Default Component References
See and also the John Snow Labs Modelhub and also the John Snow Labs Model Repository for further information about the models and pipelines.
Each String in the NLP reference column can be passed to nlp.load() to get the corresponding model wrapped inside a NLP Pipeline.
Language | nlp.load() Reference | Spark NLP Reference | Component Type |
---|---|---|---|
English | yake | yake | pipe |
English | xlnet | xlnet_base_cased | pipe |
English | use | tfhub_use | pipe |
English | toxic | multiclassifierdl_use_toxic | pipe |
English | tokenize | spark_nlp_tokenizer | pipe |
English | t5 | t5_base | pipe |
English | summarize | t5_base | pipe |
English | stopwords | stopwords_en | pipe |
English | stem | stemmer | pipe |
English | spell | spellcheck_dl | pipe |
English | spell.symmetric | spellcheck_sd | pipe |
English | spell.norivg | spellcheck_norvig | pipe |
English | spam | classifierdl_use_spam | pipe |
English | sentiment | sentimentdl_glove_imdb | pipe |
English | sentiment.vivekn | sentiment_vivekn | pipe |
English | sentiment.twitter | analyze_sentimentdl_use_twitter | model |
English | sentiment.twitter.use | analyze_sentimentdl_use_twitter | model |
English | sentiment.imdb | analyze_sentimentdl_use_imdb | model |
English | sentiment.imdb.use | analyze_sentimentdl_use_imdb | pipe |
English | sentiment.imdb.glove | sentimentdl_glove_imdb | pipe |
English | sentence_detector | sentence_detector_dl | pipe |
English | sentence_detector.pragmatic | pragmatic_sentence_detector | pipe |
English | sentence_detector.deep | sentence_detector_dl | model |
English | sarcasm | classifierdl_use_sarcasm | model |
English | questions | classifierdl_use_trec50 | model |
English | pos | pos_anc | model |
English | pos.ud_ewt | pos_ud_ewt | model |
English | pos.anc | pos_anc | model |
English | norm_document | normalizer | model |
English | norm | normalizer | model |
English | ngram | ngram | model |
English | ner | onto_recognize_entities_sm | model |
English | ner.onto | onto_recognize_entities_sm | model |
English | ner.onto.sm | onto_recognize_entities_sm | model |
English | ner.onto.glove.6B_300d | onto_300 | model |
English | ner.onto.glove.6B_100d | onto_100 | model |
English | ner.dl | recognize_entities_dl | model |
English | ner.dl.glove.6B_100d | ner_dl | model |
English | ner.dl.bert | ner_dl_bert | model |
English | ner.conll | recognize_entities_dl | model |
English | ner.bert | recognize_entities_bert | model |
English | match.chunks | match_chunks | model |
English | lemma | lemma_antbnc | model |
English | lemma.antbnc | lemma_antbnc | model |
English | lang | detect_language_375 | model |
English | grammar_correctness | t5_base | model |
English | glove | glove_100d | model |
English | explain | explain_document_ml | model |
English | explain.ml | explain_document_ml | model |
English | explain.dl | explain_document_dl | model |
English | emotion | classifierdl_use_emotion | model |
English | embed_sentence | tfhub_use | model |
English | embed_sentence.use_lg | tfhub_use_lg | model |
English | embed_sentence.use | tfhub_use | model |
English | embed_sentence.tfhub_use_lg | tfhub_use_lg | model |
English | embed_sentence.tfhub_use | tfhub_use | model |
English | embed_sentence.small_bert_L2_128 | sent_small_bert_L2_128 | model |
English | embed_sentence.electra | sent_electra_small_uncased | model |
English | embed_sentence.bert | sent_small_bert_L2_128 | model |
English | embed_chunk | chunk_embeddings | model |
English | embed | glove_100d | model |
English | embed.xlnet_large_cased | xlnet_large_cased | model |
English | embed.xlnet_base_cased | xlnet_base_cased | model |
English | embed.xlnet | xlnet_base_cased | model |
English | embed.glove | glove_100d | model |
English | embed.glove.840B_300 | glove_840B_300 | model |
English | embed.glove.100d | glove_100d | model |
English | embed.elmo | elmo | model |
English | embed.electra | electra_small_uncased | model |
English | embed.biobert_pubmed_pmc_base_cased | biobert_pubmed_pmc_base_cased | model |
English | embed.biobert_pubmed_large_cased | biobert_pubmed_large_cased | model |
English | embed.biobert_pubmed_base_cased | biobert_pubmed_base_cased | model |
English | embed.biobert_pmc_base_cased | biobert_pmc_base_cased | model |
English | embed.biobert_discharge_base_cased | biobert_discharge_base_cased | model |
English | embed.biobert_clinical_base_cased | biobert_clinical_base_cased | model |
English | embed.biobert | biobert_pubmed_base_cased | model |
English | embed.bert_large_uncased | bert_large_uncased | model |
English | embed.bert_large_cased | bert_large_cased | model |
English | embed.bert_base_uncased | bert_base_uncased | model |
English | embed.bert_base_cased | bert_base_cased | model |
English | embed.bert | bert_base_uncased | model |
English | embed.albert_xxlarge_uncased | albert_xxlarge_uncased | model |
English | embed.albert_xlarge_uncased | albert_xlarge_uncased | model |
English | embed.albert_large_uncased | albert_large_uncased | model |
English | embed.albert_base_uncased | albert_base_uncased | model |
English | elmo | elmo | model |
English | electra | electra_small_uncased | model |
English | e2e | multiclassifierdl_use_e2e | model |
English | dependency | dependency_conllu | model |
English | dep | dependency_typed_conllu | model |
English | dep.untyped | dependency_conllu | model |
English | dep.untyped.conllu | dependency_conllu | model |
English | dep.typed | dependency_typed_conllu | model |
English | dep.typed.conllu | dependency_typed_conllu | model |
English | cyberbullying | classifierdl_use_cyberbullying | model |
English | covidbert | covidbert_large_uncased | model |
English | clean.stop | clean_stop | model |
English | clean.slang | clean_slang | model |
English | classify | analyze_sentiment | model |
English | classify.trec6 | classifierdl_use_trec6 | model |
English | classify.trec6.use | classifierdl_use_trec6 | model |
English | classify.trec50 | classifierdl_use_trec50 | model |
English | classify.trec50.use | classifierdl_use_trec50 | model |
English | classify.spam | classifierdl_use_spam | model |
English | classify.spam.use | classifierdl_use_spam | model |
English | classify.sentiment_t5 | t5_base | model |
English | classify.sarcasm | classifierdl_use_sarcasm | model |
English | classify.sarcasm.use | classifierdl_use_sarcasm | model |
English | classify.questions | classifierdl_use_trec50 | model |
English | classify.lang | detect_language_375 | model |
English | classify.fakenews | classifierdl_use_fakenews | model |
English | classify.fakenews.use | classifierdl_use_fakenews | model |
English | classify.emotion | classifierdl_use_emotion | model |
English | classify.emotion.use | classifierdl_use_emotion | model |
English | classify.cyberbullying | classifierdl_use_cyberbullying | model |
English | classify.cyberbullying.use | classifierdl_use_cyberbullying | model |
English | chunk | default_chunker | model |
English | biobert | biobert_pubmed_base_cased | model |
English | bert | small_bert_L2_128 | model |
English | answer_question | t5_base | model |
English | albert | albert_base_uncased | model |
Model references
| Language Name(s) | nlp.load() Reference | Spark NLP Reference | |:———————————————————————————————————————-|:—————————————————————————————————————————————————————————————————————————————————|:———————————————————————————————————————————————————————————————————————————————————————–| | Aequian | vn.answer_question.xlm_roberta.base | xlm_roberta_qa_xlm_roberta_base_vietnamese | | Aequian | roberta | distilroberta_base | | Church Slavic, Church Slavonic, Old Bulgarian, Old Church Slavonic, Old Slavonic | cu.pos | pos_proiel | | Church Slavic, Church Slavonic, Old Bulgarian, Old Church Slavonic, Old Slavonic | cu.lemma | lemma_proiel | | Church Slavic, Church Slavonic, Old Bulgarian, Old Church Slavonic, Old Slavonic | cu.lemma.proiel | lemma_proiel | | Gothic | got.pos.proiel | pos_proiel | | Gothic | got.lemma | lemma_proiel | | Gothic | got.lemma.proiel | lemma_proiel | | Latin | la.stopwords | stopwords_la | | Latin | la.pos | pos_perseus | | Latin | la.pos.udante | pos_udante | | Latin | la.pos.proiel | pos_proiel | | Latin | la.pos.perseus | pos_perseus | | Latin | la.pos.llct | pos_llct | | Latin | la.pos.ittb | pos_ittb | | Latin | la.lemma | lemma_proiel | | Latin | la.lemma.udante | lemma_udante | | Latin | la.lemma.proiel | lemma_proiel | | Latin | la.lemma.perseus | lemma_perseus | | Latin | la.lemma.llct | lemma_llct | | Latin | la.lemma.ittb | lemma_ittb | | Sanskrit | sa.stopwords | stopwords_iso | | Sanskrit | sa.pos | pos_vedic | | Sanskrit | sa.lemma | lemma_vedic | | Sanskrit | sa.embed.w2v_cc_300d | w2v_cc_300d | | Esperanto | xx.eo.marian.translate_to.vi | opus_mt_vi_eo | | Esperanto | xx.eo.marian.translate_to.tr | opus_mt_tr_eo | | Esperanto | xx.eo.marian.translate_to.sv | opus_mt_sv_eo | | Esperanto | xx.eo.marian.translate_to.sh | opus_mt_sh_eo | | Esperanto | xx.eo.marian.translate_to.ru | opus_mt_ru_eo | | Esperanto | xx.eo.marian.translate_to.ro | opus_mt_ro_eo | | Esperanto | xx.eo.marian.translate_to.pt | opus_mt_pt_eo | | Esperanto | xx.eo.marian.translate_to.pl | opus_mt_pl_eo | | Esperanto | xx.eo.marian.translate_to.nl | opus_mt_nl_eo | | Esperanto | xx.eo.marian.translate_to.lt | opus_mt_lt_eo | | Esperanto | xx.eo.marian.translate_to.it | opus_mt_it_eo | | Esperanto | xx.eo.marian.translate_to.is | opus_mt_is_eo | | Esperanto | xx.eo.marian.translate_to.hu | opus_mt_hu_eo | | Esperanto | xx.eo.marian.translate_to.he | opus_mt_he_eo | | Esperanto | xx.eo.marian.translate_to.fr | opus_mt_fr_eo | | Esperanto | xx.eo.marian.translate_to.fi | opus_mt_fi_eo | | Esperanto | xx.eo.marian.translate_to.es | opus_mt_es_eo | | Esperanto | xx.eo.marian.translate_to.en | opus_mt_eo_en | | Esperanto | xx.eo.marian.translate_to.el | opus_mt_el_eo | | Esperanto | xx.eo.marian.translate_to.de | opus_mt_de_eo | | Esperanto | xx.eo.marian.translate_to.da | opus_mt_da_eo | | Esperanto | xx.eo.marian.translate_to.cs | opus_mt_cs_eo | | Esperanto | xx.eo.marian.translate_to.bg | opus_mt_bg_eo | | Esperanto | xx.eo.marian.translate_to.ar | opus_mt_ar_eo | | Esperanto | xx.eo.marian.translate_to.af | opus_mt_af_eo | | Esperanto | eo.stopwords | stopwords_eo | | Esperanto | eo.embed.w2v_cc_300d | w2v_cc_300d | | Volapük | vo.embed.w2v_cc_300d | w2v_cc_300d | | Coptic | cop.pos | pos_scriptorium | | Coptic | cop.lemma | lemma_scriptorium | | Coptic | cop.lemma.scriptorium | lemma_scriptorium | | Afro-Asiatic languages | xx.afa.marian.translate_to.en | opus_mt_afa_en | | Afro-Asiatic languages | xx.afa.marian.translate_to.afa | opus_mt_afa_afa | | Atlantic-Congo languages | xx.alv.marian.translate_to.en | opus_mt_alv_en | | Austro-Asiatic languages | xx.aav.marian.translate_to.en | opus_mt_aav_en | | Baltic languages | xx.bat.marian.translate_to.en | opus_mt_bat_en | | Bantu languages | xx.bnt.marian.translate_to.en | opus_mt_bnt_en | | Basque (family) | xx.euq.marian.translate_to.en | opus_mt_euq_en | | Berber languages | xx.ber.marian.translate_to.fr | opus_mt_fr_ber | | Berber languages | xx.ber.marian.translate_to.es | opus_mt_es_ber | | Berber languages | xx.ber.marian.translate_to.en | opus_mt_ber_en | | Celtic languages | xx.cel.marian.translate_to.en | opus_mt_cel_en | | Cushitic languages | xx.cus.marian.translate_to.en | opus_mt_cus_en | | Dravidian languages | xx.dra.marian.translate_to.en | opus_mt_dra_en | | East Slavic languages | xx.zle.marian.translate_to.zle | opus_mt_zle_zle | | East Slavic languages | xx.zle.marian.translate_to.en | opus_mt_zle_en | | Eastern Malayo-Polynesian languages | xx.pqe.marian.translate_to.en | opus_mt_pqe_en | | Finno-Ugrian languages | xx.fiu.marian.translate_to.fiu | opus_mt_fiu_fiu | | Finno-Ugrian languages | xx.fiu.marian.translate_to.en | opus_mt_fiu_en | | Germanic languages | xx.gem.marian.translate_to.gem | opus_mt_gem_gem | | Germanic languages | xx.gem.marian.translate_to.en | opus_mt_gem_en | | Greek languages | xx.grk.marian.translate_to.en | opus_mt_grk_en | | Indic languages | xx.inc.marian.translate_to.inc | opus_mt_inc_inc | | Indic languages | xx.inc.marian.translate_to.en | opus_mt_inc_en | | Indo-European languages | xx.ine.marian.translate_to.ine | opus_mt_ine_ine | | Multilingual | xx.classify.wiki_21 | ld_wiki_tatoeba_cnn_21 | | Multilingual | xx.classify.wiki_21.bigru | ld_tatoeba_bigru_21 | | Multilingual | xx.classify.token_xlm_roberta.token_classifier_ner_40_lang | xlm_roberta_token_classifier_ner_40_lang | | Multilingual | xx.answer_question.xquad_tydiqa.bert.cased | bert_qa_bert_multi_cased_finedtuned_xquad_tydiqa_goldp | | Multilingual | xx.answer_question.xquad.bert.uncased | bert_qa_bert_multi_uncased_finetuned_xquadv1 | | Multilingual | xx.answer_question.xquad.bert.cased | bert_qa_bert_multi_cased_finetuned_xquadv1 | | Multilingual | xx.answer_question.xlm_roberta.distilled | xlm_roberta_qa_distill_xlm_mrc | | Multilingual | xx.answer_question.tydiqa.multi_lingual_bert | bert_qa_Part_1_mBERT_Model_E1 | | Multilingual | xx.answer_question.tydiqa.bert | bert_qa_telugu_bertu_tydiqa | | Multilingual | xx.answer_question.squad.distil_bert.en_de_es_tuned.by_ZYW | distilbert_qa_squad_en_de_es_model | | Multilingual | xx.answer_question.squad.distil_bert._en_de_es_vi_zh_tuned.by_ZYW | distilbert_qa_squad_en_de_es_vi_zh_model | | Multilingual | xx.answer_question.roberta | roberta_qa_ft_lr_cu_leolin12345 | | Multilingual | xx.answer_question.distil_bert.vi_zh_es_tuned.by_ZYW | distilbert_qa_en_de_vi_zh_es_model | | Multilingual | xx.answer_question.distil_bert.en_de_tuned.by_ZYW | distilbert_qa_en_de_model | | Multilingual | xx.answer_question.distil_bert.en_de_es_tuned.by_ZYW | distilbert_qa_en_de_es_model | | Multilingual | xx.answer_question.chaii.xlm_roberta | xlm_roberta_qa_xlm_roberta_qa_chaii |