Load & Predict 1 liner
The johnsnowlabs library provides 2 simple methods with which most NLP tasks can be solved while achieving state-of-the-art
results.
The load and predict method.
when building a load&predict based model you will follow these steps:
- Pick a model/pipeline/component you want to create from the Namespace
- Call the
model = nlp.load(component)method which will return an auto-completed pipeline - Call
model.predict('that was easy')on some String input
These 3 steps can be boiled down to just 1 line
from johnsnowlabs import nlp
nlp.load('sentiment').predict('How does this witchcraft work?')
jsl.load() defines 18 components types usable in 1-liners, some can be prefixed with .train for training models
Any of the actions for the component types can be passed as a string to nlp.load() and will return you the default model
for that component type for the English language.
You can further specify your model selection by placing a ‘.’ behind your component selection.
After the ‘.’ you can specify the model you want via specifying a dataset or model version.
See the Models Hub, the Components Namespace
and The load function for more infos.
| Component type | nlp.load() base |
|---|---|
| Named Entity Recognition(NER) | nlp.load('ner') |
| Part of Speech (POS) | nlp.load('pos') |
| Classifiers | nlp.load('classify') |
| Word embeddings | nlp.load('embed') |
| Sentence embeddings | nlp.load('embed_sentence') |
| Chunk embeddings | nlp.load('embed_chunk') |
| Labeled dependency parsers | nlp.load('dep') |
| Unlabeled dependency parsers | nlp.load('dep.untyped') |
| Legitimatizes | nlp.load('lemma') |
| Matchers | nlp.load('match') |
| Normalizers | nlp.load('norm') |
| Sentence detectors | nlp.load('sentence_detector') |
| Chunkers | nlp.load('chunk') |
| Spell checkers | nlp.load('spell') |
| Stemmers | nlp.load('stem') |
| Stopwords cleaners | nlp.load('stopwords') |
| Cleaner | nlp.load('clean') |
| N-Grams | nlp.load('ngram') |
| Tokenizers | nlp.load('tokenize') |
Annotator & PretrainedPipeline based pipelines
You can create Annotator & PretrainedPipeline based pipelines using all the classes
attached to the nlp module.
nlp.PretrainedPipeline('pipe_name') gives access to Pretrained Pipelines
from johnsnowlabs import nlp
from pprint import pprint
nlp.start()
explain_document_pipeline = nlp.PretrainedPipeline("explain_document_ml")
annotations = explain_document_pipeline.annotate("We are very happy about SparkNLP")
pprint(annotations)
OUTPUT:
{
'stem': ['we', 'ar', 'veri', 'happi', 'about', 'sparknlp'],
'checked': ['We', 'are', 'very', 'happy', 'about', 'SparkNLP'],
'lemma': ['We', 'be', 'very', 'happy', 'about', 'SparkNLP'],
'document': ['We are very happy about SparkNLP'],
'pos': ['PRP', 'VBP', 'RB', 'JJ', 'IN', 'NNP'],
'token': ['We', 'are', 'very', 'happy', 'about', 'SparkNLP'],
'sentence': ['We are very happy about SparkNLP']
}
Custom Pipes
Alternatively you can compose Annotators into a pipeline which offers the highest degree of customization
from johnsnowlabs import nlp
spark = nlp.start(nlp=False)
pipe = nlp.Pipeline(stages=
[
nlp.DocumentAssembler().setInputCol('text').setOutputCol('doc'),
nlp.Tokenizer().setInputCols('doc').setOutputCol('tok')
])
spark_df = spark.createDataFrame([['Hello NLP World']]).toDF("text")
pipe.fit(spark_df).transform(spark_df).show()