1-liners reference

Usage examples of nlp.load()

The following examples demonstrate how to use nlu’s load api accompanied by the outputs generated by it. It enables loading any model or pipeline in one line
You need to pass one NLU reference to the load method.
You can also pass multiple whitespace separated references.
You can find all NLU references here

Named Entity Recognition (NER) 18 class

NER ONTO example

Predicts the following 18 NER classes from the ONTO dataset :

Type	Description
PERSON	People, including fictional like Harry Potter
NORP	Nationalities or religious or political groups like the Germans
FAC	Buildings, airports, highways, bridges, etc. like New York Airport
ORG	Companies, agencies, institutions, etc. like Microsoft
GPE	Countries, cities, states. like Germany
LOC	Non-GPE locations, mountain ranges, bodies of water. Like the Sahara desert
PRODUCT	Objects, vehicles, foods, etc. (Not services.) like playstation
EVENT	Named hurricanes, battles, wars, sports events, etc. like hurricane Katrina
WORK_OF_ART	Titles of books, songs, etc. Like Mona Lisa
LAW	Named documents made into laws. Like : Declaration of Independence
LANGUAGE	Any named language. Like Turkish
DATE	Absolute or relative dates or periods. Like every second friday
TIME	Times smaller than a day. Like every minute
PERCENT	Percentage, including ”%“. Like 55% of workers enjoy their work
MONEY	Monetary values, including unit. Like 50$ for those pants
QUANTITY	Measurements, as of weight or distance. Like this person weights 50kg
ORDINAL	“first”, “second”, etc. Like David placed first in the tournament
CARDINAL	Numerals that do not fall under another type. Like hundreds of models are avaiable in NLU

nlp.load('ner').predict('Angela Merkel from Germany and the American Donald Trump dont share many opinions')

embeddings	ner_tag	entities
[[-0.563759982585907, 0.26958999037742615, 0.3…	PER	Angela Merkel
[[-0.563759982585907, 0.26958999037742615, 0.3…	GPE	Germany
[[-0.563759982585907, 0.26958999037742615, 0.3…	NORP	American
[[-0.563759982585907, 0.26958999037742615, 0.3…	PER	Donald Trump

Named Entity Recognition (NER) 5 Class

NER CONLL example

Predicts the following NER classes from the CONLL dataset :

Tag	Description
B-PER	A person like Jim or Joe
B-ORG	An organisation like Microsoft or PETA
B-LOC	A location like Germany
B-MISC	Anything else like Playstation
O	Everything that is not an entity.

nlp.load('ner.conll').predict('Angela Merkel from Germany and the American Donald Trump dont share many opinions')

embeddings	ner_tag	entities
[[-0.563759982585907, 0.26958999037742615, 0.3…	PER	Angela Merkel
[[-0.563759982585907, 0.26958999037742615, 0.3…	LOC	Germany
[[-0.563759982585907, 0.26958999037742615, 0.3…	MISC	American
[[-0.563759982585907, 0.26958999037742615, 0.3…	PER	Donald Trump

Part of speech (POS)

POS Classifies each token with one of the following tags

Part of Speech example

Tag	Description	Example
CC	Coordinating conjunction	This batch of mushroom stew is savory and delicious
CD	Cardinal number	Here are five coins
DT	Determiner	The bunny went home
EX	Existential there	There is a storm coming
FW	Foreign word	I’m having a déjà vu
IN	Preposition or subordinating conjunction	He is cleverer than I am
JJ	Adjective	She wore a beautiful dress
JJR	Adjective, comparative	My house is bigger than yours
JJS	Adjective, superlative	I am the shortest person in my family
LS	List item marker	A number of things need to be considered before starting a business , such as premises , finance , product demand , staffing and access to customers
MD	Modal	You must stop when the traffic lights turn red
NN	Noun, singular or mass	The dog likes to run
NNS	Noun, plural	The cars are fast
NNP	Proper noun, singular	I ordered the chair from Amazon
NNPS	Proper noun, plural	We visted the Kennedys
PDT	Predeterminer	Both the children had a toy
POS	Possessive ending	I built the dog’s house
PRP	Personal pronoun	You need to stop
PRP$	Possessive pronoun	Remember not to judge a book by its cover
RB	Adverb	The dog barks loudly
RBR	Adverb, comparative	Could you sing more quietly please?
RBS	Adverb, superlative	Everyone in the race ran fast, but John ran the fastest of all
RP	Particle	He ate up all his dinner
SYM	Symbol	What are you doing ?
TO	to	Please send it back to me
UH	Interjection	Wow! You look gorgeous
VB	Verb, base form	We play soccer
VBD	Verb, past tense	I worked at a restaurant
VBG	Verb, gerund or present participle	Smoking kills people
VBN	Verb, past participle	She has done her homework
VBP	Verb, non-3rd person singular present	You flit from place to place
VBZ	Verb, 3rd person singular present	He never calls me
WDT	Wh-determiner	The store honored the complaints, which were less than 25 days old
WP	Wh-pronoun	Who can help me?
WP$	Possessive wh-pronoun	Whose fault is it?
WRB	Wh-adverb	Where are you going?

nlp.load('pos').predict('Part of speech assigns each token in a sentence a grammatical label')

token	pos
Part	NN
of	IN
speech	NN
assigns	NNS
each	DT
token	NN
in	IN
a	DT
sentence	NN
a	DT
grammatical	JJ
label	NN

Emotion Classifier

Emotion Classifier example
Classifies text as one of 4 categories (joy, fear, surprise, sadness)

nlp.load('emotion').predict('I love NLU!')

sentence_embeddings	emotion_confidence	sentence	emotion
[0.027570432052016258, -0.052647676318883896, …]	0.976017	I love NLU!	joy

Sentiment Classifier

Sentiment Classifier Example

Classifies binary sentiment for every sentence, either positive or negative.

nlp.load('sentiment').predict("I hate this guy Sami")

sentiment_confidence	sentence	sentiment	checked
0.5778	I hate this guy Sami	negative	[I, hate, this, guy, Sami]

Question Classifier 50 class

50 Class Questions Classifier example

Classifies between 50 different types of questions trained on the Trec50 dataset When setting predict(meta=True) nlu will output the probabilities for all other 49 question classes. The classes are the following :

Abbreviation question classes:

Class	Definition
abb	abbreviation
exp	expression abbreviated

Entities question classes:

Class	Definition
animal	animals
body	organs of body
color	colors
creative	inventions, books and other creative pieces
currency	currency names
dis	.med. diseases and medicine
event	events
food	food
instrument	musical instrument
lang	languages
letter	letters like a-z
other	other entities
plant	plants
product	products
religion	religions
sport	sports
substance	elements and substances
symbol	symbols and signs
technique	techniques and methods
term	equivalent terms
vehicle	vehicles
word	words with a special property

Description and abstract concepts question classes:

Class	Definition
definition	definition of sth.
description	description of sth.
manner	manner of an action
reason	reasons

Human being question classes:

Class	Definition
group	a group or organization of persons
ind	an individual
title	title of a person
description	description of a person

Location question classes:

Class	Definition
city	cities
country	countries
mountain	mountains
other	other locations
state	states

Numeric question classes:

Class	Definition
code	postcodes or other codes
count	number of sth.
date	dates
distance	linear measures
money	prices
order	ranks
other	other numbers
period	the lasting time of sth.
percent	fractions
speed	speed
temp	temperature
size	size, area and volume
weight	weight

nlp.load('en.classify.trec50').predict('How expensive is the Watch?')

sentence_embeddings	question_confidence	sentence	question
[0.051809534430503845, 0.03128402680158615, -0…]	0.919436	How expensive is the watch?	NUM_count

Fake News Classifier

Fake News Classifier example

nlp.load('en.classify.fakenews').predict('Unicorns have been sighted on Mars!')

sentence_embeddings	fake_confidence	sentence	fake
[-0.01756167598068714, 0.015006818808615208, -…]	1.000000	Unicorns have been sighted on Mars!	FAKE

Cyberbullying Classifier

Cyberbullying Classifier example

Classifies sexism and racism

nlp.load('en.classify.cyberbullying').predict('Women belong in the kitchen.') # sorry we really don't mean it

sentence_embeddings	cyberbullying_confidence	sentence	cyberbullying
[-0.054944973438978195, -0.022223370149731636,…]	0.999998	Women belong in the kitchen.	sexism

Spam Classifier

Spam Classifier example

nlp.load('en.classify.spam').predict('Please sign up for this FREE membership it costs $$NO MONEY$$ just your mobile number!')

sentence_embeddings	spam_confidence	sentence	spam
[0.008322705514729023, 0.009957313537597656, 0…]	1.000000	Please sign up for this FREE membership it cos…	spam

Sarcasm Classifier

Sarcasm Classifier example

nlp.load('en.classify.sarcasm').predict('gotta love the teachers who give exams on the day after halloween')

sentence_embeddings	sarcasm_confidence	sentence	sarcasm
[-0.03146284446120262, 0.04071342945098877, 0….]	0.999985	gotta love the teachers who give exams on the…	sarcasm

IMDB Movie Sentiment Classifier

Movie Review Sentiment Classifier example

nlp.load('en.sentiment.imdb').predict('The Matrix was a pretty good movie')

document	sentence_embeddings	sentiment_negative	sentiment_negative	sentiment_positive	sentiment
The Matrix was a pretty good movie	[[0.04629608988761902, -0.020867452025413513, … ]	[2.7235753918830596e-07]	[2.7235753918830596e-07]	[0.9999997615814209]	[positive]

Twitter Sentiment Classifier

Twitter Sentiment Classifier Example

nlp.load('en.sentiment.twitter').predict('@elonmusk Tesla stock price is too high imo')

document	sentence_embeddings	sentiment_negative	sentiment_negative	sentiment_positive	sentiment
@elonmusk Tesla stock price is too high imo	[[0.08604438602924347, 0.04703635722398758, -0…]	[1.0]	[1.0]	[1.692714735043349e-36]	[negative]

Language Classifier

Languages Classifier example
Classifies the following 20 languages :
Bulgarian, Czech, German, Greek, English, Spanish, Finnish, French, Croatian, Hungarian, Italy, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Turkish, and Ukrainian

nlp.load('lang').predict(['NLU is an open-source text processing library for advanced natural language processing for the Python.','NLU est une bibliothèque de traitement de texte open source pour le traitement avancé du langage naturel pour les langages de programmation Python.'])

language_confidence	document	language
0.985407	NLU is an open-source text processing library …]	en
0.999822	NLU est une bibliothèque de traitement de text…]	fr

E2E Classifier

E2E Classifier example

This is a multi class classifier trained on the E2E dataset for Natural language generation

nlp.load('e2e').predict('E2E is a dataset for training generative models')

sentence_embeddings	e2e	e2e_confidence	sentence
[0.021445205435156822, -0.039284929633140564, …,]	customer rating[high]	0.703248	E2E is a dataset for training generative models
None	name[The Waterman]	0.703248	None
None	eatType[restaurant]	0.703248	None
None	priceRange[£20-25]	0.703248	None
None	familyFriendly[no]	0.703248	None
None	familyFriendly[yes]	0.703248	None

Toxic Classifier

Toxic Text Classifier example

nlp.load('en.classify.toxic').predict('You are to stupid')

toxic_confidence	toxic	sentence_embeddings	document
0.978273	[toxic,insult]	[[-0.03398505970835686, 0.0007853527786210179,…,]	You are to stupid

YAKE Unsupervised Keyword Extractor

YAKE Keyword Extraction Example

nlp.load('yake').predict("NLU is a Python Library for beginners and experts in NLP")

keywords_score_confidence	keywords	sentence
0.454232	[nlu, nlp, python library]	NLU is a Python Library for beginners and expe…

Word Embeddings Bert

BERT Word Embeddings example

nlp.load('bert').predict('NLU offers the latest embeddings in one line ')

token	bert_embeddings
NLU	[0.3253086805343628, -0.574441134929657, -0.08…]
offers	[-0.6660361886024475, -0.1494743824005127, -0…]
the	[-0.6587662696838379, 0.3323703110218048, 0.16…]
latest	[0.7552685737609863, 0.17207926511764526, 1.35…]
embeddings	[-0.09838500618934631, -1.1448147296905518, -1…]
in	[-0.4635896384716034, 0.38369956612586975, 0.0…]
one	[0.26821616291999817, 0.7025910019874573, 0.15…]
line	[-0.31930840015411377, -0.48271292448043823, 0…]

Word Embeddings Biobert

BIOBERT Word Embeddings example
Bert model pretrained on Bio dataset

nlp.load('biobert').predict('Biobert was pretrained on a medical dataset')

token	biobert_embeddings
NLU	[0.3253086805343628, -0.574441134929657, -0.08…]
offers	[-0.6660361886024475, -0.1494743824005127, -0…]
the	[-0.6587662696838379, 0.3323703110218048, 0.16…]
latest	[0.7552685737609863, 0.17207926511764526, 1.35…]
embeddings	[-0.09838500618934631, -1.1448147296905518, -1…]
in	[-0.4635896384716034, 0.38369956612586975, 0.0…]
one	[0.26821616291999817, 0.7025910019874573, 0.15…]
line	[-0.31930840015411377, -0.48271292448043823, 0…]

Word Embeddings Covidbert

COVIDBERT Word Embeddings
Bert model pretrained on COVID dataset

nlp.load('covidbert').predict('Albert uses a collection of many berts to generate embeddings')

token	covid_embeddings
He	[-1.0551927089691162, -1.534174919128418, 1.29…,]
was	[-0.14796507358551025, -1.3928604125976562, 0….,]
suprised	[1.0647121667861938, -0.3664901852607727, 0.54…,]
by	[-0.15271103382110596, -0.6812090277671814, -0…,]
the	[-0.45744237303733826, -1.4266574382781982, -0…,]
diversity	[-0.05339818447828293, -0.5118572115898132, 0….,]
of	[-0.2971905767917633, -1.0936176776885986, -0….,]
NLU	[-0.9573594331741333, -0.18001675605773926, -1…,]

Word Embeddings Albert

ALBERT Word Embeddings examle

nlp.load('albert').predict('Albert uses a collection of many berts to generate embeddings')

token	albert_embeddings
Albert	[-0.08257609605789185, -0.8017427325248718, 1…]
uses	[0.8256351947784424, -1.5144840478897095, 0.90…]
a	[-0.22089454531669617, -0.24295514822006226, 3…]
collection	[-0.2136894017457962, -0.8225528597831726, -0…]
of	[1.7623294591903687, -1.113651156425476, 0.800…]
many	[0.6415284872055054, -0.04533941298723221, 1.9…]
berts	[-0.5591965317726135, -1.1773797273635864, -0…]
to	[1.0956681966781616, -1.4180747270584106, -0.2…]
generate	[-0.6759272813796997, -1.3546931743621826, 1.6…]
embeddings	[-0.0035803020000457764, -0.35928264260292053,…]

Electra Embeddings

ELECTRA Word Embeddings example

nlp.load('electra').predict('He was suprised by the diversity of NLU')

token	electra_embeddings
He	[0.29674115777015686, -0.21371933817863464, -0…,]
was	[-0.4278327524662018, -0.5352768898010254, -0….,]
suprised	[-0.3090559244155884, 0.8737565279006958, -1.0…,]
by	[-0.07821277529001236, 0.13081523776054382, 0….,]
the	[0.5462881922721863, 0.0683358758687973, -0.41…,]
diversity	[0.1381239891052246, 0.2956242859363556, 0.250…,]
of	[-0.5667567253112793, -0.3955455720424652, -0….,]
NLU	[0.5597224831581116, -0.703249454498291, -1.08…,]

Word Embeddings Elmo

ELMO Word Embeddings example

nlp.load('elmo').predict('Elmo was trained on Left to right masked to learn its embeddings')

token	elmo_embeddings
Elmo	[0.6083735227584839, 0.20089012384414673, 0.42…]
was	[0.2980785369873047, -0.07382500916719437, -0…]
trained	[-0.39923471212387085, 0.17155063152313232, 0…]
on	[0.04337821900844574, 0.1392083466053009, -0.4…]
Left	[0.4468783736228943, -0.623046875, 0.771505534…]
to	[-0.18209676444530487, 0.03812692314386368, 0…]
right	[0.23305709660053253, -0.6459438800811768, 0.5…]
masked	[-0.7243442535400391, 0.10247116535902023, 0.1…]
to	[-0.18209676444530487, 0.03812692314386368, 0…]
learn	[1.2942464351654053, 0.7376189231872559, -0.58…]
its	[0.055951207876205444, 0.19218483567237854, -0…]
embeddings	[-1.31377112865448, 0.7727609872817993, 0.6748…]

Word Embeddings Xlnet

XLNET Word Embeddings example

nlp.load('xlnet').predict('XLNET computes contextualized word representations using combination of Autoregressive Language Model and Permutation Language Model')

token	xlnet_embeddings
XLNET	[-0.02719488926231861, -1.7693557739257812, -0…]
computes	[-1.8262947797775269, 0.8455266356468201, 0.57…]
contextualized	[2.8446314334869385, -0.3564329445362091, -2.1…]
word	[-0.6143839359283447, -1.7368144989013672, -0…]
representations	[-0.30445945262908936, -1.2129613161087036, 0…]
using	[0.07423821836709976, -0.02561005763709545, -0…]
combination	[-0.5387097597122192, -1.1827564239501953, 0.5…]
of	[-1.403516411781311, 0.3108177185058594, -0.32…]
Autoregressive	[-1.0869172811508179, 0.7135171890258789, -0.2…]
Language	[-0.33215752243995667, -1.4108021259307861, -0…]
Model	[-1.6097160577774048, -0.2548254430294037, 0.0…]
and	[0.7884324789047241, -1.507911205291748, 0.677…]
Permutation	[0.6049966812133789, -0.157279372215271, -0.06…]
Language	[-0.33215752243995667, -1.4108021259307861, -0…]
Model	[-1.6097160577774048, -0.2548254430294037, 0.0…]

Word Embeddings Glove

GLOVE Word Embeddings example

nlp.load('glove').predict('Glove embeddings are generated by aggregating global word-word co-occurrence matrix from a corpus')

token	glove_embeddings
Glove	[0.3677999973297119, 0.37073999643325806, 0.32…]
embeddings	[0.732479989528656, 0.3734700083732605, 0.0188…]
are	[-0.5153300166130066, 0.8318600058555603, 0.22…]
generated	[-0.35510000586509705, 0.6115900278091431, 0.4…]
by	[-0.20874999463558197, -0.11739999800920486, 0…]
aggregating	[-0.5133699774742126, 0.04489300027489662, 0.1…]
global	[0.24281999468803406, 0.6170300245285034, 0.66…]
word-word	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, …]
co-occurrence	[0.16384999454021454, -0.3178800046443939, 0.1…]
matrix	[-0.2663800120353699, 0.4449099898338318, 0.32…]
from	[0.30730998516082764, 0.24737000465393066, 0.6…]
a	[-0.2708599865436554, 0.04400600120425224, -0…]
corpus	[0.39937999844551086, 0.15894000232219696, -0…]

Multiple Token Embeddings at once

Compare 6 Embeddings at once with NLU and T-SNE example

#This takes around 10GB RAM, watch out!
nlp.load('bert albert electra elmo xlnet use glove').predict('Get all of them at once! Watch your RAM tough!')

xlnet_embeddings	use_embeddings	elmo_embeddings	electra_embeddings	glove_embeddings	sentence	albert_embeddings	biobert_embeddings	bert_embeddings
[[-0.003953204490244389, -1.5821468830108643, …,]	[-0.019299551844596863, -0.04762779921293259, …,]	[[0.04002974182367325, -0.43536433577537537, -…,]	[[0.19559216499328613, -0.46693214774131775, -…,]	[[0.1443299949169159, 0.4395099878311157, 0.58…,]	Get all of them at once, watch your RAM tough!	[[-0.4743960201740265, -0.581386387348175, 0.7…,]	[[-0.00012563914060592651, -1.372296929359436,…,]	[[-0.7687976360321045, 0.8489367961883545, -0….,]

Bert Sentence Embeddings

BERT Sentence Embeddings example

sentence	bert_sentence_embeddings
He was suprised by the diversity of NLU	[-1.0726687908172607, 0.4481312036514282, -0.0…,]

Electra Sentence Embeddings

ELECTRA Sentence Embeddings example

nlp.load('embed_sentence.electra').predict('He was suprised by the diversity of NLU')

sentence	electra_sentence_embeddings
He was suprised by the diversity of NLU	[0.005376118700951338, 0.18036000430583954, -0…,]

Sentence Embeddings Use

USE Sentence Embeddings example

nlp.load('use').predict('USE is designed to encode whole sentences and documents into vectors that can be used for text classification, semantic similarity, clustering or oder NLP tasks')

sentence	use_embeddings
USE is designed to encode whole sentences and …]	[0.03302069380879402, -0.004255455918610096, -…]

Spell Checking

Spell checking example

nlp.load('spell').predict('I liek pentut buttr ant jely')

token	checked
I	I
liek	like
peantut	pentut
buttr	buttr
and	and
jelli	jely

Dependency Parsing Unlabeled

Untyped Dependency Parsing example

nlp.load('dep.untyped').predict('Untyped Dependencies represent a grammatical tree structure.md')

token	pos	dependency
Untyped	NNP	ROOT
Dependencies	NNP	represent
represent	VBD	Untyped
a	DT	structure
grammatical	JJ	structure
tree	NN	structure
structure	NN	represent

Dependency Parsing Labeled

Typed Dependency Parsing example

nlp.load('dep').predict('Typed Dependencies represent a grammatical tree structure.md where every edge has a label')

token	pos	dependency	labled_dependency
Typed	NNP	ROOT	root
Dependencies	NNP	represent	nsubj
represent	VBD	Typed	parataxis
a	DT	structure	nsubj
grammatical	JJ	structure	amod
tree	NN	structure	flat
structure	NN	represent	nsubj
where	WRB	structure	mark
every	DT	edge	nsubj
edge	NN	where	nsubj
has	VBZ	ROOT	root
a	DT	label	nsubj
label	NN	has	nsubj

Tokenization

Tokenization example

nlp.load('tokenize').predict('Each word and symbol in a sentence will generate token.')

token
Each
word
and
symbol
will
generate
a
token
.

Stemmer

Stemmer example

nlp.load('stem').predict('NLU can get you the stem of a word')

token	stem
NLU	nlu
can	can
get	get
you	you
the	the
stem	stem
of	of
a	a
word	word

Stopwords Removal

Stopwords Removal example

nlp.load('stopwords').predict('I want you to remove stopwords from this sentence please')

token	cleanTokens
I	remove
want	stopwords
you	sentence
to	None
remove	None
stopwords	None
from	None
this	None
sentence	None
please	None

Lemmatization

Lemmatization example

nlp.load('lemma').predict('Lemmatizing generates a less noisy version of the inputted tokens')

token	lemma
Lemmatizing	Lemmatizing
generates	generate
a	a
less	less
noisy	noisy
version	version
of	of
the	the
inputted	input
tokens	token

Normalizers

Normalizing example

nlp.load('norm').predict('@CKL_IT says that #normalizers are pretty useful to clean #structured_strings in #NLU like tweets')

normalized	token
CKLIT	@CKL_IT
says	says
that	that
normalizers	#normalizers
are	are
pretty	pretty
useful	useful
to	to
clean	clean
structuredstrings	#structured_strings
in	in
NLU	#NLU
like	like
tweets	tweets

NGrams

NGrams example

nlp.load('ngram').predict('Wht a wondful day!')

document	ngrams	pos
To be or not to be	[To, be, or, not, to, be, To be, be or, or not…]	[TO, VB, CC, RB, TO, VB]

Date Matching

Date Matching example

nlp.load('match.datetime').predict('In the years 2000/01/01 to 2010/01/01 a lot of things happened')

document	date
In the years 2000/01/01 to 2010/01/01 a lot of things happened	[2000/01/01, 2001/01/01]

Entity Chunking

Checkout see here for all possible POS labels or
Splits text into rows based on matched grammatical entities.

Entity Chunking Example

# First we load the pipeline
pipe = nlp.load('match.chunks')
# Now we print the info to see at which index which com,ponent is and what parameters we can configure on them 
pipe.generate_class_metadata_table()
# Lets set our Chunker to only match NN
pipe['default_chunker'].setRegexParsers(['<NN>+', '<JJ>+'])
# Now we can predict with the configured pipeline
pipe.predict("Jim and Joe went to the big blue market next to the town hall")

# the outputs of component_list.print_info()
The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :
>>> component_list['document_assembler'] has settable params:
component_list['document_assembler'].setCleanupMode('disabled')         | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : disabled
>>> component_list['sentence_detector'] has settable params:
component_list['sentence_detector'].setCustomBounds([])                 | Info: characters used to explicitly mark sentence bounds | Currently set to : []
component_list['sentence_detector'].setDetectLists(True)                | Info: whether detect lists during sentence detection | Currently set to : True
component_list['sentence_detector'].setExplodeSentences(False)          | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False
component_list['sentence_detector'].setMaxLength(99999)                 | Info: Set the maximum allowed length for each sentence | Currently set to : 99999
component_list['sentence_detector'].setMinLength(0)                     | Info: Set the minimum allowed length for each sentence. | Currently set to : 0
component_list['sentence_detector'].setUseAbbreviations(True)           | Info: whether to apply abbreviations at sentence detection | Currently set to : True
component_list['sentence_detector'].setUseCustomBoundsOnly(False)       | Info: Only utilize custom bounds in sentence detection | Currently set to : False
>>> component_list['regex_matcher'] has settable params:
component_list['regex_matcher'].setCaseSensitiveExceptions(True)        | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True
component_list['regex_matcher'].setTargetPattern('\S+')                 | Info: pattern to grab from text as token candidates. Defaults \S+ | Currently set to : \S+
component_list['regex_matcher'].setMaxLength(99999)                     | Info: Set the maximum allowed length for each token | Currently set to : 99999
component_list['regex_matcher'].setMinLength(0)                         | Info: Set the minimum allowed length for each token | Currently set to : 0
>>> component_list['sentiment_dl'] has settable params:
>>> component_list['default_chunker'] has settable params:
component_list['default_chunker'].setRegexParsers(['<DT>?<JJ>*<NN>+'])  | Info: an array of grammar based chunk parsers | Currently set to : ['<DT>?<JJ>*<NN>+']```

chunk	pos
market	[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO…
town hall	[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO…
big blue	[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO…
next	[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO…

Sentence Detection

Sentence Detection example

nlp.load('sentence_detector').predict('NLU can detect things. Like beginning and endings of sentences. It can also do much more!', output_level ='sentence')  

sentence	word_embeddings	pos	ner
NLU can detect things.	[[0.4970400035381317, -0.013454999774694443, 0…]	[NNP, MD, VB, NNS, ., IN, VBG, CC, NNS, IN, NN… ]	[O, O, O, O, O, B-sent, O, O, O, O, O, O, B-se…]
Like beginning and endings of sentences.	[[0.4970400035381317, -0.013454999774694443, 0…]	[NNP, MD, VB, NNS, ., IN, VBG, CC, NNS, IN, NN…]	[O, O, O, O, O, B-sent, O, O, O, O, O, O, B-se…]
It can also do much more!	[[0.4970400035381317, -0.013454999774694443, 0…]	[NNP, MD, VB, NNS, ., IN, VBG, CC, NNS, IN, NN…]	[O, O, O, O, O, B-sent, O, O, O, O, O, O, B-se…]

Document Normalization

Document Normalizer example
The DocumentNormalizer extracts content from HTML or XML documents, applying either data cleansing using an arbitrary number of custom regular expressions either data extraction following the different parameters

pipe = nlp.load('norm_document')
data = '<!DOCTYPE html> <html> <head> <title>Example</title> </head> <body> <p>This is an example of a simple HTML page with one paragraph.</p> </body> </html>'
df = pipe.predict(data,output_level='document')
df

text	normalized_text
`<!DOCTYPE html> <html> <head> <title>Example</title> </head> <body> <p>This is an example of a simple HTML page with one paragraph.</p> </body> </html>`	Example This is an example of a simple HTML page with one paragraph.

Word Segmenter

Word Segmenter Example
The WordSegmenter segments languages without any rule-based tokenization such as Chinese, Japanese, or Korean

pipe = nlp.load('ja.segment_words')
# japanese for 'Donald Trump and Angela Merkel dont share many opinions'
ja_data = ['ドナルド・トランプとアンゲラ・メルケルは多くの意見を共有していません']
df = pipe.predict(ja_data, output_level='token')
df

token
ドナルド
・
トランプ
と
アンゲラ
・
メルケル
は
多く
の
意見
を
共有
し
て
い
ませ
ん

Translation

Translation example You can translate between more than 192 Languages pairs with the Marian Models You need to specify the language your data is in as start_language and the language you want to translate to as target_language.
The language references must be ISO language codes

nlp.load('xx.<start_language>.translate_to.<target_language>')

Translate Turkish to English:
nlp.load('xx.tr.translate_to.fr')

Translate English to French:
nlp.load('xx.en.translate_to.fr')

Translate French to Hebrew:
nlp.load('xx.en.translate_to.fr')

translate_pipe = nlp.load('xx.en.translate_to.de')
df = translate_pipe.predict('Billy likes to go to the mall every sunday')
df

sentence	translation
Billy likes to go to the mall every sunday	Billy geht gerne jeden Sonntag ins Einkaufszentrum

T5

Example of every T5 task

Overview of every task available with T5

The T5 model is trained on various datasets for 17 different tasks which fall into 8 categories.

Text summarization
Question answering
Translation
Sentiment analysis
Natural Language inference
Coreference resolution
Sentence Completion
Word sense disambiguation

Every T5 Task with explanation:

Task Name	Explanation
1.CoLA	Classify if a sentence is gramaticaly correct
2.RTE	Classify whether if a statement can be deducted from a sentence
3.MNLI	Classify for a hypothesis and premise whether they contradict or contradict each other or neither of both (3 class).
4.MRPC	Classify whether a pair of sentences is a re-phrasing of each other (semantically equivalent)
5.QNLI	Classify whether the answer to a question can be deducted from an answer candidate.
6.QQP	Classify whether a pair of questions is a re-phrasing of each other (semantically equivalent)
7.SST2	Classify the sentiment of a sentence as positive or negative
8.STSB	Classify the sentiment of a sentence on a scale from 1 to 5 (21 Sentiment classes)
9.CB	Classify for a premise and a hypothesis whether they contradict each other or not (binary).
10.COPA	Classify for a question, premise, and 2 choices which choice the correct choice is (binary).
11.MultiRc	Classify for a question, a paragraph of text, and an answer candidate, if the answer is correct (binary),
12.WiC	Classify for a pair of sentences and a disambigous word if the word has the same meaning in both sentences.
13.WSC/DPR	Predict for an ambiguous pronoun in a sentence what it is referring to.
14.Summarization	Summarize text into a shorter representation.
15.SQuAD	Answer a question for a given context.
16.WMT1.	Translate English to German
17.WMT2.	Translate English to French
18.WMT3.	Translate English to Romanian

Every T5 Task example notebook to see how to use every T5 Task.
T5 Open and Closed Book question answering notebook

Text Summarization

Summarization example

Summarizes a paragraph into a shorter version with the same semantic meaning, based on Text summarization

# Set the task on T5
pipe = nlp.load('summarize')

# define Data, add additional tags between sentences
data = [
'''
The belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth .
''',
'''  Calculus, originally called infinitesimal calculus or "the calculus of infinitesimals", is the mathematical study of continuous change, in the same way that geometry is the study of shape and algebra is the study of generalizations of arithmetic operations. It has two major branches, differential calculus and integral calculus; the former concerns instantaneous rates of change, and the slopes of curves, while integral calculus concerns accumulation of quantities, and areas under or between curves. These two branches are related to each other by the fundamental theorem of calculus, and they make use of the fundamental notions of convergence of infinite sequences and infinite series to a well-defined limit.[1] Infinitesimal calculus was developed independently in the late 17th century by Isaac Newton and Gottfried Wilhelm Leibniz.[2][3] Today, calculus has widespread uses in science, engineering, and economics.[4] In mathematics education, calculus denotes courses of elementary mathematical analysis, which are mainly devoted to the study of functions and limits. The word calculus (plural calculi) is a Latin word, meaning originally "small pebble" (this meaning is kept in medicine – see Calculus (medicine)). Because such pebbles were used for calculation, the meaning of the word has evolved and today usually means a method of computation. It is therefore used for naming specific methods of calculation and related theories, such as propositional calculus, Ricci calculus, calculus of variations, lambda calculus, and process calculus.'''
]


#Predict on text data with T5
pipe.predict(data)

Predicted summary	Text
manchester united face newcastle in the premier league on wednesday . louis van gaal’s side currently sit two points clear of liverpool in fourth . the belgian duo took to the dance floor on monday night with some friends .	the belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth .

Binary Sentence similarity/ Paraphrasing

Binary sentence similarity example Classify whether one sentence is a re-phrasing or similar to another sentence
This is a sub-task of GLUE and based on MRPC - Binary Paraphrasing/ sentence similarity classification

t5 = nlp.load('en.t5.base')
# Set the task on T5
t5['t5'].setTask('mrpc ')

# define Data, add additional tags between sentences
data = [
''' sentence1: We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , " Rumsfeld said .
sentence2: Rather , the US acted because the administration saw " existing evidence in a new light , through the prism of our experience on September 11 "
'''
,
'''  
sentence1: I like to eat peanutbutter for breakfast
sentence2: 	I like to play football.
'''
]

#Predict on text data with T5
t5.predict(data)

Sentence1	Sentence2	prediction
We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , “ Rumsfeld said .	Rather , the US acted because the administration saw “ existing evidence in a new light , through the prism of our experience on September 11 “ .	equivalent
I like to eat peanutbutter for breakfast	I like to play football	not_equivalent

How to configure T5 task for MRPC and pre-process text

.setTask('mrpc sentence1:) and prefix second sentence with sentence2:

Example pre-processed input for T5 MRPC - Binary Paraphrasing/ sentence similarity

mrpc 
sentence1: We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , " Rumsfeld said . 
sentence2: Rather , the US acted because the administration saw " existing evidence in a new light , through the prism of our experience on September 11",

Regressive Sentence similarity/ Paraphrasing

Measures how similar two sentences are on a scale from 0 to 5 with 21 classes representing a regressive label.
This is a sub-task of GLUE and based onSTSB - Regressive semantic sentence similarity .

t5 = nlp.load('en.t5.base')
# Set the task on T5
t5['t5'].setTask('stsb ') 

# define Data, add additional tags between sentences
data = [
             
              ''' sentence1:  What attributes would have made you highly desirable in ancient Rome?  
                  sentence2:  How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?'
              '''
             ,
             '''  
              sentence1: What was it like in Ancient rome?
              sentence2: 	What was Ancient rome like?
              ''',
              '''  
              sentence1: What was live like as a King in Ancient Rome??
              sentence2: 	What was Ancient rome like?
              '''

             ]



#Predict on text data with T5
t5.predict(data)

Sentence1	Sentence2	prediction
What attributes would have made you highly desirable in ancient Rome?	How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?	0
What was it like in Ancient rome?	What was Ancient rome like?	5.0
What was live like as a King in Ancient Rome??	What is it like to live in Rome?	3.2

How to configure T5 task for stsb and pre-process text

.setTask('stsb sentence1:) and prefix second sentence with sentence2:

Example pre-processed input for T5 STSB - Regressive semantic sentence similarity

stsb
sentence1: What attributes would have made you highly desirable in ancient Rome?        
sentence2: How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?',

Grammar Checking

Grammar checking with T5 example) Judges if a sentence is grammatically acceptable.
Based on CoLA - Binary Grammatical Sentence acceptability classification

pipe = nlp.load('grammar_correctness')
# Set the task on T5
pipe['t5'].setTask('cola sentence: ')
# define Data
data = ['Anna and Mike is going skiing and they is liked is','Anna and Mike like to dance']
#Predict on text data with T5
pipe.predict(data)

sentence	prediction
Anna and Mike is going skiing and they is liked is	unacceptable
Anna and Mike like to dance	acceptable

Open book question answering

T5 Open and Closed Book question answering tutorial

You can imagine an open book question similar to an examen where you are allowed to bring in text documents or cheat sheets that help you answer questions in an examen. Kinda like bringing a history book to an history examen.

In T5's terms, this means the model is given a question and an additional piece of textual information or so called context.

This enables the T5 model to answer questions on textual datasets like medical records,newsarticles , wiki-databases , stories and movie scripts , product descriptions, ‘legal documents’ and many more.

You can answer open book question in 1 line of code, leveraging the latest NLU release and Google’s T5.
All it takes is :

nlp.load('answer_question').predict("""
Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards,
 and Jebe died on the road back to Samarkand""")
>>> Output: Samarkand

Example for answering medical questions based on medical context

question ='''
What does increased oxygen concentrations in the patient’s lungs displace? 
context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. 
Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin.
 Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''


#Predict on text data with T5
nlp.load('answer_question').predict(question)
>>> Output: carbon monoxide	

Take a look at this example on a recent news article snippet :

question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'

# from https://www.bbc.com/news/business-55728338 
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""
# join question with context, works with Pandas DF aswell!
questions = [
             question1+ news_article_snippet,
             question2+ news_article_snippet,
             question3+ news_article_snippet,
             question4+ news_article_snippet,
             question5+ news_article_snippet,
             question6+ news_article_snippet,]
nlp.load('answer_question').predict(questions)

This will output a Pandas Dataframe similar to this :

Answer	Question
Alibaba Group founder	Who is Jack ma?
Jack Ma	Who is founder of Alibaba Group?
Wednesday	When did Jack Ma re-appear?
surged 5%	How did Alibaba stocks react?
100 rural teachers	Whom did Jack Ma meet?
Chinese regulators	Who did Jack Ma hide from?

Closed book question answering

T5 Open and Closed Book question answering tutorial

A closed book question is the exact opposite of a open book question. In an examen scenario, you are only allowed to use what you have memorized in your brain and nothing else.
In T5's terms this means that T5 can only use it’s stored weights to answer a question and is given no aditional context.
T5 was pre-trained on the C4 dataset which contains petabytes of web crawling data collected over the last 8 years, including Wikipedia in every language.

This gives T5 the broad knowledge of the internet stored in it’s weights to answer various closed book questions

You can answer closed book question in 1 line of code, leveraging the latest NLU release and Google’s T5.
You need to pass one string to NLU, which starts which a question and is followed by a context: tag and then the actual context contents. All it takes is :

nlp.load('en.t5').predict('Who is president of Nigeria?')
>>> Muhammadu Buhari 

nlp.load('en.t5').predict('What is the most spoken language in India?')
>>> Hindi

nlp.load('en.t5').predict('What is the capital of Germany?')
>>> Berlin

PREVIOUS1-liner Tutorial Notebooks

NEXTNLP Pipelines