NLU: State of the Art
Text Mining in Python

The Simplicity of Python, the Power of Spark NLP

Powerful One-Liners

Hundreds of NLP models in hundreds of languages are at your fingertips with just one line of code

Elegant Python

Directly read and write pandas dataframes for frictionless integration with other libraries and existing ML pipelines

100% Open Source

Including pre-trained models & pipelines

Quick and Easy

NLU is available on PyPI, Conda
# Install NLU from PyPI
pip install nlu

# Install NLU from Anaconda/Conda
conda install -c johnsnowlabs nlu
            

Benchmark

Spark NLP 2.5.x obtained the best performing academic peer-reviewed results

Training NER

  • State-of-the-art Deep Learning algorithms
  • Achieve high accuracy with one line of code
  • 350 + NLP Models
  • 176 + unique NLP models and algorithms
  • 68 + unique NLP pipelines consisting of different NLU components
  • 50 + languages supported
  • 14 + embeddings BERT, ELMO, ALBERT, XLNET, GLOVE, USE, ELECTRA
  • 50 + Pre-trained Classifiers : Emotion, Sarcasm, Language, Question, E2E, Toxic
  • 36 + Pre-Trained NER (Named Entity Recognition) models
  • 34 + Pre-Trained POS (Part of Speech) models
  • 3 + Pre-Trained Lemmatizer models
  • Dependency parsing untyped and typed
  • Spell Checking
  • Multi-lingual NER models in Dutch, English, French, German, Italian, Norwegian, Polish, Portuguese, Russian, Spanish
System Year Language Accuracy
Spark NLP v2.4 2020 Python/Scala/Java/R 93.3 (test F1) - 95.9 (dev F1)
Spark NLP v2.x 2019 Python/Scala/Java/R 93
Spark NLP v1.x 2018 Python/Scala/Java/R 92
spaCy v2.x 2017
Python/Cython 92.6
spaCy v1.x 2015 Python/Cython 91.8
ClearNLP 2015 Java 91.7
CoreNLP 2015 Java 89.6
MATE 2015 Java 92.5
Turbo 2015 C++ 92.4