Tokenizing Search/regex Statistics

Tokenizing Search/regex Statistics Parts of Speech (PoS) tagging and Semantics Machine Translation Logic Sentiment analysis (easier in other libraries so lets look at a couple)

Other processes in the NLTK (with book sections)
Identifying male/female names: Section Classifying documents: Identifying dialogue act types, such as "Statement," "Emotion," "Question": Named Entity Recognition (e.g. people and organisations): 7.5 Relationship discovery (e.g. between organisations): 7.6 Parsing sentences as trees: Chapters 8 and 9 Covering the whole of NLTK is a course all of its own (literally, the book lays this out in some detail). However, here's a few additional processing jobs that are detailed in the book at worth knowing about.

Other libraires: Textblob
Finally, it is worth pointing out a couple of other libraires. Textblob doesn't come with Anaconda, but is a good addition. Wrapper for nltk, makes a number of functions simple: low power for adaptation, but extremely easy to use.

Textblob For example, sentiment analysis. This is the analysis of text for emotional and subjective levels. We might, for example, track how Tweets mentioning a politician change over time. In NLTK you need to set up and train your own text classifier. In Textblob, it is just: kafka = textblob.TextBlob(raw) print(kafka.sentiment) #Sentiment(polarity= , subjectivity= ) Polarity between -1.0 (very negative) and 1.0 (very positive). Subjectivity between 0.0 (very objective) and 1.0 (very subjective).

Other functions Spelling correction and checking: Also: Translation
b = TextBlob("I havv goood speling!") print(b.correct()) # I have good spelling! Also: Translation POS and noun phrase extraction Wide variety of areas covered by nltk. The translation facilities look especially good. You can find the example above, and examples of all the techniques mentioned, at:

Other libraries: SpaCy
"Industrial Strength NLP": optimised for big data and deep learning. The other library which is picking up traction is spacy, which is more like nltk in levels of complication, but very optimised. It has to be said that nltk does some things surprisingly fast, but spacy is organised for big data specifically.

Tokenizing Search/regex Statistics

Similar presentations

Presentation on theme: "Tokenizing Search/regex Statistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tokenizing Search/regex Statistics

Similar presentations

Presentation on theme: "Tokenizing Search/regex Statistics"— Presentation transcript:

Similar presentations

About project

Feedback