WORDS Lab CSC 9010: Special Topics. Natural Language Processing.

Slides:



Advertisements
Similar presentations
Lets Learn Understand the Problem: What do schools, books, and museums have in common? They all help you learn. Learning helps you find out about the world.
Advertisements

CSC 9010: Special Topics, Natural Language Processing. Spring, Matuszek & Papalaskari 1 N-Grams CSC 9010: Special Topics. Natural Language Processing.
©2012 Paula Matuszek CSC 9010: Text Mining Applications: Text Features Dr. Paula Matuszek (610)
Department of Computer Science, University of Maryland, College Park 1 Sharath Srinivas - CMSC 818Z, Spring 2007 Semantic Web and Knowledge Representation.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
COURSE OVERVIEW ADVANCED TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
What is the NIH RePORTER? And How Will it Help My PI?
CSC 9010 Spring, Paula Matuszek, Lillian Cassel 1 CS 9010: Semantic Web Possible Topics for Discussion Paula Matuszek Spring, 2006.
IMAT1906 Systems Development Lecture 1: Introduction.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
CS100J Spring 2006 CS100J: 11 weeks of programming using Java and 2 weeks using Matlab. David Gries is teaching CS100J. Graeme Bailey is teaching a special.
©2003 Paula Matuszek CSC 9010: Information Extraction Dr. Paula Matuszek (610) Fall, 2003.
1 CSC 8520 Spring Paula Matuszek Kinds of Machine Learning Machine learning techniques can be grouped into several categories, in several ways: –What.
CSC 8520 Fall, Paula Matuszek 1 CS 8520: Artificial Intelligence Lab 1 Paula Matuszek Fall, 2008.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
CSC 9010: Special Topics, Natural Language Processing. Spring, Matuszek & Papalaskari 1 Part of Speech (POS) Tagging Lab CSC 9010: Special Topics.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
1 CSC 9010 Spring Paula Matuszek CSC 9010 ANN Lab Paula Matuszek Spring, 2011.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Dr. Paula Matuszek (610)
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
CSC 9010 Spring, Paula Matuszek. 1 CS 9010: Semantic Web Applications and Ontology Engineering Paula Matuszek Spring, 2006.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 3 Dr. Paula Matuszek (610)
CSC 9010 Spring, Paula Matuszek, Lillian Cassel 1 CS 9010: Semantic Web Protégé Lab Paula Matuszek Spring, 2006.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 2 Dr. Paula Matuszek (610)
1 An Introduction to Computational Linguistics Mohammad Bahrani.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
©2003 Paula Matuszek CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek (610)
Evaluating your project
Computational UIUC Lane Schwartz Student Orientation August 23, 2017.
NLTK Natural Language Processing with Python, Steven Bird, Ewan Klein, and Edward Loper, O'REILLY, 2009.
Tools for Natural Language Processing Applications
Computational UIUC Lane Schwartz Student Orientation August 18, 2016.
Natural Language Processing (NLP)
Artificial Intelligence (CS 461D)
Artificial Intelligence (CS 370D)
CS 8520: Artificial Intelligence
Stock Market Prediction
Issues in Spoken Dialogue Systems
Don’t forget to change the year if necessary.
HTML <tag> Syntax
A New (Old) Approach to Teaching: The Ignatian Pedagogy Paradigm (IPP) and Its Application to Statistics Mary Malliaris, Mike Hewitt, Gezinus Hidding.
U.S. History Research Paper Outline & Organizer (Due 4/11!!)
Language and Statistics
Information Retrieval Lab
A New (Old) Approach to Teaching: The Ignatian Pedagogy Paradigm (IPP) and Its Application to Statistics Mary Malliaris, Mike Hewitt, Gezinus Hidding.
Working in Groups.
CSC 581: Mobile App Development
TEAM 2 EMERGING INFORMATION TECHNOLOGIES I
Intro to CIT 594
Intro to CIT 594
Natural Language Processing (NLP)
CSC 581: Mobile App Development
Paradigms and paradigm shifts
Lesson twelve Who Said What?.
U.S. History Research Paper Outline & Organizer
CS114B Introduction to Computer Science II
Get to transfer level math in ONE SEMESTER
CSC 1051 – Data Structures and Algorithms I
Main Title Here Topic 2 Topic 3 Topic 4 Topic 5
Artificial Intelligence 2004 Speech & Natural Language Processing
Tokenizing Search/regex Statistics
CSCI 203: Introduction to Computer Science I
Presentations.
RDA: Not just for the Big Boys
Headings How has the information been divided into smaller topics?
Headings How has the information been divided into smaller topics?
CS249 Advanced Seminar: Learning From Text
Natural Language Processing (NLP)
Presentation transcript:

WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

Words, Words, Words So far we have covered methods that largely operate on tokens. Tokenizing text Stemming words and determining lemmas POS-tagging Language models based on n-gram frequencies CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

Every time I fire a linguist, my performance goes up1 None of this has much of what could be considered "linguistic" knowledge or "understanding". No parsing Not much domain knowledge o "meaning" For the next two sections of the course we will talk extensively about syntax and semantics. 1. Hirschberg, Julia. 1998. "Every time I fire a linguist, my performance goes up," and other myths of the statistical natural language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98). CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

What's In a Word? For this lab, we will focus on some of the things that can be done with application of the techniques we have already studied. Format will be Try a demo Discuss what techniques were needed to implement it Discuss some of what would be needed to improve it CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

Gender Genie www.bookblog.net/gender/genie.html Techniques: How good is it? What might improve it? Reference: www.cs.biu.ac.il/~koppel/papers/male-female-text-final.pdf CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

Pearson Knowledge Technologies Text Classification Demo www.k-a-t.com:8080/classify/ Techniques: How good is it? What might improve it? Reference: www.k-a-t.com/publications.shtml CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

Google Sets labs.google.com/sets Techniques: How good is it? What might improve it? Reference: if you find one let me know. Possibly something like this: ww.arxiv.org/pdf/cs.CL/0412098 CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

AT&T Text to Speech Techniques: How good is it? What might improve it? www.research.att.com/projects/tts/demo.html Techniques: How good is it? What might improve it? Reference: www.research.att.com/projects/tts/pubs.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari