CC 2007, 2011 attrbution - R.B. Allen Text and Text Processing.

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Review of AI from Chapter 3. Journal May 13  What advantages and disadvantages do you see with using Expert Systems in real world applications like business,
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Chapter 5: Introduction to Information Retrieval
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advanced Google Becoming a Power Googler. (c) Thomas T. Kaun 2005 How Google Works PageRank: The number of pages link to any given page. “Importance”
Natural Language Processing WEB SEARCH ENGINES August, 2002.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Information Retrieval in Practice
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Introduction to Speech Production Lecture 1. Phonetics and Phonology Phonetics: The physical manifestation of language in sound waves. –How sounds are.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Learning Bit by Bit Search. Information Retrieval Census Memex Sea of Documents Find those related to “new media” Brute force.
Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 1 Informedia 03/12/97.
Information Retrieval
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Natural Language Understanding
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
General Architecture of Retrieval Systems 1Adrienn Skrop.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Information Retrieval in Practice
Speech Recognition
Digital Video Library - Jacky Ma.
Search Engine Architecture
Text Based Information Retrieval
Text-To-Speech System for English
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Data Mining Chapter 6 Search Engines
Lecture 8 Information Retrieval Introduction
Presentation transcript:

CC 2007, 2011 attrbution - R.B. Allen Text and Text Processing

CC 2007, 2011 attrbution - R.B. Allen Fonts Clearview (top) is a new font developed to make highway signs more readable. At highway speeds using headlights, the Clearview font is significantly more readable than the font traditionally used for highway signs.

CC 2007, 2011 attrbution - R.B. Allen OCR Optical Character Recognition Recognition Process Looking for features Versus matching a template How much linguistic and world knowledge is needed for processing Collaborative corrections

CC 2007, 2011 attrbution - R.B. Allen Reading Teaching reading. Close reading. Literacy

CC 2007, 2011 attrbution - R.B. Allen Authorship of the Federalist Papers The Federalist Papers are a series of 30 essays published in newspapers to argue for the adoption of the U.S. Constitution. James Monroe and Alexander authored almost all of them but for some of the essays, the identity of the author them was lost. Because each author used a distinctive set of terms, the authorship was able to be determined by a Bayesian statistical analysis of the words.

CC 2007, 2011 attrbution - R.B. Allen Text Processing Spell-checking –Edit distance Parsing Summarization Text categorization

CC 2007, 2011 attrbution - R.B. Allen Information Extraction Texts (e.g., Web pages) have a lot of information but it not well structured. If we could extract that information, we could develop better question answering systems. Named-Entity Extraction Template Matching

CC 2007, 2011 attrbution - R.B. Allen Text Document Retrieval Literally

CC 2007, 2011 attrbution - R.B. Allen Vector Model Words carry a lot of the meaning of documents. Thus, we can represent the meaning of a document fairly well with a list (i.e. a vector) of terms. Queries can be also be represented as vectors. Weighting terms with term frequency or document frequency.

CC 2007, 2011 attrbution - R.B. Allen Other Text-Retrieval Techniques For Web pages, the hyperlinks are also an indication of similarity. This was captured in Google’s PageRank Algorithm Learning from users. Social network links (what your friends are looking for)

CC 2007, 2011 attrbution - R.B. Allen Retrieval Interfaces

CC 2007, 2011 attrbution - R.B. Allen Indexing the Web Spidering

CC 2007, 2011 attrbution - R.B. Allen Search Engine Business Models Advertising Ad-words Search engines linked to other services

CC 2007, 2011 attrbution - R.B. Allen Automated Question Answering Recall the discussion of answering reference questions Automated question answering –Question categorization –Finding the answers From a knowledgebase Synthesizing answers from the Web

CC 2007, 2011 attrbution - R.B. Allen Sentiment Analysis and Blog Retrieval There is can be a great advantage to knowing what’s the populace is thinking. Example of difficulty. Valence detection

CC 2007, 2011 attrbution - R.B. Allen Summarization What do we mean by a summary Techniques –Extractive summarization Teaching summarization

CC 2007, 2011 attrbution - R.B. Allen Translation Surface translation Pair-wise translations versus a common, language-neutral representation. Try a round-trip translation Increasingly, statistical methods are used for improving translations.

CC 2007, 2011 attrbution - R.B. Allen Speech Processing Representing speech with Phonemes Basic sound units. In English there are about 56 phonemes Vowels vs. consonants Types of consonants: plosives, fricatives Many applications Speaker identification Word spotting Language recognition

CC 2007, 2011 attrbution - R.B. Allen Speech Recognition Digitize the sound waves Spectrograms: From waveform to frequency Can we find the phonemes? Look for “formants”

Original Sound Wave Sampled Sound Wave Frequency RepresentationWave Representation Automatically processing speech: Creating a spectrogram CC 2007, 2011 attribution - R.B. Allen

CC 2007, 2011 attrbution - R.B. Allen Phonemes Sounds which differentiate meaning Bit, But, Bat, Bet, Robot Types of Phonemes Vowels Consonants Fricatives – f, s Nasals – m, n Plosives p, t, k Flap – tt (utter) Non-English sound Trill – (Spanish perro) Click (!kung)

CC 2007, 2011 attrbution - R.B. Allen Processing Speech to Find Formats