Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Presented by Zeehasham Rasheed
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Warehouse Activity Profiling
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Short Text Understanding Through Lexical-Semantic Analysis
Marko Grobelnik Jozef Stefan Institute ( Ljubljana, Slovenia.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
MinorThird 서울시립대학교 인공지능연구실 곽별샘
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
A Language Independent Method for Question Classification COLING 2004.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Natural language processing tools Lê Đức Trọng 1.
Talk Schedule Question Answering from Bryan Klimt July 28, 2005.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park.
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Data Mining and Decision Support
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
Classifying Covert Photographs CVPR 2012 POSTER. Outline  Introduction  Combine Image Features and Attributes  Experiment  Conclusion.
CSCI 6962: Server-side Design and Programming Shopping Carts and Databases.
Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding Xu Linhe 14S
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
Information Organization: Overview
Applying Deep Neural Network to Enhance EMPI Searching
Auto Coding System Development and application
Presented by: Hassan Sayyadi
Introduction to PubChem BioAssay
What is Pattern Recognition?
Intent-Aware Semantic Query Annotation
Searching and browsing through fragments of TED Talks
Objects as Attributes for Scene Classification
1st language words & bits 2nd language words & bits
Information Organization: Overview
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
USING NLP TO MAKE UNSTRUCTURED DATA HIGHLY ACCESSABLE
Presentation transcript:

Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

Problem Caloricious.com: – Semantic search engine for food items Free-text queries over structured data – Query: gluten free high protein bars – Data: Each food item is database record with attributes name, brand, category, nutrients, allergens,.. Query segmentation and structured annotation glutenfreehighproteinbars ALLERGENNUTRIENTCATEGORY

1 st Approach MEMM with Synthetic Training Data Seems as instance of NER Problem: No labeled queries to train MEMM Solution: Generate synthetic labeled queries – Query study in 100 queries 96% queries contain 1–3 segments. One of the segments in 98% queries refers to Name or Category or Brand – Algorithm Pick a food item at random Pick 1-3 attributes and generate a query

2 nd Approach Segmentation & MaxEnt Classification Query Segmentation Train language model on structured data text Use model to find segment probabilities Find the ML segmentation through DP Segment Annotation Annotate each segment with an attribute using MaxEnt classifier Training: For each attribute training examples come from the corresponding entries of database products glutenfreehighproteinbars glutenfreehighproteinbars

Results

Conclusions – Future Work Combination of Language Model, Dynamic Programming and MaxEnt classification provides very good accuracy without labeled data It would be interesting to compare with NER on a big labeled set We also plan to compare with the state-of-the art algorithm in the context of a research submission.

More Results… Evangelos March 12, 9.14am 19.5 inches 6lbs 11oz