 Mark Sanderson, University of Sheffield University of Sheffield CIIR, University of Massachusetts Deriving concept hierarchies from text Mark Sanderson,

Slides:



Advertisements
Similar presentations
© Mark E. Damon - All Rights Reserved Another Presentation © All rights Reserved
Advertisements

A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao, and ChengXiang Zhai University of Illinois at Urbana Champaign SIGIR 2004 (Best paper.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Arrayed Maps An excellent way to review at the end of a unit. This activity results in a bulletin board displaying an array of phenomena maps developed.
Search Results Need to be Diverse Mark Sanderson University of Sheffield.
Creating a Similarity Graph from WordNet
© 2014 wheresjenny.com An earthquake also known as a quake, tremor or temblor It is the result of a sudden release of energy in the Earth’s crust that.
SCIENCE JOURNAL 1/8/ st PAGE MY SCIENCE JOURNAL BY _________________.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Search Engines and Information Retrieval
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
P4 THE SCIENTIFIC METHOD.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Information Retrieval February 24, 2004
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
WMES3103: INFORMATION RETRIEVAL WEEK 10 : USER INTERFACES AND VISUALIZATION.
1 Discussion Class 3 Inverse Document Frequency. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
DE Science Elementary “5-Minute Prep” For Earth’s Features Earth’s Changing Surface Earthquakes.
Artificial intelligence & natural language processing Mark Sanderson Porto, 2000.
Modern Retrieval Evaluations Hongning Wang
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Information Paragraphs. What are they? Information paragraphs do exactly that: they provide information to readers about one aspect of a person, place,
Search Engines and Information Retrieval Chapter 1.
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
The Scientific Method. The Scientific Method The Scientific Method is a problem solving-strategy. *It is just a series of steps that can be used to solve.
Evaluating a Research Report
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results Kummamuru et al. Presented by Bei Yu Sept. 22 nd,
Earthquakes Plate movements cause large forces The rock breaks, and this break can sometimes be tens of kilometers long Faults are fractures in the.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 1. The Statistical Imagination.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
1. 2 Introduction Standards Essential Questions Essential Questions Task Procedures: –– 1 1 –– 2 2 –– 3 3 –– 4 4 –– 5 5 –– 6 6 Evaluation/Rubric Conclusion.
Science Process Skills. Observe- using our senses to find out about objects, events, or living things. Classify- arranging or sorting objects, events,
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.
Discovering Key Concepts in Verbose Queries Michael Bendersky and W. Bruce Croft University of Massachusetts SIGIR 2008.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Earthquake Vocabulary
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
More on Document Similarity and Clustering How similar are these two documents (Again) ? Are these two documents about the same topic ?
INTEGRATING DISASTER RISK ‘FIRE AND VOLCANOS ERUPTION’ REDUCTION INTO LEARNING MATHEMATICS.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Types of Research: General categories. The general types: 1. Analytical –Historical –Philosophical –Research synthesis (meta-analysis) 2. Descriptive.
Why does science matter?. Nature follows a set of rules… If we learn the rules and how they affect us we can understand, predict and prepare for what.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Thornton Elementary Third Quarter Data rd Grade ELA Which standard did the students perform the best on in reading? Which standard did students.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Family Fun Night December 6, 2012 FOCUS: Math Literacy.
SIGIR 2005 Relevance Information: A Loss of Entropy but a Gain for IDF? Arjen P. de Vries Thomas Roelleke,
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Earthquakes Pages C14-17.
Queensland University of Technology
Proposal for Term Project
Taxonomies & Classification for Organizing Content
Murat Açar - Zeynep Çipiloğlu Yıldız
SCIENTIFIC METHOD VOCABULARY.
8th Grade Science Content Standard
8th Grade Science Content Standard
Presentation transcript:

 Mark Sanderson, University of Sheffield University of Sheffield CIIR, University of Massachusetts Deriving concept hierarchies from text Mark Sanderson, Bruce Croft

 Mark Sanderson, University of Sheffield The question is... • What paper already presented at this SIGIR is most like the one you’re about to see? • We’ll have the answer, right after this!

 Mark Sanderson, University of Sheffield Concept hierarchies from documents? • Hierarchy of concepts, Yahoo – General down to specific – Child under one or more parents • No training data • Why? – Understandable

 Mark Sanderson, University of Sheffield Current methods • Polythetic clustering

 Mark Sanderson, University of Sheffield An alternative? • Monothetic clustering – Clusters based on a single features – More ‘Yahoo/Dewey decimal’ like? – Easier to understand? » Preferable to users? – What about hierarchies of clusters?

 Mark Sanderson, University of Sheffield How to arrange cluster terms? • Existing techniques – WordNet » earthquake, volcano (eruption?) – Key phrases (Hearst 1998) » “such as”, “especially” – Phrase classification (Grefenstette 1997) » NP head or modifier “types of research” from “research things” – Hierarchical phrase analysis (Woods 1997) » Head modifier again, “car washing” under “washing”, not “car”

 Mark Sanderson, University of Sheffield WordNet (aside) • 1 sense of earthquake, sense 1 – earthquake, quake, temblor, seism -- (shaking and vibration at the surface of the earth resulting from underground movement along a fault plane of from volcanic activity) » geological phenomenon -- (a natural phenomenon involving the structure or composition of the earth) » natural phenomenon, nature -- (all non-artificial phenomena) » phenomenon -- (any state or process known through the senses rather than by intuition or reasoning)

 Mark Sanderson, University of Sheffield WordNet (aside) • 5 senses of eruption, sense 1 – volcanic eruption, eruption -- (the sudden occurrence of a violent discharge of steam and volcanic material) » discharge -- (the sudden giving off of energy) » happening, occurrence, natural event -- (an event that happens) » event -- (something that happens at a given place and time)

 Mark Sanderson, University of Sheffield Start with something simpler? • Term clustering? – simple monothetic clusters – No ordering.

 Mark Sanderson, University of Sheffield Use subsumption • Initially using subsumption. – Finds related terms – Decides which is more general, which is more specific (idf?) • Strict interpretation – X s Y iff P(x|y) = 1, P(y|x) < 1 • In practice – X s Y iff P(x|y) > 0.8, P(y|x) < 1 – P(x|y) > 0.8, P(y|x) < P(x|y) x y x y

 Mark Sanderson, University of Sheffield How to build a “hierarchy” • X s Y • X s Z • X s M • X s N • Y s Z • A s B • A s Z • B s Z X Y Z MN A B really it’s a DAG

 Mark Sanderson, University of Sheffield How to display it? • DAGs were big – Unlikely to get all on screen • Only want to see current focus plus route to taken there? • Use a method users are familiar with • Hierarchical menus X Y Z MN A B Z

 Mark Sanderson, University of Sheffield What about ambiguity? • Monothetic clusters of ambiguous terms? • Derive hierarchy from retrieved documents – Take a query and retrieve on it, – take top 500 documents, – build hierarchy from them. • Topics/concepts are words/phrases taken from – Query – Retrieved documents – Comparison of frequencies

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Poliomyelitis and Post-Polio TREC topic 302

 Mark Sanderson, University of Sheffield Did you guess the paper? • Bit like Peter Anick’s work?

 Mark Sanderson, University of Sheffield Experiment • Test properties of hierarchy • Does it mimic (in some way) Yahoo-like categories? – Parent related to child? – Parent more general than child?

 Mark Sanderson, University of Sheffield Experimental set-up • Gathered eight subjects – Presented subsumption categories and ‘random’ categories. – Ask if parent child pair are ‘interesting’. » If yes, then what type is relationship, (roughly) from WordNet » Aspect of » Type of » Same as » Opposite of » Don’t know

 Mark Sanderson, University of Sheffield Results • Question of parent/child pairing ‘interesting’ or not – Random,51% – Subsumption,67% – Difference significant from t-test, p<0.002 • If interesting, what is parent/child type? Odd?

 Mark Sanderson, University of Sheffield Yahoo categories?

 Mark Sanderson, University of Sheffield Results and conclusions • Interesting AND (aspect of OR type of) – Random,28%(51% * (47% + 8%)) – Subsumption,48%(67% * (49% + 23%)) • Appears that subsumption and an ordering based on document frequency does a reasonable job. – Term frequency work see. » Sparck Jones, K. (1972) A statistical interpretation of term specificity and its application in retrieval, in Journal of Documentation, 28(1): » Caraballo, S.A., Charniak, E. (1999) Determining the specificity of nouns from text, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP):

 Mark Sanderson, University of Sheffield Future work? • More user studies. • Incorporate other term relationship techniques • Other visualisations • Application of techniques to whole document collections. • Presentation of Cross Language IR results?