Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,

Similar presentations


Presentation on theme: "Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,"— Presentation transcript:

1 Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum, Gjergi Kasneci, Maya Ramanath, Fabian Suchanek Idea Summary Aaron Stewart April 29, 2009

2 Abstract “Our aim here is to advocate… the integration of database systems (DB) and information-retrieval (IR) methods… “One grand goal of such an endeavor is the automatic building and maintenance of a comprehensive knowledge base of facts from encyclopedic sources and the scientific literature. “Facts should be represented in terms of typed entities and relationships and allow expressive queries that return ranked results with precision in an efficient and scalable manner. We thus explore how DB and IR methods might contribute toward this ambitious goal.”

3 Goal “Find young patients in central Europe who have been reported, in the past two weeks, to have symptoms of tropical virus diseases and an indication of anomalies” –Structured predicates (age) –Fuzzy predicates (anomaly) –Ranking Google: http://www.google.com/search?sourceid=navclient&ie=UTF- 8&rlz=1T4SUNA_enUS322US322&q=Find+young+patients+in+central+Europe+who+have+been+reported%2c+in+the+past+two+weeks%2c+to+have+symptoms+of+tro pical+virus+diseases+and+an+indication+of+anomalies http://www.google.com/search?sourceid=navclient&ie=UTF- 8&rlz=1T4SUNA_enUS322US322&q=Find+young+patients+in+central+Europe+who+have+been+reported%2c+in+the+past+two+weeks%2c+to+have+symptoms+of+tro pical+virus+diseases+and+an+indication+of+anomalies

4 DB/IR Requests Approximate matching and record linkage –“M-31” and “NGC 224”: Andromeda galaxy Too-many-answers ranking Schema relaxation and homogeneity Information extraction and uncertain data Entity search and ranking

5 Problems: Web Querying Which German Nobel laureate survived both world wars and outlived all four of his children? –Max Planck Which politicians are also accomplished scientists? –Benjamin Franklin –Angela Merkel How are Max Planck, Angela Merkel, Jim Gray, and the Dalai Lama related? –All four have doctoral degrees from German universities

6 Problems: Web Querying Which German Nobel laureate survived both world wars and outlived all four of his children? –Max Planck Which politicians are also accomplished scientists? –Benjamin Franklin –Angela Merkel How are Max Planck, Angela Merkel, Jim Gray, and the Dalai Lama related? –All four have doctoral degrees from German universities

7 Approaches: Web Querying Semantic web repositories –SUMO, OpenCyc, WordNet –GeneOntology, UMLS Information extraction –YAGO, etc. Social web –Wikipedia

8 Projects Libra Cimple/DBLife KnowItAll/TextRunner YAGO Kylin/KOG

9 Libra Microsoft Research – Beijing Entity web search HTML tables and lists Tools: hierarchical CRF, LM

10 Cimple/DBLife http://dblife.cs.wisc.edu/ University of Wisconsin / Yahoo! Research “Super-homepages” Tools: Datalog, DB, tf*idf

11 KnowItAll/TextRunner http://www.cs.washington.edu/research/te xtrunner/http://www.cs.washington.edu/research/te xtrunner/ –“Who build the pyramids?” University of Washington – Seattle Gathers information from many pages Seed patterns TextRunner: unsupervised bootstrapping

12 YAGO www.mpi-inf.mpg.de/yago/ “Yet another great ontology” Typed ER graph Wikipedia infoboxes and categories NLP processing (identify relationship type) WordNet

13 Kylin/KOG “Intelligence in Wikipedia” project http://www.cs.washington.edu/ai/iwp/ They use interesting tools: CRFs, SVMs, Markov Logic Networks

14 NAGA (for YAGO) www.mpi-inf.mpg.de/yago/ Not Another Google Answer –“What is known about Einstein?” Ranking –Informativeness –Confidence –Compactness

15 Challenges Scalable harvesting Expressive ranking –User context –Data context Efficient search

16

17 Background “DB and IR are separate fields in computer science due to historical accident. Both investigate concepts, models, and computational methods for managing large amounts of complex information, though each began almost 40 years ago with very different application areas as motivations and technology drivers; for DB it was accounting systems (such as online reservations and banking), and for IR it was library systems (such as bibliographic catalogs and patent collections). “Moreover, these two directions and their related research communities emphasized very different aspects of information management; for DB it was data consistency, precise query processing, and efficiency, and for IR it was text understanding, statistical ranking models, and user satisfaction.”


Download ppt "Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,"

Similar presentations


Ads by Google