Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

YAGO-NAGA Project Presented By: Mohammad Dwaikat To: Dr. Yuliya Lierler CSCI 8986 – Fall 2012.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National.
1 Chapter 19: Information Retrieval. ©Silberschatz, Korth and Sudarshan19.2Database System Concepts - 5 th Edition, Sep 2, 2005 Chapter 19: Information.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Chapter 19: Information Retrieval
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
M1G Introduction to Database Development 1. Databases and Database Design.
Information Retrieval
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Web and Intranet Search: What‘s Next After Google* ? Moderator: Gerhard Weikum (Max-Planck Institute for CS) Panelists: Eric Brill (Microsoft Research)
Web 3.0 or The Semantic Web By: Konrad Sit CCT355 November 21 st 2011.
Saarbrucken / Germany ¨
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Marty Harris aka TEXT QUERY SYSTEM Marty Harris Mgr TRD.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Web-Enabled Decision Support Systems
1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
Ontology-Based Information Extraction: Current Approaches.
Flexible Text Mining using Interactive Information Extraction David Milward
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
MIS 673: Database Analysis and Design u Objectives: u Know how to analyze an environment and draw its semantic data model u Understand data analysis and.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Information Extraction Lecture 8 – Ontological and Open IE CIS, LMU München Winter Semester Dr. Alexander Fraser.
Information Extraction Lecture 8 – Ontological and Open IE Dr. Alexander Fraser, U. Munich September 10th, 2014 ISSALE: University of Colombo School of.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Majid Sazvar Knowledge Engineering Research Group Ferdowsi University of Mashhad Semantic Web Reasoning.
Introduction to the Semantic Web and Linked Data
Summary Knowledge Bases from Web are Real, Big & Useful: Entities, Classes & Relations Key Asset for Intelligent Applications: Semantic Search, Question.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Mining the Biomedical Research Literature Ken Baclawski.
DATA RESOURCE MANAGEMENT
Foundations of Business Intelligence: Databases and Information Management.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Search Strategies & Catalog Instruction Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science, University of Iowa.
Tutorial: Knowledge Bases for Web Content Analytics
Gaby Nativ, SDBI  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Biomedical Text Mining and Its Applications
Database and Information - Retrieval Methods for Knowledge Discovery
Information Extraction from Wikipedia: Moving Down the Long Tail
Data Resource Management
Tools for Memory: Database Management Systems
Web IR: Recent Trends; Future of Web Search
Lecture 16: Probabilistic Databases
Information Retrieval
Introduction to Information Retrieval
Observations on DB & IR and Conclusions
Chaitali Gupta, Madhusudhan Govindaraju
Aiming at prize for brilliant idea the world is not ready for.
Topic: Semantic Text Mining
Introduction to Search Engines
Presentation transcript:

Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum, Gjergi Kasneci, Maya Ramanath, Fabian Suchanek Idea Summary Aaron Stewart April 29, 2009

Abstract “Our aim here is to advocate… the integration of database systems (DB) and information-retrieval (IR) methods… “One grand goal of such an endeavor is the automatic building and maintenance of a comprehensive knowledge base of facts from encyclopedic sources and the scientific literature. “Facts should be represented in terms of typed entities and relationships and allow expressive queries that return ranked results with precision in an efficient and scalable manner. We thus explore how DB and IR methods might contribute toward this ambitious goal.”

Goal “Find young patients in central Europe who have been reported, in the past two weeks, to have symptoms of tropical virus diseases and an indication of anomalies” –Structured predicates (age) –Fuzzy predicates (anomaly) –Ranking Google: 8&rlz=1T4SUNA_enUS322US322&q=Find+young+patients+in+central+Europe+who+have+been+reported%2c+in+the+past+two+weeks%2c+to+have+symptoms+of+tro pical+virus+diseases+and+an+indication+of+anomalies 8&rlz=1T4SUNA_enUS322US322&q=Find+young+patients+in+central+Europe+who+have+been+reported%2c+in+the+past+two+weeks%2c+to+have+symptoms+of+tro pical+virus+diseases+and+an+indication+of+anomalies

DB/IR Requests Approximate matching and record linkage –“M-31” and “NGC 224”: Andromeda galaxy Too-many-answers ranking Schema relaxation and homogeneity Information extraction and uncertain data Entity search and ranking

Problems: Web Querying Which German Nobel laureate survived both world wars and outlived all four of his children? –Max Planck Which politicians are also accomplished scientists? –Benjamin Franklin –Angela Merkel How are Max Planck, Angela Merkel, Jim Gray, and the Dalai Lama related? –All four have doctoral degrees from German universities

Problems: Web Querying Which German Nobel laureate survived both world wars and outlived all four of his children? –Max Planck Which politicians are also accomplished scientists? –Benjamin Franklin –Angela Merkel How are Max Planck, Angela Merkel, Jim Gray, and the Dalai Lama related? –All four have doctoral degrees from German universities

Approaches: Web Querying Semantic web repositories –SUMO, OpenCyc, WordNet –GeneOntology, UMLS Information extraction –YAGO, etc. Social web –Wikipedia

Projects Libra Cimple/DBLife KnowItAll/TextRunner YAGO Kylin/KOG

Libra Microsoft Research – Beijing Entity web search HTML tables and lists Tools: hierarchical CRF, LM

Cimple/DBLife University of Wisconsin / Yahoo! Research “Super-homepages” Tools: Datalog, DB, tf*idf

KnowItAll/TextRunner xtrunner/ xtrunner/ –“Who build the pyramids?” University of Washington – Seattle Gathers information from many pages Seed patterns TextRunner: unsupervised bootstrapping

YAGO “Yet another great ontology” Typed ER graph Wikipedia infoboxes and categories NLP processing (identify relationship type) WordNet

Kylin/KOG “Intelligence in Wikipedia” project They use interesting tools: CRFs, SVMs, Markov Logic Networks

NAGA (for YAGO) Not Another Google Answer –“What is known about Einstein?” Ranking –Informativeness –Confidence –Compactness

Challenges Scalable harvesting Expressive ranking –User context –Data context Efficient search

Background “DB and IR are separate fields in computer science due to historical accident. Both investigate concepts, models, and computational methods for managing large amounts of complex information, though each began almost 40 years ago with very different application areas as motivations and technology drivers; for DB it was accounting systems (such as online reservations and banking), and for IR it was library systems (such as bibliographic catalogs and patent collections). “Moreover, these two directions and their related research communities emphasized very different aspects of information management; for DB it was data consistency, precise query processing, and efficiency, and for IR it was text understanding, statistical ranking models, and user satisfaction.”