NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

Pathfinder task review. Task reminder All master trainer teams to prepare a pathfinder Pathfinders are… –subject guides / tools that can be used to find.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Coupled Semi-Supervised Learning for Information Extraction Carlson et al. Proceedings of WSDM 2010.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Coupling Semi-Supervised Learning of Categories and Relations by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell School.
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
Open Information Extraction From The Web Rani Qumsiyeh.
Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.
Methods for Domain-Independent Information Extraction from the Web An Experimental Comparison Oren Etzioni et al. Prepared by Ang Sun
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
Character-Level Analysis of Semi-Structured Documents for Set Expansion Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon.
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,
Populating the Semantic Web by Macro-Reading Internet Text T.M Mitchell, J. Betteridge, A. Carlson, E. Hruschka, R. Wang Presented by: Will Darby.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/2010 Overview of NLP tasks (text pre-processing)
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Detroit Red WingsBoston BruinsNew York RangersChicago BlackhawksToronto Maple LeafsMontreal Canadians.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Bayesian Network Student Model for Adapting Learning Activity Tasks in Adaptive Course Generation System Introduction Adaptive educational hypermedia system.
Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Unsupervised and Semi-Supervised Relation Extraction.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
1 The BT Digital Library A case study in intelligent content management Paul Warren
F ROM U NSTRUCTURED I NFORMATION T O L INKED D ATA Axel Ngonga Head of University of Leipzig IASLOD, August 15/16 th 2012.
Survey of Semantic Annotation Platforms
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Champions NHL Teams. Montreal Canadiens 24 Stanley Cups Original 6 Team Tradition in NHL People know the name.
SCALING THE KNOWLEDGE BASE FOR THE NEVER-ENDING LANGUAGE LEARNER (NELL): A STEP TOWARD LARGE-SCALE COMPUTING FOR AUTOMATED LEARNING Joel Welling PSC 4/10/2012.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Information Extraction MAS.S60 Catherine Havasi Rob Speer.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
Flexible Text Mining using Interactive Information Extraction David Milward
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, April
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
DeepDive Introduction Dongfang Xu Ph.D student, School of Information, University of Arizona Sept 10, 2015.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Extraction from Wikipedia: Moving Down the Long.
N EVER -E NDING L ANGUAGE L EARNING (NELL) Jacqueline DeLorie.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
The Road to the Semantic Web Michael Genkin SDBI
Group Members00695 – Lê Hải Long – Hoàng Minh Khải – Lê Bá Long 00682– Vũ Thế Vịnh – Nguyễn Thanh Tùng SupervisorTrần Bình D ươ ng.
Supervisor: Tran Dinh Tri Group Members: Duong Ngoc Nhat-NhatDN01687 Nguyen Quang Minh-MinhNQ01717 Nguyen Quang Minh-MinhNQ01717 Duong Hoang Nam-NamDH01552.
Einat Minkov University of Haifa, Israel CL course, U
Information Organization: Overview
NELL Knowledge Base of Verbs
Information Retrieval and Web Search
Information Retrieval and Web Search
Social Knowledge Mining
Dave Touretzky Read Mitchell et al. (2018)
Automatic Detection of Causal Relations for Question Answering
Natural Language Processing
Information Organization: Overview
Topic: Semantic Text Mining
Online Solution for Small Shop
Presentation transcript:

NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January

Idea: Build a structuring KB. What is KB?  Categories: cities, companies, sport teams….  Relations: hasOfficeIn(organisation, location)  Noun Phrase What is structuring KB?

Globe and Mail Stanley Cup hockey NHL Toronto CFRB Wilson play hired won Maple Leafs home town city paper league Sundin Milson writer radio Maple Leaf Gardens team stadium Canada city stadium politician country Miller airport member Toskala Pearson Skydome Connaught Sunnybrook hospital city company skateshelmet uses equipment won Red Wings Detroi t hometown GM city company competes with Toyota plays in league Prius Corrola created Hino acquired automobile economic sector city stadium Idea: Structuring Knowledge Base climbing football uses equipment

Ideas: using Machine Learning Machine Learning: a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.artificial intelligencelearn

Ideas Seed examples Web NELL Knowledge Base (KB) Human trainers Initial ontology

Ideas: the task run 24x7, forever each day: 1.Reading task: extract more facts from the web to populate the initial ontology. 2.Learning task: learn to read (perform #1) better than yesterday.

NELL Architecture Beliefs Candidate facts Knowledge Integrator CPL RL CMC CSEAL Data Resources Knowledge Base Subsystem Components

Coupled Pattern Learner (CPL) - Learns to extract category and relation instances/ pattern from unstructure text. - Learns contextual pattern that high-precision extractor for each predicate. - Eg: + Trang An la ten mot co gai. + Trang An la ten mot cong ty.  Use it to improve high-precision

Input/Output - Input : + Larger text corpus + Initial ontology containing the information. - Output: + Proposed instances/ contextual pattern for each predicate.

Input: An ontology O, and a text corpus C Output: Trusted instances/patterns for each predicate for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances; FILTER candidates that violate coupling; RANK candidate instances/patterns; PROMOTE top candidates; end

Example: Samsung v ừ a tung clip ch ế nh ạ o s ả n ph ẩ m m ớ i c ủ a Nokia. Example: Samsung v ừ a tung clip ch ế nh ạ o s ả n ph ẩ m m ớ i c ủ a Nokia. CityHa Noi, Ho Chi Minh, Da Nang,... CompanySon Ha, Kinh Do,... competesWith(AMD, Intel), (Google, Microsoft), (Samsung, Nokia),...

Coupled SEAL Beliefs CSEAL New candidate facts Internet

Coupled SEAL SEAL ( Set Expander for Any Language ): expands entities automatically by utilizing resources from the Web CSEAL adds mutual-exclusion and type- checking constraints

Coupled SEAL Coupled SEAL :: A semi-structured extractor Queries the internet with sets of beliefs from each category or relation; mines lists and tables for instances Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables 5 queries/category 10 queries/relation fetches 50 web pages/query probabilities assigned as in CPL

Coupled SEAL Example:

Coupled Morphological Classifier KB Data Resources CMC New candidate facts CMC classify NP based on various morphological features (words, capitalization, affixes)

Coupled Morphological Classifier Ex1: Bach Mai hotel  hotel(Bach Mai) Ex2: Mai  person(Mai) Ex3: tradition  noun(tradition)

Coupled Morphological Classifier Beliefs from KB are used as training instances CMC examines candidate facts proposed by other components and classifies up to 30 new beliefs/candidate

Rule Learner Candidate facts Beliefs RL New candidate facts RL uses categories and relations in KB as its input and make new relations for KB.

Rule Learner Example 1: playSport(Rooney, football)  athlete(Rooney), sport(football) Example2: isCapital(Hanoi, Vietnam), liveIn(Thanh, Hanoi), roommate(Thanh, Khoai), roommate(Khoai, Cam)  liveIn(Thanh, Vietnam), roommate(Thanh, Cam), liveIn(Khoai, Hanoi)…..

Rule Learner Some kinds of Rule Learner Systems: OneR, Ridor, PART, JRip, ConjunctiveRule. Clip: tDeu2ic

Initial result Running 24x7, since January, 12, 2010 Inputs: ontology defining >600 categories and relations seed examples of each 100,000 web search queries per day ~ 5 minutes/day of human guidance Result: KB with > 15 million candidate beliefs, growing daily learning to reason, as well as read automatically extending its ontology

Initial result Demo: erage:beer erage:beer

References NELL article: aaai10.pdf guage_learning/ guage_learning/ Tom Mitchell’s seminar: RL: learner-or-rule-induction/