Ontologies & Machine Learning

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Background Knowledge for Ontology Construction Blaž Fortuna, Marko Grobelnik, Dunja Mladenić, Institute Jožef Stefan, Slovenia.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Copyright © 2002 Cycorp Introduction Fundamental Expression Types Top Level Collections Time and Dates Spatial Properties and Relations Event Types Information.
Chapter 5: Introduction to Information Retrieval
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Context of White Paper 3 The Data Reference Model (DRM) Version 2.0 had three components, Data Description, Data Context and Data Sharing It pushed details.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Kje se semantični splet stika z vsakdanjo spletno realnostjo? Marko Grobelnik Institut Jožef Stefan Ljubljana, Slovenija Kiberpipa, Spletne urice, 21.
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Chapter 5: Information Retrieval and Web Search
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Presented to: By: Date: Federal Aviation Administration Enterprise Information Management SOA Brown Bag #2 Sam Ceccola – SOA Architect November 17, 2010.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Michael Witbrock Ph.D. Cycorp, Inc. February 6 th, 2007.
Michael Witbrock Ph.D. Cycorp, Inc. February 2008 Cycorp © 2008.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Artificial intelligence project
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Chapter 6: Information Retrieval and Web Search
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
International Workshop Jan 21– 24, 2012 Jacksonville, Fl USA Model-based Systems Engineering (MBSE) Initiative Slides by Henson Graves Presented by Matthew.
HITIQA: Scenario Based Question Answering Tomek Strzalkowski, et al The State University of New York at Albany Paul Kantor, et al Rutgers University Boris.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
March 15, July 2005 MicrowaveOven is a type of Kitchen-Appliance Dishwasher is a type of Kitchen-Appliance.
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Defects of UML Yang Yichuan. For the Presentation Something you know Instead of lots of new stuff. Cases Instead of Concepts. Methodology instead of the.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
The Semantic Web By: Maulik Parikh.
What Is Cluster Analysis?
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Sentiment analysis algorithms and applications: A survey
CSC 594 Topics in AI – Natural Language Processing
Course Outcomes of Object Oriented Modeling Design (17630,C604)
System for Semi-automatic ontology construction
School of Computer Science & Engineering
ece 627 intelligent web: ontology and beyond
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Survey of Knowledge Base Content
Computer Programming.
ece 627 intelligent web: ontology and beyond
BPMN - Business Process Modeling Notations
Automatic Detection of Causal Relations for Question Answering
Stefan SCHULZ IMBI, University Medical Center, Freiburg, Germany
Information Networks: State of the Art
Word embeddings (continued)
Semi-Automatic Data-Driven Ontology Construction System
Semantic Wikis Expedition #52 Conor Shankey CEO July 18, 2006
Implementation of Learning Systems
Information Retrieval
Introduction to Search Engines
Presentation transcript:

Ontologies & Machine Learning Marko Grobelnik Blaz Fortuna Jozef Stefan Institute, Slovenia

Aim of the talk The main goal of this talk is to show knowledge modeling in relation to machine learning in two ways: …top-down modeling with “deep ontologies” on the example of Cyc system (http://www.cyc.com/) …bottom-up modeling of “light ontologies” on the example of OntoGen ontology learning system (http://ontogen.ijs.si)

What areas of research are we trying to target? Text-Mining, Link-Analysis and other analytic techniques dealing mainly with extracting and aggregating the information from raw data …they maximize the quality of extracted information Semantic Web dealing mainly with the integration and representation of the given data …it maximizes reusability of the given information Both areas are very much complementary and necessary for operational information engineering

Ontologies

What is an Ontology? Ontologies are main formal objects within Semantic Web and recently also within Text Analytics Ontologies have origin in philosophy, but within computer science they represent a data model that represents a domain and is used to reason about the objects in that domain and the relations between them …their main aim is to describe and represent an area of knowledge in a formal way

What is an Ontology? machine processable concepts, properties, Formal, concepts, properties, relations, functions explicit specification, Consensual knowledge of a shared Abstract model of some domain conceptualisation. Frank.van Harmelen 2003: http://seminars.ijs.si/sekt

Which elements represent an ontology? An ontology typically consists of the following elements: Instances – the basic or “ground level” objects Classes – sets, collections, or types of objects Attributes – properties, features, characteristics, or parameters that objects can have and share Relations – ways that objects can be related to one another Analogies between ontologies and relational databases: Instances correspond to records Classes correspond to tables Attributes correspond to record fields Relations correspond to relations between the tables

Levels Semantic-Web formalisms The W3C “Semantic Web Layer Cake” shows representation levels and related technologies Infrastructure Higher level of representation and reasoning (RIF) (OWL) Different Levels of Semantic Abstraction Addressing the information Character Level Encoding

Top-down modeling of knowledge Cyc system

Cyc …a little bit of historical context Older AI-ers know about Cyc: …one of the boldest attempts in AI history to encode common sense knowledge in one KB The project started in 1984 at Stanford as US response to Japan’s project on “5th Generation Computer Systems” In 1994 the company Cycorp was established (in Austin, TX) In 2005 Cyc KB gets opened and available for research OpenCyc (http://www.opencyc.org/) ResearchCyc (http://research.cyc.com/) In 2006 Cyc-Europe was established (in Ljubljana, Slovenia) Till 2006 ~$80M was spent for construction of the KB

The Cyc Ontology General Knowledge about Various Domains Represented in: First Order Logic Higher Order Logic Context Logic Micro-theories Cyc contains: 15,000 Predicates 300,000 Concepts 3,200,000 Assertions Thing Intangible Individual Temporal Spatial Partially Tangible Paths Sets Relations Logic Math Time Agents Space Physical Objects Human Beings Organ- ization Activities Living Things Artifacts Movement State Change Dynamics Materials Parts Statics Physical Agents Borders Geometry Events Scripts Spatial Paths Actors Actions Plans Goals Social Behavior Life Forms Animals Plants Ecology Natural Geography Earth & Solar System Political Weather Agent Organizations Organizational Actions Plans Types of Human Nations Governments Geo-Politics Business, Military Law Human Artifacts Social Relations, Culture Anatomy & Physiology Emotion Perception Belief Behavior & Actions Products Devices Conceptual Works Vehicles Buildings Weapons Mechanical & Electrical Software Literature Works of Art Language Business & Commerce Politics Warfare Professions Occupations Purchasing Shopping Travel Communication Transportation & Logistics Social Activities Everyday Living Sports Recreation Entertainment General Knowledge about Various Domains Specific data, facts, and observations Cycorp © 2006 11

…part of Cyc Ontology on Human Beings

Structure of Cyc Ontology Knowledge Base Layers Upper Ontology: Abstract Concepts Upper Ontology Core Theories: Space, Time, Causality, … Core Theories Domain-Specific Theories Domain-Specific Theories Facts (Database) The Knowledge Base (KB) itself comprises a massive taxonomy of concepts and specifically-defined relationships that describe how those concepts are related. This figure represents the context of the knowledge arranged by degrees of generality, with a small layer of abstract generalizations at the top and a large layer of real-world facts at the bottom. Facts: Instances

Structure of Cyc Ontology Upper Ontology: Abstract Concepts EVENT  TEMPORAL-THING  INDIVIDUAL  THING Knowledge Base Layers Upper Ontology Core Theories Domain-Specific Theories Facts (Database) The Upper Ontology doesn’t say much about the world at all. It represents very general relations between very general concepts. For example, it contains the assertions to the effect that every event is a temporal thing, every temporal thing is an individual, and every individual is a thing. “Thing” is Cyc’s most general concept. Everything whatsoever is an instance of “thing.”

Structure of Cyc Ontology Upper Ontology: Abstract Concepts EVENT  TEMPORAL-THING  INDIVIDUAL  THING Knowledge Base Layers Core Theories: Space, Time, Causality, … Upper Ontology For all events a and b, a causes b implies a precedes b Core Theories Domain-Specific Theories Facts (Database) The KB contains several core theories that represent general facts about space, time, and causality. These are the theories that are essential to almost all common-sense reasoning.

Structure of Cyc Ontology Upper Ontology: Abstract Concepts EVENT  TEMPORAL-THING  INDIVIDUAL  THING Knowledge Base Layers Core Theories: Space, Time, Causality, … Upper Ontology For all events a and b, a causes b implies a precedes b Core Theories Domain-Specific Theories For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. Domain-Specific Theories Facts (Database) Domain-Specific Theories are more specific than core theories. These theories apply to special areas of interest like military movement, the propagation of diseases, finance, chemistry, etc. These are the theories that make Cyc particularly useful, but are not necessary for common sense reasoning.

Structure of Cyc Ontology Upper Ontology: Abstract Concepts EVENT  TEMPORAL-THING  INDIVIDUAL  THING Facts (Database) Upper Ontology Core Theories Domain-Specific Theories Knowledge Base Layers Core Theories: Space, Time, Causality, … For all events a and b, a causes b implies a precedes b Domain-Specific Theories For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. Facts: Instances The final layer contains what is sometimes called “ground-level facts.” These are statements about particular individuals in the world. For example, “John has anthrax” is a specific statement about one person. Generalizations would not go here, they would go in a layer above. Anything you can imagine as a headline in a newspaper would probably go here. John is a person infected by anthrax.

Cyc KB Extended w/Domain Knowledge Thing Intangible Individual Temporal Spatial Partially Tangible Paths Sets Relations Logic Math General Knowledge about Terrorism: Terrorist groups are capable of directing assassinations: (implies (isa ?GROUP TerroristGroup) (behaviorCapable ?GROUP AssassinatingSomeone directingAgent)) … If a terrorist group considers an agent an enemy, that agent is vulnerable to an attack by that group: (and (considersAsEnemy ?GROUP ?TARGET)) (vulnerableTo ?GROUP ?TARGET TerroristAttack)) Time Agents Space Physical Objects Human Beings Organ- ization Activities Living Things Artifacts Movement State Change Dynamics Materials Parts Statics Physical Agents Borders Geometry Events Scripts Spatial Paths Actors Actions Plans Goals Social Behavior Life Forms Animals Plants Ecology Natural Geography Earth & Solar System Political Weather Agent Organizations Organizational Actions Plans Types of Human Nations Governments Geo-Politics Business, Military Law Human Artifacts Social Relations, Culture Anatomy & Physiology Emotion Perception Belief Behavior & Actions Products Devices Conceptual Works Vehicles Buildings Weapons Mechanical & Electrical Software Literature Works of Art Language Business & Commerce Politics Warfare Professions Occupations Purchasing Shopping Travel Communication Transportation & Logistics Social Activities Everyday Living Sports Recreation Entertainment General Knowledge about Terrorism Specific data, facts, and observations about terrorist groups and activities Cycorp © 2006 18

Cyc KB Extended w/Domain Knowledge Thing Intangible Individual Temporal Spatial Partially Tangible Paths Sets Relations Logic Math Time Agents Space Physical Objects Human Beings Organ- ization Activities Living Things Specific Facts about Al Qaida: (basedInRegion AlQaida Afghanistan) Al-Qaida is based in Afghanistan. (hasBeliefSystems AlQaida IslamicFundamentalistBeliefs) Al-Qaida has Islamic fundamentalist beliefs. (hasLeaders AlQaida OsamaBinLaden) Al-Qaida is led by Osama bin Laden. … (affiliatedWith AlQaida AlQudsMosqueOrganization) Al-Qaida is affiliated with the Al Quds Mosque. (affiliatedWith AlQaida SudaneseIntelligenceService) Al-Qaida is affiliated with the Sudanese Intell Service (sponsors AlQaida HarakatUlAnsar) Al-Qaida sponsors Harakat ul-Ansar. (sponsors AlQaida LaskarJihad) Al-Qaida sponsors Laskar Jihad. … (performedBy EmbassyBombingInNairobi AlQaida) Al-Qaida bombed the Embassy in Nairobi. (performedBy EmbassyBombingInTanzania AlQaida) Al-Qaida bombed the Embassy in Tanzania. Artifacts Movement State Change Dynamics Materials Parts Statics Physical Agents Borders Geometry Events Scripts Spatial Paths Actors Actions Plans Goals Social Behavior Life Forms Animals Plants Ecology Natural Geography Earth & Solar System Political Weather Agent Organizations Organizational Actions Plans Types of Human Nations Governments Geo-Politics Business, Military Law Human Artifacts Social Relations, Culture Anatomy & Physiology Emotion Perception Belief Behavior & Actions Products Devices Conceptual Works Vehicles Buildings Weapons Mechanical & Electrical Software Literature Works of Art Language Business & Commerce Politics Warfare Professions Occupations Purchasing Shopping Travel Communication Transportation & Logistics Social Activities Everyday Living Sports Recreation Entertainment General Knowledge about Terrorism Specific data, facts, and observations about terrorist groups and activities Cycorp © 2006 19

An example of Psychoanalyst’s Cyc taxonomic context #$Psychoanalyst (lexical representation: “psychoanalyst”, “psychoanalysts”) specialization-of #$MedicalCareProfessional | specialization-of #$HealthProfessional | specialization-of #$Professional-Adult | specialization-of #$Professional specialization-of #$Psychologist | specialization-of #$Scientist | specialization-of #$Researcher | | specialization-of #$PersonWithOccupation | | | specialization-of #$Person | | | | specialization-of #$HomoSapiens | | | | | instance-of #$BiologicalSpecies | | | | | | specialization-of #$BiologicalTaxon | | | | | instance-of #$SomeSampleKindsOfMammal-Biology-Topic

Example Vocabulary: Senses of ‘In’ relation (1/3) Can the inner object leave by passing between members of the outer group? Yes -- Try #$in-Among Cycorp © 2006 21

Example Vocabulary: Senses of ‘In’ relation (2/3) Does part of the inner object stick out of the container? None of it. -- Try #$in-ContCompletely Yes -- Try #$in-ContPartially No -- Try #$in-ContClosed If the container were turned around could the contained object fall out? Yes -- Try #$in-ContOpen Cycorp © 2006 22

Example Vocabulary: Senses of ‘In’ relation (3/3) Is it attached to the inside of the outer object? Yes -- Try #$connectedToInside Can it be removed by pulling, if enough force is used, without damaging either object? No -- Try #$in-Snugly or #$screwedIn Does the inner object stick into the outer object? Yes – Try #$sticksInto Cycorp © 2006 23

Cyc’s front-end: “Cyc Analytic Environment” – querying (1/2) Text query Query (semi) automatically translated in the First Order Logic Answers to the query

Cyc’s front-end: “Cyc Analytic Environment” – justification (2/2) Query & Answer Justification Sources for Reasoning and Justification

Document Tagging

Document Tagging Document Tagging …

Annotating the document with CycKB

Probabilistic Concept Tagging “The plants that produced the cranes that NASA deployed in space in the 1990s are in Canada.” The plants (#$FactoryBuildingComplex 0.8817 #$Plant 0.0967) that produced (#$Production-Generic 0.6017) the cranes (#$Crane-MotorizedDevice 0.9387 #$Crane-Bird 0.0408) that NASA (#$NASA) deployed (#$DeployingMaterial) in space (#$OuterSpace 0.51 #$SpaceInAHOC 0.1473 #$ReservedSpaceRegion 0.0459 #$Area 0) in the 1990s ((#$DecadeFn 199)) are in Canada (#$Canada 1).

Knowledge Template Induction

Train +-------------------------------------------Xp-------------------------------------------+ +------------Wd------------+ +--------------------MVp---------------------+ | | +--------A--------+ | +------Jp-----+----Mp----+ | | | | +--G--+--G-+--Ss--+---Os---+--Mp-+ +--Dmcn--+ +N Sa+ +-Js-+ | | | | | | | | | | | | | | | | LEFT Royal.a Dutch Shell Plc halted.v output.n of 455,000 barrels.n a day.p in Nigeria . (#$and (#$isa (#$TheFn #$DecreaseEvent) (#$DecreaseInValueReturnedByFn (#$ExportRateOfByFn #$Petroleum-CrudeOil) #$Nigeria))        (#$doneBy (#$TheFn #$DecreaseEvent) #$RoyalDutchShell)        (#$quantityChangeAmount (#$TheFn #$DecreaseEvent) (#$BarrelsPerDay 455000))) Template +-------------------------------------------Xp-------------------------------------------+ +------------Wd------------+ +--------------------MVp---------------------+ | | + | +------Jp-----+ | | | +-----------+--Ss--+---Os---+--Mp-+ + +-Js-+ | | | | | | | | | | LEFT [Agent] halted.v output.n of [Quantity] in [Locn] . (#$and (#$isa (#$TheFn #$DecreaseEvent) (#$DecreaseInValueReturnedByFn (#$ExportRateOfByFn #$Petroleum-CrudeOil) [Locn]))        (#$doneBy (#$TheFn #$DecreaseEvent) [Agent])        (#$quantityChangeAmount (#$TheFn #$DecreaseEvent) [Quantity])) Example-Based Machine Translation (EBMT) is an approach to building translation rules between two natural languages by providing a learning system with pairs of sentences in the two languages. We can apply EBMT to the acquisition of IE rules by providing the system pairs in which the first element is a sentence in natural language, and the second element is an appropriate internal representation in CycL. Use Petróleos de Venezuela S.A. halted output of 760 000 barrels a week in Maracaibo. (#$and (#$isa (#$TheFn #$DecreaseEvent) (#$DecreaseInValueReturnedByFn (#$ExportRateOfByFn #$Petroleum-CrudeOil) #$CityOfMaracaiboVenezuela))        (#$doneBy (#$TheFn #$DecreaseEvent) #$PetroleosdeVenezuelaSA        (#$quantityChangeAmount (#$TheFn #$DecreaseEvent) (#$BarrelsPerDay 760000))) 31

Learning Facts by Search

Learning Facts by Search Query “What are symptoms of Whooping Cough?”  (symptomOfAilment WhoopingCough ?SYMP ) NL Generation Partial English sentences “A symptom of whooping cough is ___” “Whooping cough can cause ___” “A symptom of Pertussis Bordetella is ___” “Symptoms (such as ____) of whooping cough”

Parsing Results Looking for something that matches the argument constraints on the predicate… “… symptoms of pertussis such as fever and a dry cough …” Parse back into existing CycL concepts (symptomOfAilment WhoopingCough Fever) (symptomOfAilment WhoopingCough Coughing-AilmentCondition)

  KB Consistency Check Throw out provably wrong answers Explicitly: perform one step of inference to throw out facts inconsistent with KB Implicitly: don’t even look at things that don’t match argument constraints Skip already known (provably right) knowledge 

Initial Results Total Queries: 348 Web Searches: 4290 Initial: 817 verification: 3474 Sentences Found: 1014 Rejected Results: 954 inconsistent with KB: 4 already known to the KB: 384 rejected using Google: 566 Novel formulæ: 61 348 Queries 817 Searches (symptomOfAilment WhoopingCough Coughing-AilmentCondition) (symptomOfAilment WhoopingCough Blindness) (symptomOfAilment WhoopingCough Fever) 1016 Sentences Found 3474 Searches 566 Rejected 388 Sentences Rejected 61 Sentences Asserted

Microtheory (context) Suggestion

Automatic Ontology Placement Cyc’s knowledge is contextualized into internally consistent Microtheories (MTs). New knowledge is inserted into that hierarchy manually by ontologists. An Mt Suggestor recommends appropriate placement of knowledge into the appropriate micro-theories (contexts)

MT Suggestor Approach Problem is similar to hierarchical text classification Much less data per instance Very rich (deep) structure Approaches: Generative Bayes model Multiclass SVM classification Inputs: Each assertion is broken into atomic terms Each unique term is given an index Each assertion is a list of term indices (as few as 3 for binary assertions, as many as 180 for complex rules) Training examples are indexed identically SVM Classification: outputs the index of the best Mt Bayes model: outputs probability of fit for each Mt

Precision Recall F1 Score Results 89,000 Assertions, 64,000 distinct terms, 28 Mts 10 fold-cross validation Method Precision Recall F1 Score Bayes 0.85 0.95 0.9 Multiclass SVM 0.98 These are all micro numbers Precision Recall F1 Score

Induction of new rules with ILP

Learning Higher-Order Knowledge Learning Rules with Inductive Logic Programming Integrating ALEPH ILP system into Cyc Verification (asking or experimenting) Asking a human directly Natural language processing of text Probabilistic analysis Maybe all mothers are female? All the mothers I know about are female… Cyc Ontology & Knowledge Base Mothers are female. So right now we have a preliminary version of rule induction working, and must improve it and automate using it Fill gaps New knowledge forces strengthening A lot of common sense knowledge still isn’t captured!

Performing Induction in Cyc Integrate Cyc and Aleph FOL-ify CycL and export to Aleph Produce ILP learning bias from background knowledge Based on semantic content of predicate knowledge CycL-ify, review, and assert ILP-produced rules First-orderized Facts Facts & Background Perform Induction Background Knowledge Induced Rules Good Rules I have an idea. If I’m right, it would mean that… Status: I have 6 answers Because I think maybe all bacterial diseases that affect the lungs cause coughing. True Coughing is a symptom of whooping cough. Coughing is a symptom of Eastern Equine Encephalitis. Coughing is a symptom of bacterial anthrax. False Don’t Know Doesn’t make sense Score: 52 Hi, Cmat! Total score this session: 0 This Session: Last round: 0 Best round: 0 Best agrmnt: 0% Click Here to Play! How to Play High Scores Evaluate Results

Sample Rules Produced (implies (and (cyclistPrimaryProject ?KE ?PROJECT) (projectTasks ?PROJECT ?TASK) (requestedEffortPercent ?TASK ?KE ?X)) (assignedEffortPercent ?TASK ?KE ?X)) (projectManagers ?PROJECT ?AGENT)) (projectParticipants ?PROJECT ?AGENT)) (primarySupervisor ?AGENT AGENT-1) (requestedEffortPercent ?TASK ?AGENT ?X) (projectManagers ?PROJECT ?AGENT-1) (projectTasks ?PROJECT ?TASK)) (assignedEffortPercent ?TASK ?AGENT ?X))

Sample Rules Produced If someone’s time has been requested for a task by that person’s primary project, the time will be assigned. People participate in the projects they manage. (One hopes!) People are assigned to tasks requested of them by projects managed by that person’s direct supervisor. These are only patterns, not always guaranteed to be true – but they’re useful and common-sensical.

Bottom-up modeling of knowledge OntoGen system

Underlying concepts Semi-Automatic Data-Driven Text-mining methods provide suggestions and insights into the domain The user can interact with parameters of text-mining methods All the final decisions are taken by the user Data-Driven Most of the aid provided by the system is based on some underlying data provided by the system Instances are described by features extracted from the data (e.g. bag-of-words vectors)

Main Features Interactive user interface User can interact in real-time with the integrated machine learning and text mining methods Concept discovery methods: Unsupervised k-means clustering Latent Semantic Indexing (LSI) Supervised Active learning Concept visualization Methods for helping at understanding the discovered concepts: Keyword extraction TFIDF and SVM-normal based keyword extraction Concept visualization LSI and multi-dimensional scaling based visualization Also available as a separate tool named Document Atlas: http://docatlas.ijs.si

Ontology management Ontology visualization Concept hierarchy List of suggested sub-concepts Selected concept

Concept’s instance management Concept management Selected concept Selected instance Concept’s details Keywords Concept’s instance management

Active Learning for concept learning Query SVM New Concept Active Learning for concept learning SVM hyperplane distance based active learning algorithm First few labelled documents are bootstrapped from a query search Instances for final concept are selected using the final SVM model

Multiple views of the same data Countries view Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about … Topics view UK takeovers and mergers The following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover … Multiple views of the same data Reuters news articles used in the upper example with two different sets of categories: topics or list of countries that appear in the news articles. Each set of categories offers a different view on the data. SVM based method detects importance of keywords for each view.

Concept’s instances visualization Instances are visualized as points on 2D map. The distance between two instances on the map correspond to their similarity. Characteristic keywords are shown for all parts of the map. User can select groups of instances on the map to create sub-concepts.

Classification of selected document New documents Selected document Classification of selected document Ontology population System uses one vs. all linear SVM trained on created ontology to classify new instances into concepts. Users can finalize the classifications using an interactive user interface