Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart
Typical Applications of Ontologies Agent communication Data integration Description of service capabilities for matching and composition purposes Formal verification of process descriptions Unification of terminology across communities
Text Applications of Ontologies Information Retrieval (IR) Clustering and Classification of Documents Semantic Annotation Natural Language Processing
Task-Based Evaluation (Porzel and Malaka 2005)
Task-Based Evaluation Requirements 1.Algorithm output can be quantified 2.Task can use background knowledge 3.Ontology is an additional parameter 4.Output can be traced to the ontology
Contents 1.Text Clustering and Classification 2.Information Highlighting for Supporting Search 3.Related Work
Text Clustering and Classification What is the difference?
Text Clustering
Text Classification ArrowsWeatherFlat shapes3-D formsSmile!
Dot Kom Project One of many competitions
Approaches Bag of words Manually engineered MeSH Tree Structures Automatically constructed ontologies
What is a “Bag of Words” anyway? the quick brown fox
Bag of Words thequickbrownfoxjumpsoverthelazydog (2)
Building Hierarchies
Note on Ontologies Our ontologies (“micro”) – Like a database record schema Their ontologies (“macro”) – Like WordNet
Clustering Hierarchical Agglomerative Clustering Bi-Section K-means “A Comparison of Document Clustering Techniques” –
Document Representations Bag of Words Certain words + ontology -> extended features Strategies: add, replace, only
Vectors and Cosine Similarity
Classification Results (Categories)
Classification Results (Documents)
Cluster Metrics P : computer-generated clusters L : human-created clusters P, L : sets of clusters (partitioning)
Clustering Results
Information Highlighting for Supporting Search Challenge: – 10 minute limit – KMi Planet News web site – Compile a list of important People Technologies
Information Highlighting for Supporting Search Tools: – Regular browser – Magpie – ESpotter – C-PANKOW
Teams A : web browser only B : web browser with AKT information C : web browser with AKT++ information
AKT++ Lexicon
Scores
Conclusions (for this section) Generated ontologies can be comparable to hand-crafted ontologies Humans can trust the computer too much! (Group C drop in score)
Related Work Query Expansion Information Retrieval Text Clustering and Classification Natural Language Processing
Ambiguity resolution – Bank Compounds – Headache medicine Vague words – With, of, has – Selectional restrictions Anaphora
More Applications Word sense disambiguation Classification of unknown words Named Entity Recognition (NER) Anaphora Resolution Question Answering – Who wrote the Hobbit? – Tolkien is the author of the Hobbit. Information Extraction – AUTOSLOG, ASIUM
Analysis/Conclusion Pro/con: – Focused on two systems – Passing survey of others