Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela.

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

Problem solving skills
XML: Extensible Markup Language
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Executive Functioning Skills Deficits in university students with Developmental Co-ordination Disorder (DCD) Kirby, A., Thomas, M. & Williams, N.
Information Retrieval in Practice
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
3.02 The Information Superhighway
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Stuart Card PARC (since ’74) Area Manager of the User Interface Research Center Ph.D. in Psychology from Carnegie Mellon Co-authored “The Psychology of.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Querying Structured Text in an XML Database By Xuemei Luo.
Creating Metabolic Network Models using Text Mining and Expert Knowledge J.A. Dickerson, D. Berleant, Z. Cox, W. Qi, and E. Wurtele Iowa State University.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 6: Information Retrieval and Web Search
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
1© 2010 by Nelson Education Ltd. Chapter Five Training Design.
Introduction to the Semantic Web and Linked Data
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Literature Mapping with PubAtlas -- extending PubMed with a `BLASTing interface’ D Stott Parker 1, WW Chu 1, FW Sabb 3, AW Toga 2, RM Bilder 3 1 UCLA Computer.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
DATA VISUALIZATION BOB MARSHALL, MD MPH MISM FAAFP FACULTY, DOD CLINICAL INFORMATICS FELLOWSHIP.
Welcome to Unit 7’s seminar: Children and Attention Disorders Do we have any questions about the Unit 5 project? …about Unit 6?
Human Computer Interaction Lecture 21 User Support
Mihály Bányai, Vaibhav Diwadkar and Péter Érdi
مقدمة: الاطفال الذين يعانون من كثرة النشاط الحركى ليسوا باطفال مشاغبين، او عديمين التربية لكن هم اطفال عندهم مشكلة مرضية لها تاثير سيء على التطور النفسى.
ece 627 intelligent web: ontology and beyond
Networked Information Resources
Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
Information Retrieval and Web Design
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela

Outline Motivation Infrastructure Path Mining: Discovering Sequences of Associations Path Content Retrieval Method Validation: Comparing to Traditional Meta Analysis Process Conclusion

Motivation (1/2) – Knowledge discovery Increasingly, scientific discovery requires the connection of concepts across disciplines Often there are no direct association between two given concepts in existing scientific literature In such situations, we must search for chains of associations – How to search for chains of associations? Traditional search methods require researchers to manually review documents in a potential chain When searching a large corpus, a manual search of all returned documents becomes infeasible This can lead to biased or arbitrary methods of reduction

What GENES are associated with ADHD? ADHD Attention Deficit Working Memory Dysfunction PFC DRD2 A1 ADHD DRD2 A1 Motivation (2/2)

Path Knowledge Discovery

Infrastructure for Path Mining Discovery (1/2) Sources of Knowledge – Multilevel Lexicon Evolving concept hierarchy Concepts are mapped to specific domains/matched with synonyms – Semi-Structured Corpus Distributed in HTML/XML format Maps concepts to documents at varying granularities SYNDROME ADHD ADD Attention Deficit Disorder Attention Deficit Hyperactivity Disorder Bipolar Disorder … COGNITIVE CONCEPT Declarative Memory Episodic Memory … Content… … … …

Facilitating Knowledge Discovery – Association index How frequently two concepts occur together in a paper Measures the strengths of relations Facilitates path mining – Document element index In which documents the concepts occur Provides evidence of relations between concepts Facilitates path content retrieval Infrastructure for Path Mining Discovery (2/2)

Path Mining Given a query, find the sequences of associations among concepts between different domains of knowledge Find the paths based on their occurrences in corpus (i.e. pair-wise associations) Measure the strengths of the path Path Ranking: Find the most relevant path for a query Syndromes: Shrink-Wrap-Loving Tech Syndrom Symptoms: Impaired Response Inhibition Cognitive Concepts: Impulsivity Brain Signaling: Thinner Orbitofrontal Cortex Genes: DRD4 VNTR

Using Wildcards in a Path Query – Allow paths to match with any concept in a concept domain Example: Researcher is interested in paths connecting concept C to concepts from the γ domain, via any concept in domain β

Types of Associations in Path Local AssociationGlobal Association

Types of Associations in Path Local Association ApproachGlobal Association Approach

Types of Associations in Path Local Association ApproachGlobal Association Approach

Phenograph: Aggregated Results of Path Mining Combine the paths that satisfy the path query.

Path Ranking Pick top K paths for a query Weakest link approach – For each path, use the strength of the weakest link as the strength of the whole path – Among all paths, pick the top K paths with highest strengths

Path Content Retrieval Content is important for understanding the interrelations specified by the paths Differences from traditional information retrieval: – Query is a set of relations instead of query terms – Retrieved content should be in fine granularity so that it can explicitly explain the relations – Specific types of content may be required (e.g. quantitative results from experiments, tables, etc.)

Process Flow of Path Content Retrieval

Path Content Retrieval Example: Document Content Explorer (1/2) Facilitates Path Content Retrieval – Coarse Granularity: Displays list of papers returned using the user-defined query Papers listed with summary data

– Fine Granularity: Content from paper is displayed with relevant material highlighted for easier viewing Different type of contents in corresponding tabs Concepts are highlighted in the matching content Path Content Retrieval Example: Document Content Explorer (2/2)

Method Validation: Applying Path Knowledge Discovery to Phenomics Research Mined corpus of 9000 papers – Retrieved from PubMed Central using query designed by domain experts Searched for data supporting the heritability of cognitive control Cognitive control – Complex process that involves different phenotype components – Each phenotype component is measured by different behavioral tasks – Heritability of these behavioral tasks are reported in scientific publications

Traditional Manual Approach: Meta-Analysis Search corpus to find “relevant” publications – Publications retrieved using a literature search engine – Researcher manually reviews the publications to determine which are relevant – Researcher determines which publications form a chain of associations Using content found, extract the measures of cognitive tasks (e.g. heritability) and their corresponding cognitive processes Combine the heritability measures for different cognitive processes to compute the heritability of “cognitive control” Problems of the manual approach: – Reading papers, digesting the content, and picking the numbers manually is time consuming, biased and not scalable.

Automated Approach: Path Knowledge Discovery (1/2) Path mining: – Searched for paths connecting cognitive control with indicators Path content retrieval: – Found relevant quantitative results in those publications Meta-Analysis: – Researchers then reviewed those results to perform the meta-analysis cognitive control sub- processes cognitive tasks

Comparison to manual analysis: – 12 out of 15 tasks were correctly associated with corresponding sub-processes – Increased corpus size: 150 (manual) << 9000 (automated) Able to use quantitative measures for ranking relation rather than matching manually – Reduces error and bias Automated Approach: Path Knowledge Discovery (2/2)

Conclusion Path Knowledge Discovery – Identifies and measures a path of knowledge – Retrieves relevant coarse- and fine-granularity content describing the relations specified in the path Validated the methodology using the heritability example in cognitive control Significantly increases the scalability and efficiency of conducting complex cross-discipline analysis

Back up slides

Path Content Retrieval Query processing – Translate the path to queries digestible by search systems Example – Schizophrenia -> working memory -> PFC – Translate to: (schizophrenia AND working memory) OR (working memory AND PFC)

Lexicon-Based Query Expansion ADHD AND impaired response inhibition underactive prefrontal cortex AND dopamine receptors underactive prefrontal cortex AND (DRD1 OR DRD2 OR D5-like) (attention deficit hyperactivity disorder OR attention deficit disorder OR ADHD OR ADD) AND impaired response inhibition (attention deficit hyperactivity disorder OR attention deficit disorder OR ADHD OR ADD) AND impaired response inhibition – Expand according to the synonyms: – Expand according to concepts/sub-concepts:

Path Content Retrieval Retrieve relevant path content – Vector space model Multi-granularity content – First rank by coarse-granularity content Documents Sections – For each item of coarse-granularity content, rank its fine-granularity content Assertions (sentences) Figures Tables