Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.

Similar presentations


Presentation on theme: "1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan."— Presentation transcript:

1 1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan Rocco Georgia Tech

2 2 Outline l State of Art u Users’ Perspective u Technology Perspective l Why SDM Technology – XWRAP Composer u Users’ Perspective u Technology Perspective l Progress Report and Near Term Deliverables l Related Long Term Research

3 3 Today: Simple Query- Based Searching Web Why Automating Complex Associative Access Large & Unorganized Document Collections Tomorrow with SDM Technology Semantic Web Semantic Web Query 3 Query 2 Query 1 Query 4 Query Complex Associative Access requires experts Complex Associative Access is automated (one stop shopping)

4 4 Why Automating Complex Associative Access Large & Unorganized Document Collections Characterize Sort Partition Filter Web Today: Simple Query-Based Searching Summarize Tomorrow with SDM Technology Semantic Web Semantic Web Query 3 Query 2 Query 1 Query 4

5 5 Automating Complex Associative Access l Wrapper Technology l Workflow Technology l Semantic Web Technology u Service Discovery u Service Selection u Service Composition l Research Issues u Semantic Data Integration, Interoperability u Scalability, High Performance u Trusted Computing, Dependable, Survivable

6 6 XWRAPComposer l What is it? u A wrapper generation system that can semi-automatically generate wrappers (info. extraction programs) u capable of accessing multiple scientific Web pages in one shot. l What makes it different from other existing XWRAP tools? u Capable of generating wrappers that extract information from multiple Web pages connected by URLs (page links) and compose them into an integrated XML document u Extremely useful for Automating Complex Associative Access to multiple scientific data sources

7 7 Existing Wrapper Technology SDM Enabling Technology: XWRAPComposer Query 1 Query 3 Query 2 Query 4 Seq. Link Wrapper Sequence Wrapper Blast Sum Wrapper Blast Detail Wrapper Extracting Data from a single Web Document AA045112 CACCTGGAGAAACTTCTGCACTGGCACTGTGTTCCNAGAGCTCCTTCTATGCGTCCCTCC CAAGTGATTTAATTTCAGCTGATTGGACTACGAATTCACAAGGCAGAAAAGTCAAGGTCA TTTGGNATCTGGAGACAGGAGAACTCAAGGAACCNAAAGGACT htgs

8 8 WrapperComposer Technology SDM Enabling Technology: XWRAPComposer Query 1 AA045112 Query 2 Full Seq Wrapper CACCTGGAGAAACTTCTGCACTGGCACTGTGTTCCNAGAGCTCCTTCTATGCGTCCCTCC CAAGTGATTTAATTTCAGCTGATTGGACTACGAATTCACAAGGCAGAAAAGTCAAGGTCA TTTGGNATCTGGAGACAGGAGAACTCAAGGAACCNAAAGGACT htgs Blast Wrapper Extracting Data from Multiple Web Documents

9 9 Given a sequence, list all matching DNAs. XWRAPComposer: Technical Perspective NCBi Blast SiteWeb Blast Wrapper Blast Query Page Blast Format Page Blast Delay Page Blast Summary Page Interface/Outerface Specification Composer Script Multi-page Control Flow Modeling Data Extraction Workflow Blast Detail Page

10 10 SDM Center Data Integration Infrastructure User (Matt) Workflow Agent Service registry and brokering Data Integration Agent(s) Data Mediation Wrapper based Agent Other Agents (e.g., VIPAR) Database Access Communication Protocol Gateway External Program XML Wrapper Data Source XML Wrapper Data Source Executable Workflow Plan: “Matt’s WF” DB Data Sources External Interface Program Interfacing Other I/O Agents Extraction Rules Human Knowledge GUI Code Generator Parameterized Workflow Specification (PWS) Source Capabilities (SC) Binding Patterns User Agent User constraints & parameters Workflow Resolution Service (WRS) Domain Map/Ontology Workflow Instantiation Service (WIS) WF feasible WF infeasible: report reason Data RegistrationServices Registration DB

11 11 Progress Report l Status u Produced Three Deliverables n Composer Interface/Outerface Specification n Five Java Wrappers for Pilot Scenario n Composer Script Examples for Pilto Scenario u XWRAPComposer design and development l Near Term Plan u Finish the design of XWRAP Composer scripting language ( Nov. 2002) u Develop the first prototype of XWRAP Composer system (Jan. 2003) u Performance Evaluation (March. 2003)

12 12 Related Long Term Research l Semantic Web and Semantic Data Integration u Service Discovery n dynamic content crawler u Service Selection n Adaptive query routing u Service Composition n Infopipe Technology


Download ppt "1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan."

Similar presentations


Ads by Google