Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Integration José Luis Ambite, Ph.D. Project Leader, Information Sciences Institute Research Assistant Professor, Computer Science University.

Similar presentations


Presentation on theme: "Information Integration José Luis Ambite, Ph.D. Project Leader, Information Sciences Institute Research Assistant Professor, Computer Science University."— Presentation transcript:

1 Information Integration José Luis Ambite, Ph.D. Project Leader, Information Sciences Institute Research Assistant Professor, Computer Science University of Southern California

2 Team Information Integration Infrastructure: Jose Luis Ambite, Craig Knoblock, Maria Muslea, Gowri Kumaraguruparan (USC/ISI) Domain Collaborators: FBIRN: Naveen Ashish (UCI), Jessica Turner (MRN), Karl Helmer (MGH), Tim Olsen (WUSTL), Dingying Wei (UCI) NHPRC: John Nylander, Dave Brink, Liz Moran (NHPRC) CVRG: Naveen Ashish (UCI), Steve Granite (JHU) NeuroDev: Dobyns, Paciorkowski (UW), Sherr (UCSF), … UCI CTSI: Ashish, Keator (UCI), … Security: Rachana Ananthakrishnan (UC), Laura Pearlman (USC/ISI) Data Management: Robert Schuler, Ann Chervenak (USC/ISI) Knowledge Engineering: Gully Burns (USC/ISI), Naveen Ashish (UCI), Jessica Turner (MRN) User Interfaces: Naveen Ashish (UCI), Jose Luis Ambite, Pedro Szekely, Craig Rogers, Gowri Kumaraguruparan, Maria Muslea (USC/ISI)

3 Information Integration Problem: consistent view of heterogeneous, distributed data Challenges: – Syntactic heterogeneity: formats, data models – Semantic heterogeneity: names, structure, viewpoint – Efficiency: query execution – Scalability: ease of adding new sources Approaches: – Warehouse/ETL – Common-schema federation – Virtual Integration/Mediator BIRN supports deep integration across complex data sources – Heterogeneous sources: Relational, XML DBs, Web Services, HTML, files – Structured queries – Secure, Efficient Query Execution Decision Support Application Programs, Workflows Mediator Knowledge Bases Databases Computer Programs Web BIRN

4 Information Mediator Virtual Integration Architecture: – Virtual organization: providers, consumers sharing data for specific purpose – Autonomous sources: data, control remains at sources; no changes to sources – Mediator: define domain schema and describe source contents Domain schema: view of the domain agreed upon by virtual organization Source descriptions: declarative logical formulas relating source/domain schemas Query Answering – User writes query in domain schema – Mediator: Determines sources relevant to query Rewrites query in sources schemas Breaks query into sub-queries for sources Optimizes query evaluation plan Combines answers from sources Declarative  Easy to add new sources EZ-config: Automatic configuration for single schema federations Mediator Domain Schema User queries Reformulation Optimizer Execution Engine Data Source Data Source Data Source Wrapper Sources schemas Logical Source Descriptions [VLDBJ 2005, Frontiers NeuroScience 2010, JAMIA 2011]

5 HID @MRN FBIRN Data Integration Use Case: HID and XNAT HID @UCI Human Imaging Database(s) Oracle DB XNAT EXtensible Neuroimaging Archive Toolkit Web service API BIRN Mediator SQL query XML query User query: find all male patients over 50 with t1 scans Results integrated from XNAT and HID HID results XNAT results (XML) … Domain query Integrated results Logical Source descriptions [Front. NeuroScience 2010]

6 ECG_Mesa (MySQL DB) CardioVascular Research Grid BIRN Mediator Integrated results Logical Source descriptions Chesnokov Analysis (eXistDB XML DBMS) Image Metadata dcm4che PACS (MySQL DB) WaveformDB (eXistDB XML DBMS) DICOM Image Files (file system) Waveform Files (file system) Domain query Use mediator to identify subjects and files of interest Same BIRN mediator Just plug in CVRG source descriptions and additional wrapper for eXistDB (XML/XQuery database)

7 LISDB Neuro Developmental Disorders BIRN Mediator SQL query User query: find all white females with Aicardi syndrome Results integrated from LISDB and SherrDB LISDB results SQL query … Domain query Integrated results Logical Source descriptions SherrB SherDB results Same BIRN mediator NeuroDev source descriptions

8 8 Non-Human Primate Research Consortium Provide data integration infrastructure for NHPRC: – Colony management, genetics, pathology, … BIRN NHPRC Activities: – BIRN/ISI demonstrated Colony Management integration prototype – NHPRC team developed DNA Banking application using BIRN mediator – Collaborated on NHPRC Pathology Project

9 BIRN Mediator (OMOP Model) RAND Custom Interface BWH UCSD UCI Scanner mediator Integration of multiple clinical data sources – Relational databases: UCSD, UCI, RAND, Brigham & Women’s Hospital, … – EMR system  Relational export Domain model based on the OMOP common data model – OMOP: Observational Medical Outcomes Partnership http://scanner.ucsd.edu/ [Ashish (UCI), Boxwala (UCSD), …]

10 Cross-CTSI Data Integration: Oxytocin Study UCSD-UCI cross-CTSI Oxytocin study: – HID@UCI, – RedCAP@UCSD Mediated solution – BIRN Mediator Data from neurological assessment scales – PANSS, STM, SCID, …. BIRN Mediator RedCAPHID Custom Interface [Ashish, Keator, Potkin, Fiefel, …]

11 Cross-CTSI Data Integration: Oxytocin Study

12 BIRN Information Integration General information integration infrastructure – BIRN Mediator bridge semantics across data sources provide integrated data for analysis and visualization – Domain model development and curation process Balance bottom-up/top-down domain model/ontology development and reuse – Security and user data access control built-in Approach – Engage research communities: NHPRC, FBIRN, CVRG, NeuroDev, Radiation Oncology, CTSIs,... – Build applications incrementally – Enhance capabilities while providing useful tools


Download ppt "Information Integration José Luis Ambite, Ph.D. Project Leader, Information Sciences Institute Research Assistant Professor, Computer Science University."

Similar presentations


Ads by Google