Download presentation
Presentation is loading. Please wait.
Published byAlan Bradford Modified over 9 years ago
1
I. Khalil Ibrahim1 Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together Dr. Ismail Khalil Ibrahim ismail.khalil-ibrahim@scch.at +43 7236 3343 852 www.scch.at
2
I. Khalil Ibrahim2 Biography Dr. Ismail Khalil Ibrahim is a senior software develepoer and AgenCom project manager at the Software Competence Center Hagenberg - Austria. He worked in the University of Technology - Baghdad – Iraq from 1985-1990 as a lecturer, in the Human Resources Training and Development Institute - Iraq from 1990-1996 as the head of the academic studies department, in Gadjah Mada University from 1996-2000 as a teaching and research assistant. His main research interests lay in the fields of E-commerce & I-Commerce, Database Applications and Techniques for the Web, Practical Experience and Applications in Information Integration systems, Logic Programming for Information Integration, Agents for Information Retrieval and Knowledge Discovery, XML and Semistructured Data Management, Information Systems Management and Development, Information Technology: Impact, Economic Analysis. Ismail is a member of ACM, SIGMOD, SIGKDD, and SIGecom, general Secretary of the Indonesian Information Society Initiative (IISI), member of the Iraqi Engineers Association (IEA), overseas Collaborator in the E-commerce Lab at the National University of Singapore, editorial Board of the Columbian Journal of Computing “Revista Colombiana de Computación”, chairman of the organizing committee of the 1st and 2nd International Workshop on Information Integration and Web-based Applications & Services (IIWAS'99, IIWAS'00), Yogyakarta, Indonesia, chairman of the organizing committee of the 3rd International Conference on Information Integration and Web-based Applications & Services (IIWAS'2001), Linz, Austria. Ismail holds a B.Sc. in Electrical Engineering, from the University of Technology, Iraq (1985), M.Sc. and Ph.D., in Computer Eng. and Information Systems from Gadjah Mada University (1998, 2001).
3
I. Khalil Ibrahim3 Outline nData Integration What is it ? What does a data integration system look like ? What are some data integration challenges?
4
I. Khalil Ibrahim4 What Is Data Integration? nProviding: uniform: sources transparent to user access: query, and eventually updates multiple: even two is a problem autonomous: not effect behavior of sources heterogeneous: different data models, schemas unstructured: at least semi-structured information sources: not only databases
5
I. Khalil Ibrahim5 http://www.amazon.com s 1 (Title,Author,Subject) http://www.book-a-million.com s 2 (ISBN,Title,Publisher) http://……... Example Scenario
6
I. Khalil Ibrahim6 Retrieve the titles and subjects of all the technical reports written by (Stephane Bressan) and published by MIT PRESS q1 amazon (Title,”Stephane Bressan”,subject) q2 book-a-million (ISBN,Title,”MIT Press”) Join the results Example Scenario cont.
7
I. Khalil Ibrahim7 So What is the Problem? nVirtual vs. Materialized Architectures n Access: query or query & update? Problem similar to updating through views need distributed transactional services n Mediated schema: yes or no? without mediated schema we lose advantages mediated schema requires schema integration schema integration need query transformation query transformation need query optimization
8
I. Khalil Ibrahim8 Additional Dimensions nHow many sources are we accessing? how autonomous are the sources? how much knowledge do we have about sources? how structured are the data in the sources? nRequirements from responses: accuracy completeness machine readable vs. human readable handling inconsistencies speed closed World Assumption vs. Open World Assumption
9
I. Khalil Ibrahim9 Related Technologies / Issues nDistributed databases n sources are homogeneous n data is distributed a priori n sources are not autonomous nSimilarities at the optimization and execution level n Information retrieval keyword search no semantics nData mining: discovering properties and patterns in data
10
I. Khalil Ibrahim10 Current Applications Intranets enterprise data integration web-site construction World Wide Web digital libraries comparison shopping (Netbot, Junglee) portals integration data from multiple resources XML integration nScience & Culture medical genetics: integrating genomic data Astrophysics: monitoring events in the sky Environment: puget sound regional synthesis model Culture: uniform access to all the cultural databases
11
I. Khalil Ibrahim11 Integration global defined from local global “independent” of local CWA global-schema-as-view OWA global-as-view- of-local local-as-view- of-global Database Schema Integration Data Warehousing Mediation Paradigms of Data Integration
12
I. Khalil Ibrahim12 Paradigms of Data Integration II nData Warehousing (materialization architecture) data of interest is collected in a central place and a web site is built on top of it queries are applied to the data warehouse easy to support queries, transactions hard to modify, the warehouse is not connected to the providers of information,... etc.
13
I. Khalil Ibrahim13 Wrapper Data Extraction Data Warehouse Application Data Source Data Source Data Source Data Warehousing Architecture
14
I. Khalil Ibrahim14 Paradigms of Data Integration III nInformation Mediation (virtual architecture) data remains in web sources rules that relate external data to internal application data is not replicated, data are guaranteed to be up-to-date query optimization and execution is more complex
15
I. Khalil Ibrahim15 Global Data Model Application Local Data Model Wrapper Data Source Query Execution Engine Catalog Wrapper Data Source Mediation Architecture
16
I. Khalil Ibrahim16 World Relations: Book(title,year,author,subject) BookYear(title,year) BookRev(title,author,review) GAV LAV Running Example Source Relations: DB 1 (title,author,year) DB 2 (title,author,year) DB 3 (title,review)
17
I. Khalil Ibrahim17 Global As View (GAV) nDefine a global schema of objects ande write down rules to collect these objects R R nfor each relation R in the mediated schema, we write a query over the sources' relations specifying how to obtain R's tuples from the sources (Query unfolding) traditional query processing applies requires the right sources to be avaliable and compliant
18
I. Khalil Ibrahim18 Local As View (GAV) S S nFor every information source (S), we write a query over the relations in the mediated schema that describes which tuples are found in S (Query folding or Answering Queries using Views) may be able to answer a query based on the avaliable partial information generally, may not be able to answer the query needs non standard query processing techniques potentially high complexity
19
I. Khalil Ibrahim19 Challanges nComplexity over traditional DBs: heterogeneous, autonomous, network-bounded surces nQuery reformulation now understood nmap queries over mediated schemas to „wrapped“ sources (heterogeneity) nIssues remain in query processing nfew statistics (autonomous sources) nunanticipated delays and failures (network-bounded sources)
20
I. Khalil Ibrahim20 Conclusions Data integration handles many problems needed for embedded systems applications nMany data sources nEasy addition and deletion of sources nDifferent source capabilities nDealing with network delays nEasy for user
21
I. Khalil Ibrahim21 Semantic Query Transformation for the Integration of Autonomous Information Sources (INAP’99 – Tokyo) IKA: Unity in Heterogenity (IIWAS’99 – Yogyakarta) Information Reterival Agents for the Intelligent Integration of Information Sources (MulNet 2000 - Bandung) A Multilingual Natural Language Interface for Mediating E- Commerce Product Catalogs (INAP2000 – Tokyo) Semantic Query Transformation for the Intelligent Integration of Information Sources over the Web (WIIW2001 – Rio de Janeiro) Rewriting Rules for Semantic Query Transformation in E- Commerce Applications (DS9 – Hong Kong) Data Integration in Digital Libraries: Challenges and Approaches (IndonesiaDL– Bandung) Publications
22
I. Khalil Ibrahim22 Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.