Presentation is loading. Please wait.

Presentation is loading. Please wait.

I. Khalil Ibrahim1 Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together Dr. Ismail Khalil Ibrahim

Similar presentations


Presentation on theme: "I. Khalil Ibrahim1 Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together Dr. Ismail Khalil Ibrahim"— Presentation transcript:

1 I. Khalil Ibrahim1 Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together Dr. Ismail Khalil Ibrahim ismail.khalil-ibrahim@scch.at +43 7236 3343 852 www.scch.at

2 I. Khalil Ibrahim2 Biography Dr. Ismail Khalil Ibrahim is a senior software develepoer and AgenCom project manager at the Software Competence Center Hagenberg - Austria. He worked in the University of Technology - Baghdad – Iraq from 1985-1990 as a lecturer, in the Human Resources Training and Development Institute - Iraq from 1990-1996 as the head of the academic studies department, in Gadjah Mada University from 1996-2000 as a teaching and research assistant. His main research interests lay in the fields of E-commerce & I-Commerce, Database Applications and Techniques for the Web, Practical Experience and Applications in Information Integration systems, Logic Programming for Information Integration, Agents for Information Retrieval and Knowledge Discovery, XML and Semistructured Data Management, Information Systems Management and Development, Information Technology: Impact, Economic Analysis. Ismail is a member of ACM, SIGMOD, SIGKDD, and SIGecom, general Secretary of the Indonesian Information Society Initiative (IISI), member of the Iraqi Engineers Association (IEA), overseas Collaborator in the E-commerce Lab at the National University of Singapore, editorial Board of the Columbian Journal of Computing “Revista Colombiana de Computación”, chairman of the organizing committee of the 1st and 2nd International Workshop on Information Integration and Web-based Applications & Services (IIWAS'99, IIWAS'00), Yogyakarta, Indonesia, chairman of the organizing committee of the 3rd International Conference on Information Integration and Web-based Applications & Services (IIWAS'2001), Linz, Austria. Ismail holds a B.Sc. in Electrical Engineering, from the University of Technology, Iraq (1985), M.Sc. and Ph.D., in Computer Eng. and Information Systems from Gadjah Mada University (1998, 2001).

3 I. Khalil Ibrahim3 Outline nData Integration  What is it ?  What does a data integration system look like ?  What are some data integration challenges?

4 I. Khalil Ibrahim4 What Is Data Integration? nProviding:  uniform: sources transparent to user  access: query, and eventually updates  multiple: even two is a problem  autonomous: not effect behavior of sources  heterogeneous: different data models, schemas  unstructured: at least semi-structured  information sources: not only databases

5 I. Khalil Ibrahim5 http://www.amazon.com s 1 (Title,Author,Subject) http://www.book-a-million.com s 2 (ISBN,Title,Publisher) http://……... Example Scenario

6 I. Khalil Ibrahim6 Retrieve the titles and subjects of all the technical reports written by (Stephane Bressan) and published by MIT PRESS q1  amazon  (Title,”Stephane Bressan”,subject) q2  book-a-million  (ISBN,Title,”MIT Press”) Join the results Example Scenario cont.

7 I. Khalil Ibrahim7 So What is the Problem? nVirtual vs. Materialized Architectures n Access: query or query & update?  Problem similar to updating through views  need distributed transactional services n Mediated schema: yes or no?  without mediated schema we lose advantages  mediated schema requires schema integration  schema integration need query transformation  query transformation need query optimization

8 I. Khalil Ibrahim8 Additional Dimensions nHow many sources are we accessing?  how autonomous are the sources?  how much knowledge do we have about sources?  how structured are the data in the sources? nRequirements from responses:  accuracy  completeness  machine readable vs. human readable  handling inconsistencies  speed  closed World Assumption vs. Open World Assumption

9 I. Khalil Ibrahim9 Related Technologies / Issues nDistributed databases n sources are homogeneous n data is distributed a priori n sources are not autonomous nSimilarities at the optimization and execution level n Information retrieval  keyword search  no semantics nData mining: discovering properties and patterns in data

10 I. Khalil Ibrahim10 Current Applications Intranets  enterprise data integration  web-site construction World Wide Web  digital libraries  comparison shopping (Netbot, Junglee)  portals integration data from multiple resources  XML integration nScience & Culture  medical genetics: integrating genomic data  Astrophysics: monitoring events in the sky  Environment: puget sound regional synthesis model  Culture: uniform access to all the cultural databases

11 I. Khalil Ibrahim11 Integration global defined from local global “independent” of local CWA global-schema-as-view OWA global-as-view- of-local local-as-view- of-global Database Schema Integration Data Warehousing Mediation Paradigms of Data Integration

12 I. Khalil Ibrahim12 Paradigms of Data Integration II nData Warehousing (materialization architecture)  data of interest is collected in a central place and a web site is built on top of it  queries are applied to the data warehouse easy to support queries, transactions hard to modify, the warehouse is not connected to the providers of information,... etc.

13 I. Khalil Ibrahim13 Wrapper Data Extraction Data Warehouse Application Data Source Data Source Data Source Data Warehousing Architecture

14 I. Khalil Ibrahim14 Paradigms of Data Integration III nInformation Mediation (virtual architecture)  data remains in web sources  rules that relate external data to internal application data is not replicated, data are guaranteed to be up-to-date query optimization and execution is more complex

15 I. Khalil Ibrahim15 Global Data Model Application Local Data Model Wrapper Data Source Query Execution Engine Catalog Wrapper Data Source Mediation Architecture

16 I. Khalil Ibrahim16 World Relations: Book(title,year,author,subject) BookYear(title,year) BookRev(title,author,review) GAV LAV Running Example Source Relations: DB 1 (title,author,year) DB 2 (title,author,year) DB 3 (title,review)

17 I. Khalil Ibrahim17 Global As View (GAV) nDefine a global schema of objects ande write down rules to collect these objects R R nfor each relation R in the mediated schema, we write a query over the sources' relations specifying how to obtain R's tuples from the sources (Query unfolding) traditional query processing applies requires the right sources to be avaliable and compliant

18 I. Khalil Ibrahim18 Local As View (GAV) S S nFor every information source (S), we write a query over the relations in the mediated schema that describes which tuples are found in S (Query folding or Answering Queries using Views) may be able to answer a query based on the avaliable partial information generally, may not be able to answer the query needs non standard query processing techniques potentially high complexity

19 I. Khalil Ibrahim19 Challanges nComplexity over traditional DBs: heterogeneous, autonomous, network-bounded surces nQuery reformulation now understood nmap queries over mediated schemas to „wrapped“ sources (heterogeneity) nIssues remain in query processing nfew statistics (autonomous sources) nunanticipated delays and failures (network-bounded sources)

20 I. Khalil Ibrahim20 Conclusions Data integration handles many problems needed for embedded systems applications nMany data sources nEasy addition and deletion of sources nDifferent source capabilities nDealing with network delays nEasy for user

21 I. Khalil Ibrahim21 Semantic Query Transformation for the Integration of Autonomous Information Sources (INAP’99 – Tokyo) IKA: Unity in Heterogenity (IIWAS’99 – Yogyakarta) Information Reterival Agents for the Intelligent Integration of Information Sources (MulNet 2000 - Bandung) A Multilingual Natural Language Interface for Mediating E- Commerce Product Catalogs (INAP2000 – Tokyo) Semantic Query Transformation for the Intelligent Integration of Information Sources over the Web (WIIW2001 – Rio de Janeiro) Rewriting Rules for Semantic Query Transformation in E- Commerce Applications (DS9 – Hong Kong) Data Integration in Digital Libraries: Challenges and Approaches (IndonesiaDL– Bandung) Publications

22 I. Khalil Ibrahim22 Thank you for your attention!


Download ppt "I. Khalil Ibrahim1 Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together Dr. Ismail Khalil Ibrahim"

Similar presentations


Ads by Google