I. Khalil Ibrahim1 Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together Dr. Ismail Khalil Ibrahim

Slides:



Advertisements
Similar presentations
Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
Advertisements

Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
An Approach to Wrap Legacy Applications into Web Services Wesal Al Belushi, Youcef Baghdadi Department of Computer Science, Sultan Qaboos University, Sultanate.
1 Data Integration June 3 rd, What is Data Integration? uniform accessmultiple autonomousheterogeneousdistributed Provide uniform access to data.
Database System Concepts and Architecture
Database Architectures and the Web
CSE 636 Data Integration Data Integration Approaches.
Heterogeneous Data Warehouse Analysis and Dimensional Integration Marius Octavian Olaru XXVI Cycle Computer Engineering and Science Advisor: Prof. Maurizio.
IiWAS2002, Bandung, Indonesia Teaching and Learning Databases Dr. Stéphane Bressan National University of Singapore.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Introduction to Database Management  Department of Computer Science Northern Illinois University January 2001.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Overview Distributed vs. decentralized Why distributed databases
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Chapter 14 The Second Component: The Database.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Automatic Data Ramon Lawrence University of Manitoba
Data Integration Rachel Pottinger and Liang Sun CSE 590ES January 24, 2000.
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
Lecture-8/ T. Nouf Almujally
Chapter 11 Managing Knowledge. Dimensions of Knowledge.
Distributed Database The University of California Berkeley Extension Copyright © 2011 Patrick McDermott.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
Chapter 1 Introduction to Data Mining
Session-9 Data Management for Decision Support
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
Master Thesis Defense Jan Fiedler 04/17/98
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
Distributed database system
Dept. of CSE, Project “Chamois” and 용 환승, 龍 煥昇, Hwan-Seung Yong Dept. of Computer Science and Engineering Ewha Womans University,
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
IT and Network Organization Ecommerce. IT and Network Organization OPTIMIZING INTERNAL COLLABORATIONS IN NETWORK ORGANIZATIONS.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Database Management.
Chapter 13 The Data Warehouse
Data Warehouse.
the Need for Data Integration
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment.
MANAGING KNOWLEDGE FOR THE DIGITAL FIRM
Research Issues in Electronic Commerce
Chapter 1 Database Systems
C.U.SHAH COLLEGE OF ENG. & TECH.
MANAGING DATA RESOURCES
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Presentation transcript:

I. Khalil Ibrahim1 Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together Dr. Ismail Khalil Ibrahim

I. Khalil Ibrahim2 Biography Dr. Ismail Khalil Ibrahim is a senior software develepoer and AgenCom project manager at the Software Competence Center Hagenberg - Austria. He worked in the University of Technology - Baghdad – Iraq from as a lecturer, in the Human Resources Training and Development Institute - Iraq from as the head of the academic studies department, in Gadjah Mada University from as a teaching and research assistant. His main research interests lay in the fields of E-commerce & I-Commerce, Database Applications and Techniques for the Web, Practical Experience and Applications in Information Integration systems, Logic Programming for Information Integration, Agents for Information Retrieval and Knowledge Discovery, XML and Semistructured Data Management, Information Systems Management and Development, Information Technology: Impact, Economic Analysis. Ismail is a member of ACM, SIGMOD, SIGKDD, and SIGecom, general Secretary of the Indonesian Information Society Initiative (IISI), member of the Iraqi Engineers Association (IEA), overseas Collaborator in the E-commerce Lab at the National University of Singapore, editorial Board of the Columbian Journal of Computing “Revista Colombiana de Computación”, chairman of the organizing committee of the 1st and 2nd International Workshop on Information Integration and Web-based Applications & Services (IIWAS'99, IIWAS'00), Yogyakarta, Indonesia, chairman of the organizing committee of the 3rd International Conference on Information Integration and Web-based Applications & Services (IIWAS'2001), Linz, Austria. Ismail holds a B.Sc. in Electrical Engineering, from the University of Technology, Iraq (1985), M.Sc. and Ph.D., in Computer Eng. and Information Systems from Gadjah Mada University (1998, 2001).

I. Khalil Ibrahim3 Outline nData Integration  What is it ?  What does a data integration system look like ?  What are some data integration challenges?

I. Khalil Ibrahim4 What Is Data Integration? nProviding:  uniform: sources transparent to user  access: query, and eventually updates  multiple: even two is a problem  autonomous: not effect behavior of sources  heterogeneous: different data models, schemas  unstructured: at least semi-structured  information sources: not only databases

I. Khalil Ibrahim5 s 1 (Title,Author,Subject) s 2 (ISBN,Title,Publisher) Example Scenario

I. Khalil Ibrahim6 Retrieve the titles and subjects of all the technical reports written by (Stephane Bressan) and published by MIT PRESS q1  amazon  (Title,”Stephane Bressan”,subject) q2  book-a-million  (ISBN,Title,”MIT Press”) Join the results Example Scenario cont.

I. Khalil Ibrahim7 So What is the Problem? nVirtual vs. Materialized Architectures n Access: query or query & update?  Problem similar to updating through views  need distributed transactional services n Mediated schema: yes or no?  without mediated schema we lose advantages  mediated schema requires schema integration  schema integration need query transformation  query transformation need query optimization

I. Khalil Ibrahim8 Additional Dimensions nHow many sources are we accessing?  how autonomous are the sources?  how much knowledge do we have about sources?  how structured are the data in the sources? nRequirements from responses:  accuracy  completeness  machine readable vs. human readable  handling inconsistencies  speed  closed World Assumption vs. Open World Assumption

I. Khalil Ibrahim9 Related Technologies / Issues nDistributed databases n sources are homogeneous n data is distributed a priori n sources are not autonomous nSimilarities at the optimization and execution level n Information retrieval  keyword search  no semantics nData mining: discovering properties and patterns in data

I. Khalil Ibrahim10 Current Applications Intranets  enterprise data integration  web-site construction World Wide Web  digital libraries  comparison shopping (Netbot, Junglee)  portals integration data from multiple resources  XML integration nScience & Culture  medical genetics: integrating genomic data  Astrophysics: monitoring events in the sky  Environment: puget sound regional synthesis model  Culture: uniform access to all the cultural databases

I. Khalil Ibrahim11 Integration global defined from local global “independent” of local CWA global-schema-as-view OWA global-as-view- of-local local-as-view- of-global Database Schema Integration Data Warehousing Mediation Paradigms of Data Integration

I. Khalil Ibrahim12 Paradigms of Data Integration II nData Warehousing (materialization architecture)  data of interest is collected in a central place and a web site is built on top of it  queries are applied to the data warehouse easy to support queries, transactions hard to modify, the warehouse is not connected to the providers of information,... etc.

I. Khalil Ibrahim13 Wrapper Data Extraction Data Warehouse Application Data Source Data Source Data Source Data Warehousing Architecture

I. Khalil Ibrahim14 Paradigms of Data Integration III nInformation Mediation (virtual architecture)  data remains in web sources  rules that relate external data to internal application data is not replicated, data are guaranteed to be up-to-date query optimization and execution is more complex

I. Khalil Ibrahim15 Global Data Model Application Local Data Model Wrapper Data Source Query Execution Engine Catalog Wrapper Data Source Mediation Architecture

I. Khalil Ibrahim16 World Relations: Book(title,year,author,subject) BookYear(title,year) BookRev(title,author,review) GAV LAV Running Example Source Relations: DB 1 (title,author,year) DB 2 (title,author,year) DB 3 (title,review)

I. Khalil Ibrahim17 Global As View (GAV) nDefine a global schema of objects ande write down rules to collect these objects R R nfor each relation R in the mediated schema, we write a query over the sources' relations specifying how to obtain R's tuples from the sources (Query unfolding) traditional query processing applies requires the right sources to be avaliable and compliant

I. Khalil Ibrahim18 Local As View (GAV) S S nFor every information source (S), we write a query over the relations in the mediated schema that describes which tuples are found in S (Query folding or Answering Queries using Views) may be able to answer a query based on the avaliable partial information generally, may not be able to answer the query needs non standard query processing techniques potentially high complexity

I. Khalil Ibrahim19 Challanges nComplexity over traditional DBs: heterogeneous, autonomous, network-bounded surces nQuery reformulation now understood nmap queries over mediated schemas to „wrapped“ sources (heterogeneity) nIssues remain in query processing nfew statistics (autonomous sources) nunanticipated delays and failures (network-bounded sources)

I. Khalil Ibrahim20 Conclusions Data integration handles many problems needed for embedded systems applications nMany data sources nEasy addition and deletion of sources nDifferent source capabilities nDealing with network delays nEasy for user

I. Khalil Ibrahim21 Semantic Query Transformation for the Integration of Autonomous Information Sources (INAP’99 – Tokyo) IKA: Unity in Heterogenity (IIWAS’99 – Yogyakarta) Information Reterival Agents for the Intelligent Integration of Information Sources (MulNet Bandung) A Multilingual Natural Language Interface for Mediating E- Commerce Product Catalogs (INAP2000 – Tokyo) Semantic Query Transformation for the Intelligent Integration of Information Sources over the Web (WIIW2001 – Rio de Janeiro) Rewriting Rules for Semantic Query Transformation in E- Commerce Applications (DS9 – Hong Kong) Data Integration in Digital Libraries: Challenges and Approaches (IndonesiaDL– Bandung) Publications

I. Khalil Ibrahim22 Thank you for your attention!