INFORMATION RETRIEVAL

Slides:



Advertisements
Similar presentations
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
Advertisements

Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Data Warehousing M R BRAHMAM.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.
Chapter 15 Data Warehousing, OLAP, and Data Mining
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 2: Data Warehousing
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
DATA WAREHOUSE (Muscat, Oman).
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
CS346: Advanced Databases
Designing a Data Warehouse
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Components of the Data Warehouse Michael A. Fudge, Jr.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
AN OVERVIEW OF DATA WAREHOUSING
OnLine Analytical Processing (OLAP)
Datawarehouse Objectives
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining,
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Data Warehousing.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Information Retrieval
MANAGING DATA RESOURCES
Data Warehouse and OLAP
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
Data Warehouse and OLAP
Presentation transcript:

INFORMATION RETRIEVAL DATA WAREHOUSING & INFORMATION RETRIEVAL Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University POBox 750122 Dallas, Texas 75275-0122 214-768-3087 mhd@engr.smu.edu The contents of this presentation draw extensively from slides for: Data Mining, Introductory and Advanced Topics, by Margaret H. Dunham, Prentice Hall, 2003. 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 DW&IR Outline Introduction Data Warehousing Research Summary 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 DW&IR Outline Introduction Data Warehousing Overview Information Retrieval Data Warehousing Research Summary 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Data Warehousing “Subject-oriented, integrated, time-variant, nonvolatile” William Inmon http://www.inmondatasystems.com/ Operational Data: Data used in day to day needs of company. Informational Data: Supports other functions such as planning and forecasting. Data mining tools often access data warehouses rather than operational data. 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Data Warehouse Variations Data Mart – Subset of complete data warehouse Virtual Warehouse – Warehouse implemented as a view of operational data 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Operational vs. Informational   Operational Data Data Warehouse Application OLTP OLAP Use Precise Queries Ad Hoc Temporal Snapshot Historical Modification Dynamic Static Orientation Business Data Operational Values Integrated Size Gigabits Terabits Level Detailed Summarized Access Often Less Often Response Few Seconds Minutes Data Schema Relational Star/Snowflake 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Information Retrieval Information Retrieval (IR): retrieving desired information from textual data. Library Science Digital Libraries Web Search Engines Traditionally keyword based Sample query: Find all documents about “data mining” IR being applied to other unformatted data 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 DB vs IR Records (tuples) vs. documents Well defined results vs. fuzzy results DB grew out of files and traditional business systesm IR grew out of library science and need to categorize/group/access books/articles 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 DB vs IR (cont’d) Data retrieval which docs contain a set of keywords? Well defined semantics a single erroneous object implies failure! Information retrieval information about a subject or topic semantics is frequently loose small errors are tolerated IR system: interpret contents of information items generate a ranking which reflects relevance notion of relevance is most important 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Information Retrieval (cont’d) Similarity: measure of how close a query is to a document. Documents which are “close enough” are retrieved. Metrics: Precision = |Relevant and Retrieved| |Retrieved| Recall = |Relevant and Retrieved| |Relevant| 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

IR Query Result Measures and Classification 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 DW&IR Outline Introduction Data Warehousing Dimensional Modeling OLAP Decision Support Systems Research Summary 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Data Transformation for Data Warehouse ETL – Extract, Transform, Load Unwanted data must be removed Convert heterogeneous sources into one common schema As the operational data is probably a snapshot of the data, multiple snapshots may need to be merged to create the historical view Summarize data New derived data Handle missing and erroneous data 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Data Warehouse Creation Fig 1 [1] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Dimensional Modeling View data in a hierarchical manner more as business executives might Useful in decision support systems and mining Dimension: collection of logically related attributes; axis for modeling data. Facts: data stored Ex: Dimensions – products, locations, date Facts – quantity, unit price 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Multidimensional Model Example Fig 2 [1] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Cube view of Data Fig 4 [1] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Aggregation Hierarchies 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Multidimensional Schemas Star Schema shows facts and dimensions Center of the star has facts shown in fact tables Outside of the facts, each diemnsion is shown separately in dimension tables Access to fact table from dimension table via join SELECT Quantity, Price FROM Facts, Location Where (Facts.LocationID = Location.LocationID) and (Location.City = ‘Dallas’) View as relations, problem volume of data and indexing 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Star Schema 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Flattened Star 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Normalized Star 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Snowflake Schema 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 OLAP Online Analytic Processing (OLAP): provides more complex queries than OLTP. OnLine Transaction Processing (OLTP): traditional database/transaction processing. Dimensional data; cube view Support ad hoc querying Require analysis of data Can be thought of as an extension of some of the basic aggregation functions available in SQL OLAP tools may be used in DSS systems Mutlidimentional view is fundamental 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 OLAP Implementations MOLAP (Multidimensional OLAP) Multidimential Database (MDD) Specialized DBMS and software system capable of supporting the multidimensional data directly Data stored as an n-dimensional array (cube) Indexes used to speed up processing ROLAP (Relational OLAP) Data stored in a relational database ROLAP server (middleware) creates the multidimensional view for the user Less Complex; Less efficient HOLAP (Hybrid OLAP) Not updated frequently – MDD Updated frequently - RDB 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 OLAP Operations Roll Up Drill Down Single Cell Multiple Cells Slice Dice 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 OLAP Operations Simple query – single cell in the cube Slice – Look at a subcube to get more specific information Dice – Rotate cube to look at another dimension Roll Up – Dimension Reduction; Aggregation Drill Down Visualization: These operations allow the OLAP users to actually “see” results of an operation. 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Relationship Between Topcs 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Decision Support Systems Tools and computer systems that assist management in decision making What if types of questions High level decisions Data warehouse – data which supports DSS 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Data Warehouse Links OLAP http://www.olapreport.com/ General Data Warehousing http://www.inmoncif.com/home/ http://www.datawarehouseconsulting.com/ http://www.datawarehousing.com/ http://www.dw-institute.com/ DW Products http://www-306.ibm.com/software/data/informix/redbrick/ http://www.oracle.com/solutions/business_intelligence/dw_home.html http://www.sas.com/technologies/dw/index.html http://msdn2.microsoft.com/en-us/library/aa545535.aspx http://www.sybase.com/detail?id=1027323 Interesting Articles “Teaching Effective Methodologies to Design a Data Warehouse,” by Behrooz Seyed-Abbassi http://isedj.org/isecon/2001/35c/ISECON.2001.Seyed-Abbassi.pdf An Oracle DBA’s Guide to the OLAP Option,” by by Mark Rittman http://www.dbazine.com/datawarehouse/dw-articles/rittman1 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 DW&IR Outline Introduction Data Warehousing Research Bibliomining Hierarchical Multimedia IR Ontology-based OLAP & IR Summary 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Bibliomining [2,3] Data Warehousing + Data Mining + Libraries Abstract, cleanse, summarize library data Documents Users (including demographics) Circulation Records (including Web server records) Privacy of utmost importance http://www.bibliomining.com/nicholson/biblioprocess.htm [2] http://bibliomining.com/nicholson/nicholsonbibliointro.html [3] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Hierarchical Multimedia IR [4] DW Approach to Multimedia IR Allows easier integration of multiple data types Facilitates indexing Facilitates searching Allows data to be stored at many different granularities and dimensions Data aggregation “data warehouses are not just large databases; they are large, complex environments that integrate many technologies” [p729] Multimedia starflake schema Denormalized star dimension table Normalized snowflake tables 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Starflake Fig 2 [4] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Hierarchy of Data Cubes Fig 4 [4] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Ontology-Based OLAP & IR [5] Combine structured and document data obtained from Web Global Ontology Includes OLAP dimensions Contains resource metadata RDF based IR based on Both queries and resources represented as RDF metadata http://www.w3.org/RDF/ 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

Ontology OLAP&IR Architecture Fig 1 [5] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 OLAP Dimensions in RDF Fig 2 [5] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 RDF Query Fig 6 [5] 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 DW&IR Outline Introduction Data Warehousing Research Summary 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Summary Information Retrieval is being extended to many different data types Multimedia Data warehouse Data Warehousing is being extended beyond the basic business domain Little research in combining DW and IR Integrating Unstructured Text into the Structured Environment: The Value Proposition“, by Bill Inmon http://www.inmondatasystems.com/whitepapers/integratingunstructured.pdf 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Bibliography [1] Anne-Muriel Arigon, Anne Tchounikine, and Maryvonne Miquel, “Handling Multiple Points of View in a Multimedia Data Warehouse,” ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, August 2006, Pages 199–218. [2] S. Nicholson, “The Bibliomining Process: Data Warehousing and Data Mining for Library Decision-Making,” Information Technology and Libraries, 22(4), 2003. [3] S. Nicholson, “The Basis for Biliomining: Frameworks for Bringing Together Usage-Based Data Mining and Bibliometrics through Data Warehousing in Digital Library Services,” Information Processing & Management, 42(3), May 2006, pp 785-804. [4] Jane You, Tharam Dillon, James Liu, Edwige Pissaloux, “On Hierarchical Multimedia Information Retrieval,” You, J.; Proceedings of the 2001 International Conference on Image Processing, 7-10 Oct 2001, pp 729 – 732. [5] Torsten Priebe and Gunther Pernul, “Ontology-based Integration of OLAP and Information Retrieval,” Proceedings of the 14th International Workshop on Database and expert Systems Applications, 2003. 4/17/07, Tecnológico de Monterrey, SMU CSE 8337

4/17/07, Tecnológico de Monterrey, SMU CSE 8337 Thank You 4/17/07, Tecnológico de Monterrey, SMU CSE 8337