2009.11.03 SLIDE 1IS 257 – Fall 2009 JDBC and Java Access to DBMS & Introduction to Data Warehouses University of California, Berkeley School of Information.

Slides:



Advertisements
Similar presentations
C6 Databases.
Advertisements

Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Data Management for Decision Support Session - 1 Prof. Bharat Bhasker.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Introduction to SQL Programming Techniques.
SLIDE 1IS 257 – Spring 2004 Object-Relational Database Extensions and JDBC University of California, Berkeley School of Information Management.
CSE6011 Data Warehouse and OLAP  Why data warehouse  What’s data warehouse  What’s multi-dimensional data model  What’s difference between OLAP and.
SLIDE 1IS 257 – Fall 2004 Object-Relational Database Extensions and JDBC University of California, Berkeley School of Information Management.
SLIDE 1IS 257 – Fall 2010 JDBC and Java Access to DBMS & Introduction to Data Warehouses University of California, Berkeley School of Information.
SLIDE 1IS 257 – Spring 2004 Data Warehousing University of California, Berkeley School of Information Management and Systems SIMS 257: Database.
SLIDE 1IS 257 – Fall 2011 Data Warehousing University of California, Berkeley School of Information IS 257: Database Management.
11/1/2001Database Management -- R. Larson Data Warehouses, Decision Support and Data Mining University of California, Berkeley School of Information Management.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Organizing Data & Information
SLIDE 1IS 257 – Fall 2010 JDBC and Java Access to DBMS University of California, Berkeley School of Information IS 257: Database Management.
SLIDE 1IS 257 – Fall 2006 JDBC and Java Access to DBMS University of California, Berkeley School of Information IS 257: Database Management.
SLIDE 1IS Fall 2002 Data Warehouses, Decision Support and Data Mining University of California, Berkeley School of Information Management.
10/30/2001Database Management -- R. Larson Data Warehousing University of California, Berkeley School of Information Management and Systems SIMS 257: Database.
Introduction to Data Warehousing Enrico Franconi CS 636.
SLIDE 1IS 257 – Spring 2004 Data Warehouses, Decision Support and Data Mining University of California, Berkeley School of Information Management.
11/2/2000Database Management -- R. Larson Data Warehouses, Decision Support and Data Mining University of California, Berkeley School of Information Management.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
Chapter 1: The Database Environment
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
UFCE4Y UFCE4Y-20-3 Components and Services Julia Dawson.
Joachim Hammer 1 Data Warehousing Overview, Terminology, and Research Issues Joachim Hammer.
1 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Data Warehousing “An Introduction” Dr. Akhtar Ali School of Computing,
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
Data Management for Decision Support Session-2 Prof. Bharat Bhasker.
1 Java Database Connection (JDBC) There are many industrial-strength DBMS's commercially available in the market. Oracle, DB2, and Sybase are just a few.
Think Possibility Integrating Web Applications With Databases.
CSCI 6962: Server-side Design and Programming JDBC Database Programming.
Beginning Databases with JDBC Mike Bradley Adapted from and notes by Kevin Parker, Ph.D.
1 CS 430 Database Theory Winter 2005 Lecture 1: Introduction.
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
CS 405G: Introduction to Database Systems Database programming.
© 2007 by Prentice Hall 1 Introduction to databases.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
SLIDE 1IS 257 – Fall 2014 Data Warehousing University of California, Berkeley School of Information IS 257: Database Management.
Chapter 8 Databases.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Java Database Connectivity. Java and the database Database is used to store data. It is also known as persistent storage as the data is stored and can.
JDBC. Java.sql.package The java.sql package contains various interfaces and classes used by the JDBC API. This collection of interfaces and classes enable.
Java and Databases. JDBC Architecture Java Application JDBC API Data Base Drivers AccessSQL Server DB2InformixMySQLSybase.
CSI 3125, Preliminaries, page 1 JDBC. CSI 3125, Preliminaries, page 2 JDBC JDBC stands for Java Database Connectivity, which is a standard Java API (application.
Access Databases from Java Programs via JDBC Tessema M. Mengistu Department of Computer Science Southern Illinois University Carbondale
Basics of JDBC Session 14.
SLIDE 1IS 257 – Fall 2014 NoSQL Databases University of California, Berkeley School of Information IS 257: Database Management.
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Data Warehousing/Mining 1 Data Warehousing/Mining Introduction.
Java and database. 3 Relational Databases A relational Database consists of a set of simple rectangular tables or relations The column headings are.
Foundations of information systems : BIS 1202 Lecture 4: Database Systems and Business Intelligence.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
SLIDE 1IS 257 – Fall 2013 Data Warehousing University of California, Berkeley School of Information IS 257: Database Management.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data Warehouse and OLAP
Web Technologies IT230 Dr Mohamed Habib.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Fundamentals & Ethics of Information Systems IS 201
JDBC.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Instructor: Dan Hebert
Introduction to Data Warehousing
Data Warehouse and OLAP
Introduction to Data Warehousing
Presentation transcript:

SLIDE 1IS 257 – Fall 2009 JDBC and Java Access to DBMS & Introduction to Data Warehouses University of California, Berkeley School of Information IS 257: Database Management

SLIDE 2IS 257 – Fall 2009 Lecture Outline Review: –Object-Relational DBMS –OR features in Oracle –OR features in PostgreSQL –Extending OR databases (examples from PostgreSQL) Java and JDBC Introduction to Data Warehouses

SLIDE 3IS 257 – Fall 2009 Lecture Outline Object-Relational DBMS –OR features in Oracle –OR features in PostgreSQL Extending OR databases (examples from PostgreSQL) Java and JDBC Introduction to Data Warehouses

SLIDE 4IS 257 – Fall 2009 Object Relational Data Model Class, instance, attribute, method, and integrity constraints OID per instance Encapsulation Multiple inheritance hierarchy of classes Class references via OID object references Set-Valued attributes Abstract Data Types

SLIDE 5IS 257 – Fall 2009 Object Relational Extended SQL (Illustra) CREATE TABLE tablename {OF TYPE Typename}|{OF NEW TYPE typename} (attr1 type1, attr2 type2,…,attrn typen) {UNDER parent_table_name}; CREATE TYPE typename (attribute_name type_desc, attribute2 type2, …, attrn typen); CREATE FUNCTION functionname (type_name, type_name) RETURNS type_name AS sql_statement

SLIDE 6IS 257 – Fall 2009 Object-Relational SQL in ORACLE CREATE (OR REPLACE) TYPE typename AS OBJECT (attr_name, attr_type, …); CREATE TABLE OF typename;

SLIDE 7IS 257 – Fall 2009 Example CREATE TYPE ANIMAL_TY AS OBJECT (Breed VARCHAR2(25), Name VARCHAR2(25), Birthdate DATE); Creates a new type CREATE TABLE Animal of Animal_ty; Creates “Object Table”

SLIDE 8IS 257 – Fall 2009 Constructor Functions INSERT INTO Animal values (ANIMAL_TY(‘Mule’, ‘Frances’, TO_DATE(‘01-APR-1997’, ‘DD-MM- YYYY’))); Insert a new ANIMAL_TY object into the table

SLIDE 9IS 257 – Fall 2009 PostgreSQL Classes The fundamental notion in Postgres is that of a class, which is a named collection of object instances. Each instance has the same collection of named attributes, and each attribute is of a specific type. Furthermore, each instance has a permanent object identifier (OID) that is unique throughout the installation. Because SQL syntax refers to tables, we will use the terms table and class interchangeably. Likewise, an SQL row is an instance and SQL columns are attributes.

SLIDE 10IS 257 – Fall 2009 Creating a Class You can create a new class by specifying the class name, along with all attribute names and their types: CREATE TABLE weather ( city varchar(80), temp_lo int, -- low temperature temp_hi int, -- high temperature prcp real, -- precipitation date date );

SLIDE 11IS 257 – Fall 2009 PostgreSQL Postgres can be customized with an arbitrary number of user-defined data types. Consequently, type names are not syntactical keywords, except where required to support special cases in the SQL92 standard. So far, the Postgres CREATE command looks exactly like the command used to create a table in a traditional relational system. However, we will presently see that classes have properties that are extensions of the relational model.

SLIDE 12IS 257 – Fall 2009 Inheritance CREATE TABLE cities ( name text, population float, altitude int -- (in ft) ); CREATE TABLE capitals ( state char(2) ) INHERITS (cities);

SLIDE 13IS 257 – Fall 2009 Inheritance In Postgres, a class can inherit from zero or more other classes. A query can reference either –all instances of a class –or all instances of a class plus all of its descendants

SLIDE 14IS 257 – Fall 2009 Non-Atomic Values - Arrays The preceding SQL command will create a class named SAL_EMP with a text string (name), a one-dimensional array of int4 (pay_by_quarter), which represents the employee's salary by quarter and a two-dimensional array of text (schedule), which represents the employee's weekly schedule Now we do some INSERTSs; note that when appending to an array, we enclose the values within braces and separate them by commas.

SLIDE 15IS 257 – Fall 2009 PostgreSQL Extensibility Postgres is extensible because its operation is catalog- driven –RDBMS store information about databases, tables, columns, etc., in what are commonly known as system catalogs. (Some systems call this the data dictionary). One key difference between Postgres and standard RDBMS is that Postgres stores much more information in its catalogs –not only information about tables and columns, but also information about its types, functions, access methods, etc. These classes can be modified by the user, and since Postgres bases its internal operation on these classes, this means that Postgres can be extended by users –By comparison, conventional database systems can only be extended by changing hardcoded procedures within the DBMS or by loading modules specially-written by the DBMS vendor.

SLIDE 16IS 257 – Fall 2009 Rules System CREATE RULE name AS ON event TO object [ WHERE condition ] DO [ INSTEAD ] [ action | NOTHING ] Rules can be triggered by any event (select, update, delete, etc.)

SLIDE 17IS 257 – Fall 2009 Views as Rules Views in Postgres are implemented using the rule system. In fact there is absolutely no difference between a CREATE VIEW myview AS SELECT * FROM mytab; compared against the two commands CREATE TABLE myview (same attribute list as for mytab); CREATE RULE "_RETmyview" AS ON SELECT TO myview DO INSTEAD SELECT * FROM mytab;

SLIDE 18IS 257 – Fall 2009 Extensions to Indexing Access Method extensions in Postgres GiST: A Generalized Search Trees –Joe Hellerstein, UC Berkeley

SLIDE 19IS 257 – Fall 2009 Indexing in OO/OR Systems Quick access to user-defined objects Support queries natural to the objects Two previous approaches –Specialized Indices (“ABCDEFG-trees”) redundant code: most trees are very similar concurrency control, etc. tricky! –Extensible B-trees & R-trees (Postgres/Illustra) B-tree or R-tree lookups only! E.g. ‘WHERE movie.video < ‘Terminator 2’

SLIDE 20IS 257 – Fall 2009 GiST Approach A generalized search tree. Must be: Extensible in terms of queries General (B+-tree, R-tree, etc.) Easy to extend Efficient (match specialized trees) Highly concurrent, recoverable, etc.

SLIDE 21IS 257 – Fall 2009 GiST Applications New indexes needed for new apps... –find all supersets of S –find all molecules that bind to M –your favorite query here (multimedia?)...and for new queries over old domains: –find all points in region from 12 to 2 o’clock –find all text elements estimated relevant to a query string

SLIDE 22IS 257 – Fall 2009 Lecture Outline Review –Object-Relational DBMS –OR features in Oracle –OR features in PostgreSQL –Extending OR databases (examples from PostgreSQL) Java and JDBC Introduction to Data Warehouses

SLIDE 23IS 257 – Fall 2009 Java and JDBC Java is probably the high-level language used in instruction and development today one of the earliest “enterprise” additions to Java was JDBC JDBC is an API that provides a mid-level access to DBMS from Java applications Intended to be an open cross-platform standard for database access in Java Similar in intent to Microsoft’s ODBC

SLIDE 24IS 257 – Fall 2009 JDBC Architecture The goal of JDBC is to be a generic SQL database access framework that works for any database system with no changes to the interface code OracleMySQLPostgres Java Applications JDBC API JDBC Driver Manager Driver

SLIDE 25IS 257 – Fall 2009 JDBC Provides a standard set of interfaces for any DBMS with a JDBC driver – using SQL to specify the databases operations. Resultset Statement Resultset Connection PreparedStatementCallableStatement DriverManager Oracle Driver ODBC DriverPostgres Driver Oracle DBPostgres DBODBC DB Application

SLIDE 26IS 257 – Fall 2009 JDBC Simple Java Implementation import java.sql.*; import oracle.jdbc.*; public class JDBCSample { public static void main(java.lang.String[] args) { try { // this is where the driver is loaded //Class.forName("jdbc.oracle.thin"); DriverManager.registerDriver(new OracleDriver()); } catch (SQLException e) { System.out.println("Unable to load driver Class"); return; }

SLIDE 27IS 257 – Fall 2009 JDBC Simple Java Impl. try { //All DB access is within the try/catch block... // make a connection to ORACLE on Dream Connection con = DriverManager.getConnection( “mylogin", “myoraclePW"); // Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");

SLIDE 28IS 257 – Fall 2009 JDBC Simple Java Impl. // show the Results... while(rs.next()) { System.out.println(rs.getString("NAME")); } // Release the database resources... rs.close(); stmt.close(); con.close(); } catch (SQLException se) { // inform user of errors... System.out.println("SQL Exception: " + se.getMessage()); se.printStackTrace(System.out); }

SLIDE 29IS 257 – Fall 2009 JDBC Once a connection has been made you can create three different types of statement objects Statement –The basic SQL statement as in the example PreparedStatement –A pre-compiled SQL statement CallableStatement –Permits access to stored procedures in the Database

SLIDE 30IS 257 – Fall 2009 JDBC Resultset methods Next() to loop through rows in the resultset To access the attributes of each row you need to know its type, or you can use the generic “getObject()” which wraps the attribute as an object

SLIDE 31IS 257 – Fall 2009 JDBC “GetXXX()” methods SQL data typeJava TypeGetXXX() CHARStringgetString() VARCHARStringgetString() LONGVARCHARStringgetString() NUMERICJava.math. BigDecimal GetBigDecimal() DECIMALJava.math. BigDecimal GetBigDecimal() BITBooleangetBoolean() TINYINTBytegetByte()

SLIDE 32IS 257 – Fall 2009 JDBC GetXXX() Methods SQL data typeJava TypeGetXXX() SMALLINTInteger (short)getShort() INTEGERIntegergetInt() BIGINTLonggetLong() REALFloatgetFloat() FLOATDoublegetDouble() DOUBLEDoublegetDouble() BINARYByte[]getBytes() VARBINARYByte[]getBytes() LONGVARBINARYByte[]getBytes()

SLIDE 33IS 257 – Fall 2009 JDBC GetXXX() Methods SQL data type Java TypeGetXXX() DATEjava.sql.DategetDate() TIMEjava.sql.TimegetTime() TIMESTAMPJava.sql.TimestampgetTimeStamp()

SLIDE 34IS 257 – Fall 2009 Large Object Handling Large binary data can be read from a resultset as streams using: –getAsciiStream() –getBinaryStream() –getUnicodeStream() ResultSet rs = stmt.executeQuery(“SELECT IMAGE FROM PICTURES WHERE PID = 1223”)); if (rs.next()) { BufferedInputStream gifData = new BufferedInputSteam( rs.getBinaryStream(“IMAGE”)); byte[] buf = new byte[4*1024]; // 4K buffer int len; while ((len = gifData.read(buf,0,buf.length)) != -1) { out.write(buf, 0, len); }

SLIDE 35IS 257 – Fall 2009 JDBC Metadata There are also methods to access the metadata associated with a resultSet –ResultSetMetaData rsmd = rs.getMetaData(); Metadata methods include… –getColumnCount(); –getColumnLabel(col); –getColumnTypeName(col)

SLIDE 36IS 257 – Fall 2009 JDBC access to MySQL The basic JDBC interface is the same, the only differences are in how the drivers are loaded public class JDBCTestMysql { public static void main(java.lang.String[] args) { try { // this is where the driver is loaded Class.forName("com.mysql.jdbc.Driver").newInstance(); } catch (InstantiationException i) { System.out.println("Unable to load driver Class"); return; } catch (ClassNotFoundException e) { System.out.println("Unable to load driver Class"); …

SLIDE 37IS 257 – Fall 2009 JDBC for MySQL try { //All DB access is within the try/catch block... // make a connection to MySQL on Dream Connection con = DriverManager.getConnection( "jdbc:mysql://localhost/ (this is really one line) MyDatabase?user=MyLogin&password=MySQLPW"); // Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST"); Otherwise everything is the same as in the Oracle example For connecting to the machine you are running the program on, you can use “localhost” instead of the machine name

SLIDE 38IS 257 – Fall 2009 Demo – JDBC for MySQL Demo of JDBC code on Harbinger Code is available on class web site

SLIDE 39IS 257 – Fall 2009 Lecture Outline Review –Object-Relational DBMS –OR features in Oracle –OR features in PostgreSQL –Extending OR databases (examples from PostgreSQL) Java and JDBC Introduction to Data Warehouses

SLIDE 40IS 257 – Fall 2009 Overview Data Warehouses and Merging Information Resources What is a Data Warehouse? History of Data Warehousing Types of Data and Their Uses

SLIDE 41IS 257 – Fall 2009 Problem: Heterogeneous Information Sources “Heterogeneities are everywhere” p Different interfaces p Different data representations p Duplicate and inconsistent information Personal Databases Digital Libraries Scientific Databases World Wide Web Slide credit: J. Hammer

SLIDE 42IS 257 – Fall 2009 Problem: Data Management in Large Enterprises Vertical fragmentation of informational systems (vertical stove pipes) Result of application (user)-driven development of operational systems Sales AdministrationFinanceManufacturing... Sales Planning Stock Mngmt... Suppliers... Debt Mngmt Num. Control... Inventory Slide credit: J. Hammer

SLIDE 43IS 257 – Fall 2009 Goal: Unified Access to Data Integration System Collects and combines information Provides integrated view, uniform user interface Supports sharing World Wide Web Digital LibrariesScientific Databases Personal Databases Slide credit: J. Hammer

SLIDE 44IS 257 – Fall 2009 The Traditional Research Approach Source... Integration System... Metadata Clients Wrapper Query-driven (lazy, on-demand) Slide credit: J. Hammer

SLIDE 45IS 257 – Fall 2009 Disadvantages of Query-Driven Approach Delay in query processing –Slow or unavailable information sources –Complex filtering and integration Inefficient and potentially expensive for frequent queries Competes with local processing at sources Hasn’t caught on in industry Slide credit: J. Hammer

SLIDE 46IS 257 – Fall 2009 The Warehousing ApproachDataWarehouse Clients Source... Extractor/ Monitor Integration System... Metadata Extractor/ Monitor Extractor/ Monitor Information integrated in advance Stored in WH for direct querying and analysis Slide credit: J. Hammer

SLIDE 47IS 257 – Fall 2009 Advantages of Warehousing Approach High query performance –But not necessarily most current information Doesn’t interfere with local processing at sources –Complex queries at warehouse –OLTP at information sources Information copied at warehouse –Can modify, annotate, summarize, restructure, etc. –Can store historical information –Security, no auditing Has caught on in industry Slide credit: J. Hammer

SLIDE 48IS 257 – Fall 2009 Not Either-Or Decision Query-driven approach still better for –Rapidly changing information –Rapidly changing information sources –Truly vast amounts of data from large numbers of sources –Clients with unpredictable needs Slide credit: J. Hammer

SLIDE 49IS 257 – Fall 2009 Data Warehouse Evolution TIME Information- Based Management Data Revolution “Middle Ages” “Prehistoric Times” Relational Databases PC’s and Spreadsheets End-user Interfaces 1st DW Article DW Confs. Vendor DW Frameworks Company DWs “Building the DW” Inmon (1992) Data Replication Tools Slide credit: J. Hammer

SLIDE 50IS 257 – Fall 2009 What is a Data Warehouse? “A Data Warehouse is a –subject-oriented, –integrated, –time-variant, –non-volatile collection of data used in support of management decision making processes.” -- Inmon & Hackathorn, 1994: viz. Hoffer, Chap 11

SLIDE 51IS 257 – Fall 2009 DW Definition… Subject-Oriented: –The data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects include Customers Patients Students Products Etc.

SLIDE 52IS 257 – Fall 2009 DW Definition… Integrated –The data housed in the data warehouse are defined using consistent Naming conventions Formats Encoding Structures Related Characteristics

SLIDE 53IS 257 – Fall 2009 DW Definition… Time-variant –The data in the warehouse contain a time dimension so that they may be used as a historical record of the business

SLIDE 54IS 257 – Fall 2009 DW Definition… Non-volatile –Data in the data warehouse are loaded and refreshed from operational systems, but cannot be updated by end-users

SLIDE 55IS 257 – Fall 2009 What is a Data Warehouse? A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant Slide credit: J. Hammer

SLIDE 56IS 257 – Fall 2009 A Data Warehouse is... Stored collection of diverse data –A solution to data integration problem –Single repository of information Subject-oriented –Organized by subject, not by application –Used for analysis, data mining, etc. Optimized differently from transaction- oriented db User interface aimed at executive decision makers and analysts

SLIDE 57IS 257 – Fall 2009 … Cont’d Large volume of data (Gb, Tb) Non-volatile –Historical –Time attributes are important Updates infrequent May be append-only Examples –All transactions ever at WalMart –Complete client histories at insurance firm –Stockbroker financial information and portfolios Slide credit: J. Hammer

SLIDE 58IS 257 – Fall 2009 Warehouse is a Specialized DB Standard DB Mostly updates Many small transactions Mb - Gb of data Current snapshot Index/hash on p.k. Raw data Thousands of users (e.g., clerical users) Warehouse Mostly reads Queries are long and complex Gb - Tb of data History Lots of scans Summarized, reconciled data Hundreds of users (e.g., decision-makers, analysts) Slide credit: J. Hammer

SLIDE 59IS 257 – Fall 2009 Summary Operational Systems Enterprise Modeling Business Information Guide Data Warehouse Catalog Data Warehouse Population Data Warehouse Business Information Interface Slide credit: J. Hammer

SLIDE 60IS 257 – Fall 2009 Warehousing and Industry Warehousing is big business –$2 billion in 1995 –$3.5 billion in early 1997 –Predicted: $8 billion in 1998 [Metagroup] Wal-Mart is said to have the largest warehouse –1000-CPU, 583 Terabyte, Teradata system (InformationWeek, Jan 9, 2006) –“Half a Petabyte” in warehouse (Ziff Davis Internet, October 13, 2004) –1 billion rows of data or more are updated every day (InformationWeek, Jan 9, 2006) –Some Government and Scientific database are larger, however Slide credit: J. Hammer

SLIDE 61IS 257 – Fall 2009 Other Large Data Warehouses Not including Wal-Mart and Ebay (InformationWeek, Jan 9, 2006)

SLIDE 62IS 257 – Fall 2009 Types of Data Business Data - represents meaning –Real-time data (ultimate source of all business data) –Reconciled data –Derived data Metadata - describes meaning –Build-time metadata –Control metadata –Usage metadata Data as a product* - intrinsic meaning –Produced and stored for its own intrinsic value –e.g., the contents of a text-book Slide credit: J. Hammer

SLIDE 63IS 257 – Fall 2009 Next Time More on Data Warehouses Introduction to data mining