Download presentation
Presentation is loading. Please wait.
1
2009.11.03 SLIDE 1IS 257 – Fall 2009 JDBC and Java Access to DBMS & Introduction to Data Warehouses University of California, Berkeley School of Information IS 257: Database Management
2
2009.11.03 SLIDE 2IS 257 – Fall 2009 Lecture Outline Review: –Object-Relational DBMS –OR features in Oracle –OR features in PostgreSQL –Extending OR databases (examples from PostgreSQL) Java and JDBC Introduction to Data Warehouses
3
2009.11.03 SLIDE 3IS 257 – Fall 2009 Lecture Outline Object-Relational DBMS –OR features in Oracle –OR features in PostgreSQL Extending OR databases (examples from PostgreSQL) Java and JDBC Introduction to Data Warehouses
4
2009.11.03 SLIDE 4IS 257 – Fall 2009 Object Relational Data Model Class, instance, attribute, method, and integrity constraints OID per instance Encapsulation Multiple inheritance hierarchy of classes Class references via OID object references Set-Valued attributes Abstract Data Types
5
2009.11.03 SLIDE 5IS 257 – Fall 2009 Object Relational Extended SQL (Illustra) CREATE TABLE tablename {OF TYPE Typename}|{OF NEW TYPE typename} (attr1 type1, attr2 type2,…,attrn typen) {UNDER parent_table_name}; CREATE TYPE typename (attribute_name type_desc, attribute2 type2, …, attrn typen); CREATE FUNCTION functionname (type_name, type_name) RETURNS type_name AS sql_statement
6
2009.11.03 SLIDE 6IS 257 – Fall 2009 Object-Relational SQL in ORACLE CREATE (OR REPLACE) TYPE typename AS OBJECT (attr_name, attr_type, …); CREATE TABLE OF typename;
7
2009.11.03 SLIDE 7IS 257 – Fall 2009 Example CREATE TYPE ANIMAL_TY AS OBJECT (Breed VARCHAR2(25), Name VARCHAR2(25), Birthdate DATE); Creates a new type CREATE TABLE Animal of Animal_ty; Creates “Object Table”
8
2009.11.03 SLIDE 8IS 257 – Fall 2009 Constructor Functions INSERT INTO Animal values (ANIMAL_TY(‘Mule’, ‘Frances’, TO_DATE(‘01-APR-1997’, ‘DD-MM- YYYY’))); Insert a new ANIMAL_TY object into the table
9
2009.11.03 SLIDE 9IS 257 – Fall 2009 PostgreSQL Classes The fundamental notion in Postgres is that of a class, which is a named collection of object instances. Each instance has the same collection of named attributes, and each attribute is of a specific type. Furthermore, each instance has a permanent object identifier (OID) that is unique throughout the installation. Because SQL syntax refers to tables, we will use the terms table and class interchangeably. Likewise, an SQL row is an instance and SQL columns are attributes.
10
2009.11.03 SLIDE 10IS 257 – Fall 2009 Creating a Class You can create a new class by specifying the class name, along with all attribute names and their types: CREATE TABLE weather ( city varchar(80), temp_lo int, -- low temperature temp_hi int, -- high temperature prcp real, -- precipitation date date );
11
2009.11.03 SLIDE 11IS 257 – Fall 2009 PostgreSQL Postgres can be customized with an arbitrary number of user-defined data types. Consequently, type names are not syntactical keywords, except where required to support special cases in the SQL92 standard. So far, the Postgres CREATE command looks exactly like the command used to create a table in a traditional relational system. However, we will presently see that classes have properties that are extensions of the relational model.
12
2009.11.03 SLIDE 12IS 257 – Fall 2009 Inheritance CREATE TABLE cities ( name text, population float, altitude int -- (in ft) ); CREATE TABLE capitals ( state char(2) ) INHERITS (cities);
13
2009.11.03 SLIDE 13IS 257 – Fall 2009 Inheritance In Postgres, a class can inherit from zero or more other classes. A query can reference either –all instances of a class –or all instances of a class plus all of its descendants
14
2009.11.03 SLIDE 14IS 257 – Fall 2009 Non-Atomic Values - Arrays The preceding SQL command will create a class named SAL_EMP with a text string (name), a one-dimensional array of int4 (pay_by_quarter), which represents the employee's salary by quarter and a two-dimensional array of text (schedule), which represents the employee's weekly schedule Now we do some INSERTSs; note that when appending to an array, we enclose the values within braces and separate them by commas.
15
2009.11.03 SLIDE 15IS 257 – Fall 2009 PostgreSQL Extensibility Postgres is extensible because its operation is catalog- driven –RDBMS store information about databases, tables, columns, etc., in what are commonly known as system catalogs. (Some systems call this the data dictionary). One key difference between Postgres and standard RDBMS is that Postgres stores much more information in its catalogs –not only information about tables and columns, but also information about its types, functions, access methods, etc. These classes can be modified by the user, and since Postgres bases its internal operation on these classes, this means that Postgres can be extended by users –By comparison, conventional database systems can only be extended by changing hardcoded procedures within the DBMS or by loading modules specially-written by the DBMS vendor.
16
2009.11.03 SLIDE 16IS 257 – Fall 2009 Rules System CREATE RULE name AS ON event TO object [ WHERE condition ] DO [ INSTEAD ] [ action | NOTHING ] Rules can be triggered by any event (select, update, delete, etc.)
17
2009.11.03 SLIDE 17IS 257 – Fall 2009 Views as Rules Views in Postgres are implemented using the rule system. In fact there is absolutely no difference between a CREATE VIEW myview AS SELECT * FROM mytab; compared against the two commands CREATE TABLE myview (same attribute list as for mytab); CREATE RULE "_RETmyview" AS ON SELECT TO myview DO INSTEAD SELECT * FROM mytab;
18
2009.11.03 SLIDE 18IS 257 – Fall 2009 Extensions to Indexing Access Method extensions in Postgres GiST: A Generalized Search Trees –Joe Hellerstein, UC Berkeley
19
2009.11.03 SLIDE 19IS 257 – Fall 2009 Indexing in OO/OR Systems Quick access to user-defined objects Support queries natural to the objects Two previous approaches –Specialized Indices (“ABCDEFG-trees”) redundant code: most trees are very similar concurrency control, etc. tricky! –Extensible B-trees & R-trees (Postgres/Illustra) B-tree or R-tree lookups only! E.g. ‘WHERE movie.video < ‘Terminator 2’
20
2009.11.03 SLIDE 20IS 257 – Fall 2009 GiST Approach A generalized search tree. Must be: Extensible in terms of queries General (B+-tree, R-tree, etc.) Easy to extend Efficient (match specialized trees) Highly concurrent, recoverable, etc.
21
2009.11.03 SLIDE 21IS 257 – Fall 2009 GiST Applications New indexes needed for new apps... –find all supersets of S –find all molecules that bind to M –your favorite query here (multimedia?)...and for new queries over old domains: –find all points in region from 12 to 2 o’clock –find all text elements estimated relevant to a query string
22
2009.11.03 SLIDE 22IS 257 – Fall 2009 Lecture Outline Review –Object-Relational DBMS –OR features in Oracle –OR features in PostgreSQL –Extending OR databases (examples from PostgreSQL) Java and JDBC Introduction to Data Warehouses
23
2009.11.03 SLIDE 23IS 257 – Fall 2009 Java and JDBC Java is probably the high-level language used in instruction and development today one of the earliest “enterprise” additions to Java was JDBC JDBC is an API that provides a mid-level access to DBMS from Java applications Intended to be an open cross-platform standard for database access in Java Similar in intent to Microsoft’s ODBC
24
2009.11.03 SLIDE 24IS 257 – Fall 2009 JDBC Architecture The goal of JDBC is to be a generic SQL database access framework that works for any database system with no changes to the interface code OracleMySQLPostgres Java Applications JDBC API JDBC Driver Manager Driver
25
2009.11.03 SLIDE 25IS 257 – Fall 2009 JDBC Provides a standard set of interfaces for any DBMS with a JDBC driver – using SQL to specify the databases operations. Resultset Statement Resultset Connection PreparedStatementCallableStatement DriverManager Oracle Driver ODBC DriverPostgres Driver Oracle DBPostgres DBODBC DB Application
26
2009.11.03 SLIDE 26IS 257 – Fall 2009 JDBC Simple Java Implementation import java.sql.*; import oracle.jdbc.*; public class JDBCSample { public static void main(java.lang.String[] args) { try { // this is where the driver is loaded //Class.forName("jdbc.oracle.thin"); DriverManager.registerDriver(new OracleDriver()); } catch (SQLException e) { System.out.println("Unable to load driver Class"); return; }
27
2009.11.03 SLIDE 27IS 257 – Fall 2009 JDBC Simple Java Impl. try { //All DB access is within the try/catch block... // make a connection to ORACLE on Dream Connection con = DriverManager.getConnection( "jdbc:oracle:thin:@dream.sims.berkeley.edu:1521:dev", “mylogin", “myoraclePW"); // Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");
28
2009.11.03 SLIDE 28IS 257 – Fall 2009 JDBC Simple Java Impl. // show the Results... while(rs.next()) { System.out.println(rs.getString("NAME")); } // Release the database resources... rs.close(); stmt.close(); con.close(); } catch (SQLException se) { // inform user of errors... System.out.println("SQL Exception: " + se.getMessage()); se.printStackTrace(System.out); }
29
2009.11.03 SLIDE 29IS 257 – Fall 2009 JDBC Once a connection has been made you can create three different types of statement objects Statement –The basic SQL statement as in the example PreparedStatement –A pre-compiled SQL statement CallableStatement –Permits access to stored procedures in the Database
30
2009.11.03 SLIDE 30IS 257 – Fall 2009 JDBC Resultset methods Next() to loop through rows in the resultset To access the attributes of each row you need to know its type, or you can use the generic “getObject()” which wraps the attribute as an object
31
2009.11.03 SLIDE 31IS 257 – Fall 2009 JDBC “GetXXX()” methods SQL data typeJava TypeGetXXX() CHARStringgetString() VARCHARStringgetString() LONGVARCHARStringgetString() NUMERICJava.math. BigDecimal GetBigDecimal() DECIMALJava.math. BigDecimal GetBigDecimal() BITBooleangetBoolean() TINYINTBytegetByte()
32
2009.11.03 SLIDE 32IS 257 – Fall 2009 JDBC GetXXX() Methods SQL data typeJava TypeGetXXX() SMALLINTInteger (short)getShort() INTEGERIntegergetInt() BIGINTLonggetLong() REALFloatgetFloat() FLOATDoublegetDouble() DOUBLEDoublegetDouble() BINARYByte[]getBytes() VARBINARYByte[]getBytes() LONGVARBINARYByte[]getBytes()
33
2009.11.03 SLIDE 33IS 257 – Fall 2009 JDBC GetXXX() Methods SQL data type Java TypeGetXXX() DATEjava.sql.DategetDate() TIMEjava.sql.TimegetTime() TIMESTAMPJava.sql.TimestampgetTimeStamp()
34
2009.11.03 SLIDE 34IS 257 – Fall 2009 Large Object Handling Large binary data can be read from a resultset as streams using: –getAsciiStream() –getBinaryStream() –getUnicodeStream() ResultSet rs = stmt.executeQuery(“SELECT IMAGE FROM PICTURES WHERE PID = 1223”)); if (rs.next()) { BufferedInputStream gifData = new BufferedInputSteam( rs.getBinaryStream(“IMAGE”)); byte[] buf = new byte[4*1024]; // 4K buffer int len; while ((len = gifData.read(buf,0,buf.length)) != -1) { out.write(buf, 0, len); }
35
2009.11.03 SLIDE 35IS 257 – Fall 2009 JDBC Metadata There are also methods to access the metadata associated with a resultSet –ResultSetMetaData rsmd = rs.getMetaData(); Metadata methods include… –getColumnCount(); –getColumnLabel(col); –getColumnTypeName(col)
36
2009.11.03 SLIDE 36IS 257 – Fall 2009 JDBC access to MySQL The basic JDBC interface is the same, the only differences are in how the drivers are loaded public class JDBCTestMysql { public static void main(java.lang.String[] args) { try { // this is where the driver is loaded Class.forName("com.mysql.jdbc.Driver").newInstance(); } catch (InstantiationException i) { System.out.println("Unable to load driver Class"); return; } catch (ClassNotFoundException e) { System.out.println("Unable to load driver Class"); …
37
2009.11.03 SLIDE 37IS 257 – Fall 2009 JDBC for MySQL try { //All DB access is within the try/catch block... // make a connection to MySQL on Dream Connection con = DriverManager.getConnection( "jdbc:mysql://localhost/ (this is really one line) MyDatabase?user=MyLogin&password=MySQLPW"); // Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST"); Otherwise everything is the same as in the Oracle example For connecting to the machine you are running the program on, you can use “localhost” instead of the machine name
38
2009.11.03 SLIDE 38IS 257 – Fall 2009 Demo – JDBC for MySQL Demo of JDBC code on Harbinger Code is available on class web site
39
2009.11.03 SLIDE 39IS 257 – Fall 2009 Lecture Outline Review –Object-Relational DBMS –OR features in Oracle –OR features in PostgreSQL –Extending OR databases (examples from PostgreSQL) Java and JDBC Introduction to Data Warehouses
40
2009.11.03 SLIDE 40IS 257 – Fall 2009 Overview Data Warehouses and Merging Information Resources What is a Data Warehouse? History of Data Warehousing Types of Data and Their Uses
41
2009.11.03 SLIDE 41IS 257 – Fall 2009 Problem: Heterogeneous Information Sources “Heterogeneities are everywhere” p Different interfaces p Different data representations p Duplicate and inconsistent information Personal Databases Digital Libraries Scientific Databases World Wide Web Slide credit: J. Hammer
42
2009.11.03 SLIDE 42IS 257 – Fall 2009 Problem: Data Management in Large Enterprises Vertical fragmentation of informational systems (vertical stove pipes) Result of application (user)-driven development of operational systems Sales AdministrationFinanceManufacturing... Sales Planning Stock Mngmt... Suppliers... Debt Mngmt Num. Control... Inventory Slide credit: J. Hammer
43
2009.11.03 SLIDE 43IS 257 – Fall 2009 Goal: Unified Access to Data Integration System Collects and combines information Provides integrated view, uniform user interface Supports sharing World Wide Web Digital LibrariesScientific Databases Personal Databases Slide credit: J. Hammer
44
2009.11.03 SLIDE 44IS 257 – Fall 2009 The Traditional Research Approach Source... Integration System... Metadata Clients Wrapper Query-driven (lazy, on-demand) Slide credit: J. Hammer
45
2009.11.03 SLIDE 45IS 257 – Fall 2009 Disadvantages of Query-Driven Approach Delay in query processing –Slow or unavailable information sources –Complex filtering and integration Inefficient and potentially expensive for frequent queries Competes with local processing at sources Hasn’t caught on in industry Slide credit: J. Hammer
46
2009.11.03 SLIDE 46IS 257 – Fall 2009 The Warehousing ApproachDataWarehouse Clients Source... Extractor/ Monitor Integration System... Metadata Extractor/ Monitor Extractor/ Monitor Information integrated in advance Stored in WH for direct querying and analysis Slide credit: J. Hammer
47
2009.11.03 SLIDE 47IS 257 – Fall 2009 Advantages of Warehousing Approach High query performance –But not necessarily most current information Doesn’t interfere with local processing at sources –Complex queries at warehouse –OLTP at information sources Information copied at warehouse –Can modify, annotate, summarize, restructure, etc. –Can store historical information –Security, no auditing Has caught on in industry Slide credit: J. Hammer
48
2009.11.03 SLIDE 48IS 257 – Fall 2009 Not Either-Or Decision Query-driven approach still better for –Rapidly changing information –Rapidly changing information sources –Truly vast amounts of data from large numbers of sources –Clients with unpredictable needs Slide credit: J. Hammer
49
2009.11.03 SLIDE 49IS 257 – Fall 2009 Data Warehouse Evolution TIME 2000199519901985198019601975 Information- Based Management Data Revolution “Middle Ages” “Prehistoric Times” Relational Databases PC’s and Spreadsheets End-user Interfaces 1st DW Article DW Confs. Vendor DW Frameworks Company DWs “Building the DW” Inmon (1992) Data Replication Tools Slide credit: J. Hammer
50
2009.11.03 SLIDE 50IS 257 – Fall 2009 What is a Data Warehouse? “A Data Warehouse is a –subject-oriented, –integrated, –time-variant, –non-volatile collection of data used in support of management decision making processes.” -- Inmon & Hackathorn, 1994: viz. Hoffer, Chap 11
51
2009.11.03 SLIDE 51IS 257 – Fall 2009 DW Definition… Subject-Oriented: –The data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects include Customers Patients Students Products Etc.
52
2009.11.03 SLIDE 52IS 257 – Fall 2009 DW Definition… Integrated –The data housed in the data warehouse are defined using consistent Naming conventions Formats Encoding Structures Related Characteristics
53
2009.11.03 SLIDE 53IS 257 – Fall 2009 DW Definition… Time-variant –The data in the warehouse contain a time dimension so that they may be used as a historical record of the business
54
2009.11.03 SLIDE 54IS 257 – Fall 2009 DW Definition… Non-volatile –Data in the data warehouse are loaded and refreshed from operational systems, but cannot be updated by end-users
55
2009.11.03 SLIDE 55IS 257 – Fall 2009 What is a Data Warehouse? A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant Slide credit: J. Hammer
56
2009.11.03 SLIDE 56IS 257 – Fall 2009 A Data Warehouse is... Stored collection of diverse data –A solution to data integration problem –Single repository of information Subject-oriented –Organized by subject, not by application –Used for analysis, data mining, etc. Optimized differently from transaction- oriented db User interface aimed at executive decision makers and analysts
57
2009.11.03 SLIDE 57IS 257 – Fall 2009 … Cont’d Large volume of data (Gb, Tb) Non-volatile –Historical –Time attributes are important Updates infrequent May be append-only Examples –All transactions ever at WalMart –Complete client histories at insurance firm –Stockbroker financial information and portfolios Slide credit: J. Hammer
58
2009.11.03 SLIDE 58IS 257 – Fall 2009 Warehouse is a Specialized DB Standard DB Mostly updates Many small transactions Mb - Gb of data Current snapshot Index/hash on p.k. Raw data Thousands of users (e.g., clerical users) Warehouse Mostly reads Queries are long and complex Gb - Tb of data History Lots of scans Summarized, reconciled data Hundreds of users (e.g., decision-makers, analysts) Slide credit: J. Hammer
59
2009.11.03 SLIDE 59IS 257 – Fall 2009 Summary Operational Systems Enterprise Modeling Business Information Guide Data Warehouse Catalog Data Warehouse Population Data Warehouse Business Information Interface Slide credit: J. Hammer
60
2009.11.03 SLIDE 60IS 257 – Fall 2009 Warehousing and Industry Warehousing is big business –$2 billion in 1995 –$3.5 billion in early 1997 –Predicted: $8 billion in 1998 [Metagroup] Wal-Mart is said to have the largest warehouse –1000-CPU, 583 Terabyte, Teradata system (InformationWeek, Jan 9, 2006) –“Half a Petabyte” in warehouse (Ziff Davis Internet, October 13, 2004) –1 billion rows of data or more are updated every day (InformationWeek, Jan 9, 2006) –Some Government and Scientific database are larger, however Slide credit: J. Hammer
61
2009.11.03 SLIDE 61IS 257 – Fall 2009 Other Large Data Warehouses Not including Wal-Mart and Ebay (InformationWeek, Jan 9, 2006)
62
2009.11.03 SLIDE 62IS 257 – Fall 2009 Types of Data Business Data - represents meaning –Real-time data (ultimate source of all business data) –Reconciled data –Derived data Metadata - describes meaning –Build-time metadata –Control metadata –Usage metadata Data as a product* - intrinsic meaning –Produced and stored for its own intrinsic value –e.g., the contents of a text-book Slide credit: J. Hammer
63
2009.11.03 SLIDE 63IS 257 – Fall 2009 Next Time More on Data Warehouses Introduction to data mining
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.