Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.

Similar presentations


Presentation on theme: "CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak."— Presentation transcript:

1 CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak www.cs.sjsu.edu/~mak

2 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 2 Unofficial Field Trip  Computer History Museum in Mt. View http://www.computerhistory.org/  Experience a fully restored IBM 1401 mainframe computer from the early 1960s in operation. General info: http://en.wikipedia.org/wiki/IBM_1401http://en.wikipedia.org/wiki/IBM_1401 My summer seminar: http://www.cs.sjsu.edu/~mak/1401/http://www.cs.sjsu.edu/~mak/1401/ Restoration: http://ed- thelen.org/1401Project/1401RestorationPage.htmlhttp://ed- thelen.org/1401Project/1401RestorationPage.html Private demos at 11:45 and at 2:00.  See a life-size working model of Charles Babbage’s Difference Engine in operation, a hand-cranked mechanical computer designed in the early 1800s. Public demo at 1:00. Saturday, March 23. Meet in the museum lobby at 11:15 AM.

3 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 3 Extra Credit!  There will be extra credit if you participate in the unofficial field trip to the Computer History Museum. Up to 10 points added to your midterm score. To be decided:  a quiz (via Desire2Learn)  or an essay

4 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 4 Extract, Transform, and Load (ETL)

5 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 5 Extract, Transform, and Load (ETL)  You want only high quality data in your data warehouse.  What is high quality data? correct unambiguous consistent complete  The transform phase of ETL produces high quality data. Cleaning the data. Conforming data from multiple sources. _

6 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 6 Extract, Transform, and Load (ETL)  In the real world, data is often dirty. Therefore, the ETL process must clean the source data when the data is being copied into the data warehouse.  Cleaning operations Remove or correct corrupted data. Remove or correct invalid or inconsistent data.  unexpected null values  missing data  values out of range  misspellings  referential integrity violations  business rule violations _

7 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 7 Extract, Transform, and Load (ETL)  Data from multiple sources may need to be conformed to be usable together in the data warehouse. Type conversion  Example: Convert a user ID in a data source from a string to a long integer to match with the user ID in other data sources. Format conversion  Example: Dates and times, names Align field and attribute names  Examples: customer_name vs. name_of_client store vs. retail_outlet _

8 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 8 ETL: Semantic Mappings  Unit conversions Example: feet vs. yards, miles vs. kilometers  Structural mappings Example: federal  state  city  district vs. kingdom  region  parish  Temporal mappings Example: One data source has a measure taken once an hour, another data source has the same measure taken daily.  Spatial mappings Example: street addresses vs. GIS coordinates (latitude + longitude) vs. political boundaries (cities, districts, counties, etc.)

9 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 9 ETL: Semantic Mappings  Spatio-temporal mappings Locations in space-time  And even more complex mappings May require the use of ontologies.  shared vocabularies  knowledge structures  models of reality  etc. _

10 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 10 Dimensional Modeling  Fact tables Contain values that are measures, usually numeric.  Example: the number of sales  Dimension tables Contain the context for the measures.  Examples: time, location, product Dimensions are usually grouped and hierarchical  Example: western locations, eastern locations  Example: yearly, quarterly, monthly, weekly, daily, hourly Often denormalized for query performance.  Many queries, few updates. _

11 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 11 Dimensional Modeling  Design criteria What are the facts? What are we measuring?  Example: number of sales What is the grain, or granularity of the facts?  Determined by the dimensions.  All measurements in a fact table must be at the same grain.  Example: sales figures collected at the point of sale What are the dimensions? What context do we need to provide for the measures in the fact table?  Examples: stores, dates, products

12 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 12 Dimensional Modeling  Implementation Star schema Measures: number of units sold Dimensions: date, store, product

13 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 13 Online Analytical Processing (OLAP)  A common type of business analysis. Also used to analyze scientific data.  Visualize data in a multidimensional manner. Analytical processes that involve manipulating data along different dimensions. The OLAP cube.  “What happened recently, and why?” _ http://gerardnico.com/wiki/database/oracle/oracle_olap

14 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 14 Online Analytical Processing (OLAP)  OLAP operations slice and dice drill up, drill down drill across, drill through pivot _

15 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 15 Online Analytical Processing (OLAP)  Slice View or manipulate the data along a subset of the dimensions. Consider only data from the first quarter. http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

16 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 16 Online Analytical Processing (OLAP)  Dice View or manipulate the data within subsets of the ranges of the dimensions. Consider only data from Q1 and Q2 from only Toronto and Vancouver for only computers and home entertainment. http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

17 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 17 Online Analytical Processing (OLAP)  Drill down View or manipulate a dimension at a lower level of detail. Drill down on the time dimension from quarters to months. http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

18 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 18 Online Analytical Processing (OLAP)  Drill up “Roll up” (aggregate) data to a higher level in along a dimension. Sum up the cities by country. http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

19 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 19 Online Analytical Processing (OLAP)  Drill across Integrate data from more than one fact table.  Drill through Access the database tables that underlie the OLAP cube. _

20 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 20 Online Analytical Processing (OLAP)  Pivot Rotate the axes (dimensions) to present a different view. http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

21 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 21 OLAP Summary http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

22 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 22 DW Summary http://www.csun.edu/~twang/595DM/Slides/Week6.pdf Plus: dashboards and scorecards

23 Department of Computer Science Spring 2013: March 20 CS 157B: Database Management Systems II © R. Mak 23 Cognos  Business intelligence (BI) tool from IBM. Queries and reports Dashboards and scorecards OLAP Data mining  predictive analysis  Cognos Business Intelligence 10 is available in the IBM Academic Cloud along with a sample data warehouse. I will create student accounts. Online tutorials Cognos demo


Download ppt "CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak."

Similar presentations


Ads by Google