3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980, 

Slides:



Advertisements
Similar presentations
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Advertisements

C6 Databases.
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
Topic Denormalisation S McKeever Advanced Databases 1.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Chapter 3 Database Management
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Database Management: Getting Data Together Chapter 14.
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Chapter 14 The Second Component: The Database.
Database – Part 2 Dr. V.T. Raja Oregon State University.
Data Warehouse Components
Designing a Data Warehouse
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Raster and Vector 2 Major GIS Data Models. Raster and Vector 2 Major GIS Data Models.
Lecture-8/ T. Nouf Almujally
Data Warehousing Alex Ostrovsky CS157B Spring 2007.
Database Systems COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Data Mining GyuHyeon Choi. ‘80s  When the term began to be used  Within the research community.
CSC271 Database Systems Lecture # 30.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
Section 1 # 1 CS The Age of Infinite Storage.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
Section 1 # 1 CS The Age of Infinite Storage.
Session-8 Data Management for Decision Support
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
MIS2502: Data Analytics The Information Architecture of an Organization.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Mining and Data Warehousing Many-to-Many Relationships Applications William Perrizo Dept of Computer Science North Dakota State Univ.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Warehouse. Group 5 Kacie Johnson Summer Bird Washington Farver Jonathan Wright Mike Muchane.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Department of Industrial Engineering Sharif University of Technology Session# 9.
Vertical Data In Data Processing, you run up against two curses immediately. Curse of cardinality: solutions don’t scale well with respect to record volume.
DATA RESOURCE MANAGEMENT
Foundations of Business Intelligence: Databases and Information Management.
Autonomous Robots Vision © Manfred Huber 2014.
Data Mining and Data Warehousing, many-to-many Relationships, applications DataSURG (Database Systems Users and Research Group) North Dakota State University.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Information Systems in Organizations Managing the business: decision-making Growing the business: knowledge management, R&D, and social business.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Information Systems in Organizations
Data Mining Motivation: “Necessity is the Mother of Invention”
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Database Management System (DBMS)
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Terms: Data: Database: Database Management System: INTRODUCTION
Analytics, BI & Data Integration
Advanced Geospatial Techniques: Aiding Earth Observation Applications
Presentation transcript:

3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,  Do transaction processing on a DataBase Management System (DBMS), rather than doing file processing on file systems.  “Using a DBMS, instead of file systems, unifies data resources, centralizes control, standardizes usages, minimizes redundancy and inconsistency, maximizes data value and usage.  Inmon, et all, circa 1990  “Buy a separate Data Warehouse (DW) for long-running queries and data mining” (separate from DBMS for transaction processing)”.  “Double your hardware! Double your software! Double your fun! Section 3 # 0

Data Warehouses (DWs) vs. DataBase Management Systems (DBMSs)  What happened?  Inmon's idea was a great marketing success!,  but foretold a great Concurrency Control Research & Development (CC R&D) failure! CC R&D people had failed to integrate transaction and query processing, Also Known As (AKA) OnLine Transaction Processing (OLTP) and OnLine Analytic Processing (OLAP), that is, update and read workloads) in one system with acceptable performance!  Marketing of Data Warehouses was so successful, nobody noticed the failure! (or seem to mind paying double)  Most enterprises now have a separate DW from their DBMS Section 3 # 0.1

Some still hope DWs and DBs will one day be unified again. The industry may demand it eventually; e.g., Already, there is research work on real time updating of Data Warehouses (DW) s For now let’s just focus on DATA. You run up against two curses immediately in data processing. Curse of cardinality: solutions don’t scale well with respect to record volume. "files are too deep!" Curse of dimensionality: solutions don’t scale with respect to attribute dimension. "files are too wide!" Curse of cardinality is a problem in the horizontal and vertical world!  In the horizontal world it was disguised as “curse of the slow join”. In the horizontal world we decompose relations to get good design (e.g., 3 rd normal form), but then we pay for that by requiring many slow joins to get the answers we need. Section 3 # 0.2

Horizontal Processing of Vertical Data or HPVD, instead of the ubiquitous Vertical Processing of Horizontal (record orientated) Data or VPHD. Parallelizing the processing engine.  Parallelize the software engine on clusters of computers.  Parallelize the greyware engine on clusters of people (i.e., enable visualization and use the web...). Again, we need better techniques for data analysis, querying and mining because of: Parkinson’s Law: Data volume expands to fill available data storage. Moore’s law: Available storage doubles every 9 months! Techniques to address these curses. Section 3 # 2

Yield prediction: Using Remotely Sensed Imagery (RSI) consists of an aerial photograph (RGB TIFF image taken ~July) and a synchronized crop yield map taken at harvest; thus, 4 feature attributes (B,G,R,Y) and ~100,000 pixels. A stronger association, “hi_NIR & low_red  hi_yield”, found through HPVD data mining), allows producers to take and query mid-season aerial photographs for low_NIR & high_red grid cells, and where low yeild is anticipated, apply (top dress) additional nitrogen. Can producers use Landsat images of China of predict wheat prices before planting? A few HPVD successes: 1. Precision Agriculture TIFF image Yield Map 2. Infestation Detection (e.g., Grasshopper Infestation Prediction - again involving RSI) Grasshopper caused significant economic loss each year. Early infestation prediction is key to damage control. Pixel classification on remotely sensed imagery holds much promise to achieve early detection. Pixel classification (signaturing) has many, many applications: pest detection, Flood monitoring, fire detection, wetlands monitoring … Section 3 # 3 Producer are able to analyze the color intensity patterns from aerial and satellite photos taken in mid season to predict yield (find associations between electromagnetic reflection and yeild). E.g., ”hi_green & low_red  hi_yield”. That is very intuitive.

3. Sensor Network Data HPVD  Micro and Nano scale sensor blocks are being developed for sensing  Biological agents  Chemical agents  Motion detection  coatings deterioration  RF-tagging of inventory (RFID tags for Supply Chain Mgmt)  Structural materials fatigue There will be trillions ++ of individual sensors creating mountains of data which can be data mined using HPVD (maybe it shouldn't be called a success yet?). Section 3 # 4

4. A Sensor Network Application: Each energized nano-sensor transmits a ping (location is triangulated from the ping). These locations are then translated to 3-dimensional coordinates at the display. The corresponding voxel on the display lights up. This is the expendable, one-time, cheap sensor version. A more sophisticated CEASR device could sense and transmit the intensity levels, lighting up the display voxel with the same intensity. Wherever a threshold level is sensed (of chemical, biological, thermal, etc.), a ping is registered in a compressed Vertical data structure for that location (The compressed vertical data structure is a Ptree. A detailed definition Ptrees is coming up later). Situation space Nano-sensors dropped into the Situation space Soldier sees replica of sensed situation prior to entering space.:.:.:.:..:: ….:. : … : … ::..:.. :: :.: … : :..:..::..::..:.::...:.:.:.:..:: ….:. : … : … ::..:.. :: :.: … : :..:..::..::..:.::...:.:.:.:..:: ….:. : … : … ::..:.. :: :.: … : :..:..::..::..:.::.. A clear plexiglass cube, with embedded nano-LEDs at each voxel (volume pixel) displays the situation to theuser. ================================== \ CARRIER / CubE for Active Situation Replication (CEASR) The single compressed vertical data structure (Ptree) containing all the information is transmitted to the cube, where the pattern is reconstructed (uncompress, display). Section 3 # 5

3. Anthropology Application Digital Archive Network for Anthropology (DANA) (analyze, query and mine arthropological artifacts (shape, color, discovery location,…) Section 3 # 6

What has spawned these successes? (i.e., What is Data Mining?) Querying is asking specific questions for specific answers Data Mining is finding the patterns that exist in data ( going into MOUNTAINS of raw data for the information gems hidden in that mountain of data.) Raw data must be cleaned of: missing items, outliers, noise, errors Data Warehouse: cleaned, integrated, read-only, periodic, historical database Data Mining Pattern Evaluation and Assay Classification Clustering Rule Mining Task-relevant Data Selection Feature extraction, tuple selection visualization Loop backs Smart files Section 3 # 7