Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Eleventh Edition, Coronel & Morris
Data Models Big Concept definition Small Concept definition School of thought as to what and how a database should work and the technologies that it should be based on. Small Concept definition Representation, usually graphical, of the structure of a database solution for a given business problem. Both views of a Data Model are very important! Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Importance of Data Models (Small Concept definition) Relatively simple representations, usually graphical, of complex real-world data structures Facilitate interaction among the designer, the applications programmer, and the end user End-users have different views and needs for data Data model organizes data for various users Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
Data Model Basic Building Blocks (Small Concept definition) Entity - anything about which data are to be collected and stored Attribute - a characteristic of an entity Relationship - describes an association among entities One-to-many (1:M) relationship Many-to-many (M:N or M:M) relationship One-to-one (1:1) relationship Constraint - a restriction placed on the data Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
Business Rules Brief, precise, and unambiguous descriptions of a policies, procedures, or principles within a specific organization Description of operations that help to create and enforce actions within that organization’s environment Sources of Business Rules: Company managers Policy makers Department managers Written documentation Procedures Standards Operations manuals Direct interviews with end users Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Evolution of Data Models (Big Concept defintion) Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Relational Model Developed by E.F. “Ted” Codd (IBM) in 1970 Considered ingenious but impractical Conceptually simple Computers lacked power to implement the relational model Today, microcomputers can run sophisticated relational database software Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Relational Model (continued) Relational Database Management System (RDBMS) Most important advantage of the RDBMS is its ability to hide the complexities of the relational model from the user Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Relational Model (continued) Table (relations) Matrix consisting of a series of row/column intersections Related to each other through sharing a common entity characteristic Relational table is purely logical structure How data are physically stored in the database is of no concern to the user or the designer This property became the source of a real database revolution Relational diagram Representation of relational database’s entities, attributes within those entities, and relationships between those entities Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Relational Model (continued) Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Relational Model (continued) Rise to dominance due in part to its powerful and flexible query language Structured Query Language (SQL) allows the user to specify what must be done without specifying how it must be done Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Entity Relationship Model Widely accepted and adapted graphical tool for data modeling Introduced by Peter Chen in 1976 Graphical representation of entities and their relationships in a database structure Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Entity Relationship Model (continued) Entity relationship diagram (ERD) Uses graphic representations to model database components Entity is mapped to a relational table Entity instance (or occurrence) is a row in a table Entity set is collection of like entities Connectivity labels types of relationships Diamond connected to related entities through a relationship line Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
The Entity Relationship Model Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
Comparison of Representations Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris
Big Data Exponential growth in data Business data 2005, largest data warehouse was 100TB 2013, eBay “singularity system” was 40,000TB Consumer-generated data Twitter generates over 8TB of data every day Machine-generated data Boeing engine generates 10TB every 30 minutes (over 25,000 airline flights/day = a total of more than 1000TB /day) Large Hadron Collider generates 40TB every second
Why is Big Data Different Volume Variety Velocity Data Products – the data is the product Google Amazon
Conceptual Alternatives (NoSQL) Key-Value Store Document Database
Map Representations {“1” : {“Title”: “How to X”,“Author”: “Jane Doe”,“Categories”: {“foo” : “”,“bar” : “”},“Content”: “So you want to X? Here’s…”}, “2” : {“Title”: “X Y About Z”,“Author”: “John Doe”,“Content”: “Here’s a list of Y about Z…”}, “3” : {“Title”: “Why X Instead of Z”,“Author”: “Jane Doe”,“Categories”: {“foo”: “”,},“Content”: “Some people think Z, but…”}}
MapReduce Combination of the Map and Reduce algorithms Map – partition a task into several smaller subtasks, each assigned to a different worker process Reduce – aggregate the results of the map workers into a single result set. Written primarily in Java. Use pre-defined queries.
Relational and NoSQL Differences Structure requirements RDBMS: Structured in storage Big Data: Unstructured in storage, structure imposed during processing Integrity constraints RDBMS: Ensuring consistency and integrity requires overhead Big Data: Data captured “as is” Distributed limitations RDBMS: ACID transactions require greater synchronization between distributed nodes Big Data: Designed to be highly distributed over 1,000s of nodes without enforcement of strict consistency
Relational vs. NoSQL example Word Count (count how many times each word appears in the data) NoSQL partial solution (as provided in Hadoop Tutorial) SQL solution public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, int sum = 0; while (values.hasNext()) { sum += values.next().get(); output.collect(key, new IntWritable(sum)); SELECT Word, Count(*) FROM Res_Ad_Words GROUP BY Word; Database Systems: Design, Implementation, & Management, 11th Edition, Coronel & Morris