Download presentation
Presentation is loading. Please wait.
PublishMoses Walsh Modified over 9 years ago
1
BTM 382 Database Management Chapter 2: Data models Chapter 12.12-13: CAP and Hadoop Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia University, Montréal
2
Models and data models
3
What is a model? A model is a simplified way to describe or explain a complex reality A model helps people communicate and work simply yet effectively when talking about and manipulating complex real-world phenomena
4
Scientific models Image sources: http://www.redorbit.com/education/reference_library/space_1/universe/2574692/geocentric_model/ http://hendrianusthe.wordpress.com/2012/06/21/heliocentric-vs-geocentric/
5
Conceptual models Image sources: http://info563.malagaclasses.info/strategy-it-2/ http://fivewhys.wordpress.com/2012/05/22/business-model-innovation/
6
Importance of Data Models
7
The Evolution of Data Models
8
Obsolete models: Hierarchical and network models
9
The Relational Model Uses key concepts from mathematical relations (tables) “Relational” in “relational model” means “tables” (mathematical relations), not “relationships” Table (relations) Matrix consisting of row/column intersections Relations have well defined methods (queries) for combining their data members Selecting (reading) and joining (combining) data is defined based on rigorous mathematical principles Relational data management system (RDBMS) Relations where originally too advanced for 1970s computing power As computing power increased, simplicity of the model prevailed
10
The Entity Relationship Model Very detailed specification of relationships and their properties Enhancement of the relational model Relations (tables) become entities Entity relationship diagram (ERD) Uses graphic representations to model database components Many variations for notation exist In this class, we use the Crow’s Foot notation
11
The Object-Oriented Data Model (OODM) Addresses “impedance mismatch” problem of the ER model The ER model’s view of data (tables) and programmers’ view of data (objects in OOP), is completely different This mismatch makes database programming painful, especially for very complex data structures OODM Uses object-oriented programming concepts to store data Objects represent nouns (entities or records) Objects have attributes (properties or fields) with values (data) Objects have methods (operations or functions) Classes group similar objects using a hierarchy and inheritance In an OODBMS, the data retrieval and storage closely mirrors the data structures that programmers use, and so programming complex objects is much easier than with the ER model More advanced forms support the Extended Relational Data Model, Object/Relational DBMS, and XML data structures
12
OODBMS vs. RDBMS https://youtu.be/kORTgvfHl4g
15
Big Data and NoSQL
16
Explaining Big Data https://youtu.be/7D1CQ_LOizA
17
Big Data Volume Huge amounts of data (terabytes and petabytes), especially from the Internet Velocity Organizations need to process the huge amounts of data rapidly, just as with smaller databases Variety Wide variety of data, much of it unstructured and even changing in structure
18
How do you handle Big Data? The problem with RDBMSs 1.Scale up: use more powerful, expensive servers But RDBMS is very computing intensive Big data would require much faster, more capable, more expensive computers, and even that’s not good enough for big data 2.Scale out: use many cheap distributed servers But RDBMS is slow with distributed processing Consistency is the biggest problem: guaranteeing consistency (which RDBMS is great at) is slow Slow infrastructure isn’t good enough for big data
19
What is NoSQL? https://www.youtube.com/watch?v=qUV2j3XBRHc
20
NoSQL databases to the Big Data rescue “NoSQL” means: Non-relational or non-RDBMS Also “Not only SQL”—a few in fact do support SQL It is not one model; it is many different models that are not relational data models Scale out (many cheap distributed servers) instead of scale up High scalability Support distributed database architectures High availability Rapid performance for big data, including unstructured and sparse data Fault tolerance Continue to work even if some servers in the cluster fail Emphasis is high performance speed, rather than transaction consistency
21
Types of NoSQL databases Image sources: https://www.linkedin.com/pulse/20140823125259-38485481-nosql-databases-where-i-can-use?trk=sushi_topic_posts https://www.linkedin.com/pulse/20140823125259-38485481-nosql-databases-where-i-can-use?trk=sushi_topic_posts http://www.monitis.com/blog/2011/05/22/picking-the-right-nosql-database-tool/ Also see: Picking the Right NoSQL Database Tool
22
Disadvantages of NoSQL Complex programming is required “NoSQL” means you lose the ease-of-use and structural independence of SQL There is often no built-in implementation of relationships in the database—you have to program relationships yourself in code Data is often inconsistent No guarantee of transaction integrity Entity integrity and referential integrity not guaranteed The data you retrieve at any given moment might be wrong… but it will eventually become OK This is the price to pay for rapid performance in a distributed database
23
The CAP theorem for distributed databases CAP stands for: Consistency: All nodes see the same data Availability: A request always gets a response (success or failure) Partition tolerance: Even if a node fails, the system can still function A distributed database can guarantee only two of the three CAP characteristics, never all three at the same time However, over time, it might be able to provide all three NoSQL databases are distributed, and so the CAP theorem restricts them to providing BASE, not ACID Image source: PRWEBPRWEB
24
ACID versus BASE A relational database guarantees the ACID properties: Atomicity, Consistency, Isolated, Durable In short, a set of SQL statements (called a transaction) will either all work, or all fail—no half way success, and the result will not corrupt the database A price to pay: results might be somewhat slow A NoSQL database does not guarantee ACID; it only guarantees BASE properties: Basically Available, Soft-state, Eventual consistency In short, at any given moment, not everything might be consistent, but the database will eventually get consistent In return, these imperfect results are delivered fast
27
Summary and conclusions of various data models
28
Distributed Database Spectrum Table 12.8 Sacrifices availability to ensure consistency and isolation
29
Historical outline of data models
30
Which data model should you use? Hierarchical or network models Obsolete—no one uses these any longer Entity-relationship model Almost always 90% or more of professional database situations Object-oriented database When you have very complex data structures, you need rapid performance, and it helps achieve organizational objectives Source: Barry & Associates, IncBarry & Associates, Inc When data structures are so complex that organizing data as tables causes headaches in programming retrieval and storage NoSQL When you have vast amounts of unstructured data and you need rapid performance When speed is more important than data consistency
31
Sources Most of the slides are adapted from Database Systems: Design, Implementation and Management by Carlos Coronel and Steven Morris. 11th edition (2015) published by Cengage Learning. ISBN 13: 978-1-285-19614-5Database Systems: Design, Implementation and Management Other sources are noted on the slides themselves
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.