Download presentation
Presentation is loading. Please wait.
1
Cloud Computing and Architecuture
NoSQL and MongoDB
2
Henrik Bærbak Christensen
BigData New services need to store and query very large data sets Google, Amazon, Twitter, NetFlix, … Another issue is scalability Hyper-growth of # of users and traffic Do not want to go there! Henrik Bærbak Christensen
3
Can RDBM do it? Performance: Answer: No Time 10.000 rows ~ 1 second
Queries do not scale linearly in the size of the data set rows ~ 1 second 1 billion rows ~ 24 hours Answer: No [Jacobs (2009)]
4
Why do RDBMs become slow?
Jacobs’ paper explains the details. It boils down to: The memory hierarchy Cache, RAM, Disk, (Tape) Each step has a big performance penalty The reading pattern Sequential read faster than Random read Tape: obvious but even for RAM this is true! RDBs does not respect the underlying hardware
5
Big mess of random reads on disk/RAM
Normalization It becomes even worse due to the classic RDBM virtue of normalization: A user and a transaction table Big mess of random reads on disk/RAM
6
Jacobs’ statement
7
Denormalization technique
Make aggregate objects Allows much faster access Liability: Much more space intensive And! Store data in the sequences they are to be queried…
8
A new take on storing and quering big data
NoSQL Databases A new take on storing and quering big data
9
Key features NoSQL: ”Not Only SQL” / Not Relational
Horizontal scaling of simple operations over many servers Repliation and partitioning data over many servers Simple call level interface Weaker concurrency model than ACID Efficient use of RAM and dist. Indexes for storage Ability to dynamically add new attributes to records Architectural Drivers: Performance Scalabilty [Cattel, 2010]
10
Clarifications NoSQL focus on Simple operations Horizontal scaling
Key lookup, read/write of one or a few records Opposite: Complex joins over many tables (SQL) (NoSQL generally denormalize and thus do not support joins) Horizontal scaling Many servers with no RAM nor disk sharing Commodity hardware Cheap but more prone to failures
11
NoSQL DB Types
12
Basic types Four types
13
Henrik Bærbak Christensen
Key-Value stores Basically a Java HashMap<K,V>() Typically RAM based, with periodic flushing to disk And If you loose the key, you are lost Henrik Bærbak Christensen
14
Document Stores ”documents” Supports MongoDB: JSON objects.
Stronger queries, also in document contents Schema: Any JSON object may be stored! Atomic updates, otherwise no concurrency control Supports Master-slave replication, automatic failover and recovery Automatic sharding Range-based, on shard key (like zip-code, CPR, etc.)
15
CAP Theorem
16
Reviewing ACID Basic RDBM teaching talks on ACID Atomicity Concistency
Transaction: All or none succeed Concistency DB is in valid state before and after transaction Isolation N transactions executed concurrently = N executed in sequence Durability Once a transaction has been committed, it will remain so (even in the event of power loss, crash)
17
CAP Eric Brewer: only get two of the three: Consistency Availability
Set of operations has occurred at once (Client view) Availability Operations will terminate in intended reponse Partition tolerence Operation will complete, even if components are unavailable
18
Horizontal Scaling: P taken
We have already taken P, so we have to relax either Consistency Availability RDBM prefer consistency over availability NoSQL prefer availability over consistency replacing it with eventual consistency
19
Achieving ACID Two-phase Commit
All partitions pre-commit, report result to master If success, master tells each to commit; else roll-back Guaranty consistency, but availability suffer Example Two partitions, 99.9% availability => = 99.8% (+43 min down every month) Five partitions: 99,5% (36 hours down time in all)
20
BASE Replace ACID with BASE BA: Basically Available S: Soft state
E: Eventual consistent Availability achieved by partial failures not leading to system failures In two-phase commit, what would master do if one partition does not repond?
21
Eventual Consistency So what does this mean?
Upon write, an immediate read may retrieve the old value – or not see the newest added item! Why? Gets data from a replica that is not yet updated… Eventual consistency: Given sufficiently long period of time which no updates, all replicas are consistent, and thus all reads consistently return the same data… System always available, but state is ‘soft’ / cached
22
Discussion Web applications
Is it a problem that a facebook update takes some minutes to appear at my friends?
23
Henrik Bærbak Christensen
Summary RDBMs have issues with ‘big data’ Ignores the physical laws of hardware (random read) RDB model requires joins (random read) NoSQL Class of alternative DB models and impl. Two main classes Key-value stores Document stores CAP You can only get two of three NoSQL choose A, and sacrifice C for Eventual Consistency Henrik Bærbak Christensen
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.