Presentation is loading. Please wait.

Presentation is loading. Please wait.

V 1.0 DBMAN 10 Non-traditional Databases 1.

Similar presentations


Presentation on theme: "V 1.0 DBMAN 10 Non-traditional Databases 1."— Presentation transcript:

1 V 1.0 DBMAN 10 Non-traditional Databases szabo.zsolt@nik.uni-obuda.hu 1

2 V 1.0 Non-Traditional = schema-less Many implementations with each working very differently and serving a specific need Common aim: less complex storage, more flexibility, faster access Usually also: arbitrary type of relations, or even relation- less structures szabo.zsolt@nik.uni-obuda.hu 2

3 V 1.0 Comparison Structure and type of data being kept: –SQL/Relational databases require a structure with defined attributes to hold the data, unlike NoSQL databases which usually allow free-flow operations. Querying: –Regardless of their licences, relational databases all implement the SQL standard to a certain degree and thus, they can be queried using the Structured Query Language (SQL). NoSQL databases, on the other hand, each implement a unique way to work with the data they manage. szabo.zsolt@nik.uni-obuda.hu 3

4 V 1.0 Comparison Scaling: –Both solutions are easy to scale vertically (i.e. by increasing system resources). However, being more modern (and simpler) applications, NoSQL solutions usually offer much easier means to scale horizontally (i.e. by creating a cluster of multiple machines). Reliability: –When it comes to data reliability and safe guarantee of performed transactions, SQL databases are still the better bet. szabo.zsolt@nik.uni-obuda.hu 4

5 V 1.0 Comparison Support: –Relational database management systems have decades long history. They are extremely popular and it is very easy to find both free and paid support. If an issue arises, it is therefore much easier to solve than recently-popular NoSQL databases -- especially if said solution is complex in nature (e.g. MongoDB). Complex data keeping and querying needs: –By nature, relational databases are the go-to solution for complex querying and data keeping needs. They are much more efficient and excel in this domain. szabo.zsolt@nik.uni-obuda.hu 5

6 V 1.0 NoSQL advantages – “BIG DATA” VVVC = Velocity, Variety, Volume, Complexity High data velocity – lots of data coming in very quickly, possibly from different locations. Data variety – storage of data that is structured, semi- structured and unstructured. Data volume – data that involves many terabytes or petabytes in size. Data complexity – data that is stored and managed in different locations or data centers. szabo.zsolt@nik.uni-obuda.hu 6

7 V 1.0 Typical “BIG DATA” requirements Continuous Data Availability & Real Location Independence & Better Architecture –Easy replication, because of the simpler data format Modern Transactional Capabilities (CAP vs ACID!) –Consistency (all nodes see the same data at the same time) –Availability (a guarantee that every request receives a response about whether it succeeded or failed) – Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures) –Pick TWO! szabo.zsolt@nik.uni-obuda.hu 7

8 V 1.0 Typical “BIG DATA” requirements Flexible Data Models –A NoSQL database is able to accept all types of data – structured, semi-structured, and unstructured – much more easily than a relational database which rely on a predefined schema –Faster operations on semi-structured / unstructured data Analytics and Business Intelligence –ability to mine the data that is being collected –“real-time data warehousing”: no need for middle- layer: no group by and join queries NO NEED FOR GROUP BY AND JOIN? HOW? szabo.zsolt@nik.uni-obuda.hu 8

9 V 1.0 Storage formats HTML = document descriptor XML, YAML = data descriptor JSON = object descriptor szabo.zsolt@nik.uni-obuda.hu 9

10 V 1.0 Storage formats XML  many tools, fast and memory efficient, hard to read, hard to parse, XPATH is official YAML  few tools, 2x memory, easy to read, easy to parse JSON  more and more tools, 2x memory, hard to read, easy to parse, JSONPATH is unofficial szabo.zsolt@nik.uni-obuda.hu 10

11 V 1.0 NoSQL types Graph database –Data is stored in graphs –OrientDB, Neo4J, etc. –Google: Linked data, Linked open data –RDF = Resource Description Framework When to use –Handling complex relational information –Modelling and handling classifications szabo.zsolt@nik.uni-obuda.hu 11

12 V 1.0 NoSQL types Column store / wide-column stores –sounds like the inverse of a standard database –very high performance and a highly scalable architecture –Apache Cassandra  fastest When to use –Keeping unstructured, non-volatile information: –Scaling szabo.zsolt@nik.uni-obuda.hu 12

13 V 1.0 NoSQL types Key-Value store –least complex, like Dictionary in C# –Memcached vs MemcacheDB (cache vs storage) –Oracle NoSQL Database (Eventually-Consistent) When to use –Caching –Queueing: –Distributing information / tasks –Keeping live information szabo.zsolt@nik.uni-obuda.hu 13

14 V 1.0 NoSQL types Document database –expands on the basic idea of key-value stores –“documents” contain data (using a storage format!) and each document is assigned a unique key –Apache CouchDB, Lotus Notes –MongoDB –XML databases (DB2, SQL Server, Oracle, PostgreSQL) When to use –Nested information –JavaScript/programming language friendly szabo.zsolt@nik.uni-obuda.hu 14

15 V 1.0 MongoDB features JSON-like documents (called BSON: Binary JSON) Ad hoc queries that can include user-defined JavaScript functions Indexing is similar to RDBMSes Replication/Load balancing: high availability with replica sets; one replica set = two or more copies of the data Biggest problem: only byte-wise comparison, no UTF8 order by / replace features! szabo.zsolt@nik.uni-obuda.hu 15

16 V 1.0 SQLMongoDB database tablecollection row document or BSON document columnfield index table joins embedded documents and linking primary key (automatically set to the _id field) aggregation (e.g. group by)aggregation pipeline szabo.zsolt@nik.uni-obuda.hu 16

17 V 1.0 MongoDB vs SQL SQL: CREATE TABLE MongoDB: Imlicitly created when inserting the first record db.users.insert( { user_id: "abc123", age: 55, status: "A" } ) or: db.createCollection("users") szabo.zsolt@nik.uni-obuda.hu 17

18 V 1.0 MongoDB vs SQL SQL: ALTER TABLE... ADD... MongoDB: db.users.update( { }, { $set: { join_date: new Date() } }, { multi: true } ) szabo.zsolt@nik.uni-obuda.hu 18

19 V 1.0 MongoDB vs SQL SQL: ALTER TABLE... DROP COLUMN... MongoDB: db.users.update( { }, { $unset: { join_date: "" } }, { multi: true } ) szabo.zsolt@nik.uni-obuda.hu 19

20 V 1.0 MongoDB vs SQL SQL: INSERT INTO / UPDATE / DELETE MongoDB: db.users.insert( { user_id: "abc123", age: 55, status: "A" } ) db.users.remove( { status: "D" } ) db.users.update( { age: { $gt: 25 } }, { $set: { status: "C" } }, { multi: true } ) db.users.update( { status: "A" }, { $inc: { age: 3 } }, { multi: true } ) szabo.zsolt@nik.uni-obuda.hu 20

21 V 1.0 MongoDB vs SQL SQL: SELECT... FROM MongoDB: db.users.find() db.users.find().limit(5).skip(10) db.users.find( { }, { user_id: 1, status: 1 } ) db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } ) db.users.find( { age: { $gt: 25, $lte: 50 } } ) db.users.find( { user_id: /^bc/ } ) szabo.zsolt@nik.uni-obuda.hu 21

22 V 1.0 szabo.zsolt@nik.uni-obuda.hu 22 SQLMongoDB WHERE$match GROUP BY$group HAVING$match SELECT$project ORDER BY$sort LIMIT$limit SUM()$sum COUNT()$sum join No direct corresponding operator; use $unwind for somewhat similar functionality

23 V 1.0 MongoDB vs SQL SQL: SELECT COUNT(*)... MongoDB: db.users.count() or db.users.find().count() db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } } ] ) db.orders.aggregate( [ { $group: { _id: "$cust_id", total: { $sum: "$price" } } }, { $sort: { total: 1 } } ] ) szabo.zsolt@nik.uni-obuda.hu 23

24 V 1.0 MongoDB Usually works well if bi-directional speed / reports are not that important Practice showed, that a RamDisk + LOAD DATA INFILE + MyISAM is still faster than the thousands of insert into commands on a MongoDB With MongoDB, we lose the power of joins, and we lose the performance gain of group by queries... We have to be careful when using it! szabo.zsolt@nik.uni-obuda.hu 24

25 V 1.0 XML databases “XML Enabled” vs “native XML” databases The “big four” supports the XML type in CLOB fields –Typically an XML enabled database is best suited where the majority of data are non-XML –For datasets where the majority of data are XML, a native XML database is better suited. Native XML database: BDB –Defines a (logical) model for an XML document –Has an XML document as its fundamental unit of (logical) storage –not required to have any particular underlying physical storage model szabo.zsolt@nik.uni-obuda.hu 25

26 V 1.0 XML databases Querying documents via Xquery / Xpath Results are transformed via XSLT It is usually not widespread any more, so I am not talking about it more... szabo.zsolt@nik.uni-obuda.hu 26

27 V 1.0 List of questions Types of data models Basic units of the RDBMS systems Types of relations between entities/tables in an RDBMS Types of keys in RDBMS tables Elements of an ER diagram Purpose of normalization, anomalies Normalization levels: base model.. BCNF Verification of dependency preserving decomposition Verification of lossless decomposition SQL: list of sub-languages, commands, dialects, suffixes szabo.zsolt@nik.uni-obuda.hu 27

28 V 1.0 List of questions SQL: Purpose and usage of group by, CUBE, ROLLUP SQL: Types of table joins (inner, right outer, left outer, full outer) SQL: object types (table, view, user, role, cursor, procedure, function, trigger) SQL: Constraint types OLTP vs OLAP comparison: usage OLTP vs OLAP comparison: normalized structure vs star Transaction management, definition of the ACID principles Storage models of an RDBMS szabo.zsolt@nik.uni-obuda.hu 28

29 V 1.0 List of questions RAID levels Heap file vs indexed file Index properties Relational algebra and calculus: approach, principle OOP vs RDBMS: examples to HAS-A, PART-OF, IS-A, INSTANCE-OF Mapping OOP inheritance to tables Principles of the ORM layer and the Repository Pattern NoSQL avantages/requirements NoSQL database types Document Storage formats szabo.zsolt@nik.uni-obuda.hu 29

30 V 1.0 Never forget! “The requirement for consistency is the convergence of the coherency” szabo.zsolt@nik.uni-obuda.hu 30

31 V 1.0 szabo.zsolt@nik.uni-obuda.hu 31

32 szabo.zsolt@nik.uni-obuda.hu 32


Download ppt "V 1.0 DBMAN 10 Non-traditional Databases 1."

Similar presentations


Ads by Google