Download presentation
Presentation is loading. Please wait.
Published byMegan Gonzales Modified over 11 years ago
1
Oracle Data Warehouse Mit Big Data neue Horizonte für das Data Warehouse ermöglichen Alfred Schlaucher, Detlef Schroeder DATA WAREHOUSE
2
Big Data Buzz Word oder eine neue Dimension und Möglichkeiten Oracles Technologie zu Speichern von unstrukturierten und teilstrukturierten Massendaten Cloudera Framwork Connectors in die neue Welt Oracle Loader for Hadoop und HDFS Big Data Appliance Mit Oracle R Enterprise neue Analyse-Horizonte entdecken Big Data Analysen mit Endeca Themen
3
Hive Hive is an abstraction on top of MapReduce Allows users to query data in the Hadoop cluster without knowing Java or MapReduce Uses the HiveQL language Very similar to SQL The Hive Interpreter runs on a client machine Turns HiveQL queries into MapReduce jobs Submits those jobs to the cluster Note: this does not turn the cluster into a relational database server! It is still simply running MapReduce jobs Those jobs are created by the Hive Interpreter
4
Hive (contd) Sample Hive query: SELECT stock.product, SUM(orders.purchases) FROM stock INNER JOIN orders ON (stock.id = orders.stock_id) WHERE orders.quarter = 'Q1' GROUP BY stock.product; SELECT stock.product, SUM(orders.purchases) FROM stock INNER JOIN orders ON (stock.id = orders.stock_id) WHERE orders.quarter = 'Q1' GROUP BY stock.product;
5
Pig Pig is an alternative abstraction on top of MapReduce Uses a dataflow scripting language Called PigLatin The Pig interpreter runs on the client machine Takes the PigLatin script and turns it into a series of MapReduce jobs Submits those jobs to the cluster As with Hive, nothing magical happens on the cluster It is still simply running MapReduce jobs
6
Pig (contd) Sample Pig script: stock = LOAD '/user/fred/stock' AS (id, item); orders= LOAD '/user/fred/orders' AS (id, cost); grpd = GROUP orders BY id; totals = FOREACH grpd GENERATE group, SUM(orders.cost) AS t; result = JOIN stock BY id, totals BY group; DUMP result; stock = LOAD '/user/fred/stock' AS (id, item); orders= LOAD '/user/fred/orders' AS (id, cost); grpd = GROUP orders BY id; totals = FOREACH grpd GENERATE group, SUM(orders.cost) AS t; result = JOIN stock BY id, totals BY group; DUMP result;
7
Flume and Sqoop Flume provides a method to import data into HDFS as it is generated Rather than batch-processing the data later For example, log files from a Web server Sqoop provides a method to import data from tables in a relational database into HDFS - HIVE Does this very efficiently via a Map-only MapReduce job Can also go the other way Populate database tables from files in HDFS
8
Oozie Oozie allows developers to create a workflow of MapReduce jobs Including dependencies between jobs The Oozie server submits the jobs to the server in the correct sequence
9
HBase HBase is the Hadoop database A NoSQL datastore Can store massive amounts of data Gigabytes, terabytes, and even petabytes of data in a table Scales to provide very high write throughput Hundreds of thousands of inserts per second Copes well with sparse data Tables can have many thousands of columns Even if most columns are empty for any given row Has a very constrained access model Insert a row, retrieve a row, do a full or partial table scan Only one column (the row key) is indexed
10
HBase vs Traditional RDBMSs RDBMSHBase Data layoutRow-orientedColumn-oriented TransactionsYesSingle row only Query languageSQLget/put/scan SecurityAuthentication/Authorizati on TBD IndexesOn arbitrary columnsRow-key only Max data sizeTBsPB+ Read/write throughput limits 1000s queries/secondMillions of queries/second
11
Kontakt und mehr Informationen Oracle Data Warehouse Community Mitglied werden Viele kostenlose Seminare und Events Download – Server: www.ORACLEdwh.de Nächste deutschsprachige Oracle DWH Konferenz: 19. + 20. März 2013 Kassel
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.