PANEL SENIOR BIG DATA ARCHITECT BD-COE

Slides:



Advertisements
Similar presentations
Hui Li Pig Tutorial Hui Li Some material adapted from slides by Adam Kawa the 3rd meeting of WHUG June 21, 2012.
Advertisements

Working with pig Cloud computing lecture. Purpose  Get familiar with the pig environment  Advanced features  Walk though some examples.
High Level Language: Pig Latin Hui Li Judy Qiu Some material adapted from slides by Adam Kawa the 3 rd meeting of WHUG June 21, 2012.
1 1 Apache Hadoop and the Emergence of the Enterprise Data Hub Eli Collins, Chief Technologist ©2014 Cloudera, Inc. All rights reserved.
BigBench: Big Data Benchmark Proposal Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, Hans-Arno Jacobsen.
INTEGRATING BIG DATA TECHNOLOGY INTO LEGACY SYSTEMS Robert Cooley, Ph.D.CodeFreeze 1/16/2014.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Geomarketing, Geolocation, Geotargeting, Geomatic, … August 2013.
Reynold Xin Shark: Hive (SQL) on Spark. Stage 0: Map-Shuffle-Reduce Mapper(row) { fields = row.split("\t") emit(fields[0], fields[1]); } Reducer(key,
CS525: Big Data Analytics MapReduce Languages Fall 2013 Elke A. Rundensteiner 1.
SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.
Streams – DataStage Integration InfoSphere Streams Version 3.0
Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.
Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
Taming the ETL beast How LinkedIn uses metadata to run complex ETL flows reliably Rajappa Iyer Strata Conference, London, November 12, 2013.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Page 1 © Hortonworks Inc – All Rights Reserved Hortonworks Naser Ali UK Building Energy Management Group Hadoop: A Data platform for businesses.
The State of the Art in Supporting “Big Data” by Michael Stonebraker.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Next-Generation IDS: A CEP Use Case in 10 Minutes 3rd Draft – November 8, nd Event Processing Symposium Redwood Shores, California Tim Bass, CISSP.
MapReduce With a SQL-MapReduce focus by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2
MapReduce High-Level Languages Spring 2014 WPI, Mohamed Eltabakh 1.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
June 2013 BIG DATA SCIENCE: A PATH FORWARD. CONFIDENTIAL | 2  Data Science Lead.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Big Data Yuan Xue CS 292 Special topics on.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
What is Pig ???. Why Pig ??? MapReduce is difficult to program. It only has two phases. Put the logic at the phase. Too many lines of code even for simple.
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
Data Warehousing The Easy Way with AWS Redshift
Microsoft Power Query 101 Belinda Allen Smith & Allen Consulting, Inc.
Microsoft Ignite /28/2017 6:07 PM
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Image taken from: slideshare
Mail call Us: / / Hadoop Training Sathya technologies is one of the best Software Training Institute.
Big Data is a Big Deal!.
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Pig, Making Hadoop Easy Alan F. Gates Yahoo!.
ITCS-3190.
Unit 5 Working with pig.
An Open Source Project Commonly Used for Processing Big Data Sets
Spark Presentation.
Modern Data Management
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Data Platform and Analytics Foundational Training
Projects on Extended Apache Spark
Data Warehouse.
Pig Latin - A Not-So-Foreign Language for Data Processing
Operationalize your data lake Accelerate business insight
Introduction to Spark.
Oracle Analytic Views Enhance BI Applications and Simplify Development
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Accelerate Your Self-Service Data Analytics
XtremeData on the Microsoft Azure Cloud Platform:
Overview of big data tools
relational thoughts on NoSql
Big-Data Analytics with Azure HDInsight
ITI 257 Data Analysis with Power BI
Data Wrangling as the key to success with Data Lake
Visual Data Flows – Azure Data Factory v2
Visual Data Flows – Azure Data Factory v2
Presentation transcript:

PANEL SENIOR BIG DATA ARCHITECT BD-COE

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 2 When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline Low Cost Storage and Fast Loading Data Pre- Processing, Refining, Cleansing “Simple math at scale” (Score, filter, sort, avg., count...) Joins, Unions, Aggregates Analytics (Iterative and data mining) Reporting Stable Schema Evolving Schema Aster (SQL + MapReduce Analytics) Format, No Schema Hadoop Aster (MapReduce Analytics) Teradata/ Hadoop Teradata Hadoop Aster / Hadoop Aster Hadoop Aster Financial Analysis, Ad-Hoc/OLAP Enterprise-Wide BI and Reporting Spatial/Temporal Active Execution Interactive Data Discovery Web Clickstream, Set-Top Box Analysis CDRs, Sensor Logs, JSON Social Feeds, Text, Image Processing Audio/Video Storage and Refining Storage and Batch Transformation s

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 3 When to Use which data engine? The best approach by workload and data type Processing as a Function of Schema Requirements by Data Low Cost Storage and Fast Loading Data Pre- Processing, Refining, Cleansing “Simple math at scale” (Score, filter, sort, avg., count...) Joins, Unions, Aggregates Reporting Analytics (Iterative and data mining) Stable Schema Evolving Schema A-DBMS (SQL + MapReduce Analytics) Format, No Schema Hadoop A-DBMS (MapReduce Analytics) EDW/ Hadoop EDW (SQL analytics) Hadoop A-DBMS / Hadoop A-DBMS (SQL + MapReduce Analytics) Hadoop A-DBMS (MapReduce Analytics) Need Schema

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 4 Analytic_DBMS – Hadoop - EDW RequirementsA-DBMSHadoopEDW MapReduce integration Interactive user tools Complex analytics (e.g. time-series, graph, social network) UDF Multi-language support (Java, R, Python, Perl, SAS, scripts, Bash, C+) UDF Programming flexibility and ease UDF Performance Integrated data System management, WLM Labor costs Concurrent users ExcellentPoor Good Very GoodFair Note: +¼ moon can mean years of investment

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 5 END