Big Data Intro.

Slides:



Advertisements
Similar presentations
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
Advertisements

Hive: A data warehouse on Hadoop
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
© Hortonworks Inc Secure SQL Standard based Authorization for Apache Hive Thejas Page 1.
Module 1: Database and Instance. Overview Defining a Database and an Instance Introduce Microsoft’s and Oracle’s Implementations of a Database and an.
Raghav Ayyamani. Copyright Ellis Horowitz, Why Another Data Warehousing System? Problem : Data, data and more data Several TBs of data everyday.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
DWA Example Scenarios This presentation shows a number of the most common scenarios used with the Distributed Websydian Architecture. Note that there are.
Penwell Debug Intel Confidential BRIEF OVERVIEW OF HIVE Jonathan Brauer ESE 380L Feb
Hive Facebook 2009.
Enabling data management in a big data world Craig Soules Garth Goodson Tanya Shastri.
DATABASE CONNECTIVITY TO MYSQL. Introduction =>A real life application needs to manipulate data stored in a Database. =>A database is a collection of.
Windows 7 WampServer 2.1 MySQL PHP 5.3 Script Apache Server User Record or Select Media Upload to Internet Return URL Forward URL Create.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
ECMM6018 Enterprise Networking For Electronic Commerce Tutorial 6 CGI/Perl and databases.
DAT602 Database Application Development Lecture 1 Course Structure & Background knowledge.
Learn Hadoop and Big Data Technologies. Hadoop  An Open source framework that stores and processes Big Data in distributed manner on a large groups of.
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
MSBIC Hadoop Series Hadoop & Microsoft BI Bryan Smith
BIG DATA. The information and the ability to store, analyze, and predict based on that information that is delivering a competitive advantage.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Introduction to Hadoop
SAS users meeting in Halifax
PROTECT | OPTIMIZE | TRANSFORM
CSCE 587: Big Data Analytics
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Unit 2 Hadoop and big data
Big Data –An Overview
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Eric Shook Department of Geography Kent State University
Sqoop Mr. Sriram
A Warehousing Solution Over a Map-Reduce Framework
Hadoopla: Microsoft and the Hadoop Ecosystem
TABLE OF CONTENTS. TABLE OF CONTENTS Not Possible in single computer and DB Serialised solution not possible Large data backup difficult so data.
Hive Mr. Sriram
SQOOP.
ICT Database Lesson 1 What is a Database?.
IBM NETEZZA DBA online Training at GoLogica Technologies
Hadoop Clusters Tess Fulkerson.
Central Florida Business Intelligence User Group
07 | Analyzing Big Data with Excel
Ministry of Higher Education
Big Data - in Performance Engineering
QA Validation in Big Data
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Server & Tools Business
Introduction to Apache
Big Data Young Lee BUS 550.
Zoie Barrett and Brian Lam
Charles Tappert Seidenberg School of CSIS, Pace University
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Dep. of Information Technology By: Raz Dara Mohammad Amin
Big Data: Four Vs Salhuldin Alqarghuli.
Big Data Analysis in Digital Marketing
Big DATA.
Big-Data Analytics with Azure HDInsight
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
SQL Server 2019 Bringing Apache Spark to SQL Server
Pig Hive HBase Zookeeper
Big Data.
Presentation transcript:

Big Data Intro

Examples - Volvo Collect inf. of road profile Next time optimizing fuel consuming http://etd.dtu.dk/thesis/191250/oersted_dtu2647.pdf

Examples - Google car Collect all kind of information Self driving https://techcrunch.com/2015/05/15/google-self-driving- cars-mountain- view/?ncid=rss&utm_source=feedburner&utm_medium= feed&utm_campaign=Feed%3A+Techcrunch+%28TechCru nch%29&utm_content=Google+International

Examples - Formula 1 racing team ‘Red Bull‘ Several TB data in just one race 40-50 engineers analysing real time -> modify configuration http://www.redbullracing.com/car/rb10

Examples - Social medias Approx. 562.000.000 tweets / day 4.5 billion likes generated daily as of May 2013

What characterize Big Data Volume - The quantity of generated and stored data. Variety - The type and nature of the data. – structured / unstructured Velocity - The speed at which the data is generated and processed. Variability - Inconsistency of the data set can hamper processes to handle and manage it. Veracity - The quality of captured data can vary greatly, affecting accurate analysis

Big Data implementations Apache – Hadoop – most known Apache - https://hadoop.apache.org/ Sandbox with bundled tools Hortonworks – http://hortonworks.com/products/sandbox/ -- We use this Cloudera – http://www.cloudera.com/downloads.html Some others Oracle IBM – Watson – Beats Jeopardy world masters Microsoft

Hadoop architecture overview http://www.dineshonjava.com/2014/11/hadoop-architecture.html#.WJJBxxsrLDc

Full picture https://www.mapr.com/why-hadoop/game-changer/overview

Our Architecture HADOOP * User interface – Ambari * Analyzing / extracting tool – PIG, Hive * Map Reduce (processing file request) * Hadoop File System System in sandbox Linux (with a filesystem) eg. RedHat with ext2 HOST OS (with a filesystem) eg. Windows 10 with NTFS

Hive extract / insert data HIVE data storage -> upon HDFS Hive shell -> ’command prompt’ inside HIVE. Metastore -> inf about the DB e.g. the DB it self, tables, HiveQL -> SQL like interface Hive Clients e.g. ODBC, JDBC

Hive Metastore Holds information of different storage e.g. show database show tables describe ”table” -> show design of table

Hive Query Language -- HiveQL SQl like interface Create, insert, select Example Select * from xxxx where yy < 10