Hadoop, Hive, JSON, and Data! Oh, my!! TJay Belt 1.

Slides:



Advertisements
Similar presentations
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Advertisements

FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
Relational Database Alternatives NoSQL. Choosing A Data Model Relational database underpin legacy applications and meet business needs However, companies.
Running Hadoop-as-a-Service in the Cloud
Transform + analyze Visualize + decide Capture + manage Dat a.
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Hadoop Ecosystem Overview
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.
Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
An Introduction to HDInsight June 27 th,
Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp.
Fitting Microsoft Hadoop Into Your Enterprise BI Strategy Cindy Gross | SQLCAT PM
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
BACS 287 Big Data & NoSQL 2016 by Jones & Bartlett Learning LLC.
Streaming Relational Internal & external Non-relational NoSQL MobileReports Natural language queryDashboardsApplications Orchestration Machine learningModeling.
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Orion Contextbroker PROF. DR. SERGIO TAKEO KOFUJI PROF. MS. FÁBIO H. CABRINI PSI – 5120 – TÓPICOS EM COMPUTAÇÃO EM NUVEM
MongoDB for SQL Developers Ben Galluzzo SQL Saturday #395 – Baltimore - BI Edition 2015.
An Introduction To Big Data For The SQL Server DBA.
Efficient Data Management Tools for the Heterogeneous Big Data Warehouse Autors: Aleksandr Alekseev (Programmer), Victoria Osipova (Associate professor),
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Apache Hadoop on Windows Azure Avkash Chauhan
Microsoft Ignite /28/2017 6:07 PM
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Dumps PDF Perform Data Engineering on Microsoft Azure HD Insight dumps.html Complete PDF File Download From.
Big Data & Test Automation
CS 405G: Introduction to Database Systems
Data Platform and Analytics Foundational Training
SAS users meeting in Halifax
Big Data Enterprise Patterns
Big Data A Quick Review on Analytical Tools
An Open Source Project Commonly Used for Processing Big Data Sets
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
CS122B: Projects in Databases and Web Applications Winter 2017
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Hadoopla: Microsoft and the Hadoop Ecosystem
DATA SCIENCE Online Training at GoLogica
Data Platform and Analytics Foundational Training
Overview of Azure Data Lake Store
NoSQL Systems Overview (as of November 2011).
This meme comes from South Park (S2E )
Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.
Microsoft Connect /24/ :05 AM
Overview of big data tools
Database Systems Summary and Overview
Charles Tappert Seidenberg School of CSIS, Pace University
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Azure Cosmos DB with SQL API .Net SDK
Big-Data Analytics with Azure HDInsight
Server & Tools Business
Moving your on-prem data warehouse to cloud. What are your options?
Introduction to Azure Data Lake
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Pig Hive HBase Zookeeper
Presentation transcript:

Hadoop, Hive, JSON, and Data! Oh, my!! TJay Belt 1

Database Administrator at Imagine Learning me Read me Follow 2

Thanks to our Sponsors! Yearly Partners Gold Sponsors

 Big Data ecosystem  30,000 feet view of our ecosystem  Issues found along the way Overview 4

Json (JavaScript Object Notation)  Lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

Json (JavaScript Object Notation) { "_id": " ", "Revision": 12, "ModelData": { "GradeLevel": "Kindergarten", "FirstLanguage": "English“ }, "SetTheStageData": { "LastSetTheStageLibraryWords": 1, "LastSetTheStageTakeATest": 0 }

Json (JavaScript Object Notation) "TestInstances": [{ "Product": "ILE", "Lesson": "30698aac-5a3d c-16de4ba9db70", "LessonBranch": "Main", "TestType": "PlacementTest", "TimeStarted": " T15:16: :00", "TimeCompleted": " T15:26: :00", "TestInstanceId": "1", "TestSectionInstances": [{ "TestSection": "Letter Recognition", "TestQuestionInstances": [{ "TestQuestion": "q43", "TimeStarted": " T15:17: :00", "TimeCompleted": " T15:17: :00", "TestOptionInstances": [{ "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt256" }, { "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt258" }, { "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt257" }, { "ClickCount": 1, "IsSelected": true, "ResponseLatency": -8467, "TestOption": "opt253" }, { "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt255" }, { "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt254" }] },

Blob Storage  Reliable, cost-effective cloud storage for large amounts of unstructured data  Microsoft Azure Cloud

MongoDB  MongoDB (from humongous) is a cross-platform document-oriented database.  Classified as a NoSQL database that eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas  Making the integration of data in certain types of applications easier and faster.

Hadoop  is a Java-based programming framework that supports the processing of large data sets in a distributed computing environment.

MapReduce  is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm  on a cluster.

HIVE  Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.  It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop.

What do we have? 13

Things we tried  SQL Server Json procs  SlamData  PowerQuery  DocumentDB  MongoDirector  SQL Azure

Issues I encountered 16

17

Issues I encountered 18

Issues I encountered 19

Thank You! TJay Belt Cell(801) Bloghttp://tjaybelt.blogspot.comhttp://tjaybelt.blogspot.com Linked Inwww.linkedin.com/in/tjaybeltwww.linkedin.com/in/tjaybelt Skypetjaybelt Google+linklink

Thanks to our Sponsors! Yearly Partners Gold Sponsors