3 DAYS ON JANUARY 16 th, 17 th & 18 th 2015 Santa Clara Convention Center, 5001 Great America Parkway, Santa Clara, CA 95054, United States.

Slides:



Advertisements
Similar presentations
INTRODUCTION Agenda BUSINESS CHALLENGES FEATURES OF RAPID MARTS SOLUTION OVERVIEW DWH USING SAP RAPID MARTS BENEFITS TO BUSINESS USERS.
Advertisements

SQOOP HCatalog Integration
The State of SharePoint BI
CS525: Special Topics in DBs Large-Scale Data Management MapReduce High-Level Langauges Spring 2013 WPI, Mohamed Eltabakh 1.
Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Components and Architecture CS 543 – Data Warehousing.
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
Web-Enabling the Warehouse Chapter 16. Benefits of Web-Enabling a Data Warehouse Better-informed decision making Lower costs of deployment and management.
Big Data A big step towards innovation, competition and productivity.
Introduction to Microsoft Office Web Apps with Jim Mollé Learn iT! Computer Software Training.
Big data analytics with R and Hadoop Chapter 5 Learning Data Analytics with R and Hadoop 데이터마이닝연구실 김지연.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Data: Migrating, Distributing and Audit Tracking Michelle Ayers, Advisory Solution Consultant
Hive : A Petabyte Scale Data Warehouse Using Hadoop
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Hadoop and HDFS
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
Using SAS® Information Map Studio
An Introduction to HDInsight June 27 th,
Data Management Console Synonym Editor
A NoSQL Database - Hive Dania Abed Rabbou.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Web Timesheet Application
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Chapter 11 Using SAS ® Web Report Studio. Section 11.1 Overview of SAS Web Report Studio.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
‘BigExcel’ A Web-Based Framework for Exploring Big Data in Social Sciences Asif Saleem, Blesson Varghese and Adam Barker University of St Andrews, UK
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Apache PIG rev Tools for Data Analysis with Hadoop Hadoop HDFS MapReduce Pig Statistical Software Hive.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Learn Hadoop and Big Data Technologies. Hadoop  An Open source framework that stores and processes Big Data in distributed manner on a large groups of.
Andy Roberts Data Architect
Microsoft Power Query: an Excel Users Dream for Data Extraction and Cleansing Presented by: Belinda Allen Smith & Allen Consulting, Inc.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Apache Tez : Accelerating Hadoop Query Processing Page 1.
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Dumps PDF Perform Data Engineering on Microsoft Azure HD Insight dumps.html Complete PDF File Download From.
Data Visualization with Tableau
Big Data & Test Automation
PROTECT | OPTIMIZE | TRANSFORM
Selecting the Best BI Tool
Hadoop.
Getting Started with Power Query
Building Analytics At Scale With USQL and C#
Tableau Overview  Tableau is widely used data visualization and BI tool. Tableau is simple to use and has extensive visualization capability that make.
Visual Analytics Sandbox
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Server & Tools Business
Introduction to Apache
Overview of big data tools
HP ALM Introduction.
ບົດທີ 6 ການຄຸ້ມຄອງຊັບພະຍາກອນຂໍ້ມູນ (Managing Data Resource)
Dashboard in an Hour Using Power BI
IBM C IBM Big Data Engineer. You want to train yourself to do better in exam or you want to test your preparation in either situation Dumpspedia’s.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

3 DAYS ON JANUARY 16 th, 17 th & 18 th 2015 Santa Clara Convention Center, 5001 Great America Parkway, Santa Clara, CA 95054, United States

Data to Analytics in 2 hours Using Hue, Hive and Tableau Public

About me Technology geek and Data Evangelist with deep dive expertise in Big Data, Decision Support and operational based systems or or

Agenda Using NYSE data we will be generating reports and dashboard to perform top down volume analysis for year Targeted Audience: Architects, developers, analysts and almost every IT professional. You will understand different open source tools, tips and techniques that are available for quick turn around of POCs.

Agenda Understanding data and tools Gather data (eoddata.com) Prepare or format data Upload data to HDFS using Hue Process data using Hive Develop reports and dashboard using Tableau Public

Understanding data and tools NYSE eod data and company list Understand tools – Apache Hadoop HDFS (Distributed and logical file system) Map Reduce (Distributed batch computing framework) – Apache Hue Web Interface which consolidates all Hadoop eco system tools Useful for developers, testers and analysts – Apache Hive HDFS and Map Reduce based Query Language Used as Database that can complement or replace existing Data Warehouse – Tableau Public Reporting tool

Gather data NYSE data Companylist data

Prepare or format data NYSE eod data is provided as individual files for each day and hence we will run into too many small files issue. Concatenate the small files into larger ones (best way is to use partition tables) Companylist is delimited by "," and causes some issues, hence change delimiter to "|"

Upload data to HDFS using Hue Use File browser Create 2 directories (nyse and companylist) Upload files

Process data using Hive Create 2 tables one for NYSE eod data and company list Create user defined function to transform date to sortable date format Develop, execute the query and validate the results Create stage table

Develop reports and dashboard using Tableau Public Download data using Hue Determine granularity for the report, we need to compute monthly volume, per stock ticker in each of the sectors Develop reports and dashboard using Tableau public – Filters, Calculated Fields and other features will be covered

Thank You