Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials:

Slides:



Advertisements
Similar presentations
Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
The State of SharePoint BI
Overview of MapReduce and Hadoop
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
Joe Hummel, PhD Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Adjunct Professor: U. of Illinois, Chicago and Loyola University Chicago
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
IT Experts: Personal BI with Power Pivot y Power View Ruben Gonzalez Senior Premier Field Engineer Jun 19 th, 2013.
Running Hadoop-as-a-Service in the Cloud
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
Microsoft Big Data Essentials Module 1 - Introduction to Big Data
How to Install Windows 7.
Hadoop as a Service Boston Azure 29-March-2012 Copyright (c) 2011, Bill Wilder – Use allowed under Creative Commons license
Server & Tools Business
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials:
Joe Hummel, PhD Technical Staff: Pluralsight Adjunct Professor: UIC, LUC
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Esri Maps for Office A brief introduction Dan Ames 9/20/2012.
Building BI App on Cloud Rohit Chatter Sr.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
HAMS Technologies 1
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
An Introduction to HDInsight June 27 th,
Big Data for Relational Practitioners Len Wyatt Program Manager Microsoft Corporation DBI225.
Joe Hummel, PhD Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois, Chicago stuff:
Joe Hummel, the compiler is at your service Chicago Code Camp 2014.
Hadoop as a Service Boston Azure / Microsoft DevBoston 07-Feb-2012 Copyright (c) 2011, Bill Wilder – Use allowed under Creative Commons license
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
Map-Reduce Big Data, Map-Reduce, Apache Hadoop SoftUni Team Technical Trainers Software University
Virtual techdays INDIA │ 9-11 February 2011 virtual techdays Data grail: Data Market on Windows Azure Sudhindra Kovalam │ Developer, Icertis Inc.
Copyright © 2015, SAS Institute Inc. All rights reserved. THE ELEPHANT IN THE ROOM SAS & HADOOP.
Map-Reduce examples 1. So, what is it? A two phase process geared toward optimizing broad, widely distributed parallel computing platforms Apache Hadoop.
Joe Hummel, the compiler is at your service SDC Meetup, Sept 2014.
CPS 216: Advanced Database Systems Shivnath Babu.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Learn Hadoop and Big Data Technologies. Hadoop  An Open source framework that stores and processes Big Data in distributed manner on a large groups of.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Joe Hummel, the compiler is at your service Chicago Coder Conference, June 2016.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Apache Hadoop on Windows Azure Avkash Chauhan
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Big Data Anton Boyko. Agenda What is Big Data? Why Big Data? How to Big Data?
Day 11 Review. Where would I click to send an ?
Big Data & Test Automation
#SQLSat266.
Microsoft Office 2010 Basics and the Internet
Big Data-BI Fusion: Microsoft HDInsight & MS BI
CS 341 : Programming Languages
Vazi Okhandiar, MCT, PMP, MBA, MSCS SQLSaturday #497 . April 2nd, 2016
An Open Source Project Commonly Used for Processing Big Data Sets
CS122B: Projects in Databases and Web Applications Winter 2017
Hadoop Developer.
Map Reduce.
Hadoopla: Microsoft and the Hadoop Ecosystem
07 | Analyzing Big Data with Excel
The freedom to learn what you want, when you want, absolutely free!
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
MIT 802 Introduction to Data Platforms and Sources Lecture 2
This meme comes from South Park (S2E )
Server & Tools Business
The freedom to learn what you want, when you want, absolutely free!
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Department of Engineering Science EE 465 (CES 440) - Intro
Presentation transcript:

Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials: Materials:

 Map-Reduce is from functional programming Hadoop on Azure 2 // function returns 1 if i is prime, 0 if not: let isPrime(i) =... // sums 2 numbers: let sum(x, y) = return x + y // count the number of primes in 1..N: let countPrimes(N) = let L = [ 1.. N ] // [ 1, 2, 3, 4, 5, 6,... ] let T = map isPrime L // [ 0, 1, 1, 0, 1, 0,... ] let count = reduce sum T // 42 return count // function returns 1 if i is prime, 0 if not: let isPrime(i) =... // sums 2 numbers: let sum(x, y) = return x + y // count the number of primes in 1..N: let countPrimes(N) = let L = [ 1.. N ] // [ 1, 2, 3, 4, 5, 6,... ] let T = map isPrime L // [ 0, 1, 1, 0, 1, 0,... ] let count = reduce sum T // 42 return count

 Hadoop: ◦ Created by to drive internet search ◦ Parallelism ◦ Data partitioning ◦ Fault tolerance 3 BIG Data BIG Data page hits

 Freely-available framework for big data ◦  Based on concept of Map-Reduce: 4 BIG data BIG data Map Reduce R R map function reduce intermediate results......

5 Map Sort Reduce Merge [,, … ] [, … ] R R Data Map Sort Map Sort [,,, … ] [,, … ]

 We’ll be working with Chicago crime data… ◦ ◦ GB 5M rows 1 GB 5M rows

 Compute top-10 crimes… IUCR Count IUCR = Illinois Uniform Crime Codes Uniform-Crime-R/c7ck-438e IUCR = Illinois Uniform Crime Codes Uniform-Crime-R/c7ck-438e

 Hadoop on Azure… 8 Hadoop on Azure // Javascript version: var map = function (key, value, context) { var values = value.split(","); context.write(values[4], 1); }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; // Javascript version: var map = function (key, value, context) { var values = value.split(","); context.write(values[4], 1); }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); };

 Rich ecosystem around Hadoop ◦ Pig ◦ Hive ◦ HBASE ◦ … Hadoop on Azure 9 // interactive PIG with explicit Map-Reduce functions: pig.from("CC-from-2001.txt"). mapReduce("IUCR-Count.js", "IUCR, Count:long"). orderBy("Count DESC"). take(10). to("output-from-2001") // interactive PIG with explicit Map-Reduce functions: pig.from("CC-from-2001.txt"). mapReduce("IUCR-Count.js", "IUCR, Count:long"). orderBy("Count DESC"). take(10). to("output-from-2001") // interactive PIG without explicit Map-Reduce: schema = "ID,Case Number,Date,Block,IUCR,..." pig.from("CC-from-2001.txt", schema). groupBy("IUCR"). select("group, SUM($1.Count"). orderBy("Count DESC"). take(10). to("output-from-2001") // interactive PIG without explicit Map-Reduce: schema = "ID,Case Number,Date,Block,IUCR,..." pig.from("CC-from-2001.txt", schema). groupBy("IUCR"). select("group, SUM($1.Count"). orderBy("Count DESC"). take(10). to("output-from-2001")

 Microsoft is offering free access to Hadoop ◦ Request  Hadoop connector for Excel ◦ Process data using Hadoop, analyze/visualize using Excel Hadoop on Azure 10

 Freely-available plugin for Excel 2010 ◦  Turns Excel into an in-memory database ◦ More precisely, turns spreadsheet into an OLAP cube  Note: ◦ If you have 32-bit Excel, install 32-bit PowerPivot ◦ If you have 64-bit Excel, install 64-bit PowerPivot ◦ GBs of data will require 64-bit ◦ [ How to tell what version of Excel you have? File menu, help… ] Big Data Processing, Cheap 11

 PowerPivot… ◦ Install ◦ PowerPivot menu ◦ PowerPivot Window ◦ Get Data... ◦ PivotTable… 12 Big Data Processing, Cheap

15 Big Data Processing, Cheap

 Presenter: Joe Hummel ◦ ◦ Materials:  Keep an eye for final release of: ◦ Hadoop on Azure ◦ Hadoop on Windows ◦ PowerView plugin for Excel Big Data Processing, Cheap