Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials:

Similar presentations


Presentation on theme: "Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials:"— Presentation transcript:

1 Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials: http://www.joehummel.net/downloads.html Email: joe@joehummel.net Materials: http://www.joehummel.net/downloads.html Email: joe@joehummel.net

2  Map-Reduce is from functional programming Hadoop on Azure 2 // function returns 1 if i is prime, 0 if not: let isPrime(i) =... // sums 2 numbers: let sum(x, y) = return x + y // count the number of primes in 1..N: let countPrimes(N) = let L = [ 1.. N ] // [ 1, 2, 3, 4, 5, 6,... ] let T = map isPrime L // [ 0, 1, 1, 0, 1, 0,... ] let count = reduce sum T // 42 return count // function returns 1 if i is prime, 0 if not: let isPrime(i) =... // sums 2 numbers: let sum(x, y) = return x + y // count the number of primes in 1..N: let countPrimes(N) = let L = [ 1.. N ] // [ 1, 2, 3, 4, 5, 6,... ] let T = map isPrime L // [ 0, 1, 1, 0, 1, 0,... ] let count = reduce sum T // 42 return count

3  Hadoop: ◦ Created by to drive internet search ◦ Parallelism ◦ Data partitioning ◦ Fault tolerance 3 BIG Data BIG Data page hits

4  Freely-available framework for big data ◦ http://hadoop.apache.org/ http://hadoop.apache.org/  Based on concept of Map-Reduce: 4 BIG data BIG data Map...... Reduce R R map function reduce intermediate results......

5 5 Map Sort Reduce Merge [,, … ] [, … ] R R Data Map Sort Map Sort [,,, … ] [,, … ]

6  We’ll be working with Chicago crime data… ◦ https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 ◦ http://www.cityofchicago.org/city/en/narr/foia/CityData.html 6 1 GB 5M rows 1 GB 5M rows

7  Compute top-10 crimes… 7 0486 366903 0820 308074. 0890 166916 0486 366903 0820 308074. 0890 166916 IUCR Count IUCR = Illinois Uniform Crime Codes https://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois- Uniform-Crime-R/c7ck-438e IUCR = Illinois Uniform Crime Codes https://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois- Uniform-Crime-R/c7ck-438e

8  Hadoop on Azure… 8 Hadoop on Azure // Javascript version: var map = function (key, value, context) { var values = value.split(","); context.write(values[4], 1); }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; // Javascript version: var map = function (key, value, context) { var values = value.split(","); context.write(values[4], 1); }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; 0486 366903 0820 308074. 0486 366903 0820 308074.

9  Rich ecosystem around Hadoop ◦ Pig ◦ Hive ◦ HBASE ◦ … Hadoop on Azure 9 // interactive PIG with explicit Map-Reduce functions: pig.from("CC-from-2001.txt"). mapReduce("IUCR-Count.js", "IUCR, Count:long"). orderBy("Count DESC"). take(10). to("output-from-2001") // interactive PIG with explicit Map-Reduce functions: pig.from("CC-from-2001.txt"). mapReduce("IUCR-Count.js", "IUCR, Count:long"). orderBy("Count DESC"). take(10). to("output-from-2001") // interactive PIG without explicit Map-Reduce: schema = "ID,Case Number,Date,Block,IUCR,..." pig.from("CC-from-2001.txt", schema). groupBy("IUCR"). select("group, SUM($1.Count"). orderBy("Count DESC"). take(10). to("output-from-2001") // interactive PIG without explicit Map-Reduce: schema = "ID,Case Number,Date,Block,IUCR,..." pig.from("CC-from-2001.txt", schema). groupBy("IUCR"). select("group, SUM($1.Count"). orderBy("Count DESC"). take(10). to("output-from-2001")

10  Microsoft is offering free access to Hadoop ◦ Request invitation @ http://www.hadooponazure.com/http://www.hadooponazure.com/  Hadoop connector for Excel ◦ Process data using Hadoop, analyze/visualize using Excel Hadoop on Azure 10

11  Freely-available plugin for Excel 2010 ◦ http://www.powerpivot.com/ http://www.powerpivot.com/  Turns Excel into an in-memory database ◦ More precisely, turns spreadsheet into an OLAP cube  Note: ◦ If you have 32-bit Excel, install 32-bit PowerPivot ◦ If you have 64-bit Excel, install 64-bit PowerPivot ◦ GBs of data will require 64-bit ◦ [ How to tell what version of Excel you have? File menu, help… ] Big Data Processing, Cheap 11

12  PowerPivot… ◦ Install ◦ PowerPivot menu ◦ PowerPivot Window ◦ Get Data... ◦ PivotTable… 12 Big Data Processing, Cheap

13

14

15 15 Big Data Processing, Cheap

16  Presenter: Joe Hummel ◦ Email: joe@joehummel.net ◦ Materials: http://www.joehummel.net/downloads.html  Keep an eye for final release of: ◦ Hadoop on Azure ◦ Hadoop on Windows ◦ PowerView plugin for Excel 2013 16 Big Data Processing, Cheap


Download ppt "Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials:"

Similar presentations


Ads by Google