Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data / Information / Knowledge Presentation by Pauline Lake Modifications by Rick Mercer Acknowledgment and Disclaimer: This presentation is supported.

Similar presentations


Presentation on theme: "Data / Information / Knowledge Presentation by Pauline Lake Modifications by Rick Mercer Acknowledgment and Disclaimer: This presentation is supported."— Presentation transcript:

1 Data / Information / Knowledge Presentation by Pauline Lake Modifications by Rick Mercer Acknowledgment and Disclaimer: This presentation is supported in part by the National Science Foundation under Grant 1240841. Any opinions, findings, and conclusions or recommendations expressed in these materials are those of the authors and do not necessarily reflect the views of the National Science Foundation.

2 Processing Large Data Sets Using Data Big Data and Mobile Computing Outline

3 Sort a Petabyte (10 15 ) bytes of data 10 15 = 10 3 x 10 3 x 10 3 x 10 3 x 10 3 bytes Quicksort assumes the data are in RAM 1 Petabyte would occupy 1,000 1-TB disk drives, or 10,000 100-GB drives Processing Large Data Sets

4 MapReduceMapReduce is a programming model for processing large data sets Distributed file system -- data sets are stored over many computers Parallel algorithm -- i.e., many identical processes running simultaneously MapReduce, developed at Google Hadoop, open source Apache versionHadoop The MapReduce Model

5 ● References: Petabyte Sort Blog (Quantcast Sort Blog)Petabyte Sort Blog Sorting Petabytes with Map Reduce (Google Research)Sorting Petabytes with Map Reduce MapReduce Experiment: Sort a Petabyte (10 18 bytes)

6 ProblemProblem: Count the occurrences of every word in a large set of documents, D 1, D 2, …, D N. D 1 : “a man, a plan, a canal, panama” D 2 : “in for a penny in for a pound” … D N :... MapReduce Example

7 Algorithm: Map Step: for each word, w, in D 1,...,D N, output the partial count (w, 1) Reduce: sum = 0 for each partial count, pc, produced by Map step sum = sum + pc MapReduce Example

8 ● Problem: Count the occurrences of every word in a set of documents Problem MapReduce Example Count the occurrences of every word in D 1, D 2, … D N. Map/Reduce System

9 ● Problem: Count the occurrences of every word in a set of documents Problem MapReduce Example Count the occurrences of every word in D 1, D 2, … D N. Map/Reduce System Master “a man, a plan, a canal, panama” “in for a penny in for a pound”

10 MapReduce Example Map/Reduce System Master Mapper 1 Mapper M “a man, a plan, a canal, panama” “in for a penny in for a pound” Count the occurrences of every word in D 1, D 2, … D N. ● Problem: Count the occurrences of every word in a set of documents Problem

11 MapReduce Example Map/Reduce System Master Mapper 1 Mapper M “a man, a plan, a canal, panama” “in for a penny in for a pound” (a,1),(a,1),(a,1),(a,1),(a,1) (for,1),(for,1 ) (in,1),(in,1) Count the occurrences of every word in D 1, D 2, … D N. ● Problem: Count the occurrences of every word in a set of documents Problem

12 MapReduce Example Map/Reduce System Master Mapper 1 Mapper M “a man, a plan, a canal, panama” “in for a penny in for a pound” (a,1),(a,1),(a,1),(a,1),(a,1) (for,1),(for,1 ) (in,1),(in,1) partial counts Count the occurrences of every word in D 1, D 2, … D N. ç ● Problem: Count the occurrences of every word in a set of documents Problem

13 ● Problem: Count the occurrences of every word in a set of documents Problem MapReduce Example Count the occurrences of every word in D 1, D 2, … D N. Map/Reduce System Master Mapper 1 Mapper M Reducer 1 Reducer 2 Reducer R “a man, a plan, a canal, panama” “in for a penny in for a pound” (a,1),(a,1),(a,1),(a,1),(a,1) Reducer 3 (for,1),(for,1 ) (in,1),(in,1)

14 ● Problem: Count the occurrences of every word in a set of documents Problem MapReduce Example Count the occurrences of every word in D 1, D 2, … D N. Map/Reduce System Master Mapper 1 Mapper M Reducer 1 Reducer 2 Reducer R “a man, a plan, a canal, panama” “in for a penny in for a pound” (a,1),(a,1),(a,1),(a,1),(a,1) (a,5) Reducer 3 (for,1),(for,1 ) (in,1),(in,1) (in,2) (for,5)(man,1),...

15 ● Problem: Count the occurrences of every word in a set of documents Problem MapReduce Example Count the occurrences of every word in D 1, D 2, … D N. Map/Reduce System Master Mapper 1 Mapper M Reducer 1 Reducer 2 Reducer R “a man, a plan, a canal, panama” “in for a penny in for a pound” (a,1),(a,1),(a,1),(a,1),(a,1) (a,5) Reducer 3 (for,1),(for,1 ) (in,1),(in,1) (in,2) (for,2)(man,1),... sum of partial counts

16 ● Problem: Count the occurrences of every word in a set of documents. Problem MapReduce Example (a,5), (for,2), (in,2), (man,1), (plan,1), (canal,1), (panama,1), (penny,1), (pound,1) Count the occurrences of every word in D 1, D 2, … D N. Map/Reduce System

17 Using Data

18 Big Data: Government Data ● 2012, Data.gov, 84 programs, six departments ○Benefit: helping government address problems ○Tradeoff: Government has too much data on us?

19 Big Data: Web Analytics ● Analytics discovery and use of meaningful patterns in data Analytics ○Benefit: Provide customers with targeted ads ○Tradeoff: Loss of privacy and anonymity of web search

20 Big Data: Data Mining ● Data Mining -- discovering patterns in large data sets. Data Mining ○Benefit: Discovering risk factors in medical data. ○Tradeoff: Can we keep patient medical data secure? Normal patients Diabetic patients

21 Data Visualization ● IBM chromogram of Wikipedia edits reveals known and new editing patternschromogram

22 Data Mining: Neonatal monitoring ● Data mining real-time data (heart rate, respiratory rate, O 2 satur- ation) provides a non-invasive way of predicting neonatal health ● Traditional approach: Apgar score: measure tone, cry, color, breathing, … scale of 1 through 9, at birth 5 minutes, 10 minutes

23 Big Data and Mobile Computing

24 Big Data and Mobile Google: Translate “Ciao mondo!”

25 Big Data and Mobile Google: Translate “Ciao mondo!” Map/Reduce (speech recognition)

26 Big Data and Mobile Google: Translate “Ciao mondo!” “Hello world!” Map/Reduce (speech recognition)

27 Big Data and Mobile Google: Translate “Ciao mondo!” “Hello world!” Map/Reduce (speech recognition) Benefit: Improves ability to learn foreign language Tradeoff: Google knows what we’re thinking about

28 Big Data and Mobile Google: Augment reality

29 Big Data and Mobile Google: Augment reality Map/Reduce

30 Big Data and Mobile Google: Augment reality Map/Reduce

31 Big Data and Mobile Google: Augment reality Map/Reduce Benefit: Better awareness of what’s around us. Tradeoff: Google knows where we are, what we’re thinking.

32 ● The Digital era involves Large Data Sets ● Presents challenges and opportunities. ● Requires new processing and visualization techniques ● Comes with the promise of benefits ● Comes with tradeoffs in terms of privacy and security Summary


Download ppt "Data / Information / Knowledge Presentation by Pauline Lake Modifications by Rick Mercer Acknowledgment and Disclaimer: This presentation is supported."

Similar presentations


Ads by Google