Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Big Data. World Cup soccer 2014.07.05 (Money Today) : IoT + Bigdata German soccer Team.

Similar presentations


Presentation on theme: "Introduction to Big Data. World Cup soccer 2014.07.05 (Money Today) : IoT + Bigdata German soccer Team."— Presentation transcript:

1 Introduction to Big Data

2 World Cup soccer 2014.07.05 (Money Today) : IoT + Bigdata German soccer Team

3 What is big data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

4 Big Data is Every Where! Lots of data is being collected and warehoused –Web data, e-commerce –purchases at department/ grocery stores –Bank/Credit Card transactions –Social Network

5

6 How much data? Google processes 20 PB a day (2008) Wayback Machine has 3 PB + 100 TB/month (3/20 09) Facebook has 2.5 PB of user data + 15 TB/day (4/ 2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009 ) 640K ought to be en ough for anybody.

7 What does big data do?

8 Government In 2012, the Obama administration announced the Big Data Research and Development Initiative, which explored how big data could be used to address important problems faced by the government.The initiative was composed of 84 different big data programs spread across six departments.Obama administration Big data analysis played a large role in Barack Obama's successful 2012 re-election campaign.Barack Obama2012 re-election campaign The United States Federal Government owns six of the ten most powerful supercomputers in the world.United States Federal Government The Utah Data Center is a data center currently being constructed by the United States National Security Agency. When finished, the facility will be able to handle yottabytes of information collected by the NSA over the Internet.Utah Data CenterUnited StatesNational Security Agencyyottabytes

9 Business Amazon.com handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based and as of 2005 they had the world’s three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB.Amazon.com Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress.WalmartLibrary of Congress Facebook handles 50 billion photos from its user base. FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide.FICO The volume of business data worldwide, across all companies, doubles every 1.2 years, according to estimates. Windermere Real Estate uses anonymous GPS signals from nearly 100 million drivers to help new home buyers determine their typical drive times to and from work throughout various times of the day.Windermere Real Estate

10 Examples of free big data use sites Google trends Google flue Google correlate Social metrics insight

11 Bigdata in google trend

12 Movement of carts: Product display Bigdata case 12

13 Wild Fire in Korea(1991 – 2011 ) 13

14 Google Flue Service 14

15 Find Location for your business busienss 15

16 Crime Mapping in Sanfrancisco : 71% accuracy 16

17 Similar names for bigdata: Data sciences Business analytics Data analytics Data mining business intelligence Machine Learning

18

19

20 Case 1: A case on bigdata analysis MBA (Market Basket Analysis)

21 1). POS Data (1000 data) bananas plums, lettuce, tomatoes celery, bean bean apples, carrots, tomatoes, potatoes potatoes bean carrots bean apples, oranges, lettuce, tomatoes peaches, oranges, celery, potatoes, bean beans oranges, lettuce, carrots, tomatoes apples, bananas, plums, carrots, tomatoes, onions, bean apples, potatoes lettuce, peas, beans.

22 2). Association Rules as Output (Model) Only 55 rules satisfy the specified constraints. tomatoes -> lettuce [Coverage=0.263 (263); Support=0.111 (111); Strength=0.422; Lift=1.94; Leverage=0.0539 (53.9); p=2.35E-019] lettuce -> tomatoes [Coverage=0.217 (217); Support=0.111 (111); Strength=0.512; Lift=1.94; Leverage=0.0539 (53.9); p=2.35E-019] tomatoes -> carrots [Coverage=0.263 (263); Support=0.085 (85); Strength=0.323; Lift=1.85; Leverage=0.0390 (39.0); p=1.83E-012] carrots -> tomatoes [Coverage=0.175 (175); Support=0.085 (85); Strength=0.486; Lift=1.85; Leverage=0.0390 (39.0); p=1.83E-012].

23 3). Graphic Representation

24 Relationship graph when the link is set to 0

25 Association Rule : Relationship graph when the link is set to 0 Graphic Representations of Association Rules

26 링크의 강도를 6 으로 설정했을 경우의 그래프 Relationship graph when the distance is set by value - network form

27 Application of MBA : product recommendation system

28 Case 2: SNS analysis

29 Social Network (http ://nexus.ludios.net/view/demo)

30 Analysis of Human Relations (NodeXL)

31 Friends Networks

32 Case 3. Bankruptcy Prediction The yearly financial data collected by the Korea Credit Guarantee Fund. The data consist of 944 bankrupted corporations and 944 healthy (non- bankrupted) corporations from the fiscal year 1999 to 2002. 32

33 List of financial variables selected VariableDefinition X13: interest expenses to sales (interest expenses / sales)  100 X17:profit to sales (profit / sales)  100 X24:operating profit to sales (operating profit / sales)  100 X27:ordinary profit to total capital (ordinary profit / total capital)  100 X28:current liabilities to total capital (current liabilities / total capital)  100 X103:growth rate of tangible assets (tangible assets at the end of the year / tangible assets at the beginni ng of the  100)  100 X108: turnover of managerial assets sales / {total assets  (construction in progress + investment assets)} net financing cost interest expenses  interest incomes X127: net working capital to total capital {(current assets  current liabilities) / total capital}  100 X129:growth rate of current assets (current assets at the end of the year / current assets at the beginnin g of the year  100)  100 X140:ordinary income to net worth (ordinary income / net worth)  100 33

34 Decision Tree Analysis 34

35 Case 4. Income Prediction For our study we selected the United States Census (5%) 1990 Public Use Microsample data (Census 1990). This data, which was divided into 18 files, contained the entire 5% sample made public domain from the 1990 U.S. Census in STATA 6.0 format. Combined, these 18 files included about 4.5 million males and 5 million females, totaling to 9.1 million records. Census 1990 - http://www.macalester.edu/econdata/United_State s/pums.html http://www.macalester.edu/econdata/United_State s/pums.html 35

36 Data Sampling we converted the 18 data files into flat files; then, using Java code, we merged these 18 flat files into a singe file consisting of 9.1 million records with 85 variables (approximately 1.5 GB in size). 36

37 Algorithm Analogy of Discovering the Complete Set of Rules (Drawing the Perfect Picture via Coin Scrubbing) 37

38 The Repetitive Methodology of Merging New Rules into the Domain Knowledge Base 38

39 The Relationship Between IRA’s Accuracy Level and Number of Iterations for This Study 39

40 Performance Comparison CHAIDCARTANNLRDASee5 This st udy Tool UsedAnswer Tree (SPSS) Answer Tree (SPSS) Neural Conn ection (SPSS ) SPSS See5 (with default rul e) IRA Training Sa mple size 3.24m 10000300k Accuracy (2/3-1/3) 80.19580.30RBF:76.12 MLP 80.68 81.178.382.382.7 40

41 Mining tools Enterprise Miner (SAS) Clementine (SPSS) R Python Many visualisation tools: Infographics etc Rapid miner Hadoop Rhive

42 Future direction of bigdata

43 bigdata 2013 bigdata 2014

44 Google glass Mashup, bigdata, visualisation -> analysis of commerce area

45 IoT Key: Smart & Intelligence

46 3D Printer Healthy food, organ, face recommended?

47 Evolution of bigdata

48 cup

49

50

51

52

53

54

55

56

57

58 Cup with Art

59

60 Cup with emotion

61

62

63 Cup without cup


Download ppt "Introduction to Big Data. World Cup soccer 2014.07.05 (Money Today) : IoT + Bigdata German soccer Team."

Similar presentations


Ads by Google