Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data.

Similar presentations


Presentation on theme: "Big Data."— Presentation transcript:

1 Big Data

2 World Cup Soccer German soccer Team : IoT + Bigdata

3 What is big data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

4 Big Data is Every Where! Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Social Network

5

6 What does big data do?

7 The most popular big data application program is HADOOP:
Time of Big Data What is Big Data? The most popular big data application program is HADOOP: What is HADOOP?

8 Evolution of Names Artificial Intelligence Machine Learning Business Intelligence Data mining Big Data/Data Sciences

9 What Is Data Mining? Data mining (knowledge discovery in databases):
A process of identifying hidden patterns and relationships within data (Groth) Data mining: Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases

10 DM and Business Decision Support
Database Marketing Target marketing Customer relationship management Credit Risk Management Credit scoring Fraud Detection Healthcare Informatics Clinical decision support

11 Data Mining: A KDD Process
Knowledge Pattern Evaluation Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

12 A mining software: SAS Enterprise Miner (EM) Clementine for SPSS R
Python

13 Government In 2012, the Obama administration announced the Big Data Research and Development Initiative, which explored how big data could be used to address important problems faced by the government. The initiative was composed of 84 different big data programs spread across six departments. Big data analysis played a large role in Barack Obama's successful re-election campaign. The United States Federal Government owns six of the ten most powerful supercomputers in the world. The Utah Data Center is a data center currently being constructed by the United States National Security Agency. When finished, the facility will be able to handle yottabytes of information collected by the NSA over the Internet.

14 Business Amazon.com handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based and as of 2005 they had the world’s three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB. Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 50 billion photos from its user base. FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. The volume of business data worldwide, across all companies, doubles every 1.2 years, according to estimates. Windermere Real Estate uses anonymous GPS signals from nearly 100 million drivers to help new home buyers determine their typical drive times to and from work throughout various times of the day.

15 Bigdata in google trend

16 Bigdata case Movement of carts: Product display 16

17 Wild Fire in Korea(1991 – 2011) 17

18 Google Flue Service 18

19 Find Location for your business busienss
19

20 Crime Mapping in Sanfrancisco : 71% accuracy
20

21 Evolution of bigdata Artificial Intelligence Data mining
Business Intelligence Bigdata Business Analytics Data Sciences

22

23 Future direction of bigdata

24 bigdata 2013 bigdata 2014

25 Google glass Mashup, bigdata, visualisation
-> analysis of commerce area

26 IoT Key: Smart & Intelligence

27 3D Printer Healthy food, organ, face recommended?

28 (Association Rule Analysis)
A Case on Bigdata (Association Rule Analysis)

29 Association Rues Analysis As an Example of Data mining Tool:
Market Basket Analysis

30 What Is Association Mining?
Association rule mining: Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Applications: Market basket analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. Examples: Rule form: “Body ® Head [support, confidence]” buys(x, “cookie”) ® buys(x, “milk”) [0.5%, 60%]

31 Support and Confidence
Percent of samples contain both A and B support(A  B) = P(A ∩ B) Confidence Percent of A samples also containing B confidence(A  B) = P(B|A) Example Sliced pork  lattuce [support = 2%, confidence = 60%]

32 A store selling fruits and vegetables
Which items are sold together frequently?

33 An Example of Market Basket(1)
There are 8 transactions on three items on A (Apple), B (Banana) , C (Carrot). Check associations for below two cases. (1) A (apple) B(banana) # Basket 1 A 2 B 3 C 4 A, B 5 A, C 6 B, C 7 A, B, C 8

34 An Example of Market Basket(1(2)
Basic probabilities are below: (1) AB Coverage 5/8 = 0.625 Support P(A∩B) = 3/8 = 0.375 Confidence P(B|A)=3/5=0.6 Lift P(A∩B) P(A)*P(B) /(0.625*0.625)=0.375/0.39=0.0.96 Leverage P(A∩B) - P(A)*P(B) = =

35 Lift What are good association rules? (How to interpret them?)
If lift is close to 1, it means there is no association between two items (sets). If lift is greater than 1, it means there is a positive association between two items (sets). If lift is less than 1, it means there is a negative association between two items (sets).

36 Leverage Leverage = P(A∩B) - P(A)*P(B) , it has three types
① Two items (sets) are positively associated ② Two items (sets) are independent ③Two items (sets) are negatively associated

37 Lab on Association Rules(1)
SAS Enterprise Miner or SPSS Clementine have association rules softwares. For this exercise, however, we uses Magnum Opus. download Magnum Opus evaluation version ( click)

38 After you install the problem, you can see below initial screen
After you install the problem, you can see below initial screen. From menu, choose File – Import Data (Ctrl – O).

39 Demo Data sets are already there
Demo Data sets are already there. Magnum Opus has two types of data sets available: (transaction data: *.idi, *.itl) and (attribute-value data: *.data, *.nam) Data format has below two types:(*.idi, *.itl). idi (identifier-item file) itl (item list file) 001, apples 001, oranges 001, bananas 002, apples 002, carrots 002, lettuce 002, tomatoes apples, oranges, bananas apples, carrots, lettuce, tomatoes

40 If you open tutorial.idi using note pad, you can see the file inside as left.
The example left has 5 transactions (baskets)

41 File – Import Data, or click . click Tutorial.idi
Check Identifier – item file and click Next >.

42 Set things as they are. Click GO Search by: LIFT Minimum lift: 1
Maximum no. of rules: 10 Click GO

43 Results are saved in tutorial.out file.
Below is an example of rule derived: tomatoes -> lettuce [Coverage=0.263 (263); Support=0.111 (111); Strength=0.422; Lift=1.94; Leverage= (53.9); p=2.35E-019]

44 Output from association rule analysis
Only 55 rules satisfy the specified constraints. tomatoes -> lettuce [Coverage=0.263 (263); Support=0.111 (111); Strength=0.422; Lift=1.94; Leverage= (53.9); p=2.35E-019] lettuce -> tomatoes [Coverage=0.217 (217); Support=0.111 (111); Strength=0.512; Lift=1.94; Leverage= (53.9); p=2.35E-019] tomatoes -> carrots [Coverage=0.263 (263); Support=0.085 (85); Strength=0.323; Lift=1.85; Leverage= (39.0); p=1.83E-012] carrots -> tomatoes [Coverage=0.175 (175); Support=0.085 (85); Strength=0.486; Lift=1.85; Leverage= (39.0); p=1.83E-012] onions -> potatoes [Coverage=0.189 (189); Support=0.082 (82); Strength=0.434; Lift=1.53; Leverage= (28.5); p=5.30E-007] potatoes -> onions [Coverage=0.283 (283); Support=0.082 (82); Strength=0.290; Lift=1.53; Leverage= (28.5); p=5.30E-007] lettuce & carrots -> tomatoes [Coverage=0.045 (45); Support=0.039 (39); Strength=0.867; Lift=3.30; Leverage= (27.2); p=3.16E-008]


Download ppt "Big Data."

Similar presentations


Ads by Google