Big Data.

Slides:



Advertisements
Similar presentations
Chapter 9 Business Intelligence Systems
Advertisements

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining By Archana Ketkar.
Metodi Quantitativi per Economia, Finanza e Management Lezione n°2.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Business Intelligence: Essential of Business
Data Mining: A Closer Look
Big Data A big step towards innovation, competition and productivity.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Basic Concepts in Big Data
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Introduction to Big Data. World Cup soccer (Money Today) : IoT + Bigdata German soccer Team.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Lecture 10 (big data) Knowledge Induction using association rule and decision tree (Understanding customer behavior Using data mining skills)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
Data Mining: Introduction
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
MIS 451 Building Business Intelligence Systems
Introduction C.Eng 714 Spring 2010.
I. Association Market Basket Analysis.
Big Data.
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
Analysis of Customer Behavior and Service Modeling
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
(Understanding customer behavior Using data mining skills)
I. Association Market Basket Analysis.
Data Mining: Introduction
Understanding Customer Behaviors with Information Technologies
Data Warehousing Data Mining Privacy
Association Rues Analysis .Event A -> Event ?
Data Mining: Concepts and Techniques
Big DATA.
Data Analysis and R : Technology & Opportunity
Analysis of Customer Behavior and Service Modeling
Analysis of Customer Behavior and Service Modeling
Presentation transcript:

Big Data

World Cup Soccer German soccer Team 2014.07.05 : IoT + Bigdata

What is big data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

Big Data is Every Where! Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Social Network

What does big data do?

The most popular big data application program is HADOOP: Time of Big Data What is Big Data? http://www.youtube.com/watch?v=7D1CQ_LOizA The most popular big data application program is HADOOP: What is HADOOP? http://www.youtube.com/watch?v=9s-vSeWej1U

Evolution of Names Artificial Intelligence Machine Learning Business Intelligence Data mining Big Data/Data Sciences

What Is Data Mining? Data mining (knowledge discovery in databases): A process of identifying hidden patterns and relationships within data (Groth) Data mining: Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases

DM and Business Decision Support Database Marketing Target marketing Customer relationship management Credit Risk Management Credit scoring Fraud Detection Healthcare Informatics Clinical decision support

Data Mining: A KDD Process Knowledge Pattern Evaluation Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

A mining software: SAS Enterprise Miner (EM) Clementine for SPSS R Python

Government In 2012, the Obama administration announced the Big Data Research and Development Initiative, which explored how big data could be used to address important problems faced by the government. The initiative was composed of 84 different big data programs spread across six departments. Big data analysis played a large role in Barack Obama's successful 2012 re-election campaign. The United States Federal Government owns six of the ten most powerful supercomputers in the world. The Utah Data Center is a data center currently being constructed by the United States National Security Agency. When finished, the facility will be able to handle yottabytes of information collected by the NSA over the Internet.

Business Amazon.com handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based and as of 2005 they had the world’s three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB. Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 50 billion photos from its user base. FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. The volume of business data worldwide, across all companies, doubles every 1.2 years, according to estimates. Windermere Real Estate uses anonymous GPS signals from nearly 100 million drivers to help new home buyers determine their typical drive times to and from work throughout various times of the day.

Bigdata in google trend

Bigdata case Movement of carts: Product display 16

Wild Fire in Korea(1991 – 2011) 17

Google Flue Service 18

Find Location for your business busienss 19

Crime Mapping in Sanfrancisco : 71% accuracy 20

Evolution of bigdata Artificial Intelligence Data mining Business Intelligence Bigdata Business Analytics Data Sciences

Future direction of bigdata

bigdata 2013 bigdata 2014

Google glass Mashup, bigdata, visualisation -> analysis of commerce area

IoT Key: Smart & Intelligence

3D Printer Healthy food, organ, face recommended?

(Association Rule Analysis) A Case on Bigdata (Association Rule Analysis)

Association Rues Analysis As an Example of Data mining Tool: Market Basket Analysis

What Is Association Mining? Association rule mining: Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Applications: Market basket analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. Examples: Rule form: “Body ® Head [support, confidence]” buys(x, “cookie”) ® buys(x, “milk”) [0.5%, 60%]

Support and Confidence Percent of samples contain both A and B support(A  B) = P(A ∩ B) Confidence Percent of A samples also containing B confidence(A  B) = P(B|A) Example Sliced pork  lattuce [support = 2%, confidence = 60%]

A store selling fruits and vegetables Which items are sold together frequently?

An Example of Market Basket(1) There are 8 transactions on three items on A (Apple), B (Banana) , C (Carrot). Check associations for below two cases. (1) A (apple) B(banana) # Basket 1 A 2 B 3 C 4 A, B 5 A, C 6 B, C 7 A, B, C 8

An Example of Market Basket(1(2) Basic probabilities are below: (1) AB Coverage 5/8 = 0.625 Support P(A∩B) = 3/8 = 0.375 Confidence P(B|A)=3/5=0.6 Lift P(A∩B) P(A)*P(B) 0.375/(0.625*0.625)=0.375/0.39=0.0.96 Leverage P(A∩B) - P(A)*P(B) =0.375 - 0.39 = -0.015

Lift What are good association rules? (How to interpret them?) If lift is close to 1, it means there is no association between two items (sets). If lift is greater than 1, it means there is a positive association between two items (sets). If lift is less than 1, it means there is a negative association between two items (sets).

Leverage Leverage = P(A∩B) - P(A)*P(B) , it has three types ① Two items (sets) are positively associated ② Two items (sets) are independent ③Two items (sets) are negatively associated

Lab on Association Rules(1) SAS Enterprise Miner or SPSS Clementine have association rules softwares. For this exercise, however, we uses Magnum Opus. download Magnum Opus evaluation version ( click)

After you install the problem, you can see below initial screen After you install the problem, you can see below initial screen. From menu, choose File – Import Data (Ctrl – O).

Demo Data sets are already there Demo Data sets are already there. Magnum Opus has two types of data sets available: (transaction data: *.idi, *.itl) and (attribute-value data: *.data, *.nam) Data format has below two types:(*.idi, *.itl). idi (identifier-item file) itl (item list file) 001, apples 001, oranges 001, bananas 002, apples 002, carrots 002, lettuce 002, tomatoes apples, oranges, bananas apples, carrots, lettuce, tomatoes

If you open tutorial.idi using note pad, you can see the file inside as left. The example left has 5 transactions (baskets)

File – Import Data, or click . click Tutorial.idi Check Identifier – item file and click Next >.

Set things as they are. Click GO Search by: LIFT Minimum lift: 1 Maximum no. of rules: 10 Click GO

Results are saved in tutorial.out file. Below is an example of rule derived: tomatoes -> lettuce [Coverage=0.263 (263); Support=0.111 (111); Strength=0.422; Lift=1.94; Leverage=0.0539 (53.9); p=2.35E-019]

Output from association rule analysis Only 55 rules satisfy the specified constraints. tomatoes -> lettuce [Coverage=0.263 (263); Support=0.111 (111); Strength=0.422; Lift=1.94; Leverage=0.0539 (53.9); p=2.35E-019] lettuce -> tomatoes [Coverage=0.217 (217); Support=0.111 (111); Strength=0.512; Lift=1.94; Leverage=0.0539 (53.9); p=2.35E-019] tomatoes -> carrots [Coverage=0.263 (263); Support=0.085 (85); Strength=0.323; Lift=1.85; Leverage=0.0390 (39.0); p=1.83E-012] carrots -> tomatoes [Coverage=0.175 (175); Support=0.085 (85); Strength=0.486; Lift=1.85; Leverage=0.0390 (39.0); p=1.83E-012] onions -> potatoes [Coverage=0.189 (189); Support=0.082 (82); Strength=0.434; Lift=1.53; Leverage=0.0285 (28.5); p=5.30E-007] potatoes -> onions [Coverage=0.283 (283); Support=0.082 (82); Strength=0.290; Lift=1.53; Leverage=0.0285 (28.5); p=5.30E-007] lettuce & carrots -> tomatoes [Coverage=0.045 (45); Support=0.039 (39); Strength=0.867; Lift=3.30; Leverage=0.0272 (27.2); p=3.16E-008]