Introduction to Big Data. World Cup soccer 2014.07.05 (Money Today) : IoT + Bigdata German soccer Team.

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
Role of ICT in Business ICT in a business environment can be used for:
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Chapter 3 Database Management
Copyright ©2004 Pearson Education, Inc. All rights reserved. Chapter 14 Stock Analysis and Valuation.
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
McGraw-Hill © 2008 The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES.
Chapter 14 The Second Component: The Database.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Query, Analysis and Reporting Tools Brian BALSER Lamia BENKIRANE Jeralyn PASINABO Dave WILSON MBA 664 April, the 13 th, 2009.
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
Big Data A big step towards innovation, competition and productivity.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Basic Concepts in Big Data
Chapter 16 Building the Data Mining Environment. 2 The Ideal Customer-Centric Organization Customer is king (not pauper) For B2C (business to consumer)
Data Mining Techniques
Check the Vital Signs. Why Invest? Possibility of high returns Learn about companies and the people and products behind them Share on companies and products.
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
SESSION OBJECTIVES At the end of this session participants should be able to:  Understand the FIO model  Understand the process of value creation  Identify.
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
1 Business Administrators of today and tomorrow need, along with their business knowledge, analytic insight and understanding, as well the ability.
Cases on Association Rules: Association Rules Graphs and Cases.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Lecture 3 Strategic E-Marketing Instructor: Hanniya Abid
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Chapter 3 Databases and Data Warehouses: Building Business Intelligence Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
IHG Cash flow statement. Cash flow statement- operations.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Organizing Data and Information
MIS2502: Data Analytics Advanced Analytics - Introduction.
Big Data: Electronic Gold And why Oreus should invest in Big Data Thomas Snuverink.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Why BI….? Most companies collect a large amount of data from their business operations. To keep track of that information, a business and would need to.
Chapter 13 The Management of Information and Knowledge for Better Decisions.
Customer Relationship Management (CRM) Chapter 4 Customer Portfolio Analysis Learning Objectives Why customer portfolio analysis is necessary for CRM implementation.
WORKING CAPITAL TERMINOLOGY Calculating the Targeted CCC.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
Financial Statements, Forecasts, and Planning
1 Chapter The Impact of Database Customer centric approach - A highly personal approach Marketing databases are essential to the marketing process.
Machine Learning. Definition Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Intro.
Decision Support Systems
MIS2502: Data Analytics Advanced Analytics - Introduction
Types of information systems in organizations and its characteristics
MIS5101: Data Analytics Advanced Analytics - Introduction
Big Data.
Big Data.
Supporting End-User Access
Course Introduction CSC 576: Data Mining.
Data Mining: Introduction
Chapter 3 Database Management
6.01 Vocabulary.
Kenneth C. Laudon & Jane P. Laudon
Welcome! Knowledge Discovery and Data Mining
Data Analysis and R : Technology & Opportunity
Analysis of Customer Behavior and Service Modeling
Analysis of Customer Behavior and Service Modeling
Presentation transcript:

Introduction to Big Data

World Cup soccer (Money Today) : IoT + Bigdata German soccer Team

What is big data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

Big Data is Every Where! Lots of data is being collected and warehoused –Web data, e-commerce –purchases at department/ grocery stores –Bank/Credit Card transactions –Social Network

How much data? Google processes 20 PB a day (2008) Wayback Machine has 3 PB TB/month (3/20 09) Facebook has 2.5 PB of user data + 15 TB/day (4/ 2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009 ) 640K ought to be en ough for anybody.

What does big data do?

Government In 2012, the Obama administration announced the Big Data Research and Development Initiative, which explored how big data could be used to address important problems faced by the government.The initiative was composed of 84 different big data programs spread across six departments.Obama administration Big data analysis played a large role in Barack Obama's successful 2012 re-election campaign.Barack Obama2012 re-election campaign The United States Federal Government owns six of the ten most powerful supercomputers in the world.United States Federal Government The Utah Data Center is a data center currently being constructed by the United States National Security Agency. When finished, the facility will be able to handle yottabytes of information collected by the NSA over the Internet.Utah Data CenterUnited StatesNational Security Agencyyottabytes

Business Amazon.com handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based and as of 2005 they had the world’s three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB.Amazon.com Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress.WalmartLibrary of Congress Facebook handles 50 billion photos from its user base. FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide.FICO The volume of business data worldwide, across all companies, doubles every 1.2 years, according to estimates. Windermere Real Estate uses anonymous GPS signals from nearly 100 million drivers to help new home buyers determine their typical drive times to and from work throughout various times of the day.Windermere Real Estate

Examples of free big data use sites Google trends Google flue Google correlate Social metrics insight

Bigdata in google trend

Movement of carts: Product display Bigdata case 12

Wild Fire in Korea(1991 – 2011 ) 13

Google Flue Service 14

Find Location for your business busienss 15

Crime Mapping in Sanfrancisco : 71% accuracy 16

Similar names for bigdata: Data sciences Business analytics Data analytics Data mining business intelligence Machine Learning

Case 1: A case on bigdata analysis MBA (Market Basket Analysis)

1). POS Data (1000 data) bananas plums, lettuce, tomatoes celery, bean bean apples, carrots, tomatoes, potatoes potatoes bean carrots bean apples, oranges, lettuce, tomatoes peaches, oranges, celery, potatoes, bean beans oranges, lettuce, carrots, tomatoes apples, bananas, plums, carrots, tomatoes, onions, bean apples, potatoes lettuce, peas, beans.

2). Association Rules as Output (Model) Only 55 rules satisfy the specified constraints. tomatoes -> lettuce [Coverage=0.263 (263); Support=0.111 (111); Strength=0.422; Lift=1.94; Leverage= (53.9); p=2.35E-019] lettuce -> tomatoes [Coverage=0.217 (217); Support=0.111 (111); Strength=0.512; Lift=1.94; Leverage= (53.9); p=2.35E-019] tomatoes -> carrots [Coverage=0.263 (263); Support=0.085 (85); Strength=0.323; Lift=1.85; Leverage= (39.0); p=1.83E-012] carrots -> tomatoes [Coverage=0.175 (175); Support=0.085 (85); Strength=0.486; Lift=1.85; Leverage= (39.0); p=1.83E-012].

3). Graphic Representation

Relationship graph when the link is set to 0

Association Rule : Relationship graph when the link is set to 0 Graphic Representations of Association Rules

링크의 강도를 6 으로 설정했을 경우의 그래프 Relationship graph when the distance is set by value - network form

Application of MBA : product recommendation system

Case 2: SNS analysis

Social Network (http ://nexus.ludios.net/view/demo)

Analysis of Human Relations (NodeXL)

Friends Networks

Case 3. Bankruptcy Prediction The yearly financial data collected by the Korea Credit Guarantee Fund. The data consist of 944 bankrupted corporations and 944 healthy (non- bankrupted) corporations from the fiscal year 1999 to

List of financial variables selected VariableDefinition X13: interest expenses to sales (interest expenses / sales)  100 X17:profit to sales (profit / sales)  100 X24:operating profit to sales (operating profit / sales)  100 X27:ordinary profit to total capital (ordinary profit / total capital)  100 X28:current liabilities to total capital (current liabilities / total capital)  100 X103:growth rate of tangible assets (tangible assets at the end of the year / tangible assets at the beginni ng of the  100)  100 X108: turnover of managerial assets sales / {total assets  (construction in progress + investment assets)} net financing cost interest expenses  interest incomes X127: net working capital to total capital {(current assets  current liabilities) / total capital}  100 X129:growth rate of current assets (current assets at the end of the year / current assets at the beginnin g of the year  100)  100 X140:ordinary income to net worth (ordinary income / net worth) 

Decision Tree Analysis 34

Case 4. Income Prediction For our study we selected the United States Census (5%) 1990 Public Use Microsample data (Census 1990). This data, which was divided into 18 files, contained the entire 5% sample made public domain from the 1990 U.S. Census in STATA 6.0 format. Combined, these 18 files included about 4.5 million males and 5 million females, totaling to 9.1 million records. Census s/pums.html s/pums.html 35

Data Sampling we converted the 18 data files into flat files; then, using Java code, we merged these 18 flat files into a singe file consisting of 9.1 million records with 85 variables (approximately 1.5 GB in size). 36

Algorithm Analogy of Discovering the Complete Set of Rules (Drawing the Perfect Picture via Coin Scrubbing) 37

The Repetitive Methodology of Merging New Rules into the Domain Knowledge Base 38

The Relationship Between IRA’s Accuracy Level and Number of Iterations for This Study 39

Performance Comparison CHAIDCARTANNLRDASee5 This st udy Tool UsedAnswer Tree (SPSS) Answer Tree (SPSS) Neural Conn ection (SPSS ) SPSS See5 (with default rul e) IRA Training Sa mple size 3.24m k Accuracy (2/3-1/3) RBF:76.12 MLP

Mining tools Enterprise Miner (SAS) Clementine (SPSS) R Python Many visualisation tools: Infographics etc Rapid miner Hadoop Rhive

Future direction of bigdata

bigdata 2013 bigdata 2014

Google glass Mashup, bigdata, visualisation -> analysis of commerce area

IoT Key: Smart & Intelligence

3D Printer Healthy food, organ, face recommended?

Evolution of bigdata

cup

Cup with Art

Cup with emotion

Cup without cup