1 Data Mining. 2 Agenda Examples What is data mining? The Industry comments Techniques.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Chapter 1 Business Driven Technology
Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Data Mining.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Chapter 14 The Second Component: The Database.
Recommender systems Ram Akella November 26 th 2008.
Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T.
Knowledge Discovery Centre: CityU-SAS Partnership 1 Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr.
Data Mining – Intro.
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Information systems and management in business Chapter 8 Business Intelligence (BI)
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Organizing Data and Information
MIS2502: Data Analytics Advanced Analytics - Introduction.
Why BI….? Most companies collect a large amount of data from their business operations. To keep track of that information, a business and would need to.
Data Mining and Decision Support
Data Mining Copyright KEYSOFT Solutions.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
BUSINESS INTELLIGENCE. The new technology for understanding the past & predicting the future … BI is broad category of technologies that allows for gathering,
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Popular Database Management Systems
Pengantar Sistem Informasi
Data Mining – Intro.
Data Mining: Introduction
MIS2502: Data Analytics Advanced Analytics - Introduction
Fundamentals of Information Systems
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Adrian Tuhtan CS157A Section1
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
Supporting End-User Access
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

1 Data Mining

2 Agenda Examples What is data mining? The Industry comments Techniques

3 Examples “On Friday evenings, shoppers who buy diapers also buy beer”. –Supermarket transaction database “People with good credit ratings have fewer accidents” –Insurance database, “A one-dollar gas station credit-card transaction followed by a large transaction is likely to be indicative of fraud”. –Credit card transactions database

4 More Examples Marketing –Targeted marketing using decision trees Stock selection / Fraud detection –Using neural networks Telecommunications –Churn modeling, identifying valuable customers

5 Even More Examples Healthcare –Fish oil and Reynaud’s disease Finding communities on the Web –Abortion example Personalization –Recommender systems

6 Even More More Examples Games (e.g. Hollywood Stock Exchange) – Viral Marketing –Social networks and network mining Sports – NBA Scout

7 Agenda Examples What is data mining? The Industry comments Techniques

8 What is Data Mining?

9 Querying large databases? Learning patterns from data? Building models from data?

10 What is Data Mining? Learning “structure” from large data –“reverse engineering” –“structure” could be patterns or models How is this different from statistics?

11 Data mining techniques Lots of them exist! How to categorize these? –Two approaches Description vs prediction RES framework

12 Classification of the main engines/techniques

13 Representation, Evaluation & Search: Linear Model Example Representation –Risk = 0.93*prior_default *num_cards – 1.3* employed –0.734 Evaluation –R-squared/degree of fit Search –How did the technique find the coefficients?

14 Representation, evaluation and search Different techniques represent, evaluate and search for patterns differently. –Methods can be characterized based on how they do these things. Data mining methods use very different representation schemes, use predictive accuracies as the main evaluation measure and use heuristic search procedures Strengths: Can build very accurate models and learn interesting patterns in a bottom-up manner Weaknesses: Can find false patterns and may “overfit” the learning data –How to mitigate these? This is one way to think about the difference between DM methods and traditional statistical methods

15 Agenda Examples What is data mining? The Industry comments Techniques

16 The Industry Space Data gathering and management –External data sources –Integrating databases to design unified views For realtime support For historical warehouse driven apps Firms –Data vendors, consulting services

17 web phone golfcourse channels Action Database Other Data Sources Customer Centric Architecture

18 The Industry Space Broad Data Analytics –Traditional statistical tools –Data mining tools Firms – –SPSS, SAS, Trajecta, IBM, SGI, Gainsmarts, HNC Software Other common sources –In-house analytics development and academia

19 The Industry Space Niche Market Analytics and Services –Fraud detection –Customer Segmentation –Direct Marketing –Bioinformatics –Internet Advertising –Personalization Firms –Examples: Doubletwist, Celera, HNC Software, Knowledge Stream Partners, Adknowledge (acquired by Engage), Epiphany.

20 The Industry Space Broad CRM Technologies and Services –General features Some data collection and integration tools Some analytics and profitability analyses Some features to streamline operations Often customizable based on client needs Boils down to client needs Firms –E.g. Siebel.

21 Data Mining Revisited Smart techniques –Data mining Not a problem. Engineering –Integrating this into an overall data management architecture The more difficult problem When and how to use –The hard part is figuring out which problem to solve, what data to use etc –The importance of thinking “bottom up” for solving problems

22 The Chief Data Officer

23 The Chief Data Officer

24 Agenda Examples What is data mining? The Industry comments Techniques

25 Example DM Models: Neural Networks Attempts to mimic the way neurons work in translating input data into an output (dependent variable)

26 Structure of a Neural Network

27 Surface-fitters or Function Approximators

28 Example DM Models: OLAP (On Line Analytical Processing) Provides visual tools to slice and dice the data

29 Browsing a Data Cube

30 Example: Clustering Identify homogeneous and separable groups (“clusters”) so that: –maximum similarity between points within a group –maximum difference between groups Applications –group customers into categories useful for targeted marketing. –Identify clusters in image data

31 What clusters can look like

32 Example: Classification

33 Example: Nearest neighbor methods Read “Amazon.com recommendations” paper

34 Online Recommender Systems Opportunities –Customized stores and all the associated benefits –Easy measurement –Permits experimentation Challenges –Scale (tens of millions of users, and millions of items) –Need for real-time results –Amount of info on customers varies, but often sparse data

35 Simple collaborative filtering C 1 C 2 C 3.. C n I 1 I 2 I 3 ….. I m Let C 1 be the vector of zeros and ones corresponding to customer 1. 2.Define similarity between customers A and B as cos(A, B) = A. B ||A||. ||B|| 3. In traditional collaborative filtering, for a given customer find the closest customer and then recommend the other products purchased by this closest cust. Advantages and Disadvantages?

36 Content based recommendations Treat recommendations as search for related items. E.g. if you liked “Men In Black” you may get recommendations for comedy films. Advantages and disadvantages?

37 Item-to-Item Collaborative Filtering I 1 I 2 I 3.. I n C 1 C 2 C 3 ….. C m For each item, find all similar items in an offline computation 2.Create a similar items table where for each items the set of all related items Are stored.

38 Example : Rule discovery methods Read: “On the discovery of statistical quantitative rules”

39 On Evaluation Apparently I would like watching movies on gang violence in New York theaters. –Why? Because… Hamburger grills product recommendation On evaluation –absolutely critical in a world in which more interactions are being structured automatically –‘evaluation’ has multiple aspects, not just how “accurate” a model may seem to be.

40 Agenda Examples What is data mining? The Industry comments Techniques