Data Mining Ketaki Borkar CS157A November 29, 2007.

Slides:



Advertisements
Similar presentations
Chapter 1 Introduction 1. The Evolution of Data Analysis To Support Business Intelligence 2 Evolutionary Step Business Question Enabling Technologies.
Advertisements

Supporting End-User Access
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
DATA WAREHOUSING.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Chapter 14 The Second Component: The Database.
Clarifying the Research Question through Secondary Data and Exploration Chapter 5 組員 黎旭崴 李承霖.
Data Mining – Intro.
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
DataMining By Guan Hang Su CS157A section 2 fall 2005.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
Business Intelligence
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Data Mining Techniques
Data Mining Chun-Hung Chou
Understanding Data Analytics and Data Mining Introduction.
Enabling Organization-Decision Making
© 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke Slide 1 Chapter 9 Competitive Advantage with Information Systems for Decision Making.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
@ ?!.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
BUSINESS DRIVEN TECHNOLOGY
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Information systems and management in business Chapter 8 Business Intelligence (BI)
New Developments in Business Intelligence ( Decision Support Systems) BUS 782.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining Basics. “Copyright and Terms of Service Copyright © Texas Education Agency. The materials found on this website are copyrighted © and trademarked.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Data Mining Copyright KEYSOFT Solutions.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Data Mining NATE BUTLER, BRENT DAVIS, BROCK NOLAN, AND NICK THORNHILL.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Introduction BIM Data Mining.
Chapter 1 Introduction.
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
Chapter 1 Introduction.
Adrian Tuhtan CS157A Section1
MIS5101: Data Analytics Advanced Analytics - Introduction
Data Analysis.
Data Science introduction.
Supporting End-User Access
Data mining Data mining is the process of analyzing data from different perspectives and summarizing it into useful information.
CSE591: Data Mining by H. Liu
Presentation transcript:

Data Mining Ketaki Borkar CS157A November 29, 2007

Agenda 1. Definition 2. Overview 3. History 4. Evolution 5. Scope 6. Stages 7. Process 8. Relationships 9. Elements 10. Data Warehousing 11. Techniques 12. Examples 13. Advantages/Disadvantages 14. References

Definition “Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.”

Overview Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge driven decisions. Prospective analysis offered by data mining move beyond analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line.

History Data mining is the evolution of a field with a long history, but the term itself was only introduced relatively recently, in the 1990s Statistics are the foundation of most technologies on which data mining is built. Its roots can be traced back to along three family lines:  Classical statistics  Artificial intelligence  Machine learning It is finding increasing acceptance in science and business areas which need to analyze large amounts of data to discover trends which they could not otherwise find.

Classical Statistics Classical statistics embrace concepts such as regression analysis, standard distribution, standard deviation, standard variance, cluster analysis, all of which are used to study data and data relationships. These are the building blocks with which more advanced statistical analysis are underpinned. Within the heart of today’s data mining tools and techniques, classical statistical analysis plays a significant role.

Artificial Intelligence (AI) It is built upon heuristics (method that often rapidly leads to a solution that is usually close to the best possible answer) as opposed to statistics, attempts to apply human-thought-like processing to statistical problems. Since this approach requires vast computer processing power, it was not practical until the early 1980s, when computers began to offer useful power at reasonable prices. Certain AI concepts were adopted by some high-end commercial products, such as query optimization modules for Relational Database Management Systems (RDBMS).

Machine Learning Union of statistics and artificial intelligence. Is an evolution of artificial intelligence because it blends artificial intelligence heuristics with advanced statistical analysis. Machine learning attempts to let computer programs learn about the data they study, such that programs make different decisions based on the qualities of the studied data, using statistics for fundamental concepts, and adding more advanced AI heuristics and algorithms to achieve its goals.

Evolution of Data Mining Evolutionary Step Business Question Enabling Technologies Product Providers Purpose Data Collection(1960) "What was my total revenue in the last five years?" Computers, tapes, disks IBM, CDCRetrospective, static data delivery Data Access(1980s) "What were unit sales in New England last March?" Relational databases (RDBMS), Structured Query Language (SQL), ODBC Oracle, Sybase, Informix, IBM, Microsoft Retrospective, dynamic data delivery at record level Data Warehousing & Decision Support (1990s) "What were unit sales in New England last March? Drill down to Boston." On-line analytic processing (OLAP), multidimensional databases, data warehouses Pilot, Comshare, Arbor, Cognos, Microstrategy Retrospective, dynamic data delivery at multiple levels Data Mining (Emerging Today) "What’s likely to happen to Boston unit sales next month? Why?" Advanced algorithms, multiprocessor computers, massive databases Pilot, Lockheed, IBM, SGI, numerous startups (nascent industry) Prospective, proactive information delivery

Scope of Data Mining Automated prediction of trends and behaviors. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings.  EX: forecasting bankruptcy identifying segments of a population likely to respond similarly to given events. Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step.  EX: analysis of retail sales data to identify seemingly unrelated products that are often purchased together (ex  beer and diapers). detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.

Stages Stage 1: Exploration  Data preparation, cleaning and transformation. Stage 2: Model building and validation  Considering various models and choosing the best one based on their performance. Stage 3: Deployment  Using the selected model as best in Stage 2 and applying it to new data in order to generate predictions or estimates of the expected outcome.

Data Mining Process

Relationships Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials. Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities. Associations: Data can be mined to identify associations. The beer- diaper example is an example of associative mining. Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Elements Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

Date Warehousing vs. Data Mining Data Warehouse: “is a repository (or archive) of information gathered from multiple sources, stored under a unified schema, at a single site.” (Silberschatz)  Collect data  Store in single repository  Allows for easier query development as a single repository can be queried. Data Mining:  Analyzing databases or Data Warehouses to discover patterns about the data to gain knowledge.  Knowledge is power

Data Mining Techniques Clustering is the method by which like records are grouped together. Usually this is done to give the end user a high level view of what is going on in the database. Clustering is sometimes used to mean segmentation - which most marketing people will tell you is useful for coming up with a birds eye view of the business.  EX: 1) Clustering people with similar movie preferences 2) Amazon.com displays “Customers who brought this book also bought…” Nearest neighbor algorithm is a refinement of clustering. It perfoms prediction by finding the prediction value of records (near neighbors) similar to the record to be predicted.

Techniques…continued Decision Tree: A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Specifically each branch of the tree is a classification question and the leaves of the tree are partitions of the dataset with their classification. CART: Classification and Regression Trees. CHAID: Chi-Square Automatic Interaction Detector

Examples – Amazon.com

Credit Risk – Decision Tree

Advantages Historical data can be used to predict future trends Knowledge about new trends can be used to improve products and services Extracting knowledge hidden in large volumes of data Data mining is used in developing models to predict outcomes of future situations.

Disadvantages Background checks Spam Privacy concerns  Birthdates, SSNs, personal information scrutinized for corporate gain. Telemarketing Surveillance and profiling

References privacychallenge02.html privacychallenge02.html ourse.mat/Alex/ ourse.mat/Alex/ els.html els.html html html