By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Decision Tree Approach in Data Mining
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1. Abstract 2 Introduction Related Work Conclusion References.
Copyright © 2004 Pearson Education, Inc.. Chapter 5 The Relational Data Model and Relational Database Constraints.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Week 9 Data Mining System (Knowledge Data Discovery)
An overview of The IBM Intelligent Miner for Data By: Neeraja Rudrabhatla 11/04/1999.
Data Mining By Archana Ketkar.
Intelligent Databases and Information Systems Department of Computer Science and Artificial Intelligence, University of Granada, Spain © Fernando Berzal,
Data Mining Adrian Tuhtan CS157A Section1.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
Enterprise systems infrastructure and architecture DT211 4
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Chun-Hung Chou
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Mining By Dave Maung.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
DATA MINING By Cecilia Parng CS 157B.
ID3 Algorithm Michael Crawford.
What type of marketing information is useful in the Sports & Entertainment/Event Marketing Industry? Definition of Marketing Information: data collected.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
ISQS 7342 Dr. zhangxi Lin By: Tej Pulapa. DT in Forecasting Targeted Marketing - Know before hand what an online customer loves to see or hear about.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Association Rules Carissa Wang February 23, 2010.
Chapter 1 MARKETING IS ALL AROUND US. The Scope of Marketing Marketing is activity, set of institutions, and processes for creating, communicating, delivering,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Oracle Advanced Analytics
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
A Research Oriented Study Report By :- Akash Saxena
Adrian Tuhtan CS157A Section1
Sangeeta Devadiga CS 157B, Spring 2007
I don’t need a title slide for a lecture
כריית נתונים.
12/2/2018.
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Clustering John Owen Sarah Smith.
MIS2502: Data Analytics Introduction to Advanced Analytics
©Jiawei Han and Micheline Kamber
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

By Dan Stalloch

Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain things occur Classification – shows us how data is grouped

Prediction – the detection of a stable occurrence within the data that may continue into the future Identification – what can be found out by system usage or what might be present in a thing Classification – how the data could be grouped Optimization – finding ways to utilize resources

Apriori – frequent large item sets Sampling – small frequent item sets Frequent-Pattern (FP) Tree and FP-Growth – better version of Apriori Partition – efficient way to use the Apriori algorithm Decision Tree Induction – constructing a decision tree from a training data set k-Means – creates clustering And others

Marketing – analyzing customer behavior Finance – keeping track of credit and fraud Manufacturing – optimizing use of resources Health Care – checking patterns for useful information

databases/auto-mpg/auto-mpg.data databases/auto-mpg/auto-mpg.data This is a Car database from a depository of databases made available to everyone through UCI When mining a database it is essential to ask what would you like to be able to predict from it and in this instance we would like to know which cars have decent mpg We might also be able to predict which companies are likely to stay in business

We must create or use programs that shows us either a 2-D contingency table or a 3-D contingency table ab.org/tutorials/dt ree18.pdf

We use a formula to decide which areas have the highest information gain dependent on what we would like to know. That forumula goes like this IG(Y|X) = H(Y) - H(Y | X) Where H(X) = the entropy of X

databases/auto-mpg/auto-mpg.data databases/auto-mpg/auto-mpg.data Chapter 28 from Fundamentals of Database Systems 6 th Edition By Elmasri and Navathe Pictures from Andrew W. Moore Slides