Data Mining Jim King. What is Data Mining?  A.k.a. knowledge discovery The search for previously unknown relationships in large data setsThe search for.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Unit 7: Store and Retrieve it Database Management Systems (DBMS)
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Chapter 12: Web Usage Mining - An introduction
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
Data Mining Adrian Tuhtan CS157A Section1.
Recommender systems Ram Akella November 26 th 2008.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Data Mining – Intro.
Constraint Satisfaction Not all problems are solved by a sequential series of steps. How do we solve other types of problems?
Enterprise systems infrastructure and architecture DT211 4
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Basic Data Mining Techniques
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
Recommender systems Drew Culbert IST /12/02.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Knowledge Discovery and Data Mining Evgueni Smirnov.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Data Mining By Dave Maung.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Association Rule Mining
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Text Clustering Hongning Wang
Academic Year 2014 Spring Academic Year 2014 Spring.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
CS548 Spring 2016 Association Rules Showcase by Shijie Jiang, Yuting Liang and Zheng Nie Showcasing work by C.J. Carmona, S. Ramírez-Gallego, F. Torres,
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Unsupervised Learning
DATA MINING © Prentice Hall.
Data Mining Jim King.
Waikato Environment for Knowledge Analysis
Adrian Tuhtan CS157A Section1
Sangeeta Devadiga CS 157B, Spring 2007
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Information Organization: Clustering
CSE591: Data Mining by H. Liu
Unsupervised Learning
Presentation transcript:

Data Mining Jim King

What is Data Mining?  A.k.a. knowledge discovery The search for previously unknown relationships in large data setsThe search for previously unknown relationships in large data sets  Why? Improved technology allows for vast quantities of data to be gatheredImproved technology allows for vast quantities of data to be gathered Those relationships can perhaps be used to make future decisions and strategiesThose relationships can perhaps be used to make future decisions and strategies

How do we Data Mine?  Three considerations to be made ClassificationClassification AssociationAssociation SequentialSequential

Classification  Generate grouping rules Future data can then be classified quicklyFuture data can then be classified quickly  Example: Disease classification based on symptoms may lead to better treatments

Association  Two conditions occur together PresumptiveObjective  With some probability (confidence) Cond1 => Cond2

Sequential  Event B follows Event A  Ex. In e-commerce, what links do people follow? After following links to a product, how often do they buy?After following links to a product, how often do they buy?

Classification Algorithms  Hard clustering vs. Soft clustering Collection of classes { C1, C2,.. Cn }Collection of classes { C1, C2,.. Cn } Arbitrary Object OArbitrary Object O Soft Clustering: Classes may overlap where an object belongs to multiple classesSoft Clustering: Classes may overlap where an object belongs to multiple classes Hard Clustering: Every object may belong to only one class. No overlapHard Clustering: Every object may belong to only one class. No overlap

Classification  One way: Agglomerative Every object is its own clusterEvery object is its own cluster Find two objects with least distanceFind two objects with least distance Combine into one clusterCombine into one cluster Stop when only one cluster remainsStop when only one cluster remains Returns hierarchy of the clusteringReturns hierarchy of the clustering Need to decide on some distance function

Classification  Another way: Division method Everything initially in one clusterEverything initially in one cluster Split into two clustersSplit into two clusters Split each new cluster into two more clustersSplit each new cluster into two more clusters Stop when can’t divide any moreStop when can’t divide any more Requires more computational power, but usually worse results

Association Algorithms  Given constraints, minimize the criteria need for a condition  Bought cereal & eggs -> Bought milk 80% confidence80% confidence  Bought cereal -> Bought milk 90% confidence90% confidence

Association  Prune conditions which fall below minimum improvement yields simplifications  Other constraints: Minimum confidence ( 30% with A include B)Minimum confidence ( 30% with A include B) Minimum support ( 2% have both A and B)Minimum support ( 2% have both A and B)

Sequential Algorithms  People buy basic camping equipment  Later buy other items related  Starting with basic item sets, try to concatenate and find the resulting set among customer behavior

Sequential  If resulting item set is not supported (at all or above a threshold), drop it  Sequences do not have to be contiguous i.e. A customer buys A then B then C, sequence A then C is validi.e. A customer buys A then B then C, sequence A then C is valid

Case Study - SchulWeb  Search Site for schools in Germany  How to improve performance and user satisfaction?  Use log to track user navigation patterns (i.e. What URLs requested, what order?)  Extract Information from these

Interpretations of Mining  Users don’t like to type text  Prefer to select from available choices  What were they looking for? Schools close to some regionSchools close to some region Used option to specify a state (for location)Used option to specify a state (for location) Used option to specify a school type (to limit search size)Used option to specify a school type (to limit search size)

Changes Made  Made “Near Town” Default Made option obvious, people started to useMade option obvious, people started to use Limited region size further, short lists producedLimited region size further, short lists produced Shorter lists less intimidating, more people found what they needShorter lists less intimidating, more people found what they need

Conclusions  Data mining is a useful tool with multiple algorithms that can be tuned for specific tasks  Can benefit business, medicine, science  More efficient algorithms needed to speed up data mining process

Conclusions  Making Data mining easier to use Data with rich descriptions (more fields)Data with rich descriptions (more fields) More Data/RecordsMore Data/Records Controlled/Reliable Data Collection (automated vs. manual)Controlled/Reliable Data Collection (automated vs. manual) Way to evaluate resultsWay to evaluate results Integrate information gained back into systemIntegrate information gained back into system

Final Questions? 