Clustering and Term Project

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested.
Clustering Basic Concepts and Algorithms
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
PARTITIONAL CLUSTERING
K Means Clustering , Nearest Cluster and Gaussian Mixture
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Lecture 14 Go over midterm results Algorithms Efficiency More on prime numbers.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Basic Data Mining Techniques
Pre-processing for Data Mining 3.1 COT5230 Data Mining Week 3 Pre-processing for Data Mining M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28.
Technical Question Technical Question
Evaluating Performance for Data Mining Techniques
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Data Survey Chapters in Data Preparation for Data Mining by Dorian Pyle Martti Kesäniemi.
Knowledge discovery process Chapter 1 Juha Vesanto
Data Mining over Hidden Data Sources Tantan Liu Depart. Computer Science & Engineering Ohio State University July 23, 2012.
Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.
Why, and How, your Analytics Project will Fail Peter McCallum Director, CBI.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
Clustering Sequential Data: Research Paper Review Presented by Glynis Hawley April 28, 2003 On the Optimal Clustering of Sequential Data by Cheng-Ru Lin.
Math – What is a Function? 1. 2 input output function.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Production & Costs Continued… Agenda: I.Consumer and Producer Theory: similarities and differences II. Isoquants & The Marginal Rate of Technical Substitution.
Knowledge Discovery and Data Mining: Know What You are looking For Knowledge discovery process Using knowledge discovered Data mining.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
A new clustering tool of Data Mining RAPID MINER.
Write a function rule for a graph EXAMPLE 3 Write a rule for the function represented by the graph. Identify the domain and the range of the function.
Data Mining Copyright KEYSOFT Solutions.
Clustering, performance evaluation, and Term Project 1.Term Project 2.Resource for review.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Machine Learning in CSC 196K
Flowcharts C++ Lab. Algorithm An informal definition of an algorithm is: a step-by-step method for solving a problem or doing a task. Input data A step-by-step.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
1 Chapter 10 Mineral Economics the reminder of the book: will survey specific resource problems, employ simple tools of economic analysis to clarify them,
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Data Mining – Algorithms: K Means Clustering
MGT 350 Week 1 Individual Critical Thinking Application Paper Prepare a 700-to 1,050-word paper in which you describe critical thinking. · Provide an example.
Semi-Supervised Clustering
Algorithms II Software Development Life-Cycle.
Gedas Adomavicius Jesse Bockstedt
Input/Output tables.
Clustering 1 (Introduction and kmean)
Systems Analysis and Design: What is it?
Waikato Environment for Knowledge Analysis
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Linear Functions SOL 8.14, SOL 8.16, SOL 8.17.
Clustering.
KAIST CS LAB Oh Jong-Hoon
And human centered design
Digital Image Processing Week III
Confidence Intervals for Proportions
Route Graph Optimization
Clustering The process of grouping samples so that the samples are similar within each group.
Confidence Intervals for Proportions
What is Productivity ? High production? Input vs. Output Profits?
Metamorphic Exploration of an Unsupervised Clustering Program
Presentation transcript:

Clustering and Term Project Plan for this week

Term Project Questions? Examples: Research problems in Data Mining Industry problems in Data Mining Explore new data with existing/new tools Explore data with different process (tools, data selection, preprocessing) Focus on solving a problem (application or technical)

Data exploration Process (time%, importance%) --Dorian Pyle Exploring the problem space (10, 15) Exploring the solution space (9, 14) Specifying the implementation (1, 51) method (increases profitability, reduces waste, decreases fraud, or meets X goal) Mining the data Preparing the data (60, 15) Surveying the data (15, 3) Modeling the data (5, 2)

Ten Golden Rules for Miners --Dorian Pyle Select clearly defined problems that will yield tangible benefits. Specify the required solution. Define how the solution delivered is going to be used. Understand as much as possible about the problem and data set (the domain). Let the problem drive the modeling (tool and data preparation for model building)

Ten Golden Rules for Miners (cont.) 6. Stipulate assumptions. 7. Refine the model iteratively. 8. Make the model as simple as possible. 9. Define instability in the model (critical areas where changes in output vs. input). 10. Define uncertainty in the model (low confidence areas)

Selection of Research Paper for Review Algorithm-centered Application-centered Survey-centered Selection Due Mar. 24

Plan of the Week Monday (Dunham’s ppt Part II clustering 74-128) Similarity and distance measures Hierarchical algorithms (single link…) Partition algorithms (K-Means, MST,…)

Plan of the Week (cont.) Wednesday (Witten’s book 218-224, pdf 94-104; Dunham’s book 47-51) Statistical based clustering (EM algorithm) Case study: a data mining application using Cubist Term Project: directions and discussion