Principles of Data Mining Pham Tho Hoan

Slides:



Advertisements
Similar presentations
Machine Learning and Data Mining Clustering
Advertisements

Visual Recognition Tutorial
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML) and likelihood ratio (LR) test
These slides are additional material for TIES4451 Lecture 5 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö.
Visual Recognition Tutorial
Data Mining – Intro.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Maximum likelihood (ML)
Chapter 2 – Simple Linear Regression - How. Here is a perfect scenario of what we want reality to look like for simple linear regression. Our two variables.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Crash Course on Machine Learning
Data Mining Techniques
Overview DM for Business Intelligence.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
EM and expected complete log-likelihood Mixture of Experts
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Data Mining and Decision Support
Machine Learning 5. Parametric Methods.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.
Computacion Inteligente Least-Square Methods for System Identification.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Advanced Data Analytics
Lecture 1.31 Criteria for optimal reception of radio signals.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
Data Mining ICCM
OPERATING SYSTEMS CS 3502 Fall 2017
Who am I? Work in Probabilistic Machine Learning Like to teach 
Chapter 7. Classification and Prediction
By Arijit Chatterjee Dr
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Deep Feedforward Networks
12. Principles of Parameter Estimation
DATA MINING © Prentice Hall.
LECTURE 11: Advanced Discriminant Analysis
Data Mining 101 with Scikit-Learn
10701 / Machine Learning.
Microsoft Office Illustrated
Machine Learning Basics
Data Mining Lecture 11.
Data Mining: Concepts and Techniques Course Outline
Collaborative Filtering Matrix Factorization Approach
Ying shen Sse, tongji university Sep. 2016
Chapter 8 Models and Decision Support
Biointelligence Laboratory, Seoul National University
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
The loss function, the normal equation,
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Analytics – Statistical Approaches
Mathematical Foundations of BME Reza Shadmehr
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
12. Principles of Parameter Estimation
Machine Learning and Data Mining Clustering
Presentation transcript:

Principles of Data Mining Pham Tho Hoan

References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2 nd Edition, [3] Christopher M. Bishop, Pattern Recognition and Machine Learning, 2006

Principles of Data Mining

What is data mining Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. The relationships and summaries derived through a data mining exercise are often referred to as models or patterns. Examples include linear equations, rules, clusters, graphs, tree structures, and recurrent patterns in time series. Observational data ≠ experimental data, convenience (opportunity) samples ≠ random samples, huge data ≠ small data, data mining ≠ statistics Novelty >< triviality, novelty must be measured relative to the user's prior knowledge Simple relationships are more readily understood than complicated ones, and may well be preferred, but simple ones may not be useful.

Data mining and Knowledge Discovery in Data

Types of data sets n × p data matrix {real number, category, missing, noise} Text, sequence, structure, pictures Transactions Etc. Lost information

Model and pattern structures A model structure, as defined here, is a global summary of a data set; it makes statements about any point in the full measurement space. (Y = aX + c ) Pattern structures make statements only about restricted regions of the space spanned by the variables. An example is a simple probabilistic statement of the form: if X > x1 then prob(Y > y1) = p1; or p(Y > y1|X > x1) = p1. This structure consists of constraints on the values of the variables X and Y, related in the form of a probabilistic rule Once we have established the structural form we are interested in finding, the next step is to estimate its parameters from the available data. We refer to a particular model, such as y = 3:2x + 2:8, as a "fitted model," or just "model" for short (and similarly for patterns).

Data mining tasks Exploratory Data Analysis (EDA) the goal is simply to explore the data without any clear ideas of what we are looking for. Typically, EDA techniques are interactive and visual, and there are many effective graphical display methods for relatively small, low-dimensional data sets. Descriptive Modeling The goal of a descriptive model is describe all of the data (or the process generating the data). Examples of such descriptions include models for the overall probability distribution of the data (density estimation), partitioning of the p-dimensional space into groups (cluster analysis and segmentation), and models describing the relationship between variables (dependency modeling). Predictive Modeling: Classification and Regression The aim here is to build a model that will permit the value of one variable to be predicted from the known values of other variables. Discovering Patterns and Rules Retrieval by Content

Components of data mining algorithms 1.Model or Pattern Structure: determining the underlying structure or functional forms that we seek from the data 2.Score Function: judging the quality of a fitted model 3.Optimization and Search Method: optimizing the score function and searching over different model and pattern structures. 4. Data Management Strategy: handling data access efficiently during the search/optimization

Score functions Without some form of score function, we cannot tell whether one model is better than another or, indeed, how to choose a good set of values for the parameters of the model. Several score functions are widely used for this purpose; these include likelihood, sum of squared errors, and misclassification rate (the latter is used in supervised classification problems). Penalize model complexity: score(model) = error(model) + penaltyFunction(model),

Optimization and Search Methods The goal of optimization and search is to determine the structure and the parameter values that achieve a minimum (or maximum, depending on the context) value of the score function. Methods: Greedy Search Algorithm, Systematic Search and Search Heuristics, Branch-and-Bound, Gradient-Based Methods for Optimizing Smooth Functions, Univariate Parameter Optimization, Multivariate Parameter Optimization, Constrained Optimization, etc.

Data Management Strategy The ways in which the data are stored, indexed, and accessed.

An example Problem: – Input: a dataset of credit-card spending {(x i, y i ), i=1,.., n}; – Output: a model which would allow us to predict a person's annual credit-card spending given their annual income. One solution: the model would not be perfect, but since spending typically increases with income, the model might well be adequate as a rough characterization. – Model structure: variable spending (f) is linearly related to the variable income (x): f(x) = ax +b – The score function: The smaller this sum is, the better the model fits the data. – The optimization algorithm (to find a, b) is quite simple in the case of linear regression: a and b can be expressed as explicit functions of the observed values of spending and income.

Some questions Cùng bài toán trên (mô hình hóa quan hệ giữa x và y), xem xét 3 mô hình sau, anh chị thích mô hình nào? M1: y=(y 1 + … + y n )/n với mọi x M2: y=ax + b (với a, b tìm được như trong slide trước) M3: if (x=x 1 ) then y=y 1 else if (x=x 2 ) then y=y 2 else … if (x=x n ) then y=y n else y=random-value (default) Mô hình nào phức tạp nhất? Mô hình nào phù hợp với dữ liệu huấn luyện nhất? Mô hình nào có khả năng dự đoán tốt nhất?