Theoretic Frameworks for Data Mining Reporter: Qi Liu.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

An Introduction to Data Mining
1 WHY MAKING BAYESIAN NETWORKS BAYESIAN MAKES SENSE. Dawn E. Holmes Department of Statistics and Applied Probability University of California, Santa Barbara.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Decision Tree Approach in Data Mining
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Building Global Models from Local Patterns A.J. Knobbe.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Basic Data Mining Techniques Chapter Decision Trees.
Learning From Data Chichang Jou Tamkang University.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
Data Mining By Archana Ketkar.
Data Mining, Information Theory and Image Interpretation Sargur N. Srihari Center of Excellence for Document Analysis and Recognition and Department of.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Data Mining – Intro.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Learning Chapter 18 and Parts of Chapter 20
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Inductive learning Simplest form: learn a function from examples
COMP3503 Intro to Inductive Modeling
Design patterns. What is a design pattern? Christopher Alexander: «The pattern describes a problem which again and again occurs in the work, as well as.
On Data Mining, Compression, and Kolmogorov Complexity. C. Faloutsos and V. Megalooikonomou Data Mining and Knowledge Discovery, 2007.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Chapter 1 Introduction to Data Mining
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
November 13, 2014Computer Vision Lecture 17: Object Recognition I 1 Today we will move on to… Object Recognition.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Learning from observations
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Data Mining and Decision Support
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Introduction to Machine Learning, its potential usage in network area,
Intro to MIS – MGS351 Databases and Data Warehouses
Data Mining – Intro.
By Arijit Chatterjee Dr
Data Mining 101 with Scikit-Learn
Data Mining Lecture 11.
Data Mining: Concepts and Techniques Course Outline
Advanced Embodiment Design 26 March 2015
Object Recognition Today we will move on to… April 12, 2018
Data Science in Industry
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Theoretic Frameworks for Data Mining Reporter: Qi Liu

How can be a framework for data mining? Encompass all or most typical data mining tasks Have a probablistic nature Be able to talk about inductive generations Deal with different types of data Recognize the data mining process as an iterative and interactive process Account for background knowledge in deciding what is an interesting discovery

Statistics Framework Statistics viewpoint: – Volume of data – Computational feasibility – Database integration – Simplicity of use – Understandablity of results

Machine Learning Framework Data mining is applied machine learning Machine learning focuses on the prediction, based on known properties learned from the training data Data mining focuses on the discovery of (previously) unknown properties on the data Data mining can not use supervised methods due to unavailability

Probablistic Framework To find the underlying joint distribution(e.g., Bayesian network) of the variables in the data. Advantages: – Solid background – Clustering/Classification fit easily into this framework Lackage: – Can not take the iterative and interactive nature of the data mining process into account

Data Compression Framework Goal: to compress the data set by finding some structure for it and then encoding the data using few bits. Minimum description length(MDL) principle Instances: association rules, a decision tree, clustering

Microeconomic Framework To find actionable patterns that increase utility Define utility function from a perspective of customers

Inductive Database Framework Store both data and patterns An inductive database I(D,P) consist of a data component D and a pattern component P. We assume that both the data and the pattern components D and P are sets of sets. This assumption is motivated by an analogy with traditional relational databases. PS: deductive database: partial rules

Information Theoretic Framework Data mining is a process of information transimission from an algorithm to data miner. Model the data miner’s state of mind as a probability distribution, called the background distribution, which represents the uncertainty and misconceptions. In the data mining process, properties of the data(referred as patterns) are revealed.

Attention! Focus on the data miner as much as on the data. An interesting pattern should be defined subjectively, rather than objectively. The primary concern is understanding the data itself, rather than the stochastic source than generated it.

Bird’s eye view on IT framework – A data miner is able to formalize her beliefs in a background distribution, denoted P* – Kraft’s inequality is an equality – Code length of x with a probability P: -log(P(x)) – The entropy of P* could be small due to the data miner being overly confident – Update P* to be a new background distribution P*’ – Measure the reduction of code length:

Trade-off Good data mining algorithms are those that are able to pinpoint those patterns that lead to a large information gain. A trade-off between the information gain due to the revealing of a pattern in the data, and the description length of the pattern, that should define a pattern’s interest to the data miner.

How to determine P* and P*’?

Patterns

More issues about the framework The cost of a pattern should be specified in advance by the data miner. Joint Patterns Cases: – Clustering and alternative clustering – Dimensionality reduction(PCA) – Frequent pattern mining – Community detection – Subgroup discovery and supervised learning