Modeling Data the different views on Data Mining.

Slides:



Advertisements
Similar presentations
Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Quantitative Methods Interactions - getting more complex.
Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Pattern Analysis Prof. Bennett Math Model of Learning and Discovery 2/14/05 Based on Chapter 1 of Shawe-Taylor and Cristianini.
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
Generating Novel Reflectance Functions Adam Brady.
Correlation and Autocorrelation
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Testing Bridge Lengths The Gadsden Group. Goals and Objectives Collect and express data in the form of tables and graphs Look for patterns to make predictions.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Design Patterns Discussion of pages: xi-11 Sections: Preface, Forward, Chapter
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Then/Now You recognized arithmetic sequences and related them to linear functions. (Lesson 3–5) Write an equation for a proportional relationship. Write.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
5.7 SCATTER PLOTS AND TREND LINES:
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
A General Framework for Tracking Multiple People from a Moving Camera
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Objectives Compare linear, quadratic, and exponential models.
2. Write an exponential decay function to model this situation.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Overview of CCSS Statistics and Probability Math Alliance September 2011.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
AN INTELLIGENT AGENT is a software entity that senses its environment and then carries out some operations on behalf of a user, with a certain degree of.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Chapter 5: The Periodic Table Video. Section 1: Organizing the Elements Video 2.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Theoretic Frameworks for Data Mining Reporter: Qi Liu.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
Foundations for Functions Chapter Exploring Functions Terms you need to know – Transformation, Translation, Reflection, Stretch, and Compression.
BILCO - 1 BILCO 2.0 for Windows In = Out......a simple starting point !
(c) Adaptive Processes Consulting Be with the Best!
Data Mining and Decision Support
Modeling Data the different views on Data Mining.
Tracking with dynamics
Computer Vision Exercise Session 8 – Condensation Tracker.
Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis.
METHODS OF SPATIAL ECONOMIC ANALYSIS LECTURE 06 Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια, Τηλ Γραφείο Γ.6 UNIVERSITY.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Introduction to Machine Learning, its potential usage in network area,
Research Methods in I/O Psychology
Machine Learning for Computer Security
Data mining and statistical learning, lecture 1b
“The Art of Forecasting”
Data Mining 101 with Scikit-Learn
Scientific Method & Measurements
Do Now Can you Reason abstractly?
Regression Analysis PhD Course.
Regression-Based Prediction for Artifacts in JPEG-Compressed Images
الفصل الثامن خطة العمل أ/ سلطانة العطاوي....
Xaq Pitkow, Dora E. Angelaki  Neuron 
Proportional and Non-Proportional Relationships
Unit 2: Ecology Lesson #3: Sampling Techniques
Learning Targets Students will be able to: Compare linear, quadratic, and exponential models and given a set of data, decide which type of function models.
The EM Algorithm With Applications To Image Epitome
Objectives Compare linear, quadratic, and exponential models.
Parallel Programming in C with MPI and OpenMP
Pattern Analysis Prof. Bennett
Presentation transcript:

Modeling Data the different views on Data Mining

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Data fitting  Very old concept  Capture function between variables  Often  few variables  simple models  Functions  step-functions  linear  quadratic  Trade-off between complexity of model and fit (generalization)

response to new drug body weight

response to new drug body weight

money spent income ¾ ratio

Kleiber’s Law of Metabolic Rate

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Density Estimation  Dataset describes a sample from a distribution  Describe distribution is simple terms prototypes

Density Estimation  Other methods also take into account the spatial relationships between prototypes  Self-Organizing Map (SOM)

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Learning  Perform a task more accurately than before  Learn to perform a task (at all)  Suggests an interaction between model and domain  perform some action in domain  observe performance  update model to reflect desirability of action  Often includes some form of experimentation  Not so common in Data Mining  often static data (warehouse), observational data

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Prediction: learning a decision boundary

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Compression  Compression is possible when data contains structure (repeting patterns)  Compression algorithms will discover structure and replace that by short code  Code table forms interesting set of patterns ABCDEF …1… 1…1… 1…1… 1…1… 0…0… 1…1… Pattern ACD appears frequently ACD helps to compress the data ACD is a relevant pattern to report

Compression Paul Vitanyi (CWI, Amsterdam)  Software to unzip identity of unknown composers  Beethoven, Miles Davis, Jimmy Hendrix  SARS virus similarity  internet worms, viruses  intruder attack traffic  images, video, …

Mobile calls: modeling duration of calls

More data: linear model

Even more data: still linear?

Hmmm