Modeling Data the different views on Data Mining.

Slides:



Advertisements
Similar presentations
Quantitative Methods Interactions - getting more complex.
Advertisements

Ch. 17 Basic Statistical Models CIS 2033: Computational Probability and Statistics Prof. Longin Jan Latecki Prepared by: Nouf Albarakati.
Pattern Analysis Prof. Bennett Math Model of Learning and Discovery 2/14/05 Based on Chapter 1 of Shawe-Taylor and Cristianini.
Modeling Data the different views on Data Mining.
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Tracking with distributed sensors Luca Schenato. Framework Modeling Algorithms Simulations Overview.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Testing Bridge Lengths The Gadsden Group. Goals and Objectives Collect and express data in the form of tables and graphs Look for patterns to make predictions.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Design Patterns Discussion of pages: xi-11 Sections: Preface, Forward, Chapter
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Data Mining Chun-Hung Chou
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Approaches to Event Prediction in Complex Environments Terence Tan (PhD Candidate) Advisors: Prof Christian Darken,
5.7 SCATTER PLOTS AND TREND LINES:
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Outline Class Intros Overview of Course Example Research Project.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
AN INTELLIGENT AGENT is a software entity that senses its environment and then carries out some operations on behalf of a user, with a certain degree of.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
The Periodic TableSection 1 Recognizing a Pattern 〉 How did Mendeleev arrange the elements in his periodic table? 〉 In his periodic table, Mendeleev arranged.
Chapter 5: The Periodic Table Video. Section 1: Organizing the Elements Video 2.
Roles of Clinician and Engineer in Design and Evaluation of Autonomous Critical Care Devices What are the knowledge gaps? 1 University of Maryland 1 Lex.
Theoretic Frameworks for Data Mining Reporter: Qi Liu.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
BILCO - 1 BILCO 2.0 for Windows In = Out......a simple starting point !
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
(c) Adaptive Processes Consulting Be with the Best!
Data Mining and Decision Support
IEEE International Conference on Multimedia and Expo.
Computer Vision Exercise Session 8 – Condensation Tracker.
Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
The Periodic Law History of the Periodic Table.  Objectives  Explain the roles of Mendeleev and Moseley in the development of the periodic table  Describe.
Lesson Starter Share what you have learned previously about the periodic table. Section 1 History of the Periodic Table Chapter 5.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Scientific Method Making observations, doing experiments, and creating models or theories to try to explain your results or predict new answers form the.
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Introduction to Machine Learning, its potential usage in network area,
Machine Learning for Computer Security
Tracking Objects with Dynamics
“The Art of Forecasting”
Data Mining 101 with Scikit-Learn
Scientific Method & Measurements
Regression-Based Prediction for Artifacts in JPEG-Compressed Images
Chapter 1: A Physics Toolkit
الفصل الثامن خطة العمل أ/ سلطانة العطاوي....
Xaq Pitkow, Dora E. Angelaki  Neuron 
Presented by: Yang Yu Spatiotemporal GMM for Background Subtraction with Superpixel Hierarchy Mingliang Chen, Xing Wei, Qingxiong.
Overfitting and Underfitting
Support Vector Machines
Unit 2: Ecology Lesson #3: Sampling Techniques
Introduction to Stream Computing and Reservoir Sampling
Ensemble learning Reminder - Bagging of Trees Random Forest
Learning Targets Students will be able to: Compare linear, quadratic, and exponential models and given a set of data, decide which type of function models.
Local Habitat Sampling
Multivariate Methods Berlin Chen, 2005 References:
Pattern Analysis Prof. Bennett
Presentation transcript:

Modeling Data the different views on Data Mining

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Data fitting  Very old concept  Capture function between variables  Often  few variables  simple models  Functions  step-functions  linear  quadratic  Trade-off between complexity of model and quality (generalization)

response to new drug body weight

response to new drug body weight

Kleiber ’ s Law of Metabolic Rate

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Density Estimation  Dataset describes a sample from a distribution  Describe distribution is simple terms prototypes

Density Estimation  Data is just a sample from an underlying distribution

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

(Machine) Learning  Perform a task more accurately than before  Learn to perform a task (at all)  Suggests an interaction between model and domain  perform some action in domain  observe performance  update model to reflect desirability of action  Often includes some form of experimentation  Not so common in Data Mining  often static data (warehouse), observational data

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Prediction: learning a decision boundary

Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

Compression  Compression is possible when data contains structure (repeating patterns)  Compression algorithms will discover structure and replace that by short code  Code table forms interesting set of patterns ABCDEF …1… 1…1… 1…1… 1…1… 0…0… 1…1… Pattern ACD appears frequently ACD helps to compress the data ACD is a relevant pattern to report

Compression Paul Vitanyi (CWI, Amsterdam)  Software to unzip identity of unknown composers  Beethoven, Miles Davis, Jimmy Hendrix  SARS virus similarity  internet worms, viruses  intruder attack traffic  images, video, …