ESL Chap1 - Introduction Statistical Learning Problems Identify the risk factors for prostate cancer, based on clinical and demographic variables.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Some learnings and open questions from the Montpellier DSM workshop First DSM working group meeting University of Miskolc, Hungary, 7-8 April, 2005 P.
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
What is Statistical Modeling
Discriminant Analysis To describe multiple regression analysis and multiple discriminant analysis. Discriminant Analysis.
Statistical Learning Introduction: Data Mining Process and Modeling Examples Data Mining Process.
Introduction to Neural Networks Simon Durrant Quantitative Methods December 15th.
1 BA 275 Quantitative Business Methods Simple Linear Regression Introduction Case Study: Housing Prices Agenda.
Statistical Learning Introduction: Data Mining Process and Modeling Examples Data Mining Process.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, , 3.2, 3.3, 4.3, 6.1*
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to machine learning
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
1 A Taste of Data Mining. 2 Definition “Data mining is the analysis of data to establish relationships and identify patterns.” practice.findlaw.com/glossary.html.
Statistical Learning Introduction: Modeling Examples.
Lecture 2: Introduction to Machine Learning
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Pattern Recognition & Machine Learning Debrup Chakraborty
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
The Broad Institute of MIT and Harvard Classification / Prediction.
1 CSC 8520 Spring Paula Matuszek Kinds of Machine Learning Machine learning techniques can be grouped into several categories, in several ways: –What.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
What is science? an introduction to life science.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Science Process Skills Vocabulary 8/17/15. Predicting Forming an idea of an expected result. Based on inferences.
Gaussian Processes Li An Li An
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
24 October 2002Data Mining & Visualization1 Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford.
A New Generation of Artificial Neural Networks.  Support Vector Machines (SVM) appeared in the early nineties in the COLT92 ACM Conference.  SVM have.
Creating a Database for Corporate Structure and Regulatory Risk in the Oil & Gas Industry Kirby Lawrence Faculty Mentor-Benjamin Gilbert.
Modeling of Core Protection Calculator System Software February 28, 2005 Kim, Sung Ho Kim, Sung Ho.
DEMAND FORECASTING & MARKET SEGMENTATION. Why demand forecasting?  Planning and scheduling production  Acquiring inputs  Making provision for finances.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
CSE 4705 Artificial Intelligence
Machine Learning for Computer Security
Statistical Learning Introduction: Modeling Examples
Introduction to Linear Regression

Statistical Learning Introduction: Modeling Examples
Lecture 3: Linear Regression (with One Variable)
Data mining and statistical learning, lecture 1b
The Elements of Statistical Learning
Supervised Learning Seminar Social Media Mining University UC3M
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Linear regression project
Machine Learning Ali Ghodsi Department of Statistics
Overview of Supervised Learning
Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.
Introduction to Azure Machine Learning Studio
Machine Learning Week 1.
BA 275 Quantitative Business Methods
Supervised vs. unsupervised Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Speech recognition, machine learning
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Ensemble Methods: Bagging.
Azure Machine Learning
Speech recognition, machine learning
Presentation transcript:

ESL Chap1 - Introduction Statistical Learning Problems Identify the risk factors for prostate cancer, based on clinical and demographic variables.

Classify a recorded phoneme, based on a log-periodogram. Classify a recorded phoneme, based on a log-periodogram.

Predict whether someone will have a heart attack on the basis of demographic, diet and clinical measurements

Customize an spam detection system.

Identify the numbers in a handwritten zip code, from a digitized image

Classify a tissue sample into one of several cancer classes, based on a gene expression profile.

Classify the pixels in a LANDSAT image, according to usage: {red soil, cotton, vegetation stubble, mixture, gray soil, damp gray soil, very damp gray soil}

The Supervised Learning Problem Starting point: Outcome measurement Y (also called dependent variable, response, target) Vector of p predictor measurements X (also called inputs, regressors, covariates, features, independent variables) In the regression problem, Y is quantitative (e.g price, blood pressure) In the classification problem, Y takes values in a finite, unordered set (survived/died, digit 0-9, cancer class of tissue sample) We have training data (x 1, y 1 )  (x N, y N ). These are observations (examples, instances) of these measurements.

Objectives On the basis of the training data we would like to: Accurately predict unseen test cases Understand which inputs affect the outcome, and how Assess the quality of our predictions and inferences

Philosophy It is important to understand the ideas behind the various techniques, in order to know how and when to use them. One has to understand the simpler methods first, in order to grasp the more sophisticated ones. It is important to accurately assess the performance of a method, to know how well or how badly it is working [simpler methods often perform as well as fancier ones!] This is an exciting research area, having important applications in science, industry and finance.