Topics of discussions: 1. Go over classification methods: what can be said about applicability in different situations. 2. Models not perfect; what can.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Design of Experiments Lecture I
DECISION TREES. Decision trees  One possible representation for hypotheses.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Brief introduction on Logistic Regression
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Assuming normally distributed data! Naïve Bayes Classifier.
Sparse vs. Ensemble Approaches to Supervised Learning
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Announcements  Project teams should be decided today! Otherwise, you will work alone.  If you have any question or uncertainty about the project, talk.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Thanks to Nir Friedman, HU
Decision analysis and Risk Management course in Kuopio
1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.
An Introduction to Support Vector Machines Martin Law.
Crash Course on Machine Learning
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Hypothesis Testing in Linear Regression Analysis
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
Probabilistic Mechanism Analysis. Outline Uncertainty in mechanisms Why consider uncertainty Basics of uncertainty Probabilistic mechanism analysis Examples.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Machine learning system design Prioritizing what to work on
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Ensemble Methods: Bagging and Boosting
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Interactive Deduplication using Active Learning Sunita Sarawagi and Anuradha Bhamidipaty Presented by Doug Downey.
Course Review FORE 3218 Course Review  Sampling  Inventories  Growth and yield.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Simple examples of the Bayesian approach For proportions and means.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Basic Statistics Linear Regression. X Y Simple Linear Regression.
1 Sampling Distribution of Arithmetic Mean Dr. T. T. Kachwala.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 CSI5388 Practical Recommendations. 2 Context for our Recommendations I This discussion will take place in the context of the following three questions:
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Introduction to emulators Tony O’Hagan University of Sheffield.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning: Ensemble Methods
Oliver Schulte Machine Learning 726
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Eco 6380 Predictive Analytics For Economists Spring 2016
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
John Loucks St. Edward’s University . SLIDES . BY.
COMP61011 : Machine Learning Ensemble Models
Slides by JOHN LOUCKS St. Edward’s University.
BASIC REGRESSION CONCEPTS
Sampling and Sample Size Calculations
Topic 8 Correlation and Regression Analysis
The Bias-Variance Trade-Off
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Topics of discussions: 1. Go over classification methods: what can be said about applicability in different situations. 2. Models not perfect; what can we say about robustness or other flaws in the model. 3. How do we get an intuitive feeling for the classification methods? Graphical methods? 4. Methods for variable selection? How to find the “best” set of variables? Why do we need to reduce the number of variables? 5. How many different data sets people have looked at? 6. Unisims, multisims: estimating systematic uncertainties. 7. Technical framework implementation of different classifiers. 8. Question: Is subjectivity adding to the quality of the analysis or not? 9. Can we learn about statisticians' methodology.

1. Classification methods Try them all (sort of) Don't worry so much about which one is better Start simple Could statistician prepare some comparisons Unlikely to get VERY precise conclusions some are more sensitive to tuning parameters than others, e.g.random forests are not Often “off the shelf” versions give largely the same results (amongst the good ones) Don't overlook naïve Bayes “Good” classifier: Decision trees with boosting, random forests, Bayesian NN 2. Robustness A classifier might be chosen on the basis of something other than average class errors. Compromise between class errors and sensitivity to parameters settings (insensitivity to parameter settings is not a sufficient condition to choose one classifier over another) Trade off parameter settings to sample size Use parameter settings as inputs to classifiers What to do regarding non Gaussian tails?

3. A graphical suggestion Plot change in prediction relative to change in values of a simple variable vs those values. Look for : - additivity or not - interactions - linearity or not - intercept = 0? (i.e. useless variable) 7. Technical framework implementation of different classifiers (See Illya Narsky talk) Suggestion to prepare consumer reports for different software 6. Systematic uncertainties Unisim/multisim/designing the multisim (multisim generally prefered) Changing the mix of data points produced vs parameter settings 10. Mistakes not to make See Volker's slides on normalization