R OBERTO B ATTITI, M AURO B RUNATO The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Analysis of frequency counts with Chi square
Topic 6: Introduction to Hypothesis Testing
Hypothesis testing Week 10 Lecture 2.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
The Simple Regression Model
Chapter Sampling Distributions and Hypothesis Testing.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Chapter 11 Multiple Regression.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Today Concepts underlying inferential statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 9: Introduction to the t statistic
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Introduction to Testing a Hypothesis Testing a treatment Descriptive statistics cannot determine if differences are due to chance. A sampling error occurs.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Nonparametric or Distribution-free Tests
Inferential Statistics
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
AM Recitation 2/10/11.
Chapter 10 Hypothesis Testing
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Data Collection & Processing Hand Grip Strength P textbook.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Evidence Based Medicine
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Individual values of X Frequency How many individuals   Distribution of a population.
Inferential Statistics 2 Maarten Buis January 11, 2006.
User Study Evaluation Human-Computer Interaction.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
STATISTICS FOR SCIENCE RESEARCH (The Basics). Why Stats? Scientists analyze data collected in an experiment to look for patterns or relationships among.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Introduction to Testing a Hypothesis Testing a treatment Descriptive statistics cannot determine if differences are due to chance. Sampling error means.
1 Probability and Statistics Confidence Intervals.
Roberto Battiti, Mauro Brunato
Chapter 13 Understanding research results: statistical inference.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Stats Methods at IC Lecture 3: Regression.
Step 1: Specify a null hypothesis
Quantitative Methods PSY302 Quiz Chapter 9 Statistical Significance
Quantitative Data Analysis P6 M4
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Review for Exam 2 Some important themes from Chapters 6-9
The Nature of Probability and Statistics
Hypothesis Testing.
Inferential Statistics
Regression & Correlation (1)
Feature Selection Methods
Presentation transcript:

R OBERTO B ATTITI, M AURO B RUNATO The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb optimization.org/LIONbook © Roberto Battiti and Mauro Brunato, 2014, all rights reserved. Can be used and modified for classroom usage, provided that the attribution (link to book website) is kept.

Chap.7 Ranking and selecting features I don’t mind my eyebrows. They add... something to me. I wouldn’t say they were my best feature, though. People tell me they like my eyes. They distract from the eyebrows. (Nicholas Hoult)

Feature selection

Feature selection (2) Before starting to learn a model from the examples, one must be sure that the input data have sufficient information to predict the outputs, without excessive redundancy, which may causes “big” models and poor generalization Feature selection is the process of selecting a subset of relevant features to be used in model construction.

Reasons for feature selection Selecting a small number of informative features has advantages: 1.Dimensionality reduction 2.Memory usage reduction 3.Improved generalization 4.Better human understanding

Methods for feature selection Feature selection is a problem with many possible solutions: no simple recipe. 1. Use the designer intuition and existing knowledge 2.Estimate the relevance or discrimination power of the individual features

Wrapper, Filter and Embedded methods The value of a feature is related to a model- construction method. Three classes of methods: 1.Wrapper methods are built “around” a specific predictive model (measure error rate) 2.Filter methods use a proxy measure instead of the error rate to score a feature subset 3.Embedded methods perform feature selection as an integral part of the model construction process.

Top-down and Bottom-up methods In a bottom-up method one gradually adds the ranked features in the order of their individual discrimination power and stops when the error rate stops decreasing In a top-down truncation method one starts with the complete set of features and progressively eliminates features while searching for the optimal performance point

Linear models Can we associate the importance of a feature to its weight? Careful with ranges and scaling. Normalization helps.

Nonlinearities and mutual relationships between features Measuring individual features in isolation will discard mutual relationships  selection can be suboptimal XOR function of two inputs E.g., to get a proper meal one needs to eat either a hamburger or a dessert but not both. The individual presence or absence of a hamburger (or of a dessert) in a menu will not be related to classifying a menu as correct or not.

Correlation coefficient Examples of data distributions and corresponding correlation values Pearson correlation coefficient: widely used measure of linear relationship between numeric variables. Y random variable associated with the output X i random variable associated with an input

Correlation coefficient (2) Examples of data distributions and corresponding correlation values

Correlation Ratio Correlation ratio is used to measure a relationship between a numeric input and a categorical output. significant  at least one outcome class where the feature’s average value is significantly different from the average on all classes Let L_y be the number of times that outcome y appears, so that one can partition the sample input vectors by their output: Inpuuts leading to output y

Correlation ratio (2) Average of the i-th feature within each output class: Overall average: Correlation ratio between the i-th feature and outcome:

Statistical hypothesis testing A statistical hypothesis test is a method of making statistical decisions by using experimental data. Hypothesis testing answers the question: Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as large as the value that was actually observed? Reject if prob. is too low. Statistically significant  unlikely to have occurred by chance.

Relationship between two categorical features Null hypothesis that the two events “occurrence of term t” and “document of class c” are independent, the expected value of the above counts for joint events are obtained by multiplying probabilities of individual events If the count deviates from the one expected for two independent events, one can conclude that the two events are dependent, and that therefore the feature is significant to predict the output. Check if the deviation is sufficiently large that it cannot happen by chance.

Chi-squared test Chi-squared statistic: where count c,t is the number of occurrences of the value t given the class c the best features are the ones with larger χ 2 values If independent

The uncertainty in an output distribution can be measured from its entropy: After knowing a specific input value x, the uncertainty in the output can decrease Mutual information (1): Entropy

The entropy of Y after knowing the i-th input feature value is The conditional entropy of variable Y is the expected value of H(Y|x i ) Mutual information (2): Conditional Entropy

Mutual Information (3) Mutual information between X i and Y : The amount by which the uncertainty decreases An equivalent expression which clarifies the symmetry between Xi and Y: Mutual Information captures arbitrary non- linear dependencies between variables

GIST Reducing the number of input attributes used by a model, while keeping roughly equivalent performance, has many advantages. It is difficult to rank individual features without considering the specific modeling method and their mutual relationships.

GIST 2 Trust the correlation coefficient only if you have reasons to suspect linear relationships Correlation ratio can be computed even if the outcome is not quantitative Use chi-square to identify possible dependencies between inputs and output Use mutual information to estimate arbitrary dependencies between qualitative or quantitative features