NORTEL NETWORKS CONFIDENTIAL 2 Evaluaton of Objective Quality Estimators: Methods used with Voice Models & Implications for Video Testing Leigh Thorpe.

Slides:

Advertisements

Similar presentations

Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.

Advertisements

Regression, Correlation. Research Theoretical empirical Usually combination of the two.

Chapter 4 The Relation between Two Variables

Chapter 8 Linear Regression © 2010 Pearson Education 1.

Central Limit Theorem.

Correlation CJ 526 Statistical Analysis in Criminal Justice.

Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.

LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.

LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.

QUANTITATIVE DATA ANALYSIS

Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.

Correlation 2 Computations, and the best fitting line.

Basic Statistical Concepts Psych 231: Research Methods in Psychology.

Basic Statistical Concepts

Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.

Chapter 7 Correlational Research Gay, Mills, and Airasian

Correlation and Regression Analysis

Leon-Guerrero and Frankfort-Nachmias,

Relationships Among Variables

8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.

Measures of Central Tendency

Chapter 8: Bivariate Regression and Correlation

Joint Distributions AND CORRELATION Coefficients (Part 3)

Relationship of two variables

Scatterplots, Association, and Correlation Copyright © 2010, 2007, 2004 Pearson Education, Inc.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.

Correlation and regression 1: Correlation Coefficient

Data Collection & Processing Hand Grip Strength P textbook.

STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.

September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.

© Copyright McGraw-Hill CHAPTER 3 Data Description.

Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.

Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.

Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.

UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.

METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.

VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,

Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.

Examining Relationships in Quantitative Research

Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.

Descriptive Statistics Descriptive Statistics describe a set of data.

Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Stat 13, Tue 5/29/ Drawing the reg. line. 2. Making predictions. 3. Interpreting b and r. 4. RMS residual. 5. r Residual plots. Final exam.

11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.

3.2: Linear Correlation Measure the strength of a linear relationship between two variables. As x increases, no definite shift in y: no correlation. As.

Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.

Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.

Correlation & Regression Analysis

Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8

Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.

ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.

Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.

Week 2 Normal Distributions, Scatter Plots, Regression and Random.

Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.

Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.

Stats Methods at IC Lecture 3: Regression.

Chapter 13 Simple Linear Regression

Chapter 12 Understanding Research Results: Description and Correlation

Inference for Least Squares Lines

CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.

POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.

Understanding Research Results: Description and Correlation

Attention Narrows Position Tuning of Population Responses in V1

Least-Squares Regression

Using statistics to evaluate your test Gerard Seinhorst

Least-Squares Regression

MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.

Presentation transcript:

NORTEL NETWORKS CONFIDENTIAL 2 Evaluaton of Objective Quality Estimators: Methods used with Voice Models & Implications for Video Testing Leigh Thorpe Nortel CTO Services Group VQEG Ottawa Meeting, Sept 10-14, 2007

Nortel Confidential 3 Overview Evaluation goals and analysis approach Database characteristics Subjective results: internal consistency Specific characteristics of measurement: resolution & performance on specific types of impairment

Nortel Confidential 4 Evaluation of Measurement Models Want to understand how well the model predicts quality as rated by users Need to assess performance against an evaluation database 1. How close are the predictions for a set of test cases to the subjective ratings for those same cases? 2. Does the model differentiate neighbouring points in the correct direction? Interested in three aspects of performance: Accuracy: is the model good at predicting the subjective rating Resolution/Monotonicity:

Nortel Confidential 5 Three methods of analysis (1) Graphical: scatterplot and regression line plot subjective scores on x-axis, objective measure on y-axis. the spread of dots shows visually how closely the variables track each other and how close their relationship is to the ideal (the main diagonal) by inspection can see how subgroups behave compared to overall performance. (2) The correlation coefficient r, the Pearson Product-Moment Correlation, measures strength of linear relationship, the tendency for two variables to increase or decrease together does not indicate how close the values of the two variables are perfect correlation gives r = 1 or −1; no relationship gives r = 0 the measurement units for the two variables may be same or different the number of points and the dynamic range of the variables (difference from highest to lowest) will each affect the value of the correlation coefficent (3) The Standard Error of Estimates (SEE) a measure of deviation of the dependent variable from its regression line can compute a score for subsets of the conditions tested SEE is a measure of deviation: smaller is better. The closer the points are to the line (the better the prediction), the smaller the SEE value. SEE is a measure of dispersion similar to standard deviation, and behaves like standard deviation

Nortel Confidential 6 Performance on subgroups of points What correlation tells us Computing the correlation coefficient for a subgroup can mislead us about how the subgroup relates to the overall group. The red points show a different relationship between the variables than is seen for the overall group. The correlation for those points tells us about their relationship to each other, but not to the rest of the data. r = 0.83 r = 0.94 * * * * * * * * * * * * * * * * * * * * * * * * *

Nortel Confidential 7 What SEE tells us * * * * * * * * * * * * Analogous to a standard deviation, SEE is the square root of the average of squared deviations. It is the RMS deviation from the regression line for a given set of points. It can be calculated for any set of points with sufficient n, say n ≥ 6. Compare two groups of points: SEE is smaller for the yellow deviations than for the red deviations. SEE is in the same units as the variable for which it captures the variation. For this example, SEE has the units of y.

Nortel Confidential 8 Evaluation Samples: The “Database” The evaluation database consists of: a number of samples of the signal of interest a mean subjective rating for each sample Ideally, the database should contain samples (test cases) covering the full range of types and levels of impairments that the model will encounter in usage conditions. single database: all subjects have rated all test cases where multiple databases are used, there should be sufficient common test cases across the databases to show whether the subjective ratings line up

Nortel Confidential 9 Criteria used for new Voice Qual Database Cover a broad range of impairment types and levels different types of codecs, range of packet loss, background noise (for these cases, noise is in the reference) combinations of these: coding, noise, packet loss, tandeming Two languages: English, French Multiple talkers eight---four per langage Include conditions that will challenge candidate methods time warping (temporal shift) and noise reduction A large number of judgments to obtain stable scores We used n = 60 for each sample

Nortel Confidential 10 Effect of Truncating Quality Range r = 0.53 r = 0.85 This small range database is simulated from the above by restricting the range of subjective values. Care was taken in the simulation to keep the number of points about the same.) The range restriction reduced the correlation coefficient from 0.85 to

Nortel Confidential 11 Database details Languages tested separately; listeners were native speakers of language heard Samples 6 – 8 sec duration each made up of two unrelated sentences from same talker Four talkers per language; talkers crossed with conditions 1304 samples (326 x 4) Test room ambient noise low Presented at nominal telephone listening volume Too many samples to complete in one session: samples were divided across four test sessions each session included one instance of each condition the four talkers were represented equally in all sessions therefore, every listener heard every test case, but not always with the same talker

Nortel Confidential 12 Internal Consistency of Database: English r = English samples. This is the upper limit of performance that can be detected with this database. One half Other half English Database: Internal Consistency (Per condition means, arbitrary split) R = The variability of these samples indicates a resolution of about 0.25 MOS, as would be expected for n = 30 (ie, half).

Nortel Confidential 13 Internal Consistency of Database: French r = French samples R = 0.995

Nortel Confidential 14 Correlation Coefficient (r) by Algorithm Averaged* Merged French English Model DModel CModel BModel A Subj Data This is the correlation for French and English scores averaged together, not the average of the correlation coefficients!

Nortel Confidential 15 Results for Model A r = 0.93 The spread of these points shows that Model A can resolve subjective quality to no better than about 0.5 MOS.

Nortel Confidential 16 Results for Model C r = 0.84 This model shows a tendency to compress the range of its output score, relative to the subjective scores. There are a number of outliers in the lower left quadrant. The mid-range resolution is about 3/4 MOS.

Nortel Confidential 17 Example: data plotted by subgroup

Nortel Confidential 18 Example of results for subgroups SEE* values Overall Noise Reduction Noise + Packet Loss Noise Temporal Clipping Constrained Bursty PL Bursty Packet Loss Constrained Random PL Random Packet Loss Codecs MNRU Model DModel CModel BModel A Combined * based on means across languages

Nortel Confidential 19 What can we learn from the voice metric testing that can assist in evaluation of video metrics? 1. Ensure the use of a range of quality in the subjective test samples (next slide). this can affect the correlation observed 2. Include all the impairments you are going to want to assess with the model, or that may be encountered in signals that pass through networks. 3. Within reason, any subjective metric can be used, as long as it is sufficiently sensitive to the variation in quality over the range used. It doesn’t need to be MOS. 4. Collect data from as many viewers as practicable n> 30 if possible 5. Examine internal consistency of subjective ratings 6. Examine performance of the models on subgroups within the data select a statistic that provides an unbiased result. (r is not unbiased in this application). SEE statistic provides credible alternative 7. Examine resolution and monotonicity quantitative metrics??

Nortel Confidential 21 Interpretating regression and correlation * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Weak relationship: the points fall far from the line, and the cloud of points is about as long as it is wide. It looks as though a line on any direction would be as good. Strong relationship and the line is very similar to the diagonal: on average, the objective measure is closely tracking subjective score. For MOS prediction, this is the most desireable result. Strong relationship, but the line is canted relative to the diagonal: the objective measure is using a smaller range than the subjective score. Note: the value of the correlation coefficient does not indicate whether the line tracks the diagonal. Deviation from linear: the objective measure follows the diagonal for the lower portion, but underestimates the quality of the conditions in the upper range. We can compute a regression line, but it will not account for the non-linearity. We could compute a best fit curve, but there is no “correlation” statistic to indicate the strength of a non-linear relationship. * * * * * * * * * * * * * *

Nortel Confidential 22 Working with correlation (1) Correlation coefficients cannot be averaged. Why not? * * * * * * * * * * * * r = 0.94 * * * * * * * * * * * * r = 0.92 * * * * * * * * * * * * * * * * * * * * * * * * r = 0.93 Correlation is not a linear process, and so the correlations cannot be treated with linear operations (like averaging). Database ADatabase B Databases A & B Merged r = 0.65

Nortel Confidential 23 Nortel Database Summary of Impairment Conditions 326 cases x 4 talkers x 2 languages = 2608 test samples in the database 326Total good and poor noise reduction algorithm48Noise Reduction 2%, 4%, random & bursty54Noise + Packet Loss 20, 10, 0 dB SNR, Hoth, car, babble, street 33Noise ms clip, +/-80 ms shift, 120 ms mute 21Temporal Clipping same speech & mask for each codec22Constrained Bursty PL 1% - 10% PL, 10, 20, 30 ms packets54Bursty Packet Loss same speech & mask for each codec22Constrained Random PL 1% - 10% PL, 10, 20, 30 ms packets54Random Packet Loss G.711, G.729, AMR, tandem7Codecs High quality only2Clean Range of Quality No. of Cases Category dBQ7MNRU

Nortel Confidential 24 Results for Model A by subgroup English