How to remove an out layer tester Lucjan Janowski Faculty of Electrical Engineering, Automatics, Computer Science and Electronics Department of Telecommunications.

Slides:

Advertisements

Similar presentations

Overview of Inferential Statistics

Advertisements

STC1204 Mid Term Public Speaking Preparation 5 questions Randomly Answer 1 question Date: 19 th April 2010 Monday Time: 4.15 – 6.00pm.

Design of Experiments Lecture I

CHAPTER 23: Two Categorical Variables: The Chi-Square Test

Introduction to Hypothesis Testing

Evolutionary Computational Intelligence Lecture 10a: Surrogate Assisted Ferrante Neri University of Jyväskylä.

Introduction to Decision Analysis

Chapter 12 Simple Regression

Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.

Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~

4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.

Using Statistics in Research Psych 231: Research Methods in Psychology.

On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.

Getting Started with Hypothesis Testing The Single Sample.

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.

INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.

Maths Study Centre CB Open 11am – 5pm Semester Weekdays

Sampling Distributions and Hypothesis Testing. 2 Major Points An example An example Sampling distribution Sampling distribution Hypothesis testing Hypothesis.

Gaussian process modelling

1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.

Generalized Linear Model Lucjan Janowski Faculty of Electrical Engineering, Automatics, Computer Science and Electronics Department of Telecommunications.

P Values Robin Beaumont 10/10/2011 With much help from Professor Chris Wilds material University of Auckland.

RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.

CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.

Accounting 3020 Chapter 5 – Cost Behavior: Analysis and Use.

Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.

Geo597 Geostatistics Ch9 Random Function Models.

ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.

Operational Definitions In our last class, we discussed (a) what it means to quantify psychological variables and (b) the different scales of measurement.

February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.

Chapter 4: Variability. Variability Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together.

Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.

UKNARIC conference Understanding IELTS scores

P Values Robin Beaumont 8/2/2012 With much help from Professor Chris Wilds material University of Auckland.

FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.

P Values - part 2 Samples & Populations Robin Beaumont 2011 With much help from Professor Chris Wilds material University of Auckland.

 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.

Armando Teixeira-Pinto AcademyHealth, Orlando ‘07 Analysis of Non-commensurate Outcomes.

2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)

Quality Assessment Recognition Tasks (QART) – Recent Results Mikołaj Leszczuk, Lucjan Janowski, Łukasz Dudek, Sergio Garcia AGH – University of Science.

Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.

The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.

ChE 551 Lecture 04 Statistical Tests Of Rate Equations 1.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Multiple Regression.

26134 Business Statistics Week 5 Tutorial

The Beta Reputation System

Sampling Distributions and Hypothesis Testing

Item Analysis: Classical and Beyond

Chapter 11 Simple Regression

PPA 501 – Analytical Methods in Administration

Cost Estimation Chapter 5

Multiple Regression.

Ice Investigation with PPC

Discrete Event Simulation - 4

Knowledge Tracing Parameters can be learned with the EM algorithm!

Quantitative Methods in HPELS HPELS 6210

Significance Tests: The Basics

Introduction to Predictive Modeling

Chapter 4, Regression Diagnostics Detection of Model Violation

Inferential Statistics

Psych 231: Research Methods in Psychology

Psych 231: Research Methods in Psychology

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond

Testing Causal Hypotheses

Presentation transcript:

How to remove an out layer tester Lucjan Janowski Faculty of Electrical Engineering, Automatics, Computer Science and Electronics Department of Telecommunications

Agenda Can a tester be an out layer? The detecting philosophy Latent variables Rasch model WinSteps The final decision Conclusion 2008 I

Can a tester be an out layer? 2008 I

What would we like to model? Why do we use testers? A tester represents human perception that is difficult to model People are different and so are our users/clients. Our goal is to take such difference into account Some of us are critical and others are uncritical A tester can be tired or not focused enough and therefore his/her answer can be random 2008 I

A tired tester problem A user can be tired too. Should we remove all tired testers? Can a tester score randomly? What are the consequences? Note that detecting that a tester scores a picture differently than the average score does not mean that it is a random tester We have to be very careful with testers removal since our goal is to build a model of the average user not the proper user 2008 I

Why are some scores different? Different effects can affect tester’s judgement differently (e.g. motion intensity, color, etc.) Testers have different experience (e.g. watching mainly youtube or films on a DVD set) Each of us is more or less critic to anything that he/she judges The words describing the opinion scale can be understood differently (in Poland OK is good in England OK is fair) 2008 I

What can we do? We have to detect random scores A tester that scores randomly often should be removed from the model building An answer that differs from the average score is not necessarily a random one therefore we have to consider the average score but corrected by a tester individualism We need a mathematic model of a user behavior that takes into account those properties 2008 I

Latent variable OS This is what a tester sees Any distortion that influences QoE 2008 I

Latent variable OS Latent variable This is what a tester sees Any distortion that influences QoE 2008 I

Latent variable manifestation 2008 I

An example 2008 I Tester ID Video ID (increasing distortion)

Non extreme values testers 2008 I Tester ID Video ID (increasing distortion)

Wide range for 10 and I Tester ID Video ID (increasing distortion)

Critical tester 2008 I Tester ID Video ID (increasing distortion)

Are the answers random? 2008 I Tester ID Video ID (increasing distortion)

Rasch model We assume that a latent variable is the variable that is really scored by testers We assume that the opinion score probability is a logit function of the model parameters The function has parameters describing: –a tester “criticism” factor –a film/picture/… quality –an average threshold value for particular score 2008 I

Rasch model equation n the tester number i the object number (what is scored) x the opinion score value (1-5, 0-10, …) 2008 I

2008 I Tester ID Video ID (increasing distortion)

2008 I Tester ID Video ID (increasing distortion)

Rasch model We assume that Rasch model is correct and the data that do not fit this model are incorrect [sic] Note that without any assumption we are not able to detect randomly scoring testers 2008 I Data Model values Observed values

OMS (Outfit Mean Square) Knowing the model probability and the user answer we can estimate how far is a tester from the model A tester’s accuracy or quality is based on the OMS (Outfit Mean Square) Rasch model can be computed by WinSteps software ( The OMS can be interpreted on the basis of heuristically obtained ranges 2008 I

Results interpretation 2008 I A tes ter is not rel ev ant an d he/ sh e sh oul d be re mo ve d 2<OMS We should be suspicious 1.5<OMS<2 Correct tester 0.5<OMS<1.5 A tester fits the model too well OMS<0.5

An example results 2008 I Tester ID Video ID (increasing distortion) OMS

Rasch model disadvantages It is more accurate for more data. It is difficult to have lots of results since the tests are expensive Not all type of correct testers’ behavior can be modeled The algorithms are not implemented in Matlab therefore it is difficult to implement it in an automatic analysis made in Matlab 2008 I

Conclusion A tester’s answers make it possible to model human perception but not all his/her answers are correct Out layers should be removed Rasch model helps to detect not relevant testers The final decision should be checked since not all correct behaviors can be modeled by Rasch model 2008 I

2008 I