The Browser Evaluation Test A Proposal Pierre Wellner, Mike Flynn IDIAP, September 2003.

Slides:



Advertisements
Similar presentations
Tales from the digital classroom: engaging staff with digital stories to enhance learning and teaching Jak Radice School of Health Studies / CED Maureen.
Advertisements

IDEA Student Ratings of Instruction Update Carrie Ahern and Lynette Molstad Selected slides reproduced with permission of Dr. Amy Gross from The IDEA Center.
Supporting Teachers to make Overall Teacher Judgments The Consortium for Professional Learning.
Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.
On-line resources
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Seminar on Rural Sustainability - A North American Perspective Alex Mayer, Michigan Technological University.
Fact: We constantly employ the entire assessment cycle in our daily lives Determining our desired outcomes Designing an assessment methodology Collecting.
True/False. False True Subject May Go Here True / False ? Type correct answer here. Type incorrect answer here.
1 Statistical Inference Note: Only worry about pages 295 through 299 of Chapter 12.
PSYC512: Research Methods PSYC512: Research Methods Lecture 9 Brian P. Dyre University of Idaho.
Data Analysis Statistics. Inferential statistics.
1 Assessment and Evaluation for Online Courses Associate Professor Dr. Annabel Bhamani Kajornboon CULI’s 6 th Intl Conference: Facing.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
Digital literacy Critical thinking for our digital world 9 slides Running time = 15 min Optional ‘Teachers TV’ example.
Nonparametric and Resampling Statistics. Wilcoxon Rank-Sum Test To compare two independent samples Null is that the two populations are identical The.
1 ADVANCED MICROSOFT POWERPOINT Lesson 7 – Working with Visual and Sound Objects Microsoft Office 2003: Advanced.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Choosing Statistical Procedures
Using Multimedia on the Web
DIVA - University of Fribourg - Switzerland Seminar presentation, jan Lawrence Michel, MSc Student Portable Meeting Recorder.
HYPOTHESIS TESTING: A FORM OF STATISTICAL INFERENCE Mrs. Watkins AP Statistics Chapters 23,20,21.
The New Era of Business Meetings Making time between and during meetings more effective.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
Temporal Compression Of Speech: An Evaluation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 4, MAY 2008 Simon Tucker and Steve.
Statistics for the Behavioral Sciences Second Edition Chapter 11: The Independent-Samples t Test iClicker Questions Copyright © 2012 by Worth Publishers.
Conducting a User Study Human-Computer Interaction.
Testing & modeling users. The aims Describe how to do user testing. Discuss the differences between user testing, usability testing and research experiments.
After lecture 8, students will be able to: 1.Define provisional truth and explain the burden on the news consumer that results from truth’s provisional.
Producer 2003 By Mark White. Producer 2003 A add-on to PowerPoint 2003 Stand alone program Allows you to:  Create –audio and video  Edit  Synchronize.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
1 Results from Lab 0 Guessed values are biased towards the high side. Judgment sample means are biased toward the high side and are more variable.
Normal Distribution.
Human-Computer Interaction. Overview What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
SW318 Social Work Statistics Slide 1 Measure of Variability: Range (1) This question asks about the range, or minimum and maximum values of the variable.
Paper Evaluation Summary. Aims of the paper Determine whether personal support played a role in the uptake of the internet. Determine if this uptake can.
SCIENTIFIC METHOD: THE SPECIAL WAY IN WHICH A SCIENTIST GATHERS INFORMATON AND TESTS IDEAS. How scientists search for answers. A logical way of solving.
Assessment and Testing
Introduction to Podcasting for Nurse Educators March 03, 2008 Nursing 000 Terri L. Calderone, M.Ed., Ed.D.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Chapter 11 The t-Test for Two Related Samples
Ultranet for students Functionality Overview Learner Profile Learner Profile (Release 2) Personalised information on every student Progress Attendance.
 Can facial expressions tell if someone is lying or not?
WHITEBOARD PRACTICE FINDING THE MISSING ANGLE IN AN ANGLE PAIR.
Today’s lesson (Chapter 12) Paired experimental designs Paired t-test Confidence interval for E(W-Y)
{ What is the Scientific Method?. 6 steps  Identify the PROBLEM—this is what you the scientist wants to know and is ALWAYS a QUESTION  Would a plant.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Chapter 8 Introducing Inferential Statistics.
Generating data with enacted methods
Hypothesis Testing I The One-sample Case
Simon Tucker and Steve Whittaker University of Sheffield
Tasks processing Strong participation . Large results comparison on
Chapter 8: Hypothesis Testing and Inferential Statistics
Section 11.2: Carrying Out Significance Tests
Lesson Plan The BIG picture? Stickability!
Comparing Two Means: Paired Data
Ensuring Success through Assessment – Involve Students
An overview to this point
Psychology Life Hack of the Week
Question 4 of Evaluation
Comparing Two Means: Paired Data
Comparing Two Proportions
6-3 and 6-4 Quiz Review Pages , #24, 30, 32, 38, 39, 42, 45–46
Year 10 Biology Individual Project.
Presentation transcript:

The Browser Evaluation Test A Proposal Pierre Wellner, Mike Flynn IDIAP, September 2003

Ricoh “MuVIE”, Lee et al Video editing, key frames, transcript search, embedded web browser, slides, whiteboard, minutes, perspective & panoramic views, speaker location, visual & audio activity NOT TESTED ON HUMANS Microsoft “Distributed Meetings” Cutler et al Panoramic video, person-tracking, audio source localisation & beam- forming, speaker clustering & change, whiteboard camera, PC capture SUBJECTIVELY TESTED

The Problem No assessment, or... Assessed by unique scheme Often very subjective [from Cutler et al, “Distributed Meetings: A Meeting Capture and Broadcasting System”, ACM Multimedia, 2002] –“I was able to get the information I needed […]” –“I would use this system again if I had to miss a meeting.” –“I would recommend the use of this system to my peers.” No standard Browsing task → Objective comparison not possible ←

Aims of the BET Performance, not judgment Independent of experimenter perception Directly comparable numeric scores Replicable

The Browsing Task Find a maximum number of observations of interest in a minimum amount of time. But what is an “observation of interest”?

test sampling BET Overview observations answers observers playback system subjects media browser scoring scores meeting participants corpus recording system

People Participants Observers –Observer selection –Many diverse interests –Interesting for participants or absentees? Subjects –Subject selection

Data Corpus –Discussion, Presentation, Decision, Status… –Normal meetings, if possible –Reflect common distribution Observations –Pairs of statements, one true, one false

Tests & Scores Test: sample of observations Subjects must decide on truth –using the browser Score is correct minus incorrect answers Control scores established: –Educated guesses, no media –Same software as observers –Well-known basic applications

Illustration Corpus 20 ~40 minutes ≈ 13 hrs 20 mins of recordings Observations 60 observers 3 observers watch each 18 observation-pairs/hour 6  real-time ≈ 240 hours observation time 216 observation-pairs/meeting, or 4,320 observation-pairs total Testing 10 subjects each watch 8 meetings, in 2 hours 40 mins per subject 4 subjects watch each meeting, 26 hours 40 mins total subject time 1 answer per minute, 160 answers/subject ≈ 1,600 answers total Significance Assume: binomial distribution of results, 90% answered correctly Confidence interval: 88.2% to 91.6%, with 95% confidence level

Summary Performance, not judgment –Subjects are measured in performance of tasks Independent of experimenter perception –Observers indirectly decide the tasks Directly comparable numeric scores –Standard methods, standard scores Replicable –Publicly accessible Web-site –All media available for download –Tests and scoring on-line

Questions…? Is this a good method? Do you recognise the problem? Would you use this method? Do you have a browser to test? Do you know of an existing MM corpus? …