Protecting Users Pavel Dmitriev, Microsoft Analysis & Experimentation

Slides:



Advertisements
Similar presentations
Operating System Support Focus on Architecture
Advertisements

Chapter 8 Operating System Support
Computer Organization and Architecture
There is a certain way that an HTML file should be set up. The HTML section declares a beginning and an ending. Within the HTML, there should be a HEAD.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
Building Highly Available Systems with SQL Server™ 2005 Vineet Gupta Evangelist – Data and Integration Microsoft Corp.
11 SYSTEM PERFORMANCE IN WINDOWS XP Chapter 12. Chapter 12: System Performance in Windows XP2 SYSTEM PERFORMANCE IN WINDOWS XP  Optimize Microsoft Windows.
GROUP 1 Advantages of electronic system over its corresponding traditional system.
TEMPDB Capacity Planning. Indexing Advantages – Increases performance – SQL server do not have to search all the rows. – Performance, Concurrency, Required.
The EigenTrust Algorithm for Reputation Management in P2P Networks
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
A Data/Detector Characterization Pipeline (What is it and why we need one) Soumya D. Mohanty AEI January 18, 2001 Outline of the talk Functions of a Pipeline.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
CHAPTER 7 Unexpected Input. INTRODUCTION What is Unexpected Input? Something (normally user-supplied data) that is unexpected happen to an application.
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Reference Documents4.6fi_12.1 Chapter 12 Reference Documents During the normal course of business, repetitive periodic accounting transactions must be.
Computer Security: Chapter 5 Operating Systems Security.
Boot Engineering Extension Record (B.E.E.R.) By Curtis E. Stevens.
Core LIMS Training: Entering Experimental Data – Simple Data Entry.
Hair Products for Baby: Importance. Babies as we all know are very delicate and that is why you will find most parents protecting their babies from certain.
SOFTWARE TESTING TRAINING TOOLS SUPPORT FOR SOFTWARE TESTING Chapter 6 immaculateres 1.
Lesson 13 PROTECTING AND SHARING DOCUMENTS
The Palantir Platform… …Changes in 2.3
CACheck: Detecting and Repairing Cell Arrays in Spreadsheets
Chapter 3: I Need a Tour Guide (Introduction to Visual Basic 2012)
User Interface Evaluation
Mobile Testing – Survival Knowledge – Part V
Chapter 1 The Scientific Method.
Configuring ALSMS Remote Navigation
Course Developer/Writer: A. J. Ikuomola
Parallel Autonomous Cyber Systems Monitoring and Protection
N-Tier Architecture.
Better Customer Service Ideas
Software project mgt. session # 3– lab manual.
AI Powered ADS A STEP BY STEP GUIDE TO EXTREME PERSONALIZATION
Designing For Testability
BASIC INFORMATION ABOUT DATABASE MANAGEMENT SOFTWARE
MCTS Guide to Microsoft Windows 7
Networks and Operating Systems: Exercise Session 2
How will execution time grow with SIZE?
GO! with Microsoft Office 2016
William Stallings Computer Organization and Architecture
Analysis Operations Monitoring Requirements Stefano Belforte
Good and Bad Data Visualizations
Lesson 13 PROTECTING AND SHARING DOCUMENTS
The Benefits of Online Controlled Experimentation at Scale
Auditing in SQL Server 2008 DBA-364-M
Displaying Form Validation Info
Mock Object Creation for Test Factoring
Experimental Design.
Experimental Design.
CS240: Advanced Programming Concepts
Chapter 7: Single Factor Designs.
Allocating IP Addressing by Using Dynamic Host Configuration Protocol
Chapter 12 Power Analysis.
Lecture 20: Intro to Transactions & Logging II
Topic 5: Communication and the Internet
Protecting Users Pavel Dmitriev, Microsoft Analysis & Experimentation
To Err is Human Owen Brennan.
Topological Signatures For Fast Mobility Analysis
Efficient QoS for secondary users in cognitive radio systems
Correct Function.
Camera shots- Long shots
The Science of Success: Building Faith in a Data Warehouse
Experimentation Challenges
Lesson 5 Working with Style and Design Elements
Smart companies carefully track their investments in every part of their business. By carefully monitoring and managing their return on investment (ROI)
Presentation transcript:

Protecting Users Pavel Dmitriev, Microsoft Analysis & Experimentation Current: 14:00 Pavel Dmitriev, Microsoft Analysis & Experimentation

The Challenge As more and more experiments are run, possibility of user harm increases Less manual monitoring of experiments Buggy feature or a bad idea may make it to real users Interactions are possible between concurrently running experiments Experimentation system itself may have issues and can hurt users (link) Need to minimize harm to users! If you have to kiss a lot of frogs to find a prince, find more frogs and kiss them faster and faster -- Mike Moran, Do it Wrong Quickly 1:30

Fast Auto-Detection and Shutdown of Bad Experiments Experimentation system needs to Automatically analyze scorecards Detect bad experiments Send alerts to experimenters In cases of extreme badness, shut down the experiment automatically The challenge is doing it fast (seconds to minutes) Requires a real-time data pipeline Data in the beginning of experiment may be noisy Small amount of data Easily dominated by a few very active users or bots Aggregating from event-level to user-level helps reduce false positives Exp system needs to not just report out the results, but be able to analyze and take action based on the results. Ex: bot that generated 100 queries in the first 30 seconds 2:00

Starting Small Start with a small percentage, e.g. 0.5% This should be enough to detect outrageously bad experiments Once verified that things look Ok, can [automatically] ramp up Run experiment with partial exposure E.g. only on 1 out of 10 queries in the treatment actually gets served treatment experience Once verified that things look Ok, ramp up the exposure to 100% The advantage is that no single user can be stuck in a bad experience for a long time The disadvantage is inconsistent user experience, and dilution to user-level metrics 1:45

Prevent and Detect Interactions Interaction happens if an effect of exposing users to several experiments at the same time is not the same as adding up the effects of individual experiments “The whole is greater than the sum of its parts” -Aristotle “I told about the whole being greater than the sum of its parts. It's that way with people, too, he said, only with people it's sometimes that the whole is less than the sum of the parts.” -Wendelin Van Draanen, Flipped 1:00

Prevent and Detect Interactions Interaction happens if an effect of exposing users to several experiments at the same time is not the same as adding up the effects of individual experiments Example (antagonistic): E1 changes font to blue, E2 changes background color to blue Example (synergetic): E1,…,EN make page header more convenient on page1,…,pageN Our experience is that, when prevention setup is in place, interactions are rare About a dozen interactions per year in Bing, with over 10,000 experiments run 1:30

Interaction Prevention When suspecting an interaction: Run experiments sequentially (slow) Run non-overlapping experiments (two experiments need to use the same hash seed, and get assigned to different portions of the hash space) user (uid1) hash1 space hash2 space E1 E2 f(hash1,uid1) f(hash2,uid1) user (uid1) hash3 space E1 E2 f(hash3,uid1) Non-overlapping Experiments 2:30 Overlapping Experiments

Interaction Detection Given two overlapping experiments: E1(T1,C1), E2(T2,C2) and a metric M, there’s an interaction if the results for M in E1 are stat. sig. different in the segment of users who are in T2 compared to the segment of users who are in C2. Note: need to run it for all pairs of overlapping experiments. Complexity= O(#metrics*#experiments^2) Need to control for type I errors (e.g. Bonferroni Correction) T2 2:00 T1 C1 (T1-C1 | T2) =?= (T1-C1 | C2) C2 All E1 Users

Summary Automated methods to protect users are required as the number of experiments ramps-up Experimentation system should not just report out the results, but should auto-analyze them and take action: Send alerts Auto-shutdown bad experiments Start small and ramp-up Prevent and detect interactions 0:45