Real World Interactive Learning

Slides:



Advertisements
Similar presentations
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Advertisements

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Identity Management - Login © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Feature: Reprint Outstanding Transactions Report © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Feature: Purchase Requisitions - Requester © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Feature: Payroll and HR Enhancements © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
Interactivity Navigating a data model Working with large quantities of data Entry Editing and adding data User feedback and validation Presentation.
Co- location Mass Market Managed Hosting ISV Hosting.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Windows 7 Training Microsoft Confidential. Windows ® 7 Compatibility Version Checking.
Feature: Purchase Order Prepayments II © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Feature: OLE Notes Migration Utility
Feature: Web Client Keyboard Shortcuts © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Feature: SmartList Usability Enhancements © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Session 1.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
 Rico Mariani Architect Microsoft Corporation.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Assign an Item to Multiple Sites © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Print Remaining Documents © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
Connect with life Connect with life
NEXT: Overview – Sharing skills & code.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Document Attachment –Replace OLE Notes © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Feature: Suggested Item Enhancements – Sales Script and Additional Information © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows.
Feature: Customer Combiner and Modifier © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Feature: Employee Self Service Timecard Entry © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Ian Ellison-Taylor General Manager Microsoft Corporation PC27.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
demo Instance AInstance B Read “7” Write “8”

customer.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
Feature: Void Historical/Open Transaction Updates © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
Feature: Suggested Item Enhancements – Analysis and Assignment © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and.
projekt202 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
The CLR CoreCLRCoreCLR © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

MIX 09 4/17/2018 4:41 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Возможности Excel 2010, о которых следует знать
Title of Presentation 11/22/2018 3:34 PM
Baseline: How Are We Doing Now?
Office Mac /30/2018 © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Title of Presentation 12/2/2018 3:48 PM
1/3/2019 1:21 PM © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
28 days.
Building SaaS Solutions on Windows Azure
Feature: Document Attachment - Flow from Master Records
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
Feature: Multi-user Editing Allowed in RMA Entry
WINDOWS AZURE A LAP AROUND PLATFORM THE Steve Marx
PENSACOLA ENERGY WORK PLAN OCTOBER 10, 2016
Title of Presentation 5/12/ :53 PM
Шитманов Дархан Қаражанұлы Тарих пәнінің
Title of Presentation 5/24/2019 1:26 PM
5/24/2019 6:44 PM 1/8/18 Bell #10 In a world governed by the gods, is there any room for human will? Do human choices make a difference? EXPLAIN © 2007.
日本初公開!? Vista の新機能を実演 とっちゃん わんくま同盟 7/23/2019 9:09 AM
Title of Presentation 7/24/2019 8:53 PM
Presentation transcript:

Real World Interactive Learning Alekh Agarwal John Langford ICML Tutorial, August 6 Slides and full references at http://hunch.net/~rwil

Accurate digit classifier 1/23/2018 The Supervised Learning Paradigm Training examples 1 5 4 3 7 9 6 2 Accurate digit classifier ML takes a different approach. By applying concepts from a range of fields, including statistics, probability theory, and so on, we can build an ML system that “learns” to recognize handwritten digits by being trained on thousands, or even millions, of examples. So, in order to get an accurate digit classifier, all we need to do is acquire lots of training examples, with labels (this is called “labeled training data”), and then feed it into the ML system. Training labels 2 Supervised Learner © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Supervised Learning is cool

How about news? Repeatedly: Observe features of user+articles Choose a news article. Observe click-or-not Goal: Maximize fraction of clicks Add Wellness

A standard pipeline Collect (𝑢𝑠𝑒𝑟, 𝑎𝑟𝑡𝑖𝑐𝑙𝑒, 𝑐𝑙𝑖𝑐𝑘) information. Build 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠(𝑢𝑠𝑒𝑟,𝑎𝑟𝑡𝑖𝑐𝑙𝑒) Learn 𝑃 𝑐𝑙𝑖𝑐𝑘 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠(𝑢𝑠𝑒𝑟,𝑎𝑟𝑡𝑖𝑐𝑙𝑒)) Act: arg max 𝑎𝑟𝑡𝑖𝑐𝑙𝑒𝑠 𝑃 (𝑐𝑙𝑖𝑐𝑘|𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠(𝑢𝑠𝑒𝑟,𝑎𝑟𝑡𝑖𝑐𝑙𝑒)) Deploy in A/B test for 2 weeks A/B test fails  Why?

A: Need Right Signal for Right Answer Q: What goes wrong? Is Ukraine interesting to John ? A: Need Right Signal for Right Answer

Q: What goes wrong? Model value over time A: The world changes!

Interactive Learning Learning Action (selected news story) Features (user history, news stories) Consequence (click-or-not)

Q: Why Interactive Learning? Use free interaction data rather than expensive labels

Q: Why Interactive Learning? AI: A function programmed with data AI: A function programmed with data AI: An economically viable digital agent that explores, learns, and acts

Flavors of Interactive Learning Full Reinforcement Learning: Special Domains +Right Signal, -Nonstationary Bad, -$$$ +AI Contextual Bandits: Immediate Reward RL ±Rightish Signal, +Nonstationary ok, +$$$, +AI Active Learning: Choose examples to label. -Wrong Signal, -Nonstationary bad, +$$$, -”not AI” Revisit after motivations

Ex: Which advice? Repeatedly: Observe features of user+advice Choose an advice. Observe steps walked Goal: Healthy behaviors

Other Real-world Applications News Rec: [LCLS ‘10] Ad Choice: [BPQCCPRSS ‘12] Ad Format: [TRSA ‘13] Education: [MLLBP ‘14] Music Rec: [WWHW ‘14] Robotics: [PG ‘16] Wellness/Health: [ZKZ ’09, SLLSPM ’11, NSTWCSM ’14, PGCRRH ’14, NHS ’15, KHSBATM ‘15, HFKMTY ’16]

Take-aways Good fit for many real problems

Outline Algs & Theory Overview Things that go wrong in practice Evaluate? Learn? Explore? Things that go wrong in practice Systems for going right Really doing it in practice

Contextual Bandits Repeatedly: Observe features 𝑥 Choose action 𝑎∈𝐴 Observe reward 𝑟 Goal: Maximize expected reward

Policies Policy maps features to actions. Policy = Classifier that acts.

Exploration Policy

Exploration Randomization Policy

Exploration Randomization Policy

Inverse Propensity Score(IPS) [HT ‘52] Given experience 𝑥,𝑎,𝑝,𝑟 and a policy 𝜋:𝑥→𝑎, how good is 𝜋? 𝑉 IPS 𝜋 = 1 𝑛 (𝑥,𝑎,𝑝,𝑟) 𝑟𝐼(𝜋 𝑥 =𝑎) 𝑝 Propensity Score

What do we know about IPS? Theorem: For all 𝜋, for all 𝐷(𝑥, 𝑟 ) E 𝑟 𝜋 𝑥 =E 𝑉 IPS 𝜋 = E 1 𝑛 𝑥,𝑎,𝑝,𝑟 𝑟𝐼 𝜋 𝑥 =𝑎 𝑝 Proof: For all (𝑥, 𝑟 ), 𝐸 𝑎∼ 𝑝 𝑟 𝑎 𝐼(𝜋 𝑥 =𝑎) 𝑝 𝑎 = 𝑎 𝑝 𝑎 𝑟 𝑎 𝐼(𝜋 𝑥 =𝑎) 𝑝 𝑎 = 𝑟 𝜋(𝑥)

Reward over time System’s actual online performance Offline estimate of system’s performance Add reference slide for evaluation Offline estimate of baseline’s performance

Better Evaluation Techniques Double Robust: [DLL ‘11] Weighted IPS: [K ’92, SJ ‘15] Clipping: [BL ’08]

Learning from Exploration [‘Z 03] Given Data 𝑥, 𝑎, 𝑝,𝑟 how to maximize E[ 𝑟 𝜋 𝑥 ]? Maximize E 𝑉 IPS 𝜋 instead! 𝑟 𝑎 = 𝑟/𝑝 if 𝜋 𝑥 =𝑎 0 otherwise Equivalent to: 𝑟 𝑎 ′ = 1 if 𝜋 𝑥 =𝑎 0 otherwise with importance weight 𝒓 𝒑 Importance weighted multiclass classification!

Vowpal Wabbit: Online/Fast learning BSD License, 10 year project Mailing List>500, Github>1K forks, >4K stars, >1K issues, >100 contributors Command Line/C++/C#/Python/Java/AzureML/Daemon

VW for Contextual Bandit Learning echo “1:2:0.5 | here are some features” | vw --cb 2 Format: <action>:<loss>:<probability> | features… Training on a large dataset: vw --cb 2 rcv1.cb.gz --ngram 2 --skips 4 -b 24 Result: 0.048616

Better Learning from Exploration Data Policy Gradient: [W ‘92] Offset Tree: [BL ’09] Double Robust for learning: [DLL ‘11] Multitask Regression: Unpublished, but in Vowpal Wabbit Weighted IPS for learning: [SJ ‘15]

Evaluating Online Learning Problem: How do you evaluate an online learning algorithm Offline? Answer: Use Progressive Validation [BKL ’99, CCG ‘04] Theorem: 1) Expected PV value = Uniform expected policy value. 2) Trust like a test set error.

How do you do Exploration? Simplest Algorithm: 𝜖-greedy. With probability 𝜖 act uniform random With probability 1−𝜖 act greedily

Better Exploration Algorithms Better algorithms maintain ensemble and explore amongst actions of this ensemble. Thompson Sampling: [T ‘33] EXP4: [ACFS ‘02] Epoch Greedy: [LZ ‘07] Polytime: [DHKKLRZ ‘11] Cover&Bag: [AHKLLS ‘14] Bootstrap: [EK ‘14]

Evaluating Exploration Algorithms Problem: How do you take the choice of examples acquired by an exploration algorithm into account? Answer: Rejection Sample from history. [DELL ‘12] Theorem: Realized history is unbiased up to length observed. Better versions: [DELL ‘14] & VW code

More Details! NIPS tutorial: http://hunch.net/~jl/interact.pdf John’s Spring 2017 Cornell Tech class (http://hunch.net/~mltf) with slides and recordings [Forthcoming] Alekh’s Fall 2017 Columbia class with extensive class notes.

Take-aways Good fit for many problems Fundamental questions have useful answers