Midterm Exam Review Notes William Cohen 1. General hints in studying Understand what you’ve done and why – There will be questions that test your understanding.

Slides:

Advertisements

Similar presentations

How to Succeed in Mathematics WOU Mathematics Department

Advertisements

The Seminar Seminar Every Talk is a Job Talk. Two Types of Seminar Conference Presentation Short time (15-20 min) Narrow Audience (usually) No Audience.

Exams and Revision Some hints and tips.

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.

Understanding the Writing Process: The Seven Steps

Spark: Cluster Computing with Working Sets

1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 21 Instructor: Paul Beame.

Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.

Midterm Review CS 4705 Julia Hirschberg.

A-1 © 2000 UW CSE University of Washington Computer Programming I Lecture 1: Overview and Welcome Dr. Martin Dickey University of Washington.

Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.

Map-Reduce With Hadoop. Announcement - 1 Assignment 1B: Autolab is not secure and assignments aren’t designed for adversarial interactions Our policy:

Sebastian Schelter, Venu Satuluri, Reza Zadeh

Welcome to the wonderful world of……. . A Quick & Easy Guide.  What IS ?  A quick, easy and convenient way to send a letter to friends, family.

1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.

Study Guide for Final Exam What Smart Students Know.

All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.

Design IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, October 15, 2013 Carolyn Seaman University of Maryland, Baltimore County.

ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Chapter 26 Michelson-Morley Continued Relativistic Energy and Momentum.

Notetaking in the Classroom Why should we learn strategies for taking notes??? How can we become more effective notetakers????

1. RECAP 2 Parallel NB Training Documents/labels Documents/labels – 1 Documents/labels – 2 Documents/labels – 3 counts DFs Split into documents subsets.

Efficient Logistic Regression with Stochastic Gradient Descent – part 2 William Cohen.

Efficient Logistic Regression with Stochastic Gradient Descent William Cohen.

Test Taking Tips Test Prep  Preparation for your first test should begin on the first day of class; this includes paying attention.

Look out, It’s SAT time… Soon!. Revision is the key to success… To do well in your SATs your revision should be up and running by Easter…and that means.

Step 2: Inviting to Challenge Group. DON’T! Before getting into the training, it’s important that you DON’T just randomly send someone a message asking.

CE Operating Systems Lecture 22 Operating Systems - Revision Lecture.

Announcements My office hours: Tues 4pm Wed: guest lecture, Matt Hurst, Bing Local Search.

A Simple Introduction to Git: a distributed version-control system CS 5010 Program Design Paradigms “Bootcamp” Lesson 0.5 © Mitchell Wand, This.

English 51 Monday, March 11, 2013 Melissa Gunby. Freewrite Would you be a good candidate for Battle School? Why or why not?

Preparing for the OCR Functional Skills Maths Assessment

Map-Reduce With Hadoop. Announcement 1/2 Assignments, in general: Autolab is not secure and assignments aren’t designed for adversarial interactions Our.

Robust Principal Components Analysis IT530 Lecture Notes.

Parallel and Distributed Computing: MapReduce pilfered from: Alona Fyshe.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.

Other Map-Reduce (ish) Frameworks: Spark William Cohen 1.

Efficient Logistic Regression with Stochastic Gradient Descent William Cohen.

Logistic Regression William Cohen.

Announcements Phrases assignment out today: – Unsupervised learning – Google n-grams data – Non-trivial pipeline – Make sure you allocate time to actually.

Efficient Logistic Regression with Stochastic Gradient Descent William Cohen 1.

TEST TAKING SKILLS. TEST TAKING TIPS YOU ALREADY KNOW THIS STUFF, BUT LET’S REVIEW, SHALL WE? TWO BASIC TYPES OF MULTIPLE CHOICE QUESTIONS: YOU ALREADY.

KERNELS AND PERCEPTRONS. The perceptron A B instance x i Compute: y i = sign(v k. x i ) ^ y i ^ If mistake: v k+1 = v k + y i x i x is a vector y is -1.

Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.

Efficient Logistic Regression with Stochastic Gradient Descent

Efficient Logistic Regression with Stochastic Gradient Descent William Cohen 1.

Lecture 4 Page 1 CS 111 Online Modularity and Memory Clearly, programs must have access to memory We need abstractions that give them the required access.

Division Brought to you by powerpointpros.com. Lesson Menu Click on the links below to start with a specific topic. What is Division? Using Division Practice.

Mistake Bounds William W. Cohen. One simple way to look for interactions Naïve Bayes – two class version dense vector of g(x,y) scores for each word in.

Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )

1 i206: Lecture 17: Exam 2 Prep ; Intro to Regular Expressions Marti Hearst Spring 2012.

CMSC201 Computer Science I for Majors Lecture 25 – Final Exam Review Prof. Katherine Gibson Prof. Jeremy Dixon.

Matt Gormley Lecture 11 October 5, 2016

Hash table CSC317 We have elements with key and satellite data

Exam Review Session William Cohen.

CMSC201 Computer Science I for Majors Lecture 27 – Final Exam Review

ECE 5424: Introduction to Machine Learning

CSE332: Data Abstractions About the Final

Logistic Regression & Parallel SGD

Relational Algebra Chapter 4, Sections 4.1 – 4.2

CMSC201 Computer Science I for Majors Lecture 25 – Final Exam Review

CMSC201 Computer Science I for Majors Final Exam Information

It’s revision time again

Parallel Perceptrons and Iterative Parameter Mixing

CSE 326: Data Structures Lecture #14

Presentation transcript:

Midterm Exam Review Notes William Cohen 1

General hints in studying Understand what you’ve done and why – There will be questions that test your understanding of the techniques implemented why will/won’t this shortcut work? what does the analysis say about the method? We’re mostly about learning meets computation here – there won’t be many “pure 601” questions – but 601 topics that are discussed in lecture are fair game for questions 2

General hints in studying Techniques covered in class but not assignments: – When/where/how to use them – That usually includes understanding the analytic results presented in class – Eg: is the lazy regularization update an approximation or not? when does it help? when does it not help? 3

General hints in studying What about assignments you haven’t done? – You should read through the assignments and be familiar with the algorithms being implemented There won’t be questions about programming details that you could look up on line – but you should know how architectures like Hadoop work (eg, when and where they communicate, how they recover from failure, ….) – you should be able to sketch out simple map-reduce algorithms and answer questions about GuineaPig – you should be able to read programs in workflow operators and discuss how they’d be implemented and/or how to optimize them 4

General hints in studying There are not detailed questions on the guest speakers There might be high-level questions 5

General hints in exam taking You can bring in one 8 ½ by 11” sheet (front and back) Look over everything quickly and skip around – probably nobody will know everything on the test If you’re not sure what we’ve got in mind: state your assumptions clearly in your answer. – This is ok even on true/false If you look at a question and don’t know the answer: – we probably haven’t told you the answer – but we’ve told you enough to work it out – imagine arguing for some answer and see if you like it 6

Outline – major topicsbefore midterm Hadoop – stream-and-sort is how I ease you into that, not really a topic on its own Phrase finding and similarity joins Parallelizing learners (perceptron, …) Hash kernels and streaming SGD Distributed SGD for Matrix Factorization Some of these are easier to ask questions about than others. 7

And now….guest lecture! 8

MORE REVIEW SLIDES 9

HADOOP 10

What data gets lost if the job tracker is rebooted? If I have a 1Tb file and shard it 1000 ways will it take longer than sharding it 10 ways? Where should a combiner run? 11

$ hadoop fs -ls rcv1/small/sharded Found 10 items -rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part $ hadoop fs -tail rcv1/small/sharded/part weak as the arrival of arbitraged cargoes from the West has put the local market under pressure… M14,M143,MCAT The Brent crude market on the Singapore International … $ hadoop fs -ls rcv1/small/sharded Found 10 items -rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part rw-r--r-- 3 … :28 /user/wcohen/rcv1/small/sharded/part $ hadoop fs -tail rcv1/small/sharded/part weak as the arrival of arbitraged cargoes from the West has put the local market under pressure… M14,M143,MCAT The Brent crude market on the Singapore International … Where is this data? How many disks is it on? If I set up a directory on /afs that looks the same will it work the same? what about a local disk? 12

13

14

15

Why do I see this same error over and over again? 16

PARALLEL LEARNERS 17

Parallelizing perceptrons – take 2 Instances/labels Instances/labels – 1 Instances/labels – 2 Instances/labels – 3 w -1 w- 2 w-3 w w Split into example subsets Combine by some sort of weighted averaging Compute local vk’s w (previous) 18

A theorem Corollary: if we weight the vectors uniformly, then the number of mistakes is still bounded. I.e., this is “enough communication” to guarantee convergence. I probably won’t ask about the proof, but I could definitely ask about the theorem. 19

NAACL 2010 What does the word “structured” mean here? why is it important? would the results be better or worse with a regular perceptron? 20

STREAMING SGD 21

Learning as optimization for regularized logistic regression Algorithm: Time goes from O(nT) to O(mVT) where n = number of non-zero entries, m = number of examples V = number of features T = number of passes over data Time goes from O(nT) to O(mVT) where n = number of non-zero entries, m = number of examples V = number of features T = number of passes over data what if you had a different update for the regularizer? 22

Formalization of the “Hash Trick”: First: Review of Kernels What is it? how does it work? what aspects of performance does it help? What did we say about it formally? 23