ML Approaches – Conceptual Stuff Nitin Kohli DS W210 – Capstone Project.

Slides:



Advertisements
Similar presentations
Noise-Predictive Turbo Equalization for Partial Response Channels Sharon Aviran, Paul H. Siegel and Jack K. Wolf Department of Electrical and Computer.
Advertisements

More Accurate Bus Prediction Allows Passengers to find alternate forms of transportation Do this with energy efficiency in mind Dont use any high level.
Viewing an Archived Session of Wimba Classroom Instructions for Students.
Decision Tree Approach in Data Mining
Motion in One Dimension Problems MC Questions 2. A motorcycle is moving with a speed v when the rider applies the brakes, giving the motorcycle a constant.
Programming Types of Testing.
Sorted list matching & Experimental run-Time COP 3502.
Mumbai Navigator “How to commute using delay prone buses” Abhiram Ranade Department of Comp Sc. & Engg. I.I.T. Bombay.
Algorithms Sheet 2 Identifying Control Structures.
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 17: Linked Lists.
02/04/2008CSCI 315 Operating Systems Design1 CPU Scheduling Algorithms Notice: The slides for this lecture have been largely based on those accompanying.
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
ACSE th Conference The Iconic Programmer Stephen Chen.
02/11/2004CSCI 315 Operating Systems Design1 CPU Scheduling Algorithms Notice: The slides for this lecture have been largely based on those accompanying.
Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 8 Chapter 5: CPU Scheduling.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Face Alignment Using Cascaded Boosted Regression Active Shape Models
PYTHON PROGRAMMING Week 10 – Wednesday. TERMS – CHAPTER 1 Write down definitions for these terms:  Computation  Computability  Computing  Artificial.
More Dynamic Programming Floyd-Warshall Algorithm.
Chapter 2 - Algorithms and Design
Iteration Iterative operator: for, du, while. Problem: Infinite and time-consuming iterations. Halting problem: We are not able to determine whether an.
Logic Structure - focus on looping Please use speaker notes for additional information!
Dr. Clincy1 Chapter 6 Delivery & Forwarding of IP Packets Lecture #4 Items you should understand by now – before routing Physical Addressing – with in.
Neural Networks and Fuzzy Systems Hopfield Network A feedback neural network has feedback loops from its outputs to its inputs. The presence of such loops.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Displaying Dynamic Information Jaime Teevan * Massachusetts Institute of Technology * The General ProblemThe General Solution Are these.
Selection Control Structures. Simple Program Design, Fourth Edition Chapter 4 2 Objectives In this chapter you will be able to: Elaborate on the uses.
System Analysis (Part 3) System Control and Review System Maintenance.
Train timetables Time real life problems – Year 4/5.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 2 - Algorithms and Design print Statement input Statement and Variables Assignment Statement if Statement Flowcharts Flow of Control Looping with.
Pseudocode Simple Program Design Third Edition A Step-by-Step Approach 2.
Computational Biology, Part 13 Spreadsheet Basics II Robert F. Murphy Copyright  1996, 1999, 2000, All rights reserved.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
1 Data Link Layer Lecture 23 Imran Ahmed University of Management & Technology.
Homework #2: Functions and Arrays By J. H. Wang Mar. 20, 2012.
Parallel and Distributed Simulation Time Parallel Simulation.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 17: Linked Lists (part 3)
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
© The McGraw-Hill Companies, 2006 Chapter 3 Iteration.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Sequential Processing to Update a File Please use speaker notes for additional information!
Linear Search Linear Search is a fundamental search algorithm. Linear search, also known as sequential search, is a process that checks every element in.
© 2006 Carnegie Mellon Robotics Academy Designed for use with the LEGO MINDSTORMS ® Education NXT Software and Base Set #9797 Sentry System Two-Way Communication.

Scan Library.
Introduction to Computational Thinking
Navigation In Dynamic Environment
Pseudocode algorithms using sequence, selection and repetition
Global Challenge A bag for Juliane Lesson 2.
Global Challenge Flashing Wheels Lesson 3.
Global Challenge A bag for Juliane Lesson 2.
Binary Search A binary search algorithm finds the position of a specified value within a sorted array. Binary search is a technique for searching an ordered.
Lecture 2 Part 3 CPU Scheduling
Global Challenge Flashing Wheels Lesson 3.
Global Challenge Flashing Wheels Lesson 3.
Planning and Estimation.
Global Challenge A bag for Juliane Lesson 2.
Global Challenge Flashing Wheels Lesson 3.
Global Challenge A bag for Juliane Lesson 2.
Global Challenge A bag for Juliane Lesson 2.
Global Challenge A bag for Juliane Lesson 2.
Global Challenge A bag for Juliane Lesson 2.
Global Challenge A bag for Juliane Lesson 2.
Global Challenge A bag for Juliane Lesson 2.
Global Challenge Flashing Wheels Lesson 3.
Global Challenge Flashing Wheels Lesson 3.
Planning and Estimation.
Global Challenge Flashing Wheels Lesson 3.
Presentation transcript:

ML Approaches – Conceptual Stuff Nitin Kohli DS W210 – Capstone Project

Sequence Matching – Marathon Runner Analogy Imagine you are watching a runner run a marathon During the marathon, a runner reaches various checkpoints and their time is recorded For instance, if there are 26 checkpoints in the race, and we know the runner’s time at the first 3 checkpoints, we can use this information to deduce the time at the 4 th checkpoint, 5 th checkpoint, and so on In general, we are able to infer about time to complete the remainder of the race for that particular runner

Sequence Matching We apply this analogy to trains within the BART system The trains depart a “starting” station at a particular time, and check in at checkpoints (train stops) along the way This information gives us a partial story of the sequence of arrival times for the train To deduce the remainder of the times, we can match these incomplete sequences on complete historical sequences to deduce the next arrival time

Lag Time Analysis Unlike the marathon runner in our previous analogy, once a train arrives at a given station it does not immediately continue It pauses for a bit to allow passengers to get on and off the train before continuing Thus, we need to supplement the arrival time from the sequence matching by accounting for the lag time at a given station This is done using a Ridge Regression with features such as (but not limited to): Length of the train Which stop the train is at Time of the arrival (Estimated) Whether the arrival is in the AM or PM

Summary: Sequence Matching -> Lag Time Prediction -> Updated Departure Time -> Repeat … … … Lag Model Sequence Matching Model

Tech Summary of System Level Prediction 1. User enters various information 2. We first need to tell the user the train will arrive at the selected station for departure - This means we need to query the MySQL db to find the most recent trains heading in the direction of the user - Then, we need to use the previous train stops as an input to perform a sequence match - Once we have a sequence match, we can predict from the matched sequences - This will give us a predicted arrival time at the next stop - But at each stop, the train will wait some time before departing from that station - This is were the lag_times model comes in - the current stop, length of the train, etc are used to predict how long the train will wait at a given station 3. Repeat this process until we have both the departing destination and arrival destination predictions 4. Output these values back to the user in the UI

ML Approaches – “Mathy” Stuff Nitin Kohli DS W210 – Capstone Project

The following slides have (for the most part) all the math that was done to construct the system level prediction It includes the first model, which was used to iterate on to get the more accurate second model

Conceptual Framework: k-Nearest Sequences In the picture on the right, note that there are 5 distinct paths Within each path, trains can run in either direction Thus, there are 10 directional paths For each directional path, we will denote the stops using {1,2,…,n} For example, on the orange line from Richmond to Fremont, 1 will refer to Richmond 2 will refer to El Cerrito del Norte 3 will refer to El Cerrito Plaza, etc.

Conceptual Framework Continued … … …

… … … … … … …

Approach 1: Complete the whole sequence

Approach 1: Empirical Results Predicted6:326:406:426:446:45 Actuals6:326:406:416:436:44

Approach 2: A Dynamic Probabilistic System

Solution: Invoke the Weak Law of Large Numbers

Dynamic Probabilistic System Algorithm