Fast Cars, Big Data How Streaming Can Help Formula 1.

Slides:



Advertisements
Similar presentations
John Consiglio The Cooper Union February 24, 2009 Using Hardware-in-the-Loop Simulations to Improve EPA Emissions Testing.
Advertisements

© 2015 Ellen Friedman 1 Big Data Stories: Decisions That Drive Successful Projects Ellen Friedman Strata Conference San Jose 18 February 2015.
© 2014 MapR Technologies 1 Ted Dunning February 20, 2015.
This refers to the use of computers to represent a real life situation.
IGCSE ICT Computer Simulation.
Teaching and Learning in a Digital World. FaceBook Twitter Delicious Glogster LinkedIn.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
John Plummer Technical Specialist Data Platform Microsoft Ltd StreamInsight Complex Event Processing (CEP) Platform.
Modelling and Simulation 3.5 Science and the Environment.
Team W3: Anthony Marchetta Derek R. Ritchea David M. Roderick Adam Stoler Milestone 1: Jan 21 st Project Proposal Overall Project Objective: Design an.
Kaskad Technology Korrelera for Market Surveillance Candyce Edelen November 8, 2006.
Machine Learning as a Service
Distributed Time Series Database
We will start shortly…. DiView II DiView II Software Presented by: Daniele Posenato.
Mr/Ms Period ---- Class. Take a couple of minutes and write down a couple of sentences explaining what Science means to you.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Apache Hadoop on Windows Azure Avkash Chauhan
Computer Architecture Organization and Architecture
International Online Virtual Grand Prix Racing An exciting maths project for schools in international partnerships Val Brooks.
MAE 494/598 Group 3: Fabian Gadau Lucas Jaramillo Myrtle Lin Bryce Thompson Racetrack Optimization.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
1© 2009 Autodesk Hardware Shade – Presenting Your Designs Hardware and Software Shading HW Shade Workflow Tessellation Quality Settings Lighting Settings.
Heuristic Evaluation May 4, 2016
© 2014 MapR Technologies 1 Ted Dunning. © 2014 MapR Technologies 2 Me, Us Ted Dunning, MapR Chief Application Architect, Apache Member –Committer PMC.
Data Analytics (CS40003) Introduction to Data Lecture #1
Basic Outputs to Troubleshooting Queries.
Export Services Deep Dive
Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.
SNS COLLEGE OF TECHNOLOGY
Marketing Data: 101 Charts & Graphs
Distributed Tracing How to do latency analysis for microservice-based applications Reshmi
Introduction to Spark Streaming for Real Time data analysis
Using Unity as an Animator and Simulator for PaypyrusRT Models
Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator.
Real Time Data with Azure and Power BI
File Format Benchmark - Avro, JSON, ORC, & Parquet
Neha Jain Shashwat Yadav
Heuristic Evaluation August 5, 2016
Statistics 202: Statistical Aspects of Data Mining
Modelling rich interaction
Software Engineering and Best Practices
Outline Introduction Characteristics of intrusion detection systems
Using Sequence Statistics to Fight Advanced Persistent Threats
A lap around DirectX game development tools
Application Level Fault Tolerance and Detection
Overview of Hadoop MapReduce MapReduce is a soft work framework for easily writing applications which process vast amounts of.
Lecture 10: Buffer Manager and File Organization
Sales Performance Definition Purpose Instructions
September 11, Ian R Brooks Ph.D.
Power BI Performance …Tips and Techniques.
Introduction to mobile app development Module 3 – Improving your App Studio app Lance McCarthy.
Fault Injection: A Method for Validating Fault-tolerant System
Tooling and Diagnostics
CSCI1600: Embedded and Real Time Software
Data collection and activity analysis
Learn. Imagine. Build. .NET Conf
Introduction to SAP HANA
Testing Mutable Objects
DISSERTATION ON CRYPTOGRAPHY.
Big DATA.
Actually, You CAN Measure the Impact of Customer Content on Sales
Bharat Prakash | Solutions Consultant
Polyglot Persistence: Column Stores
Introduction to ASP.NET Parts 1 & 2
Towards Formula SAE Driver/Vehicle Optimization
Towards Formula SAE Driver/Vehicle Optimization
End of day Calculator and special order parts tracking
Visual Data Flows – Azure Data Factory v2
The “perfect” data platform
Presentation transcript:

Fast Cars, Big Data How Streaming Can Help Formula 1

Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email tdunning@apache.org tdunning@maprtech.com Twitter @ted_dunning Hashtags today: #strataconf @mapr

Agenda What’s the point of data in motorsports? How can we play too? KPI preserving generation How the sim works What works? Live demo

How data plays in F1 motorsports

Data in Motorsports http://f1framework.blogspot.de/2013/08/short-guide-to-f1-telemetry-spa-circuit.html

Blue driver carries more speed while slowing for curve https://scarbsf1.wordpress.com/2011/08/18/telemetry-and-data-analysis-introduction/

Difference is due to later and sharper braking

Real Analytics as Well as Visualization Inputs Predictive analysis of consumables and tires Physical models of car + driver performance Tire wear slows lap times, lower fuel weight speeds lap times Game theoretic analysis of competitors’ options Monte Carlo analysis of likely weather conditions Current GP points status Outputs Tactical options, outcome distributions

Data for Marketing as well http://formula1.ferrari.com/en/inforacing-hungarian-gp-2015/

More Data = More Gearhead Engagement Occasionally, somebody makes a tiny bit of data available like this data dump of Button’s ride in the Australian Gran Prix Fans go nuts with this http://bit.ly/f1-data-dump http://bit.ly/f1-data-plotting-with-r

But it still doesn’t work ...

ENODATA ! Available data is definitely not good enough for more than an occasional blog post Available data is intermittent, out of date and inconsistent Except in special cases we can’t really build publicly available systems from scrounged data

The Unrealistic Nature of Real Data Real data has several defects We can’t share it We can’t get it We can’t break it We can’t understand it KPI preserving synthetic data has several virtues We can generate at any scale we like We can inject faults or oddities We have a god’s-eye view We have guarantees about practical fidelity

The Unrealistic Nature of Real Data Real data has several defects We can’t share it (due to confidentiality) We can’t get it (too big, wrong scale, out of date) We can’t break it (injecting major real-world faults frowned upon) We can’t understand it (we don’t know what really happened) KPI preserving synthetic data has several virtues We can generate at any scale we like We can inject faults or oddities We have a god’s-eye view We have guarantees about practical fidelity

KPI Matching Simulation Parametric matching of key performance and failure signatures allows emulation of complex data properties Matching on KPI’s and failure modes guarantees practical fidelity

If it breaks the same, it’s as good as the same

The Method Pick realistic and important KPI’s and failure measures Sample rates, data volumes Plausible physics Plausible data semantics Your mileage may vary Build emulation roughly based on real system Tune data spec to match KPI’s using real models Export data spec to alternative models Re-tune data spec to match on alternative models

So how does that work?

So how does that work? Especially for real-time data?

Production System Outline

Simplified Demo System Outline

TORCS for Cars, Physics and Drivers TORCS is a pseudo-physics based racing simulator with full graphics output and pluggable control modules. TORCS is commonly used for AI research, but the control model can just as well collect data

What is the Point? We would like to Prove out software architectures Test data pipelines and visualization systems Tune UI’s

What is the Point? We would like to Prove out software architectures Test data pipelines and visualization systems Tune UI’s Play video games?

What is the Point? We would like to We also need to Prove out software architectures Test data pipelines and visualization systems Tune UI’s We also need to Simulate system failure scenarios Push limits for future usage

Current Status It works, is available on github, ASL 2 Data collected is unrealistically limited, lacks Tire pressure, temperature x 4 Brake usage, temperature x 8 Engine monitoring is primitive (RPMs only, no KERS) Data rate is fixed, real data comes in at highly variable rates Real data has variable delays due to RF dropout + buffering Data collected is in pure JSON Real data is columnar compressed blobs

Tables as Objects, Objects as Tables List of objects Row-wise form Column-wise form Object containing lists

Micro Columnar Formats An entire table stored in columnar form can be a first-class value using these techniques This is very powerful for in-lining one-to-many relations.

Compression Results Samples are 64b time, 16 bit sample Sample time at 10kHz Sample time jitter makes it important to keep original time-stamp How much overhead to retain time-stamp?

Sensor Data V1 3 main data points: Buffered Speed (m/s) RPM { "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806 }, "racetime":0.324, "timestamp":1458141858 "Speed":6.755624, "Distance":2004.084717, "RPM":1673.264526 "racetime":0.556, 3 main data points: Speed (m/s) RPM Distance (m) Buffered

Sensor Data V2 3 main data points: Buffered Speed (m/s) RPM { "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806, "gear" : 2 }, "racetime":0.324, "timestamp":1458141858 "Speed":6.755624, "Distance":2004.084717, “RPM":1673.264526, "racetime":0.556, Sensor Data V2 3 main data points: Speed (m/s) RPM Distance (m) Throttle Gear … Buffered

Thank you for coming today!

Short Books by Ted Dunning & Ellen Friedman Published by O’Reilly in 2014 and 2015 For sale from Amazon or O’Reilly Free e-books currently available courtesy of MapR http://bit.ly/recommendation-ebook http://bit.ly/ebook-anomaly http://bit.ly/mapr-tsdb-ebook http://bit.ly/ebook-real-world-hadoop

Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing tomorrow morning before Tug’s talk http://bit.ly/mapr-ebook-streams

Thank You!

Q & A Engage with us! @mapr maprtech mapr-technologies MapR tdunning@maprtech.com maprtech