Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –

Slides:



Advertisements
Similar presentations
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Advertisements

Seeking prime numbers quickly through parallel-computing Daniel J. Wright.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
 2002 Prentice Hall Hardware Basics: Inside The Box Chapter 2.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Chapter 5 Computing Components. The (META) BIG IDEA Cool, idea but maybe too big DATA – Must be stored somewhere in a storage device PROCESSING – Data.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
Technical Architectures
MSSG: A Framework for Massive-Scale Semantic Graphs Timothy D. R. Hartley, Umit Catalyurek, Fusun Ozguner, Andy Yoo, Scott Kohn, Keith Henderson Dept.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Parallel and Distributed IR
Distributed Iterative Training Kevin Gimpel Shay Cohen Severin Hacker Noah A. Smith.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Christopher Jeffers August 2012
IBM’s Watson. IBM’s Watson represents an innovation in Data Analysis Computing called Deep QA (Question Answering) Their project is a hybrid technology.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
© 2012 International Business Machines Corporation IBM Watson in Health Care Joel Farrell, IBM MedBiquitous Annual Conference 2013.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Improving Network I/O Virtualization for Cloud Computing.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Introduction to Hadoop and HDFS
Master Thesis Defense Jan Fiedler 04/17/98
1 Peter Fox Xinformatics 4400/6400 Week 11, April 16, 2013 Information Audit and dealing with Unstructured Information.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Computing Fundamentals Module Lesson 19 — Using Technology to Solve Problems Computer Literacy BASICS.
Titanium/Java Performance Analysis Ryan Huebsch Group: Boon Thau Loo, Matt Harren Joe Hellerstein, Ion Stoica, Scott Shenker P I E R Peer-to-Peer.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
MULTIMEDIA DATABASES -Define data -Define databases.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Facilitating Document Annotation using Content and Querying Value.
Computing Fundamentals Module Lesson 6 — Using Technology to Solve Problems Computer Literacy BASICS.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
An Investigation of Xen and PTLsim for Exploring Latency Constraints of Co-Processing Units Grant Jenks UCLA.
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Facilitating Document Annotation Using Content and Querying Value.
Using Technology to Solve Problems Unit 2 Mod 2 SO 7.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
IBM WATSON IT-01 Hiral Patel IT-04 Charmy Adhyaru.
Aakarsh Malhotra ( ) Gandharv Kapoor( )
Getting the Most out of Scientific Computing Resources
Understanding and Improving Server Performance
NFV Compute Acceleration APIs and Evaluation
Big Data is a Big Deal!.
Getting the Most out of Scientific Computing Resources
Big Data Enterprise Patterns
Hadoop Aakash Kag What Why How 1.
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Java 9: The Quest for Very Large Heaps
课程名 编译原理 Compiling Techniques
Hadoop Clusters Tess Fulkerson.
Johannes Peter MediaMarktSaturn Retail Group
Renouncing Hotel’s Data Through Queries Using Hadoop
Charles Tappert Seidenberg School of CSIS, Pace University
Computer Literacy BASICS
Performance And Scalability In Oracle9i And SQL Server 2000
Presentation transcript:

Making Watson Fast Daniel Brown HON111

Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds – Initial application speed: 1-2 hours processing time per question

Unstructured Information Management Architecture (UIMA): framework for NLP applications; facilitates parallel processing – UIMA-AS: Asynchronous Scaleout UIMA chosen at start for these reasons; other optimization work only began after 2 years (after QA accuracy/confidence improved)

UIMA implementation of DeepQA

Type System Common Analysis Structure (CAS) Annotator – CAS multiplier (CM): creates new “children” CASes Flow Controller CASes can be spread across multiple systems (processed in parallel) for efficiency

Scaling out Two systems: – Development (+question processing) Meant to analyze many questions accurately – Production (+speed) Meant to answer one question quickly

Scaling out: UIMA-AS (UIMA-AS: Asynchronous Scaleout) – Manages multithreading, communication between processes necessary for parallel processing Feasibility test: simulated production system with 110 processes, core machines – Goal: less than 3 seconds; actual: more than 3 seconds – Two sources of latency: CAS serialization, network communication – Optimizing CAS serialization resulted in runtime of <1s

Scaling out: Deployment 400 processes, 72 machines

How to find time bottlenecks in such a system? – Monitoring tool – Integrated timing measurements (in flow controller component)

RAM Optimizations Wanted to avoid disk read/write time delays, so all (production system) data was put into RAM Some optimizations: – Reference size reduction – Java object size reduction – Java object overhead – String size – Special hash tables – Java garbage collection with large heap sizes *Full GC between games

Indri Search Optimizations Indri search: used to find most relevant 1-2 sentences from Watson database Using single processor, primary search takes too long (i.e. 100s) – Supporting evidence search even longer Solution? – Divide corpus (body of information to search) into chunks, then assign each search daemon a chunk – (specifically, 50GB corpus of 6.8 million documents, 79 chunks of documents each, 79 Indri search daemons with 8 CPU cores each; end result, 32 passage queries could be run at once)

Preprocessing and Custom Content Services Watson must first analyze the passage texts before being able to use them – Deep NLP analysis - semantic/structural parsing, etc. Since Watson had to be self-contained, this analysis could be done before run time (preprocessed) – Used Hadoop (distributed file system software) – 50 machines, 16GB/8 cores each

Preprocessing and Custom Content Services Retrieving the preprocessed data? – Preprocessed data much larger than unprocessed corpus (~300GB total) – Built custom content server – allocated data to 14 machines, ~20GB each – Documents then were accessed from these servers

End result Parallel processing combined with a number of other performance optimizations resulted in a final average latency of less than 3 seconds. – No one “silver bullet” solution