Pattern Matching in DAME using AURA technology Jim Austin, Robert Davis, Bojian Liang, Andy Pasley University of York.

Slides:



Advertisements
Similar presentations
Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York.
Advertisements

Jim Austin University of York & Cybula Ltd
Rolls-Royce supported University Technology Centre in Control and Systems Engineering UK e-Science DAME Project Alex Shenfield
Information Retrieval in Practice
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.
Decision Support Tools CBR & Modeling Jeff Allan University of Sheffield.
Grid Enabled Pattern Matching within the DAME e-Science Pilot Project Jim Austin Computer Science University of York.
 Image Search Engine Results now  Focus on GIS image registration  The Technique and its advantages  Internal working  Sample Results  Applicable.
Improving performance of Multiple Sequence Alignment in Multi-client Environments Aaron Zollman CMSC 838 Presentation.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
A Distributed Data Architecture Mark Jessop University of York.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
DAME: A Distributed Diagnostics Environment for Maintenance Professor Jim Austin/Dr Tom Jackson University of York.
Chapter 10 Architectural Design
Lecture 11: DMBS Internals
DAME: Distributed Engine Health Monitoring on the Grid
CLOUD BASED MACHINE LEARNING APPROACHES FOR LEAKAGE ASSESSMENT AND MANAGEMENT IN SMART WATER NETWORKS Dr. Steve Mounce, Ms. Catalina Pedroza, Dr. Tom Jackson,
Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Autonomous Pipelines David Brett Leicester E-Science talk Edinburgh AUTONOMOUS PIPELINES David Brett, Leicester University.
Distributed Aircraft Maintenance Environment - DAME DAME Workflow Advisor Max Ong University of Sheffield.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Quick and Easy Binary to dB Conversion George Weistroffer, Jeremy Cooper, and Jerry Tucker Electrical and Computer Engineering Virginia Commonwealth University.
The DAME project Professor Jim Austin University of York.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Efficient Local Statistical Analysis via Integral Histograms with Discrete Wavelet Transform Teng-Yok Lee & Han-Wei Shen IEEE SciVis ’13Uncertainty & Multivariate.
New software library of geometrical primitives for modelling of solids used in Monte Carlo detector simulations Marek Gayer, John Apostolakis, Gabriele.
DAME: A Distributed Diagnostics Environment for Maintenance Dr Tom Jackson University of York.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Overview of the DAME Project Distributed Aircraft Maintenance Environment University of York Martyn Fletcher.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Programming in C++ Dale/Weems/Headington Chapter 1 Overview of Programming and Problem Solving.
Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.
Computer Architecture2  Computers are comprised of three things  The physical computer  The operating system  The user and programs running on the.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Visual Odometry David Nister, CVPR 2004
A Scalable Service Architecture for Distributed Search Mark Jessop University of York.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Sequential Logic Flip-Flop Circuits By Dylan Smeder.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
W4118 Operating Systems Instructor: Junfeng Yang.
Indexing The World Wide Web: The Journey So Far Abhishek Das, Ankit Jain 2011 Paper Presentation : Abhishek Rangnekar 1.
Computational Challenges in BIG DATA 28/Apr/2012 China-Korea-Japan Workshop Takeaki Uno National Institute of Informatics & Graduated School for Advanced.
SketchVisor: Robust Network Measurement for Software Packet Processing
University of Maryland Baltimore County
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Fast Subsequence Matching in Time-Series Databases.
SIMILARITY SEARCH The Metric Space Approach
Efficient data maintenance in GlusterFS using databases
Lecture 16: Data Storage Wednesday, November 6, 2006.
Oracle SQL*Loader
COMP 430 Intro. to Database Systems
Genomic Data Clustering on FPGAs for Compression
Lecture 11: DMBS Internals
CSCI1600: Embedded and Real Time Software
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Implementation of Relational Operations
CSCI1600: Embedded and Real Time Software
Dependency Architecture
Presentation transcript:

Pattern Matching in DAME using AURA technology Jim Austin, Robert Davis, Bojian Liang, Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME Overview Context AURA technology DAME pattern matching problem AURA solution Search performance Next steps

Distributed Aircraft Maintenance Environment - DAME Context Vibration data from all engines in flight Detection of unusual vibration patterns –Novelties, anomalies –Automatic or manual  Search for similar vibration behaviour –Need to search large volumes of historical vibration data Investigate search results and associated data –Service data records –CBR tools: Sheffield

Distributed Aircraft Maintenance Environment - DAME AURA technology AURA –Proven technology for searching large data sets –Ability to scale and maintain performance –Easily parallelised Examples –Address matcher –Molecular matcher Operation –Vectors compared to stored examples –Uses bit level comparison methods –Correlation Matrix Memory operations

Distributed Aircraft Maintenance Environment - DAME AURA architecture Data Adaptor Store Search Input pattern Candidate Engine (Back check) Indexer Output pattern AURA SearchEngine Results binary Store & Search Indexes or Data ResultStore Candidate Selector

Distributed Aircraft Maintenance Environment - DAME AURA storage & recall Input pattern Output pattern AURA SearchEngine binary ** Correlation Matrix Memories

Distributed Aircraft Maintenance Environment - DAME AURA software AURA re-designed –To improve performance of the AURA library in terms of both memory usage and search times 3 fold reduction in memory 3 fold reduction in search time –To make the library easy to use Simple API Typically only 4 or 5 API calls used Enable implementation as an OGSI GT3 service –To engineer the library to commercial software standards Comprehensive user guide and reference manual

Distributed Aircraft Maintenance Environment - DAME Pattern matching problem Vibration data from sensors forms Z-mod data. Tracked orders extracted from Z-mod data Frequency Time Tracked order Time Amplitude

Distributed Aircraft Maintenance Environment - DAME Pattern matching problem Novelty or anomaly identified in tracked order data by feature detectors Forms Query sub- sequence

Distributed Aircraft Maintenance Environment - DAME Pattern matching problem Search for sub-sequences similar to the query in a large volume of tracked order data. –Need to investigate all possible alignments –Benchmark method is sequential scan –Noisy data: imprecise matching required –Various possible similarity measures Euclidian distance Correlation

Distributed Aircraft Maintenance Environment - DAME AURA solution Stored Time series AURA Search Engine Results Encoded Query Query Time Series AURA Backcheck Encoded Time Series Candidate Matches

Distributed Aircraft Maintenance Environment - DAME AURA solution Encoding: reduction in dimensionality –e.g. from 100pts to 10 values. Approximate search –From ~ 1,000,000s of alignments down to ~1000s of candidate matches Backcheck –From ~1000s candidate matches to 100 or fewer results

Distributed Aircraft Maintenance Environment - DAME Encoding technique Piecewise Aggregate Approximation Values encoded using integer bins

Distributed Aircraft Maintenance Environment - DAME Search efficiency Approximate search using AURA –Fast method of discarding poor matches –AURA search typically an order of magnitude or more faster than sequential scan. –Candidate matches typically <1% of total. –Back check stage very efficient due to reduction in volume of data typically 1% or less of processing time for full sequential scan.

Distributed Aircraft Maintenance Environment - DAME Data size Assume –Fleet of 100 aircraft, 4 engines each –Flying 10 hours per day –5 data points per tracked order per second –4 bytes per data point Totals – approx. 100 GigaBytes per year per tracked order –Roughly 10 tracked orders of interest so… Total approx. 1 TeraByte per year

Distributed Aircraft Maintenance Environment - DAME Search performance Deployed system assumptions –100 CPUs 2GHz each with 1GByte RAM. One per aircraft –Each search needs to check 25,000,000,000 alignments of the query per year of tracked order data. Sequential scan –Measured at approx. 2 seconds for 5,000,000 alignments of a 100 data point query (one CPU). –Extrapolates to approx. 500 seconds to search 5 years of data assuming 1 CPU per aircraft –This is too slow!  Need to support multiple searches and searches on more than one tracked order.

Distributed Aircraft Maintenance Environment - DAME Search performance Using AURA and PAA based approach –Search time reduced by approx an order of magnitude. –Can search 5 years of data for 100 aircraft in approx: 50 seconds –Believe this to be a workable solution –But response times potentially slower than this Need to handle a number of searches in parallel Communications and other overheads

Distributed Aircraft Maintenance Environment - DAME Next steps Technology –Refine similarity measures and encoding methods. Architecture –Develop additional services to distribute and organise the search –Support multiple searches in parallel Measurement –Perform scaling trials on engine data –Obtain better estimates of overall performance Multiple searches Overheads