Download presentation
Presentation is loading. Please wait.
1
Jim Austin University of York & Cybula Ltd
Data Mining with AURA Jim Austin University of York & Cybula Ltd
2
Overview AURA AURA within UK e-Science GRID issues in DM
Background to AURA Brief overview of its components Its implementation AURA within UK e-Science What is e-Science The DAME pilot project Use of AURA in DAME GRID issues in DM 22 Oct 2001
3
The AURA Technology Neural network based associative storage
Set of tools to build fast pattern recognition systems Aimed at unstructured data Aimed at large datasets Scaleable technology 22 Oct 2001
4
AURA as a basis for search
The game is to remove the chaff using AURA. Later processes find the exact match. 22 Oct 2001
5
The storage system Correlation Matrix Memory based
Exploits threshold logic methods Uses distributed encoding of information Implemented using binary ‘weights’ for efficient software and hardware implementation 22 Oct 2001
6
weights ( ) M P Inputs Threshold, T R 22 Oct 2001
7
Why is it fast? Access only rows that are activated by inputs.
Inputs are made as sparse as possible and fixed weight. Only need to sum over active rows (bit vectors) – ideal for most processors Great for bit vector machines (DAP!). 22 Oct 2001
8
Use of the CMM CMM system Query Data subset Data Slow algorithm
Final data 22 Oct 2001
9
CMM system Pre-process Operations Prepare data Post process CMM system
22 Oct 2001
10
Pre-processing Implements a number of pre-processors
N-grams for text strings CMAC for numeric data Graphs for images and graphics Tokens for logical data Quantisation for time series 22 Oct 2001
11
Post processing Data selected by the CMM must be accessed quickly.
Uses ‘best bit index’ method to match output data and recover stored data. 22 Oct 2001
12
Implementation The AURA C++ library Implemented on PC or workstation
Beowulf parallel cluster Origin 2000 supercomputer Bespoke hardware 22 Oct 2001
13
AURA parallel implementation 28 dedicated PCI based processors
Cortex-1 AURA parallel implementation 28 dedicated PCI based processors Beowulf configuration 3.5Gb memory size 22 Oct 2001
14
UK eScience Aims to build on the concept of Grids
To make computing and data provision as direct and simple as electrical power delivery £110M initiative started 18 months ago DAME is a £3.5M pilot project to demonstrate its application in the engineering field. 22 Oct 2001
15
DAME Objectives DAME: Distributed Aircraft Maintenance Environment.
Demonstrate diagnostic capability on the GRID Examine timeliness properties of the GRID Demonstrate on the RR Aeroengine diagnostic problem 22 Oct 2001
16
University of Sheffield, P Fleming.
Rolls-Royce University of Sheffield, P Fleming. University of Leeds, Peter Dew, Alison McKay. York, J Austin, J McDermid, A Wellings. University of Oxford, Lionel Tarassenko. Rolls-Royce, Derby. Data Systems & Solutions. Cybula Ltd. 22 Oct 2001
17
Grid Diagnostics centre Engine flight data London Airport
Airline office New York Airport Grid Diagnostics centre Maintenance Centre American data center European data center 22 Oct 2001
18
Diagnostic issues The system must analyse and report Data
Novel engine operation Identify any cause of events Do this quickly Data Large (many Tb) 22 Oct 2001
19
Data – Zmod plots 22 Oct 2001
20
How does AURA contribute
Search technology for multi-media data Parallel pattern match engine based on neural networks. Built on Correlation Matrix Memories. High performance Beowulf and dedicated hardware implementations. Commercially sold by Cybula Ltd. 22 Oct 2001
21
22 Oct 2001 Diagnostic station Engine data Novelty indication Quote
Data used to identify novelty Data reduction processes Match requests Features Data to be searched for Pattern match results Data stores/ data warehouse Diagnosis AURA-G GRID 22 Oct 2001
22
Simple example of processing chain Matching previous events
CMM Data sample DM coding Simple example of processing chain Matching previous events 22 Oct 2001
23
Typical pre-processing
Frequency DM coding (1 up and 0 down) Fast Preserves information Produces a binary vector Time 22 Oct 2001
24
AURA-G This is a Globus enabled AURA implementation.
Developed under DAME Will be available end of 2002 for use in other problems. 22 Oct 2001
25
AURA-G Support of scalable pattern matching
Supports distributed search, across multiple CMM engines at different sites OGSA compliant 22 Oct 2001
26
Grid Issues in Data Mining
Data provenance Standards: Data transparency independent of location Managing DB/Data mining link in distributed system OGSA DAI 22 Oct 2001
27
Conclusions AURA is a mature component for data search and retrieval
Robust software and hardware implementation available Applications in e-Science for Grid applications underway 22 Oct 2001
28
Dept Computer Science, University of York, York, YO1O 5DD.
Contacts Jim Austin Dept Computer Science, University of York, York, YO1O 5DD. Cybula Ltd. DAME : 22 Oct 2001
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.