The Mercury System: Embedding Computation into Disk Drives Roger Chamberlain, Ron Cytron, Mark Franklin, Ron Indeck Center for Security Technologies Washington.

Slides:



Advertisements
Similar presentations
Workflows, Requests, Tasks and EMu Mark Bradley National Gallery of Australia.
Advertisements

Section 6.2. Record data by magnetizing the binary code on the surface of a disk. Data area is reusable Allows for both sequential and direct access file.
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Commercial Data Processing Lesson 2: The Data Processing Cycle.
Novell Server Linux vs. windows server 2008 By: Gabe Miller.
File Management Systems
CPSC 2031 What is a computer? A machine that processes information.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Lesson 12 – NETWORK SERVERS Distinguish between servers and workstations. Choose servers for Windows NT and Netware. Maintain and troubleshoot servers.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Mahesh Sukumar Subramanian Srinivasan. Introduction Face detection - determines the locations of human faces in digital images. Binary pattern-classification.
Introduction to Computers Essential Understanding of Computers and Computer Operations.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
V-1 University of Washington Computer Programming I File Input/Output © 2000 UW CSE.
GOAT SEARCH Revorg GOAT Search Solution (Powered by Lucene)
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Hardware Case that houses the computer Monitor Keyboard and Mouse Disk Drives – floppy disk, hard disk, CD Motherboard Power Supply (PSU) Speakers Ports.
Overview of SQL Server Alka Arora.
Introduction To Windows Azure Cloud
Computing Hardware Starter.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Gene Matching Using JBits Steven A. Guccione Eric Keller.
Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science
Licitware a forensic software tool designed to investigate computer crimes.
INTRODUCTION TO COMPUTING
Expand your capabilities. Increase efficiency. With the Lexmark MX6500e – a powerful, versatile multifunction option device.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Introduction to Computers
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
IT253: Computer Organization
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
CS370 Spring 2007 CS 370 Database Systems Lecture 1 Overview of Database Systems.
INFORMATION MANAGEMENT Unit 2 SO 4 Explain the advantages of using a database approach compared to using traditional file processing; Advantages including.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Virtual Memory 1 1.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
A+ Guide to Managing and Maintaining Your PC Fifth Edition Chapter 23 Purchasing a PC or Building Your Own.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
Unit 1: Computing Fundamentals. Computer Tour-There are 7 major components inside a computer  Write down each major component as it is discussed.  Watch.
1 Introduction to Computers Prof. Sokol Computer and Information Science Brooklyn College.
Abstract Increases in CPU and memory will be wasted if not matched by similar performance in I/O SLED vs. RAID 5 levels of RAID and respective cost/performance.
Cloud Computing Vs RAID Group 21 Fangfei Li John Soh Course: CSCI4707.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Identify internal hardware devices (e. g
Storage HDD, SSD and RAID.
CMPE Database Systems Workshop June 16 Class Meeting
Steve Ko Computer Sciences and Engineering University at Buffalo
MapReduce Simplied Data Processing on Large Clusters
Steve Ko Computer Sciences and Engineering University at Buffalo
GateKeeper: A New Hardware Architecture
Chapter 1: The Database Environment
File Storage and Indexing
Computer Evolution and Performance
4.3 Virtual Memory.
Is the WWW a DBMS? = Fairly sophisticated search available
Introduction to Operating Systems
Virtual Memory 1 1.
Presentation transcript:

The Mercury System: Embedding Computation into Disk Drives Roger Chamberlain, Ron Cytron, Mark Franklin, Ron Indeck Center for Security Technologies Washington University in St. Louis

Enabling Technology: Disk Drives Magnetic disk storage areal density vs. year of IBM product introduction (From D. A. Thompson) ~10,000,000x increase in 45 years! (over 50% per year) Areal density (Mb/in 2 )

Cost per Megabyte Cost decreasing 3% per week! Price history of hard disk product vs. year of product introduction (From D. A. Thompson) Price per megabyte (dollars)

Massive Data Storage industry shipped 4,000,000,000,000,000,000 Bytes last year MasterCard recently installed a 200 TByte data warehouse in St. Louis US intelligence services collect data equaling the printed collection of the US Library of Congress every day!

Enabling Technology: Reconfigurable Hardware Field Programmable Gate Arrays (FPGAs) provide custom logic function capability Operate at hardware speeds Can be altered (reconfigured) in the field to meet specific application needs

What are we doing? Within the Center, we are combining the capabilities of these two enabling technologies to build extremely fast data search engines. We do this by moving the search closer to the data, and performing it in hardware rather than software.

Important Application: Intelligence Data Lots of data –Public (e.g., web pages) –Clandestine (e.g., via national technical means) Growing constantly Many perturbations of individual words –Tzar, Tsar, Czar, … Query and field types aren’t known a priori

Finding a needel in a haystack Text can contain errors Often seek an approximate match, e.g. needle No match? Try 2-transpositions enedle, needle, nedele, neelde, needel No match? Try 1-deletions eedle, nedle, nedle, neele, neede, needl No match? Try insertions, larger edits, …

Genome Application Genome maps being expanded daily –80,000 genes, 3 billion base pairs ( A,C,G,T ) Look for matches –Identify function –Disease: understand, diagnose, detect, therapy –Biofuels, warfare, toxic waste –Understand evolution –Forensics, organ donors, authentication –More effective crops, disease resistance

DNA String Matching Looking for CACGTTAGT…TAGC Interested in matches and near matches Search human genome, other gene oceans –Need to search entire data sets

Bio Computation Problem *BIG* Genome Databases A C GT G T A CA G DNA pattern DNA sequence Match?

Image Database Applications Challenging database Unstructured Massive data sets Don’t know what we need to look for in each picture

Object Recognition Face recognition Match template with image Template database must be searched Strict time constraints for matching and overall search

Washington University Campus

Satellite Data Low orbit fly-over every 90 minutes Look for differences in images –Large objects –Troops –Changes to landscape Flag, transmit these differences immediately

How do we find what we’re looking for most effectively?!

Conventional Structured Database D id Document Agent James Bond Agent mobile computer James Madison movie James Bond movie Word James computer agent Bond Inverted list - pointers Madison mobile movie

Challenges in Searching These Massive Databases If we know what we will be looking for –Need to build index beforehand –Maintain index as it changes If we don’t know what we want a priori –Need to search the whole database!

Conventional Search Hard drive Processor Memory I/O bus Hard drive

Conventional Search Hard drive Processor Memory I/O bus Hard drive find …

Conventional Search Hard drive Processor Memory I/O bus Hard drive no, no, no, yes, no … contents

Conventional Search Hard drive Processor Memory I/O bus Hard drive no, no, yes, no, no … contents

Conventional Approach

WUSTL’s Approach

Streaming Approach Hard drive Processor Memory I/O bus Memory bus Reconfigurable hardware Search Engine Hard drive Reconfigurable hardware Search Engine

Streaming Approach Hard drive Processor Memory I/O bus Memory bus Reconfigurable hardware Search Engine Hard drive Reconfigurable hardware Search Engine find …

Streaming Approach Hard drive Processor Memory I/O bus Memory bus Reconfigurable hardware Search Engine Hard drive Reconfigurable hardware Search Engine find …

Streaming Approach Hard drive Processor Memory I/O bus Memory bus Reconfigurable hardware Search Engine Hard drive Reconfigurable hardware Search Engine no, no, no, yes, no … no, no, yes, no, no …

Search Engine in Context

Reconfigurable Hardware for Text Searches

Sources of Performance Gains 1.Disk Search Parallelism: Each engine searches in parallel across a disk or disk surface 2.System Parallelism: Searching is off-loaded to search engines and main processor can perform other tasks 3.Reduced data movement overhead: Disk data moves principally to search engine, not successively over system bus, memory bus, to cache, etc. 4.Hardware logic for searching: Searching, matching, and query operations are performed on streaming data in hardware rather than in software 5.Specialized hardware logic tailored to queries: Reconfigurable hardware permits matching the query logic to the search engine logic and preserves flexibility

Technical Status Prototype operational External to an ATA/100 drive –  performance is currently disk-limited –SCSI-based RAID system under development 3 applications functional –Exact text search –Approximate text search (agrep) –Biosequence search (Smith-Waterman)

Performance Speedup relative to 1 GHz processor ApplicationDisk-limited speedup Logic-limited speedup Exact text search1.114 Approx. text search1231 Biosequence search50125

Summary Fast, inexpensive searches for large and changing databases Approximate searches supported Up to 100 times faster than standard database searches Performance is scalable and uses conventional disk drives Data Search Systems, Inc. is actively commercializing the technology