DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. by Nirish Dhruv Department of Computer.

Slides:



Advertisements
Similar presentations
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Advertisements

/2829 November 2007 WDF-Presentation V Common Wind Tunnel Data Format.
Kien A. Hua Division of Computer Science University of Central Florida.
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
Dwarf: A High Performance OLAP Engine Nick Roussopoulos ACT Inc. & UMD.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Managing Data Resources
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
DATA WAREHOUSING.
Programming Languages Structure
© Prentice Hall CHAPTER 3 Computer Software.
The Etree Library: A System for Manipulating Large Octrees on Disk David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
ODBC Open DataBase Connectivity a standard database access method developed by Microsoft to access data from any application regardless of which database.
Chapter 11 Databases.
Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes.
Session 1 - Introduction and Data Access Layer
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
CISC105 General Computer Science Class 1 – 6/5/2006.
3. Multimedia Systems Technology
NPP/ NPOESS Product Data Format Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPOAlgorithm / System EngineeringData / Information Architecture
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Spatiotemporal Tile Indexing Scheme Oscar Pérez Cruz Polytechnic University of Puerto Rico Mentor: Dr. Ranga Raju Vatsavai Computational Sciences and Engineering.
A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability Hyun ChoUniversity of Alabama at Birmingham Jeff GrayUniversity.
N P O E S S I N T E G R A T E D P R O G R A M O F F I C E NPP/ NPOESS Product Data Format Richard E. Ullman NOAA/NESDIS/IPO NASA/GSFC/NPP Algorithm Division.
CS266 Software Reverse Engineering (SRE) Reversing and Patching Java Bytecode Teodoro (Ted) Cipresso,
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
MULTIMEDIA DATABASES -Define data -Define databases.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Technology and Science, Osaka University Dependence-Cache.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
Marlo Maddox Code 587 Advanced Data Management & Analysis Branch HDF/HDF-EOS Workshop VII - Silver Spring, MD September 23 – 25, 2003 An Evaluation of.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
- 1 - HDF5, HDF-EOS and Geospatial Data Archives HDF and HDF-EOS Workshop VII September 24, 2003.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering.
DATABASE CONNECTIVITY TO MYSQL. Introduction =>A real life application needs to manipulate data stored in a Database. =>A database is a collection of.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.
HDF and HDF-EOS Workshop VII September 24, 2003 HDF5, HDF-EOS and Geospatial Data Archives Don Keefer Illinois State Geological Survey Mike Folk Univ.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
8 th Semester, Batch 2009 Department Of Computer Science SSUET.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
® Sponsored by Improving Access to Point Cloud Data 98th OGC Technical Committee Washington DC, USA 8 March 2016 Keith Ryden Esri Software Development.
NET8 Protocol Analysis & Emulation Guided by Dr. Ran Giladi Students: Michal Bukai Ran Steinherz.
Big data toolbox.
Big Data is a Big Deal!.
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
HDF5 for Real-Time and/or Embedded Test Data
SRNWP Interoperability Workshop
CSCI-235 Micro-Computer Applications
EVLA Archive The EVLA Archive is the E2E Archive
What is FITS? FITS = Flexible Image Transport System
Computer Science I CSC 135.
MANAGING DATA RESOURCES
DATABASE SYSTEM UNIT I.
SDM workshop Strawman report History and Progress and Goal.
MANAGING DATA RESOURCES
Overview of big data tools
IPDA July 2013 CDF and PDS Todd King, Joseph Mafi, Steven Joy.
ບົດທີ 6 ການຄຸ້ມຄອງຊັບພະຍາກອນຂໍ້ມູນ (Managing Data Resource)
Presentation transcript:

DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon Department of Electrical and Computer Engineering Graduate Comitee Dr. Donald Crouch Dr. Carolyn Crouch Dr. Taek Kwon Department of Computer Science Department of Computer Science Department of Electrical and Computer Engineering

Background ITS sensor networks produce huge amount of data Presently used for operational and monitoring uses due to huge size of data Examples: RWIS, WIM and traffic detector networks Efficient archival/retrieval need for planning and research

Problem Statement Present TMC Archive –Flat zip compressed format –Difficult to extract spatially correlated data –Need for efficient archival / retrieval for spatially and/or temporally correlated data

Existing File Format and Archive Unified Traffic Data Format (UTDF) ###.o30 file (5760 bytes) ###.v30 file (2880 bytes) 1-byte. 1-byte 2-byte. 2-byte 00:00:00 00:00:30 00:01:00 00:01:30 00:02:00. 23:59: Zip ###.v30 & ###.o30 files for 4000 Sensors yyyymmdd.traffic file Record Time Volume Occupancy

Review of Large Data Archive Data Warehouse –Inflow:To get data from various systems –Upflow: Put data to a more compact from –Downflow: Put compact data form to archival storage –Outflow: Output data to consumers as required –Metaflow: To manage warehouse itself

Why Data Warehouse? Simplicity Better Quality of Data Fast Access Platform Independent

Hierarchical Data Format (HDF) File format and library for storing scientific data Software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. Platform Independent

Common Data Format (CDF) Self-describing data abstraction for the storage and manipulation of multi-dimensional data in discipline-independent format File format and a library Transparent data compression Platform Independent API available in C, FORTRAN, Java, and Perl

Creating Traffic CDF Traffic Archive traffic.cdf C Program (.EXE) CDF 2.7 C API (DLL, Lib and cdf.h file)

Traffic Data Archive in CDF Designing Data Structure for traffic data Setting Dimensions Setting Variances Setting CDF variables, CDF data types, CDF attributes (meta-data), and compression algorithm

Data Organization Record Number Sensor IDTimeVolumeOccupancy 1100:00: :00: :01: :59: :00: :00: :01: :00: :00: :00: :59:

Variances Specification for traffic CDF rVariables Sensor IDTimeVolumeOccupancy Record Variance TRUEFALSETRUE First Dimension Variance FALSETRUE

CDF Compression Algorithms LevelUncompressed CDF RLEHuffmanAdaptive Huffman GZIP MB45.4 MB36.3 MB29.8 MB18.7 MB MB MB MB MB

Data Retrieval in CDF CDF Archive (.cdf) Station Definition ~~~~~~~~~ C Program using CDF API (.EXE) Volume Count (.txt) ~~~~~~~~~

Data Archive in SQL Server Traffic Data Archive (zipped Binary files) Traffic Data Archive (SQL Server 2000) Dynazip Active X control ADODB Connection 32-bit ODBC (DSN) Visual Basic Interface

Retrieval Task Station 1: 10069N Detectors: 3263,3264,3265,3266 Station 2: 10069S Station 492:17750W Station 1: Volume Computation 3263(Vol)+ 3264(Vol)+ 3265(Vol+ 3266(Vol) Station 2: Volume Computation Station 492: Volume Computation Text File : 10069N Total Vol 10069S Total Vol W Total Vol

Results on single day traffic data Binary Uncompressed CDFRDBMS Archival Time N/A5 minutes6 hours Size 40 MB16.6 MB 370 MB Retrieval Time N/A 2 minutes2 hours

Conclusions Transportation archive using CDF could be a better archive due to following reasons –More data storage with almost no additional storage requirements –Indexed data allowing random access –Open standard, portable and free –Can be used directly with many scientific visualization and analysis packages

Conclusions RDBMS is less suitable for large-scaled traffic data due to following reasons –Large storage requirements due to overheads –Retrieval is comparatively quite slow –Initial investment is expensive

Future Work Using XML with CDF for web Scaling CDF Adding more Features –Variables and attributes