Download presentation
Presentation is loading. Please wait.
1
DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon Department of Electrical and Computer Engineering Graduate Comitee Dr. Donald Crouch Dr. Carolyn Crouch Dr. Taek Kwon Department of Computer Science Department of Computer Science Department of Electrical and Computer Engineering
2
Background ITS sensor networks produce huge amount of data Presently used for operational and monitoring uses due to huge size of data Examples: RWIS, WIM and traffic detector networks Efficient archival/retrieval need for planning and research
3
Problem Statement Present TMC Archive –Flat zip compressed format –Difficult to extract spatially correlated data –Need for efficient archival / retrieval for spatially and/or temporally correlated data
4
Existing File Format and Archive Unified Traffic Data Format (UTDF) ###.o30 file (5760 bytes) ###.v30 file (2880 bytes) 1-byte. 1-byte 2-byte. 2-byte 00:00:00 00:00:30 00:01:00 00:01:30 00:02:00. 23:59:30 1 2 3 4 5. 2880 Zip ###.v30 & ###.o30 files for 4000 Sensors yyyymmdd.traffic file Record Time Volume Occupancy
5
Review of Large Data Archive Data Warehouse –Inflow:To get data from various systems –Upflow: Put data to a more compact from –Downflow: Put compact data form to archival storage –Outflow: Output data to consumers as required –Metaflow: To manage warehouse itself
6
Why Data Warehouse? Simplicity Better Quality of Data Fast Access Platform Independent
7
Hierarchical Data Format (HDF) File format and library for storing scientific data Software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. Platform Independent
8
Common Data Format (CDF) Self-describing data abstraction for the storage and manipulation of multi-dimensional data in discipline-independent format File format and a library Transparent data compression Platform Independent API available in C, FORTRAN, Java, and Perl
9
Creating Traffic CDF Traffic Archive traffic.cdf C Program (.EXE) CDF 2.7 C API (DLL, Lib and cdf.h file)
10
Traffic Data Archive in CDF Designing Data Structure for traffic data Setting Dimensions Setting Variances Setting CDF variables, CDF data types, CDF attributes (meta-data), and compression algorithm
11
Data Organization Record Number Sensor IDTimeVolumeOccupancy 1100:00:00105.0 1 00:00:3012.0 1 00:01:0021.0........ 1 23:59:3023.0 22 00:00:00 21.0 200:00:30 32.5 200:01:00 21.0............... 3999 00:00:00 00.0........ 4000 00:00:00 34.5 400000:00:30 23.0........ 400023:59:30 21.0
12
Variances Specification for traffic CDF rVariables Sensor IDTimeVolumeOccupancy Record Variance TRUEFALSETRUE First Dimension Variance FALSETRUE
13
CDF Compression Algorithms LevelUncompressed CDF RLEHuffmanAdaptive Huffman GZIP 167.6 MB45.4 MB36.3 MB29.8 MB18.7 MB 2-- 17.8 MB 3-- 16.9 MB 4-- 16.6 MB 5--. 6. 7. 8. 9 15.3 MB
14
Data Retrieval in CDF CDF Archive (.cdf) Station Definition ~~~~~~~~~ C Program using CDF API (.EXE) Volume Count (.txt) ~~~~~~~~~
15
Data Archive in SQL Server Traffic Data Archive (zipped Binary files) Traffic Data Archive (SQL Server 2000) Dynazip Active X control ADODB Connection 32-bit ODBC (DSN) Visual Basic Interface
16
Retrieval Task Station 1: 10069N Detectors: 3263,3264,3265,3266 Station 2: 10069S Station 492:17750W Station 1: Volume Computation 3263(Vol)+ 3264(Vol)+ 3265(Vol+ 3266(Vol) Station 2: Volume Computation Station 492: Volume Computation Text File : 10069N Total Vol 10069S Total Vol. 17750W Total Vol
17
Results on single day traffic data Binary Uncompressed CDFRDBMS Archival Time N/A5 minutes6 hours Size 40 MB16.6 MB 370 MB Retrieval Time N/A 2 minutes2 hours
18
Conclusions Transportation archive using CDF could be a better archive due to following reasons –More data storage with almost no additional storage requirements –Indexed data allowing random access –Open standard, portable and free –Can be used directly with many scientific visualization and analysis packages
19
Conclusions RDBMS is less suitable for large-scaled traffic data due to following reasons –Large storage requirements due to overheads –Retrieval is comparatively quite slow –Initial investment is expensive
20
Future Work Using XML with CDF for web Scaling CDF Adding more Features –Variables and attributes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.