Download presentation
Presentation is loading. Please wait.
Published byDulcie Hunt Modified over 9 years ago
1
Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue B. Moon and Timothy Roscoe
2
5/25/2001NRDM 20012 Overview Sprint IP Monitoring Project Types of Data Types of Analysis Experience and Challenges Metadata Abstractions and Model Design and Implementation
3
5/25/2001NRDM 20013 Sprint IP Monitoring Project Design Goal: to acquire data without sampling or insufficient accuracy. System Components: –Linux PC with 3 PCI buses and 100GB –DAG card with OC3 to OC48 support and GPS. –SAN-based analysis platform –Data repository
4
5/25/2001NRDM 20014 Configuration at Monitored PoP customer
5
5/25/2001NRDM 20015 Analysis Platform and Data Repository at Sprint ATL
6
5/25/2001NRDM 20016 Types of Collected Data Packet trace of 50 to 100GB –44 byte packet header + 12 byte framing info per packet BGP routing tables IS-IS tables PoP configuration (topology)
7
5/25/2001NRDM 20017 Types of Analysis Simple statistics gathering Isolation of TCP flows Trace correlation Generation of traffic matrices
8
5/25/2001NRDM 20018 Challenges Total amount of data > 10 TB –What to keep on-line and off-line Sharing data and results –What has been computed/generated Correlating different types of data –E.g. packet traces with routing tables Determining s/w dependency Reproducibility of results
9
5/25/2001NRDM 20019 Task Abstraction Storage of data –Ad-hoc solution: disk arrays, SAN, tape library Source code maintenance –CVS Metadata management –Our focus in this work
10
5/25/2001NRDM 200110 Metadata Abstraction Raw input data sets Result data sets Analysis programs –Versions of s/w Analysis operations –between data sets and programs
11
5/25/2001NRDM 200111 Design and Implementation Dependency graph in relational database schema => RDBMS Interaction with version control –S/W major release Linkage to data storage system –Make raw data set self-describing –Metadata independent of data location User interface –Browsing DB thru GUI and capturing analysis operations by simple command scripts.
12
5/25/2001NRDM 200112 Conclusion and Future Work Flexible and minimally intrusive Extensions: –Automatic storage management –Result caching –Job scheduling –Automation of analysis Will results be easily reproducible? Will users adapt to the new discipline?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.