CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)

Slides:



Advertisements
Similar presentations
NOAAs Comprehensive Large-data Array Stewardship System (CLASS) Robert Rank NOAA CLASS Program Chris Elvidge – NOAA-NGDC January 22, 2006.
Advertisements

Archive Task Team (ATT) Disk Storage Stuart Doescher, USGS (Ken Gacke) WGISS-18 September 2004 Beijing, China.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
XenData SX-520 LTO Archive Servers A series of archive servers based on IT standards, designed for the demanding requirements of the media and entertainment.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
A new Network Concept for transporting and storing digital video…………
XProtect® Expert 2013 Product presentation
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Chapter 1: Introduction
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
Figure 1.1 Interaction between applications and the operating system.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
SESSION 9 THE INTERNET AND THE NEW INFORMATION NEW INFORMATIONTECHNOLOGYINFRASTRUCTURE.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
Comprehensive Large Array-data Stewardship System (CLASS) Web Site Tutorial Visit CLASS Site at
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
XenData Digital Archives Simplify your video archive workflow XenData LTO Video Archive Solutions Overview © Copyright 2013 XenData Limited.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Week #10 Objectives: Remote Access and Mobile Computing Configure Mobile Computer and Device Settings Configure Remote Desktop and Remote Assistance for.
Module 10 Configuring and Managing Storage Technologies.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
OCLC Online Computer Library Center CONTENTdm Migration Training Craig Yamashita Vice President, Technology and Product Development DiMeMa, Inc. July 2005.
Tutorial 11 Installing, Updating, and Configuring Software
GridFS Targeting Data Sharing in Grid Environments Marcelo Nery dos Santos / Renato Cerqueira PUC-Rio, Brazil Presented by: Francisco Silva.
Ch Review1 Review Chapter Microcomputer Systems Hardware, Software, and the Operating System.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Using the SAS® Information Delivery Portal
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
Managing and Monitoring Windows 7 Performance Lesson 8.
Module 8 Configuring Mobile Computing and Remote Access in Windows® 7.
Global Land Cover Facility The Global Land Cover Facility (GLCF) is a member of the Earth Science Information Partnership (ESIP) Federation providing data,
The 2000 Decennial Census School District Project: Using Census Data for the School District Mapping System **** Development and Implementation Tai A.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
CD ASTER Scenario: Backward Chaining INSERTION RETRIEVAL PRODUCTION Subscribe Search & Order Store External Data Provider User Deliver Generate.
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
ASI-Eumetsat Meeting Matera, 4-5 Feb CNM Context Matera, February 4-5, 20092ASI-Eumetsat Meeting.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Week #3 Objectives Partition Disks in Windows® 7 Manage Disk Volumes Maintain Disks in Windows 7 Install and Configure Device Drivers.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Conductor at HiROC Bradford Castalia 19 September, 2007 PSI Designing and Implementing Processing Pipelines with Conductor: The HiROC Experience.
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
NOAA Report WGISS 19 Climate and Meteorology Status Glenn K. Rutledge NOAA Cordoba, Argentina March 7,2005.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
VMware vSphere Configuration and Management v6
Jini Architecture Introduction System Overview An Example.
06-1L ASTRO-E2 ASTRO-E2 User Group - 14 February, 2005 Astro-E2 Archive Lorella Angelini/HEASARC.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
A Technical Overview Bill Branan DuraCloud Technical Lead.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
© 2012 IBM Corporation IBM Linear Tape File System (LTFS) Overview and Demo.
ITMT 1371 – Window 7 Configuration 1 ITMT Windows 7 Configuration Chapter 8 – Managing and Monitoring Windows 7 Performance.
Extending Auto-Tiering to the Cloud For additional, on-demand, offsite storage resources 1.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Compute and Storage For the Farm at Jlab
TV Broadcasting What to look for Architecture TV Broadcasting Solution
WP18, High-speed data recording Krzysztof Wrona, European XFEL
22 September 2017, ESA/ESRIN - Frascati
CSI 400/500 Operating Systems Spring 2009
XenData SX-550 LTO Archive Servers
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
Intermountain West Data Warehouse
Hadoop Technopoints.
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Presentation transcript:

CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead) November 3, 2005

2 Agenda What is CLASS? Overview of the CLASS System Distributed Redundant Archive CLASS Cache Management Current and Future CLASS Data Volumes Dealing with Larger Data Volumes Information Management Research Case of CLASS Scalability Questions?

3 What is CLASS? CLASS stands for Comprehensive Large Array Stewardship System. CLASS is a web-based data archive and distribution system for NOAA’s environmental data. Mission Statement: NOAA's National Data Centers and their world-wide clientele of customers look to CLASS as the sole NOAA IT infrastructure project in which all NOAA’s current and future environmental data sets will reside. CLASS provides permanent, secure storage, and safe, efficient data discovery and access between the Data Centers and the customers.

4 Overview of the CLASS System

5 Overview of CLASS Activity Controller Controls all activities for the back end systems. Configure processing paths. Processing path are a group of activities that are executed in a specified sequence. There is a trigger for each activity of a processing path. Each activity is comprised of a process and its parameters. Each process has configurable environment specifications.

6 Overview of CLASS Ingest Checks for files to ingest periodically. These files are either push to CLASS or CLASS pulls them from the data provider. CLASS can also worth with manifest files that contain list of files to ingest. Extracts metadata. Creates inventory records in the database. Archives the data into a robotic tape archive. Puts the data in the local cache. Generates browse data files for AVHRR and GOES data types. Starts the subscription process.

7 Overview of CLASS Delivery The delivery system processes orders. Retrieves order information from the database. Locates files in the temporary or permanent caches or retrieves the files for the order from the robotic tape. Performs data extraction, sub-setting, conversion, etc. upon user request through the order. Encrypts data that is restricted. Generates digital signatures on all files for user that request digital signatures. Copies the order data in the CLASS FTP area. Pushes the data to subscriber users that have requested it. Notifies the user that there data is ready.

8 Overview of the CLASS WEB Interface Users can register and order data for free. Tomcat, Cocoon, XSL, Java, and Java script are used display information to the user it to the user. The web interface uses the VisServer to generate browse images to display to the user. The web interface uses the InvServer to search the inventory and retrieve search results. Users can place data into the shopping cart where it can be ordered. Users can update user preferences and profiles Approved users can manage user subscriptions URL:

9 CLASS Cache Management Manages files in three types of caches: permanent, temporary, and delivery. To save disk space, files are store in the temporary cache for a limited time and are removed once the demand for them is gone. Files not on the on-line caches can be retrieved from the robotic tape archive and store in the temporary cache for a limited time. Tracks activities on files and file location in the cache. Parameterization of file cleanup and file storage. Operator interface access to manage caches.

10 Distributed Redundant Archive Ingest process Operational inventory Archiver Archive interchange Robotic storage Provider Ingest process Operational inventory Archiver Archive interchange Robotic storage Suitland Asheville Operational datastore

11 Current and Future Data Volumes Current: –Ingest: 71 GB/Day (average) –Distribute: 120 GB/Day (average) Future (2010): –Ingest: 8+ TB/Day –Distribute: 48+ TB/Day (estimate: 6 times the ingest volume

12 Dealing with Larger Data Volumes Hardware and Communication studies recommended upgrades –CPUs: from GHz to GHz. –Increase RAM from 4 to 8 GB. –Increase processor to memory bandwidth – 25.5 GB/sec. –Increase Remote I/O bandwidth – 8.8 GB/sec. –SAN for all fibre channel transfers. –Shared File System (SFS) to eliminate unnecessary file copies. –System scalability for easy addition of new hardware. This upgrade will handle the immediate increase of data volume for EOS and NPP data: Ingest: 4 TB/Day Delivery: 24 TB/Day

13 Information Management Research Long Term Architecture (LTA) –CLASS Node Study –Reprocessing –APIs for access to CLASS –Data Models –External repositories and systems CLASS Near Term Upgrade –Upgrade the CLASS Ingest system for optimization and easier integration of new data streams. –Upgrade the CLASS Delivery system for optimization and easier integration of new data streams. –Implement an Order Generator to centralize order generation. Needed for API access CLASS ordering. Integrating with other systems for CLASS metadata like the NMMR (NOAA Metadata Manager's Repository). Delivery of data on physical media.

14 Case of CLASS Scalability Historical ingest of GOES data. Goal was 400 GB/Day. Started at 100 GB/Day. Now as high as 900+ GB/Day Changes to increase the ingest rate: –Parallelized the ingest process across multiple servers. –Switched from shared disks to local disks for temporary directories and some caches. –Increase the RAM from 2 GB to 4GB. –Re-configured, in the database, the number of Ingest processes that could run at a time to maximize ingest throughput. –Turned off operational ingest of GOES data at NCDC and turned it on at Suitland. –Added GigE network cards to the servers to increase data transfers. –No software changes were made to increase the ingest rate. Only software and hardware configurations.

15 Open Discussion