The University of Texas Research Data Repository : “Corral” A Geographically Replicated Repository for Research Data Chris Jordan.

Slides:



Advertisements
Similar presentations
Texas Digital Library Services Preservation Network.
Advertisements

UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
The Total Cost of (Non) Ownership of Storage In The Cloud Jinesh Varia Technology Evangelist.
QCloud Queensland Cloud Data Storage and Services 27Mar2012 QCloud1.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
High Performance Computing Course Notes Grid Computing.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Disk and Tape Storage Cost Models Richard Moore & David Minor San Diego Supercomputer.
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Research Data Service at the IT Pro Forum HEIDI IMKER, DIRECTOR.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Storage Solutions The use case at the National Library of the.
WebArchiv Czech Web Archive IIPC 2007, Paris.
Discussion on Financial Models for Shared IT Services CSG 5/14/09 Notes by Jerry Grochow.
TDL Forum WEDNESDAY, APRIL 16, Agenda - Updates & Announcements ◦TCDL 2014 (Kristi) ◦Vireo Users Group Meeting (Kristi) ◦Staffing (Ryan) ◦SHARE.
Research Support Services Research Support Services.
Ontario Library Research Cloud: Building A Province-Wide Research Cloud for Ontario’s Academic Libraries Pascal V. Calarco, University of Waterloo IGeLU.
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
Digital Preservation: Lessons learned through national action Digital Preservation Interoperability Framework Workshop April 2010.
Corral: A Texas-scale repository for digital research data Chris Jordan Data Management and Collections Group Texas Advanced Computing Center.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
CSG - Research Computing Redux John Holt, Alan Wolf University of Wisconsin - Madison.
Digital Cities 2013 Survey. MAJOR PROJECTS Replaced UPS & PDU’s in City’s Primary Data Center SAN Selection and Replacement VMware 5.0 Up 1 Upgrade Improved.
NML Bioinformatics Service— Licensed Bioinformatics Tools High-throughput Data Analysis Literature Study Data Mining Functional Genomics Analysis Vector.
Center for Research Computing at Notre Dame Jarek Nabrzyski, Director
Implementing an Institutional Repository: Part III 16 th North Carolina Serials Conference March 29, 2007 Resource Issues.
ECM and Shared Services Overview AITR Meeting April 23, 2009.
Overview of the Texas Advanced Computing Center and International Partnerships Marcia Inger Assistant Director Development & External Relations April 26,
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
Lessons About Sustainability Learned from the Open Science Data Cloud Robert Grossman University of Chicago & Open Cloud Consortium.
Advanced Research Computing Projects & Services at U-M
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Data Area Report Chris Jordan, Data Working Group Lead, TACC Kelly Gaither, Data and Visualization Area Director, TACC April 2009.
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
Computing Strategies. A computing strategy should identify – the hardware, – the software, – Internet services, and – the network connectivity needed.
SAN DIEGO SUPERCOMPUTER CENTER Replication Policies for Federated Digital Repositories Robert H. McDonald Chronopolis Project Manager
Southern California Infrastructure Philip Papadopoulos Greg Hidley.
Campus Texas STaR Chart Presentation for Los Fresnos HS Technology Leadership EDTC Project 2 Jaime Villarreal.
UTA Site Report Jae Yu UTA Site Report 7 th DOSAR Workshop Louisiana State University Apr. 2 – 3, 2009 Jae Yu Univ. of Texas, Arlington.
Implementing a Security Policy JISC – ICT Security Threats & Promises, April 2002 Mick Ismail ICT Services Manager City of Wolverhampton College.
IBERGRID as RC Total Capacity: > 10k-20K cores, > 3 Petabytes Evolving to cloud (conditioned by WLCG in some cases) Capacity may substantially increase.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
The CLoud Infrastructure for Microbial Bioinformatics
5th DOSAR Workshop Louisiana Tech University Sept. 27 – 28, 2007
Digital Initiatives Technology Librarian
VI-SEEM Data Repository
XSEDE’s Campus Bridging Project
The Institute of Quantitative Social Science
USF Health Informatics Institute (HII)
HII Technical Infrastructure
Interoperability of Digital Repositories
Implementing an Institutional Repository: Part III
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Proposal for a DØ Remote Analysis Model (DØRAM)
Presenter name goes here Presenter title goes here
IT Office hours – 1 Data Sharing 101
Building a CMMI Data Infrastructure
Presentation transcript:

The University of Texas Research Data Repository : “Corral” A Geographically Replicated Repository for Research Data Chris Jordan

The University of Texas System 15 Total Institutions – 9 “Academic”, i.e. non-medical – 6 Health Institutions Most prominently: – UT Austin – UT Southwestern Medical School – MD Anderson Cancer Center – UT Health (Houston)

UT Research Cyberinfrastructure University of Texas System assessment of needs for research cyberinfrastructure: – Research data storage – High-Performance Computing – Dedicated 10 gigabit networking for research $23 Million total investment in 2011 $15 Million for network infrastructure $4 Million for research data infrastructure

UT Data Repository Requirements 5 Petabyte usable capacity Geographically replicated Open to all UT System researchers Scalable in performance and capacity Support for sharing data outside UT System Support for data management functions

UTDR Implementation: Corral 2 Installations (Austin and Arlington): – 12 Dell Servers w/ Infiniband & 10GigE – 2 Data Direct Networks storage controllers – 5PB raw capacity 3TB SATA drives – 350TB raw capacity 600GB SAS drives – 20GB/sec measured performance IBM General Parallel File System iRODS Data management software

Data Management Functions Collaboration with UT Library to develop data management planning resources Consulting support for researchers developing proposals and research plans Development support for structured data, data management tools, metadata, web Up to 5 year commitments for data storage Not yet truly “long-term” preservation

Allocation Model 5TB “free” for UT Principal Investigators Web-based account signup/allocation request Human review for allocations at present Plan to automate approvals using InCommon Larger allocations on a $250/TB/year basis Allocations often coupled with HPC or analysis resource access

Sustainability Plan Expectation of ongoing subsidy from UT for ”free” tier of storage Recharge tier provides for ongoing expansion/replacement of hardware for paying users Subsidy not required for continued operation

Adoption/Usage Over 100 PIs within first year of operation Data growth about 10% per month ~10% of PIs account for ~50% of usage Usage in almost all disciplines/departments Major adoption from genome researchers – Up to 80% of new requests in some months

Future Plans Implement true “long-term” preservation via Digital Preservation Network Expand to third site for increased replication Support HIPAA, other high-security data types Increase support of web-based data sharing

Q & A Thanks to: – UT Library Data Management team – TACC Data Management and Collections Group – Digital Preservation Network