Grid Applications and Repositories Head, ANU Internet Futures, Lead, APAC Information Infrastructure Program, APAC Grid Services.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
A Very Brief Introduction to iRODS
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
The Preservation and Sustainability of Research Data in Australia Dr Markus Buchhorn, Director, ICT Environments Australian National University; Also in.
Head, ANU Internet Futures Ex-Lead, APAC Information Infrastructure Program APAC Services Architect Grid Services Coordinator,
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
THE JOINED UP WORLD OF E-RESEARCH Professor Neil McLean National Technical Standards Adviser to the Department of Education Science and Training (DEST)
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
The DEER Distributed European Electronic Resource Dr Suzanne Keene Francesca Monti University College London.
1 Data services and computing. 2 We tend to be dealt the computing environment in which we must operate. Few of us have enough influence to steer the.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Research and Educational Networking and Cyberinfrastructure Russ Hobby, Internet2 Dan Updegrove, NLR University of Kentucky CI Days 22 February 2010.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
BUILDING ON COMMON GROUND: EXPLORING THE INTERSECTION OF ARCHIVES AND DATA CURATION Lizzy Rolando & Wendy Hagenmaier 6/3/2015IASSIST 2015.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Futures Lab: Biology Greenhouse gasses. Carbon-neutral fuels. Cleaning Waste Sites. All of these problems have possible solutions originating in the biology.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Supporting Further and Higher Education Collection description as Middleware The Information Environment Service Registry (IESR) Rachel Bruce, Information.
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
Joint Information Systems Committee Supporting Higher and Further Education Rachel Bruce Programme Manager, JISC Executive Collection.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Information Infrastructure Evolution ARIIC is working towards – a distributed electronic research environment that allows researchers to share, annotate,
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service Arts and Humanities e-Science Support Centre King’s.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Collection Description considerations in the nof-digitise programme Sarah Mitchell Programme Manager New Opportunities Fund.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI UMD Roadmap Steven Newhouse 14/09/2010.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Joint Information Systems Committee Repositories Support Project Summer School 2008 Amber Thomas, JISC.
Enabling Digital Earth by focussing on ‘accessibility’ rather than ‘delivery’. Ryan Fraser CSIRO.
Developing a digital repository infrastructure for King’s College London RSP Training Day, 22 nd January 2009 Gareth Knight Centre for e-Research.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
GISELA & CHAIN Workshop Digital Cultural Heritage Network
INTAROS WP5 Data integration and management
Joseph JaJa, Mike Smorul, and Sangchul Song
Australia's National Information Infrastructure for Research Markus Buchhorn Director, ICT Environments, The Australian National University (and APAC,
Brian Matthews STFC EOSCpilot Brian Matthews STFC
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Presentation transcript:

Grid Applications and Repositories Head, ANU Internet Futures, Lead, APAC Information Infrastructure Program, APAC Grid Services Architect, Grid Services Coordinator, GrangeNet

Overview  Common and Uncommon Issues from Diverse “Grid” Application Areas  E-Research activities  Also relevant to education  Large range of community ICT literacy Scholarly Input and Output Slice by issue, dice by application

The bigger context: e-Research + infrastructure  The use of IT to enhance research  and education! Access distributed resources transparently Make data readily and appropriately available Make collaboration easier  Is it The Grid ? No, and yes – the Grid is a tool in the kit

What are the bits in eRI? Network Layer (Physical and Transmission) (Advanced) Communications Services Layer Applications, Grid, Middleware Services Layer Applications and Users…

What’s in that middle bit? Computing Visualisation Collaboration Data Instruments Middle- ware (Advanced) Communications Services Layer Applications and Users…

A (local) data architecture A Repository Object Store Files, DB, streams, instruments Metadata DB Scientific, Management, Annotation, Preservation, Access,… Access Interface Presentation Interface Disk, Tape, HSM, RAM, …

 It’s not just users Other services act on users’ behalf, or each other’s Must operate within the same frameworks and standards Rep. IRP Repository Federation “Portal” or Federation interface AAA Services Metadata-flows Users Computing Collaboration Visualisation Access protocols Queries, Curation AAA flows Data Grids, Federated Repositories, Virtual Collections, … proxy This all applies even with a single repository

Application Areas - 1  Geosciences Minerals, oils and gases, tectonics Govt, Surveys, Industry Many data sources (spatial and physical) and simulations  Bioinformatics Genetics, proteomics, … Public datasets, private queries, private annotations

Application Areas - 2  High Energy Physics Large expensive instruments, projects Massive data, computation and simulation  Earth Systems Sciences Climate studies, oceanography Massive remote sensing data set, large and complex simulations  Astronomy Big data, complex reduction process, big simulations, long-term research

Application Areas - 3  Linguistics, Musicology Archives of digitised cultural material Complex analyses  Social Science Data Census, health, surveys, … Complex data structures, qualitative data  Archaeology Digitised physical materials, spatial and chronological data

Application Areas - 4  Financial Many sources, SX, FX, news, … Timeliness (low-delay, high-throughput) and long time scales are important  Music, Arts, Sports Performance, formal and practice Education focus

Longevity  Sustainability Data formats  Descriptions, C ompression, lifetimes  Simplex vs Complex (compound) objects Software  Algorithms, implementations, Operating Systems Versioning  Recalculation, interpretation, validation, derivatives  Community valuation and quality  Underlying infrastructure, technologies Storage Facilities Mirroring for protection – policy and technical issues Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.

Metadata  Varied research schemas 1 is nice, most have zero or five…  Baseline DC Almost non-existent..  Provenance and processing  Preservation, curation and valuation  Subjective metadata, annotations  Scientific description Itself subjective, and contentious… Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.

Lifecycles  Workflows for data to be Acquired Ingested, Curated Delivered  Vary over time as we learn things  Vary over time as we value things  Data needs to be reprocessed How does that impact the existing stored data?  Workflows themselves become part of the metadata and need to be stored and managed Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.

Data Movement  Performance vs political requirements Mirroring/Caching; federated repositories Movement across policy boundaries  Collision with authorisation Some data cannot move from its host (in bulk)  Appropriate Delivery needs Remote/field access to data Clients in a different ‘circle’  Bandwidth, compute, language, culture  Movement Protocols Access protocols and inter-repository protocols One standard is great – ten are not Resource discovery Geo, HEP, Ling, SS, Arch, Fin, Mus.

Rights  Needs AAA to be working, to scale Authentication, Authorisation and Accounting Requires identities and roles and policies to be understood  Privacy, Security Personal information leakage Anonymised and de-identified data,  needs to stay usable  Ownership Not always with the researcher  Time-varying Data sourced under old agreements Rights vary by status of source  people die, agreements expire, … Geo, Bio, HEP, ESS, Astro, Ling, SS, Arch, Fin, Mus.

Types  Digital  Non-Digital Paintings, Objects, Manuscripts  Semi-Digital Books, texts, images, film  Quantitative and Qualitative Describing, searching and finding useful qualitative data is hard Ling, SS, Arch, Fin

Processing  Data fusion Single or multiple repositories  Data slicing, latitudinal searches Impacts technology choices  Interfaces for non-humans computing, collaboration, visualisation Geo, Bio, Chem, HEP, ESS, Astro, Ling, SS, Fin

Summary  Common and Uncommon Issues from Diverse Application Areas  One size (infrastructure) does not fit all (yet) But 3-4 (40?) sizes may fit most (for now)  Some domains have very different definitions of sustainability, rights issues, data movement, etc. But many don’t…  User and developer education is still needed