Grid Applications and Repositories Head, ANU Internet Futures, Lead, APAC Information Infrastructure Program, APAC Grid Services Architect, Grid Services Coordinator, GrangeNet
Overview Common and Uncommon Issues from Diverse “Grid” Application Areas E-Research activities Also relevant to education Large range of community ICT literacy Scholarly Input and Output Slice by issue, dice by application
The bigger context: e-Research + infrastructure The use of IT to enhance research and education! Access distributed resources transparently Make data readily and appropriately available Make collaboration easier Is it The Grid ? No, and yes – the Grid is a tool in the kit
What are the bits in eRI? Network Layer (Physical and Transmission) (Advanced) Communications Services Layer Applications, Grid, Middleware Services Layer Applications and Users…
What’s in that middle bit? Computing Visualisation Collaboration Data Instruments Middle- ware (Advanced) Communications Services Layer Applications and Users…
A (local) data architecture A Repository Object Store Files, DB, streams, instruments Metadata DB Scientific, Management, Annotation, Preservation, Access,… Access Interface Presentation Interface Disk, Tape, HSM, RAM, …
It’s not just users Other services act on users’ behalf, or each other’s Must operate within the same frameworks and standards Rep. IRP Repository Federation “Portal” or Federation interface AAA Services Metadata-flows Users Computing Collaboration Visualisation Access protocols Queries, Curation AAA flows Data Grids, Federated Repositories, Virtual Collections, … proxy This all applies even with a single repository
Application Areas - 1 Geosciences Minerals, oils and gases, tectonics Govt, Surveys, Industry Many data sources (spatial and physical) and simulations Bioinformatics Genetics, proteomics, … Public datasets, private queries, private annotations
Application Areas - 2 High Energy Physics Large expensive instruments, projects Massive data, computation and simulation Earth Systems Sciences Climate studies, oceanography Massive remote sensing data set, large and complex simulations Astronomy Big data, complex reduction process, big simulations, long-term research
Application Areas - 3 Linguistics, Musicology Archives of digitised cultural material Complex analyses Social Science Data Census, health, surveys, … Complex data structures, qualitative data Archaeology Digitised physical materials, spatial and chronological data
Application Areas - 4 Financial Many sources, SX, FX, news, … Timeliness (low-delay, high-throughput) and long time scales are important Music, Arts, Sports Performance, formal and practice Education focus
Longevity Sustainability Data formats Descriptions, C ompression, lifetimes Simplex vs Complex (compound) objects Software Algorithms, implementations, Operating Systems Versioning Recalculation, interpretation, validation, derivatives Community valuation and quality Underlying infrastructure, technologies Storage Facilities Mirroring for protection – policy and technical issues Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.
Metadata Varied research schemas 1 is nice, most have zero or five… Baseline DC Almost non-existent.. Provenance and processing Preservation, curation and valuation Subjective metadata, annotations Scientific description Itself subjective, and contentious… Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.
Lifecycles Workflows for data to be Acquired Ingested, Curated Delivered Vary over time as we learn things Vary over time as we value things Data needs to be reprocessed How does that impact the existing stored data? Workflows themselves become part of the metadata and need to be stored and managed Geo, Bio, ESS, Astro, Ling, SS, Arch, Fin, Mus.
Data Movement Performance vs political requirements Mirroring/Caching; federated repositories Movement across policy boundaries Collision with authorisation Some data cannot move from its host (in bulk) Appropriate Delivery needs Remote/field access to data Clients in a different ‘circle’ Bandwidth, compute, language, culture Movement Protocols Access protocols and inter-repository protocols One standard is great – ten are not Resource discovery Geo, HEP, Ling, SS, Arch, Fin, Mus.
Rights Needs AAA to be working, to scale Authentication, Authorisation and Accounting Requires identities and roles and policies to be understood Privacy, Security Personal information leakage Anonymised and de-identified data, needs to stay usable Ownership Not always with the researcher Time-varying Data sourced under old agreements Rights vary by status of source people die, agreements expire, … Geo, Bio, HEP, ESS, Astro, Ling, SS, Arch, Fin, Mus.
Types Digital Non-Digital Paintings, Objects, Manuscripts Semi-Digital Books, texts, images, film Quantitative and Qualitative Describing, searching and finding useful qualitative data is hard Ling, SS, Arch, Fin
Processing Data fusion Single or multiple repositories Data slicing, latitudinal searches Impacts technology choices Interfaces for non-humans computing, collaboration, visualisation Geo, Bio, Chem, HEP, ESS, Astro, Ling, SS, Fin
Summary Common and Uncommon Issues from Diverse Application Areas One size (infrastructure) does not fit all (yet) But 3-4 (40?) sizes may fit most (for now) Some domains have very different definitions of sustainability, rights issues, data movement, etc. But many don’t… User and developer education is still needed