The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

Slides:



Advertisements
Similar presentations
What does LOFAR have to do with the Virtual Observatory (VO)? LOFAR Science Day 16 December 2003 Melbourne David Barnes The University of Melbourne.
Advertisements

Remote Visualisation System (RVS) By: Anil Chandra.
The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and.
ESO-ESA Existing Activities Archives, Virtual Observatories and the Grid.
Development of China-VO ZHAO Yongheng NAOC, Beijing Nov
Solar and STP Physics with AstroGrid 1. Mullard Space Science Laboratory, University College London. 2. School of Physics and Astronomy, University of.
Web Services Andrea Miller Ryan Armstrong Alex. Web services are an emerging technology that offer a solution for providing a common collaborative architecture.
Leicester Database & Archive Service J. D. Law-Green, J. P. Osborne, R. S. Warwick X-Ray & Observational Astronomy Group, University of Leicester What.
AstroGrid Group 7: Teemu Toivola Tero Viitala. Problem several separate databases no common interface between databases difficulties of joining related.
Developing PANDORA Mark Corbould Director, IT Business Systems.
Aus-VO: Progress in the Australian Virtual Observatory Tara Murphy Australia Telescope National Facility.
The Virtual Observatory and other data issues computers and astronomy today background technology the future : opportunities and problems VO vision VO.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Virtual Observatory --Architecture and Specifications Chenzhou Cui Chinese Virtual Observatory (China-VO) National Astronomical Observatory of China.
A long tradition. e-science, Data Centres, and the Virtual Observatory why is e-science important ? what is the structure of the VO ? what then must we.
1 27-Sept-2004Andy Lawrence : IVOA workshop, Pune State of the IVOA state of the VO state of the IVOA.
The Japanese Virtual Observatory (JVO) Yuji Shirasaki National Astronomical Observatory of Japan.
Deploying the AstroGrid: Science Use for the Black Hole Census Deploying the AstroGrid: Science Use for the Black Hole Census Nicholas Walton Institute.
Alex Szalay, Jim Gray Analyzing Large Data Sets in Astronomy.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Last News of and
Talk structure who are we ? what is a VO ? what are the challenges ? what is an e-project ? Andy Lawrence Garching June 2002.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
Computing for Space Science – Current practice and future challenges Peter Allan Head, Space Data Division.
NEON Obs School 11-Aug-2005 Archival Data and Virtual Observatories 1 Virtual Observatories...or how to do your research from a beach in the Bahamas rather.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
AstroGrid status SOFT the VO the future AstroGrid presentation to GSC Andy Lawrence July 2003.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 10-June-2004Andy Lawrence : PPARC data curation panel meeting AstroGrid, Data Centres, & Edinburgh What is curation ? Data Centres in the VO era Data.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
Federated Discovery and Access in Astronomy Robert Hanisch (NIST), Ray Plante (NCSA)
Networking Relationships What is a computer network?
Understand how the future will work exchange information on key projects understand PPARC priorities debate (conclude?) community approach to the PPARC.
Research Networks and Astronomy Richard Schilizzi Joint Institute for VLBI in Europe
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
The Project The Virtual Observatory Technical Progress Andy Lawrence Nottingham All-Hands meeting Sept 2003 AstroGrid
Metadata for the Web Andy Powell UKOLN University of Bath
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
A PPARC funded project Workflow and Job Control in Astrogrid Jeff Lusted Dept Physics and Astronomy University of Leicester.
The Virtual Observatory Europe and the VO: the Astrophysical Virtual Observatory and the EURO-VO Astrophysical Virtual Observatory and the EURO-VO Paolo.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EURO-VO Structure Data Centre Alliance (DCA) A collaborative and operational network of European data centres who, by the uptake of new VO technologies.
Who are we ? what is a VO ? what is a Grid ? how do we get there ? Andy Lawrence S.P.I.E. Hawaii Aug 2002 AstroGrid
1 18-Nov-2004Andy Lawrence :VO-TECH workshop, Cambridge VO-TECH : Intro Euro-VO VO-TECH project Key issues Goals of meeting.
The International Virtual Observatory Alliance (IVOA) interoperability in action.
1 8-Jun-2004Andy Lawrence : PharmaGrid talk, Diessenhofen The Virtual Observatory The VObs concept The structure of the VObs Standards, Standards, Standards.
Large Area Surveys - I Large area surveys can answer fundamental questions about the distribution of gas in galaxy clusters, how gas cycles in and out.
Sharing scientific data: astronomy as a case study for a change in paradigm Présenté par Françoise Genova.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
A centre of expertise in digital information management Shaping the e-future? Grids, Web Services and Digital Libraries Professor Tony.
The Large Synoptic Survey Telescope Project Bob Mann Wide-Field Astronomy Unit University of Edinburgh.
1 15-Dec-2004Andy Lawrence : AstroGrid Consortium Meeting, Edinburgh AstroGrid2 and Euro-VO AstroGrid and AVO Euro-VO VO-TECH project How to manage the.
Introduction to the VO ESAVO ESA/ESAC – Madrid, Spain.
1 14-Dec-2004Andy Lawrence : AstroGrid Consortium Meeting, Edinburgh Meeting Goals review achievements review architecture identify targets for AG2 and.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Introduction: AstroGrid increases scientific research possibilities by enabling access to distributed astronomical data and information resources. AstroGrid.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
Grid Optical Burst Switched Networks
Recap: introduction to e-science
Moving towards the Virtual Observatory Paolo Padovani, ST-ECF/ESO
University of Technology
Introduction to D4Science
Long-Term Preservation of Astronomical Research Results
Google Sky.
Google Sky.
CEA Experiences Paul Harrison ESO.
Presentation transcript:

The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

astronomical archives (1)

IT in astronomy : key areas (1) facility operations (2) facility output processing (3) shared supercomputers for theory (4) science archives (5) end-user tools (1-3) : big bucks (4-5) : smaller bucks but - produces the final science output - sets requirements for (1-2)

astronomical archives major archives growing at TB/yr

astronomical archives major archives growing at TB/yr issue not storage but management (curation) improving quality of data access and presentation needs specialist data centres

end users increasing fraction of archive re-use

end users increasing fraction of archive re-use increasing multi-archive use most download small files and analyse at home some users process whole databases reduction standardised; analysis home grown

needles in a haystack Hambly et al faint moving object is a cool white dwarf - may be solution to the dark matter problem - but hard to find : one in a million - even harder across multiple archives

failed stars compare optical and infra-red extra object is very cold a "brown dwarf" or failed star

multi- views of a Supernova Remnant Shocks seen in the X- ray Heavy elements seen in the optical Dust seen in the IR Relativistic electrons seen in the radio

solar-terrestrial links Coronal mass ejection imaged by space- based solar observatory Effect detected hours later by satellites and ground radar

background technology (2)

dogs and fleas there is a very large dog

hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps

hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps but device bw 10x/decade –real PC disks 10MB/s; fibre channel SCSI poss 100MB/s and last mile problem remains –end-end b/w typically 10Mbps

operations on a TB database searching at 10 MB/s takes a day –solved by parallelism –but development non-trivial ==> people transfer at 10 Mbps takes a week –leave it where it is ==> data centres provide search and analysis services

network development higher level protocols ==> transparency TCP/IP message exchange HTTPdoc sharing (web) grid suiteCPU sharing XML/SOAP data exchange ==> service paradigm

next up on the internet workflow definition dynamic semantics (ontology) software agents

the problems (3)

data growth astronomical data is growing fast but so is computing power so whats the problem ? (1) Heterogeneity (2) End user delivery (3) End user demand

data rich future heritage –Schmidt, IRAS, Hipparcos current hits –VLT, SDSS, 2MASS, HST, Chandra, XMM, WMAP coming up : –UKIDSS, VISTA, ALMA, JWST, Planck, Herschel cross fingers : –LSST, ELT, Lisa, Darwin,SKA, XEUS, etc. plus lots more issue is archive interoperability –need standards and transparent infrastructure

archive data rates map the sky : 0.1" x 16 bits = 100 TB process to find objects : billion row tables VISTA 100 TB/yr by 2007 SKA datacubes 100PB/yr by 2020 not a technical or financial problem –LHC doing 100PB/yr by 2007 issue is logistic : data management need professional data centres

data rates : user delivery disk I/O and bandwidth –end-user bottlenecks will get WORSE –but links between data centres can be good move from download to service paradigm –leave the data where it is –operations on data (search, cluster analysis, etc) as services –shift the results not the data –networks of collaborating data centres (datagrid or VO)

user demands bar constantly raising –online ease –multi-archive transparency –easy data intensive science new requirements –automated resource discovery (intelligent Google) –cheap I/O and CPU cycles –new standards and software infrastructure

the virtual observatory (4)

the VO concept web all docs in the world inside your PC VO all databases in the world inside your PC

Generic science drivers data growth multi-archive science large database science can do all this now, but needs to be fast and easy empowerment Beijing as good as Berkeley

whats its not not a monolith not a warehouse

VO framework framework + standards inter-operable data inter-operable software modules no central VO-command - its not a thing - its a way of life

VO geometry not a warehouse not a hierarchy not a peer-to-peer system small set of service centres and large population of end users –note : latest hot database lives with creators / curators

yesterday browser front end CGI request html web page DB engine SQL data

today application web service SOAP/XML request SOAP/XML data DB engine SQL native data anything standard formats

tomorrow application web service job results anything web service web service web service web service web service Registry Workflow GLUE Certification VO Space standard semantics publish WSDL grid connected

publishing metaphor facilities are authors data centres are publishers VO portals are shops end-users are readers VO infrastructure is distribution system.

International VO alliance (IVOA)

IVOA standards formal process modelled on W3C technical working groups and interop workshops agreed functionality roadmap

IVOA standards key standards so far –table formats –resource and service metadata definitions –semantic dictionary –protocols for image and spectrum access coming year –grid and web service interfaces –authentication –storage sharing protocols –application metadata and interfaces

state of implementations key projects : AstroGrid, US-NVO, Euro-VO many compliant data services VO aware tools mutually harvesting registries workflow system simple shared storage AstroGrid has ~100 registered users first science results coming out

coming year single sign on internationally shared storage NGS link up many more tools

next steps intelligent glue –ontology, agents analysis services –cluster analysis, multi-D visualisation, etc theory services –simulated data, models on demand embedding facilities –VO ready facilities –links to data creation

lessons

drivers: end user bottleneck end user demand empowerment need network of healthy data centres need last mile investment need facilities to be VO ready need continuing technology development need continuing standards programme

FIN