The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

astronomical archives (1)

IT in astronomy : key areas (1) facility operations (2) facility output processing (3) shared supercomputers for theory (4) science archives (5) end-user tools (1-3) : big bucks (4-5) : smaller bucks but - produces the final science output - sets requirements for (1-2)

astronomical archives major archives growing at TB/yr

astronomical archives major archives growing at TB/yr issue not storage but management (curation) improving quality of data access and presentation needs specialist data centres

end users increasing fraction of archive re-use

end users increasing fraction of archive re-use increasing multi-archive use most download small files and analyse at home some users process whole databases reduction standardised; analysis home grown

needles in a haystack Hambly et al 2001 - faint moving object is a cool white dwarf - may be solution to the dark matter problem - but hard to find : one in a million - even harder across multiple archives

failed stars compare optical and infra-red extra object is very cold a "brown dwarf" or failed star

multi- views of a Supernova Remnant Shocks seen in the X- ray Heavy elements seen in the optical Dust seen in the IR Relativistic electrons seen in the radio

solar-terrestrial links Coronal mass ejection imaged by space- based solar observatory Effect detected hours later by satellites and ground radar

background technology (2)

dogs and fleas there is a very large dog

hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps

hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps but device bw 10x/decade –real PC disks 10MB/s; fibre channel SCSI poss 100MB/s and last mile problem remains –end-end b/w typically 10Mbps

operations on a TB database searching at 10 MB/s takes a day –solved by parallelism –but development non-trivial ==> people transfer at 10 Mbps takes a week –leave it where it is ==> data centres provide search and analysis services

network development higher level protocols ==> transparency TCP/IP message exchange HTTPdoc sharing (web) grid suiteCPU sharing XML/SOAP data exchange ==> service paradigm

next up on the internet workflow definition dynamic semantics (ontology) software agents

the problems (3)

data growth astronomical data is growing fast but so is computing power so whats the problem ? (1) Heterogeneity (2) End user delivery (3) End user demand

data rich future heritage –Schmidt, IRAS, Hipparcos current hits –VLT, SDSS, 2MASS, HST, Chandra, XMM, WMAP coming up : –UKIDSS, VISTA, ALMA, JWST, Planck, Herschel cross fingers : –LSST, ELT, Lisa, Darwin,SKA, XEUS, etc. plus lots more issue is archive interoperability –need standards and transparent infrastructure

archive data rates map the sky : 0.1" x 16 bits = 100 TB process to find objects : billion row tables VISTA 100 TB/yr by 2007 SKA datacubes 100PB/yr by 2020 not a technical or financial problem –LHC doing 100PB/yr by 2007 issue is logistic : data management need professional data centres

data rates : user delivery disk I/O and bandwidth –end-user bottlenecks will get WORSE –but links between data centres can be good move from download to service paradigm –leave the data where it is –operations on data (search, cluster analysis, etc) as services –shift the results not the data –networks of collaborating data centres (datagrid or VO)

user demands bar constantly raising –online ease –multi-archive transparency –easy data intensive science new requirements –automated resource discovery (intelligent Google) –cheap I/O and CPU cycles –new standards and software infrastructure

the virtual observatory (4)

the VO concept web all docs in the world inside your PC VO all databases in the world inside your PC

Generic science drivers data growth multi-archive science large database science can do all this now, but needs to be fast and easy empowerment Beijing as good as Berkeley

whats its not not a monolith not a warehouse

VO framework framework + standards inter-operable data inter-operable software modules no central VO-command - its not a thing - its a way of life

VO geometry not a warehouse not a hierarchy not a peer-to-peer system small set of service centres and large population of end users –note : latest hot database lives with creators / curators

yesterday browser front end CGI request html web page DB engine SQL data

today application web service SOAP/XML request SOAP/XML data DB engine SQL native data anything standard formats

tomorrow application web service job results anything web service web service web service web service web service Registry Workflow GLUE Certification VO Space standard semantics publish WSDL grid connected

publishing metaphor facilities are authors data centres are publishers VO portals are shops end-users are readers VO infrastructure is distribution system.

International VO alliance (IVOA)

IVOA standards formal process modelled on W3C technical working groups and interop workshops agreed functionality roadmap

IVOA standards key standards so far –table formats –resource and service metadata definitions –semantic dictionary –protocols for image and spectrum access coming year –grid and web service interfaces –authentication –storage sharing protocols –application metadata and interfaces

state of implementations key projects : AstroGrid, US-NVO, Euro-VO many compliant data services VO aware tools mutually harvesting registries workflow system simple shared storage AstroGrid has ~100 registered users first science results coming out

coming year single sign on internationally shared storage NGS link up many more tools

next steps intelligent glue –ontology, agents analysis services –cluster analysis, multi-D visualisation, etc theory services –simulated data, models on demand embedding facilities –VO ready facilities –links to data creation

lessons

drivers: end user bottleneck end user demand empowerment need network of healthy data centres need last mile investment need facilities to be VO ready need continuing technology development need continuing standards programme

The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

Similar presentations

Presentation on theme: "The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

Similar presentations

Presentation on theme: "The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory."— Presentation transcript:

Similar presentations

About project

Feedback