Download presentation
Presentation is loading. Please wait.
Published byTeresa Augusta Stewart Modified over 9 years ago
1
The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory
2
astronomical archives (1)
3
IT in astronomy : key areas (1) facility operations (2) facility output processing (3) shared supercomputers for theory (4) science archives (5) end-user tools (1-3) : big bucks (4-5) : smaller bucks but - produces the final science output - sets requirements for (1-2)
4
astronomical archives major archives growing at TB/yr
5
astronomical archives major archives growing at TB/yr issue not storage but management (curation) improving quality of data access and presentation needs specialist data centres
6
end users increasing fraction of archive re-use
7
end users increasing fraction of archive re-use increasing multi-archive use most download small files and analyse at home some users process whole databases reduction standardised; analysis home grown
8
needles in a haystack Hambly et al 2001 - faint moving object is a cool white dwarf - may be solution to the dark matter problem - but hard to find : one in a million - even harder across multiple archives
9
failed stars compare optical and infra-red extra object is very cold a "brown dwarf" or failed star
10
multi- views of a Supernova Remnant Shocks seen in the X- ray Heavy elements seen in the optical Dust seen in the IR Relativistic electrons seen in the radio
11
solar-terrestrial links Coronal mass ejection imaged by space- based solar observatory Effect detected hours later by satellites and ground radar
12
background technology (2)
13
dogs and fleas there is a very large dog
14
hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps
15
hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps but device bw 10x/decade –real PC disks 10MB/s; fibre channel SCSI poss 100MB/s and last mile problem remains –end-end b/w typically 10Mbps
16
operations on a TB database searching at 10 MB/s takes a day –solved by parallelism –but development non-trivial ==> people transfer at 10 Mbps takes a week –leave it where it is ==> data centres provide search and analysis services
17
network development higher level protocols ==> transparency TCP/IP message exchange HTTPdoc sharing (web) grid suiteCPU sharing XML/SOAP data exchange ==> service paradigm
18
next up on the internet workflow definition dynamic semantics (ontology) software agents
19
the problems (3)
20
data growth astronomical data is growing fast but so is computing power so whats the problem ? (1) Heterogeneity (2) End user delivery (3) End user demand
21
data rich future heritage –Schmidt, IRAS, Hipparcos current hits –VLT, SDSS, 2MASS, HST, Chandra, XMM, WMAP coming up : –UKIDSS, VISTA, ALMA, JWST, Planck, Herschel cross fingers : –LSST, ELT, Lisa, Darwin,SKA, XEUS, etc. plus lots more issue is archive interoperability –need standards and transparent infrastructure
22
archive data rates map the sky : 0.1" x 16 bits = 100 TB process to find objects : billion row tables VISTA 100 TB/yr by 2007 SKA datacubes 100PB/yr by 2020 not a technical or financial problem –LHC doing 100PB/yr by 2007 issue is logistic : data management need professional data centres
23
data rates : user delivery disk I/O and bandwidth –end-user bottlenecks will get WORSE –but links between data centres can be good move from download to service paradigm –leave the data where it is –operations on data (search, cluster analysis, etc) as services –shift the results not the data –networks of collaborating data centres (datagrid or VO)
24
user demands bar constantly raising –online ease –multi-archive transparency –easy data intensive science new requirements –automated resource discovery (intelligent Google) –cheap I/O and CPU cycles –new standards and software infrastructure
25
the virtual observatory (4)
26
the VO concept web all docs in the world inside your PC VO all databases in the world inside your PC
27
Generic science drivers data growth multi-archive science large database science can do all this now, but needs to be fast and easy empowerment Beijing as good as Berkeley
28
whats its not not a monolith not a warehouse
29
VO framework framework + standards inter-operable data inter-operable software modules no central VO-command - its not a thing - its a way of life
30
VO geometry not a warehouse not a hierarchy not a peer-to-peer system small set of service centres and large population of end users –note : latest hot database lives with creators / curators
31
yesterday browser front end CGI request html web page DB engine SQL data
32
today application web service SOAP/XML request SOAP/XML data DB engine SQL native data anything standard formats
33
tomorrow application web service job results anything web service web service web service web service web service Registry Workflow GLUE Certification VO Space standard semantics publish WSDL grid connected
34
publishing metaphor facilities are authors data centres are publishers VO portals are shops end-users are readers VO infrastructure is distribution system.
35
International VO alliance (IVOA)
36
IVOA standards formal process modelled on W3C technical working groups and interop workshops agreed functionality roadmap
37
IVOA standards key standards so far –table formats –resource and service metadata definitions –semantic dictionary –protocols for image and spectrum access coming year –grid and web service interfaces –authentication –storage sharing protocols –application metadata and interfaces
38
state of implementations key projects : AstroGrid, US-NVO, Euro-VO many compliant data services VO aware tools mutually harvesting registries workflow system simple shared storage AstroGrid has ~100 registered users first science results coming out
39
coming year single sign on internationally shared storage NGS link up many more tools
40
next steps intelligent glue –ontology, agents analysis services –cluster analysis, multi-D visualisation, etc theory services –simulated data, models on demand embedding facilities –VO ready facilities –links to data creation
41
lessons
42
drivers: end user bottleneck end user demand empowerment need network of healthy data centres need last mile investment need facilities to be VO ready need continuing technology development need continuing standards programme
43
FIN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.