Presentation is loading. Please wait.

Presentation is loading. Please wait.

The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

Similar presentations


Presentation on theme: "The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory."— Presentation transcript:

1 The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

2 astronomical archives (1)

3 IT in astronomy : key areas (1) facility operations (2) facility output processing (3) shared supercomputers for theory (4) science archives (5) end-user tools (1-3) : big bucks (4-5) : smaller bucks but - produces the final science output - sets requirements for (1-2)

4 astronomical archives major archives growing at TB/yr

5 astronomical archives major archives growing at TB/yr issue not storage but management (curation) improving quality of data access and presentation needs specialist data centres

6 end users increasing fraction of archive re-use

7 end users increasing fraction of archive re-use increasing multi-archive use most download small files and analyse at home some users process whole databases reduction standardised; analysis home grown

8 needles in a haystack Hambly et al 2001 - faint moving object is a cool white dwarf - may be solution to the dark matter problem - but hard to find : one in a million - even harder across multiple archives

9 failed stars compare optical and infra-red extra object is very cold a "brown dwarf" or failed star

10 multi- views of a Supernova Remnant Shocks seen in the X- ray Heavy elements seen in the optical Dust seen in the IR Relativistic electrons seen in the radio

11 solar-terrestrial links Coronal mass ejection imaged by space- based solar observatory Effect detected hours later by satellites and ground radar

12 background technology (2)

13 dogs and fleas there is a very large dog

14 hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps

15 hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps but device bw 10x/decade –real PC disks 10MB/s; fibre channel SCSI poss 100MB/s and last mile problem remains –end-end b/w typically 10Mbps

16 operations on a TB database searching at 10 MB/s takes a day –solved by parallelism –but development non-trivial ==> people transfer at 10 Mbps takes a week –leave it where it is ==> data centres provide search and analysis services

17 network development higher level protocols ==> transparency TCP/IP message exchange HTTPdoc sharing (web) grid suiteCPU sharing XML/SOAP data exchange ==> service paradigm

18 next up on the internet workflow definition dynamic semantics (ontology) software agents

19 the problems (3)

20 data growth astronomical data is growing fast but so is computing power so whats the problem ? (1) Heterogeneity (2) End user delivery (3) End user demand

21 data rich future heritage –Schmidt, IRAS, Hipparcos current hits –VLT, SDSS, 2MASS, HST, Chandra, XMM, WMAP coming up : –UKIDSS, VISTA, ALMA, JWST, Planck, Herschel cross fingers : –LSST, ELT, Lisa, Darwin,SKA, XEUS, etc. plus lots more issue is archive interoperability –need standards and transparent infrastructure

22 archive data rates map the sky : 0.1" x 16 bits = 100 TB process to find objects : billion row tables VISTA 100 TB/yr by 2007 SKA datacubes 100PB/yr by 2020 not a technical or financial problem –LHC doing 100PB/yr by 2007 issue is logistic : data management need professional data centres

23 data rates : user delivery disk I/O and bandwidth –end-user bottlenecks will get WORSE –but links between data centres can be good move from download to service paradigm –leave the data where it is –operations on data (search, cluster analysis, etc) as services –shift the results not the data –networks of collaborating data centres (datagrid or VO)

24 user demands bar constantly raising –online ease –multi-archive transparency –easy data intensive science new requirements –automated resource discovery (intelligent Google) –cheap I/O and CPU cycles –new standards and software infrastructure

25 the virtual observatory (4)

26 the VO concept web all docs in the world inside your PC VO all databases in the world inside your PC

27 Generic science drivers data growth multi-archive science large database science can do all this now, but needs to be fast and easy empowerment Beijing as good as Berkeley

28 whats its not not a monolith not a warehouse

29 VO framework framework + standards inter-operable data inter-operable software modules no central VO-command - its not a thing - its a way of life

30 VO geometry not a warehouse not a hierarchy not a peer-to-peer system small set of service centres and large population of end users –note : latest hot database lives with creators / curators

31 yesterday browser front end CGI request html web page DB engine SQL data

32 today application web service SOAP/XML request SOAP/XML data DB engine SQL native data anything standard formats

33 tomorrow application web service job results anything web service web service web service web service web service Registry Workflow GLUE Certification VO Space standard semantics publish WSDL grid connected

34 publishing metaphor facilities are authors data centres are publishers VO portals are shops end-users are readers VO infrastructure is distribution system.

35 International VO alliance (IVOA)

36 IVOA standards formal process modelled on W3C technical working groups and interop workshops agreed functionality roadmap

37 IVOA standards key standards so far –table formats –resource and service metadata definitions –semantic dictionary –protocols for image and spectrum access coming year –grid and web service interfaces –authentication –storage sharing protocols –application metadata and interfaces

38 state of implementations key projects : AstroGrid, US-NVO, Euro-VO many compliant data services VO aware tools mutually harvesting registries workflow system simple shared storage AstroGrid has ~100 registered users first science results coming out

39 coming year single sign on internationally shared storage NGS link up many more tools

40 next steps intelligent glue –ontology, agents analysis services –cluster analysis, multi-D visualisation, etc theory services –simulated data, models on demand embedding facilities –VO ready facilities –links to data creation

41 lessons

42 drivers: end user bottleneck end user demand empowerment need network of healthy data centres need last mile investment need facilities to be VO ready need continuing technology development need continuing standards programme

43 FIN


Download ppt "The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory."

Similar presentations


Ads by Google