Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

Trying to Use Databases for Science Jim Gray Microsoft Research
Online Science -- The World-Wide Telescope Archetype
World Wide Telescope mining the Sky using Web Services Information At Your Fingertips for astronomers Jim Gray Microsoft Research Alex Szalay Johns Hopkins.
1 Online Science -- The World-Wide Telescope as an Archetype Jim Gray Microsoft Research Collaborating with: Alex Szalay, Peter Kunszt, Ani
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
Astronomy Data Bases Jim Gray Microsoft Research.
Development of China-VO ZHAO Yongheng NAOC, Beijing Nov
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Long-Term Preservation of Astronomical Research Results Robert Hanisch US National Virtual Observatory Space Telescope Science Institute Baltimore, MD.
Data-Intensive Computing in the Science Community Alex Szalay, JHU.
Leicester Database & Archive Service J. D. Law-Green, J. P. Osborne, R. S. Warwick X-Ray & Observational Astronomy Group, University of Leicester What.
Requirements from astronomy in the Virtual Observatory era Bob Mann Institute for Astronomy & NeSC University of Edinburgh.
Aus-VO: Progress in the Australian Virtual Observatory Tara Murphy Australia Telescope National Facility.
Long-Term Preservation of Astronomical Research Results Robert Hanisch US National Virtual Observatory Space Telescope Science Institute Baltimore, MD.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
1 Large-scale Data Processing Challenges David Wallom.
Sky Surveys and the Virtual Observatory Alex Szalay The Johns Hopkins University.
Virtual Observatory --Architecture and Specifications Chenzhou Cui Chinese Virtual Observatory (China-VO) National Astronomical Observatory of China.
A long tradition. e-science, Data Centres, and the Virtual Observatory why is e-science important ? what is the structure of the VO ? what then must we.
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 New Frontiers with LSST: leveraging world facilities Tony Tyson Director, LSST Project University of California, Davis Science with the 8-10 m telescopes.
Alex Szalay, Jim Gray Analyzing Large Data Sets in Astronomy.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Talk structure who are we ? what is a VO ? what are the challenges ? what is an e-project ? Andy Lawrence Garching June 2002.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
Public Access to Large Astronomical Datasets Alex Szalay, Johns Hopkins Jim Gray, Microsoft Research.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
1 10-June-2004Andy Lawrence : PPARC data curation panel meeting AstroGrid, Data Centres, & Edinburgh What is curation ? Data Centres in the VO era Data.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
Wiss. Beirat AIP, ClusterFinder & VO-Methods H. Enke German Astrophysical Virtual Observatory ClusterFinder VO Methods for Astronomical Applications.
Federated Discovery and Access in Astronomy Robert Hanisch (NIST), Ray Plante (NCSA)
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NVO Review -- San Diego Jan The VO compared to Other O‘s Jim Gray Microsoft T HE US N ATIONAL V IRTUAL O BSERVATORY.
Advanced Technologies in Education Virtual Observatory 1 Virtual Observatory: D-Space Project Athens, 14 November 2004 Elena Tavlaki Head of Research Programs.
Who are we ? what is a VO ? what is a Grid ? how do we get there ? Andy Lawrence S.P.I.E. Hawaii Aug 2002 AstroGrid
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
March 1st, 2006Prospective PNG PNG: Databases - Virtual Observatory.
Sky Survey Database Design National e-Science Centre Edinburgh 8 April 2003.
Grid Based Chinese Virtual Observatory System Design Chenzhou CUI, Yongheng ZHAO National Astronomical Observatories, Chinese Academy of Sciences
Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory NSF
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
EScience: Techniques and Technologies for 21st Century Discovery Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering Computer Science.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Introduction to the VO ESAVO ESA/ESAC – Madrid, Spain.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
New Astronomy in a Virtual Observatory S. G. Djorgovski (Caltech) Presentation at the NSF Symposium on Knowledge Environments for Science, Arlington, 26.
Microsoft Research San Francisco (aka BARC: bay area research center) Jim Gray Researcher Microsoft Research Scalable servers Scalable servers Collaboration.
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Grid Computing.
Moving towards the Virtual Observatory Paolo Padovani, ST-ECF/ESO
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Long-Term Preservation of Astronomical Research Results
Jim Gray Microsoft Research
Grid Application Model and Design and Implementation of Grid Services
Google Sky.
Presentation transcript:

Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002

Computational Science The Third Science Branch is Evolving In the beginning science was empirical. Then theoretical branches evolved. Now, we have computational branches. –Has primarily been simulation –Growth area data analysis/visualization of peta-scale instrument data. Analysis & Visualization tools –Help both simulation and instruments. –Are primitive today.

Computational Science Traditional Empirical Science –Scientist gathers data by direct observation –Scientist analyzes data Computational Science –Data captured by instruments Or data generated by simulator –Processed by software –Placed in a database –Scientist analyzes database Concern: Scalability

Astronomy Data Growth In the “old days” astronomers took photos. Starting in the 1960’s they began to digitize. New instruments are digital (100s of GB/night) Detectors are following Moore’s law. Data avalanche: double every 2 years Total area of 3m+ telescopes in the world in m 2, total number of CCD pixels in megapixel, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels.

Universal Access to Astronomy Data Astronomers have a few Petabytes now. –1 pixel (byte) / sq arc second ~ 4TB –Multi-spectral, temporal, … → 1PB They mine it looking for new (kinds of) objects or more of interesting ones (quasars), density variations in 400-D space correlations in 400-D space Data doubles every 2 years. Data is public after 2 years. So, 50% of the data is public. Some have private access to 5% more data. So: 50% vs 55% access for everyone

The Changing Style of Observational Astronomy The Old Way:Now:Future: Pointed, heterogeneous observations (~ MB - GB ) Large, homogeneous sky surveys ( multi-TB, ~ sources) Multiple, federated sky surveys and archives (~ PB ) Small samples of objects (~ ) Archives of pointed observations (~ TB ) Virtual Observatory

Why Astronomy Data? It has no commercial value –No privacy concerns –Can freely share results with others –Great for experimenting with algorithms It is real and well documented –High-dimensional data (with confidence intervals) –Spatial data –Temporal data Many different instruments from many different places and many different times Federation is a goal The questions are interesting –How did the universe form? There is a lot of it (petabytes) IRAS 100  ROSAT ~keV DSS Optical 2MASS 2  IRAS 25  NVSS 20cm WENSS 92cm GB 6cm

Chandra Hubble MMT Sub-mm array VLA Antartica sub-mmMagellan 6.5m Whipple  -ray SIRTF Oak Ridge 1.2m CO Virtual Observatory == World-Wide Telescope

Virtual Observatory Premise: Most data is (or could be online) So, the Internet is the world’s best telescope: –It has data on every part of the sky –In every measured spectral band: optical, x-ray, radio.. –As deep as the best instruments (2 years ago). –It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). –It’s a smart telescope: links objects and data to literature on them.

Why is VO a Good Scientific Prospect? Technological revolutions as the drivers/enablers of the bursts of scientific growth Historical examples in astronomy: –1960’s: the advent of electronics and access to space Quasars, CMBR, x-ray astronomy, pulsars, GRBs, … –1980’s ’s: computers, digital detectors (CCDs etc.) Galaxy formation and evolution, extrasolar planets, CMBR fluctuations, dark matter and energy, GRBs, … –2000’s and beyond: information technology The next golden age of discovery in astronomy? VO is the mechanism to effect this process

Surveys Observatories Missions Survey and Mission Archives Follow-Up Telescopes and Missions Results Data Services Data Mining and Analysis, Target Selection Digital libraries Primary Data Providers VO Secondary Data Providers SDSS (USA) LAMOST (China)

Virtual Observatory & the Public The universe at anyone ’ s fingertips Educational activities involving real data New discoveries made by schoolchildren Interactive exhibits based on archived data Astronomy as a motivator for learning about computing  Real Astronomy Experience

Virtual Observatory Challenges Size : multi-Petabyte 40,000 square degrees is 2 Trillion pixels –One band (at 1 sq arcsec) 4 Terabytes –Multi-wavelength Terabytes –Time dimension >> 10 Petabytes –Need auto parallelism tools Unsolved MetaData problem –Hard to publish data & programs –How to federate Archives –Hard to find/understand data & programs Current tools inadequate –new analysis & visualization tools –Data Federation is problematic Transition to the new astronomy –Sociological issues

Astronomical Strategies PROBLEM SOLUTION Slow CPU growthDistributed Computing Limited storageDistributed Data Limited bandwidthInformation Hierarchies - Move only what you need Data diversityInteroperability VO

Grids GRIDMIDDLEWAREGRIDMIDDLEWARE Visualization Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Internet, networks Desktop Mobile Access Hoffmann, Reinefeld, Putzer

the Virtual Observatory concept Aim to make all archives speak the same language –all searchable and analysable by the same tools –all data sources accessible through a uniform interface –all data held in distributed databases that appear as one archives form the Digital Sky –eventual interface to real observatories the archive is the sky

shared managed distributed resources –documents + data + software + storage + cycles + expertise network : ability to pass messages web : transparent document system computational grid : transparent CPU datagrid: transparent data access and services information grid, knowledge grid... ? Virtual Organisations ? the Grid concept a supercomputer on your desktop everybody can be a power user

Three Layer GRID Abstraction Information Grid Knowledge Grid Computation/Data Grid Data to Knowledge Control Automation E-Science

What’s needed? Science Data & Questions Scientists Database To store data Execute Queries Plumbers Data Mining Algorithms Miners Question & Answer Visualization Tools

obstacles to overcome sociology internet technology i/o bottleneck network bottleneck

obstacles to overcome (1) sociology –need agreed formats for data, metadata, provenance –need standardised semantics ("ontology") internet technology –need protocols for publishing and exchanging data –need registry for publishing service availability and semantics –need method of transmitting authentication/authorisation –need methods for managing distributed resources

obstacles to overcome (2) i/o bottleneck –need database supercomputers –need innovative search and analysis algorithms network bottleneck –data centers must provide analysis service –facility class analysis code needed shift the results not the data

Distributed Computing at Work Virtual and collaborative exploration of the Universe Floating Point Operations Total CPU time Results received e TFLOPs/sec e years years Users Last 24 HoursTotal

SkyQuery Won 2 nd prize in Microsoft.NET Contest

Compute ResourcesCatalogsData Archives Information Discovery Metadata delivery Data Discovery Data Delivery Catalog Mediator Data mediator 1. Portals and Workbenches Bulk Data Analysis Catalog Analysis Metadata View Data View 4.Grid Security Caching Replication Backup Scheduling 2.Knowledge & Resource Management Standard Metadata format, Data model, Wire format Catalog/Image Specific Access Standard APIs and Protocols Concept space Derived Collections National Virtual Observatory Data Grid

AVO STATUS AVO approved with EU funds ~2 Million € (total budget ~ 4M €) Contract start on 15 November Year Phase A study 9 NEW POSITIONS for 3 years over 6 institutions - total 18 FTE (~ 50 people) Total VO funding AVO+NVO+ASTROGRID = $21 million (US) 3 Year target : Build VO 1.0 among the 6 partner archive sets by Defining and executing trial science cases Defining, developing and deploying new interoperability standards and tools Developing and deploying new Grid-based services

Data-Rich Astronomy and Other Fields Technical and methodological challenges facing the VO are common to most data-intensive sciences today, and beyond (commerce, industry, finance, etc.) Interdisciplinary exchanges (e.g., with physics, biology, earth sciences, etc.) intellectual cross- fertilization, avoid wasteful duplication of efforts Partnerships and collaborations with applied CS/IT are essential, may lead to significant technological advances High-energy physics WWW ! The Grid Astronomy (VO) ???

Scaling the VO Mountain Discoveries Data Mining Visualization Data Mining Visualization Data Services Existing Centers and Archives We are here Thank you!