National Virtual Observatory Theory,Computation, and Data Exploration Panel of the AASC Charles Alcock, Tom Prince, Alex Szalay.

Slides:



Advertisements
Similar presentations
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Advertisements

Viewing and Features ShowSky - a Jini aware Applet/API astronomical archive discovery tool Object Design and Implementation Guide Star Catalog-II Jini.

The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
Development of China-VO ZHAO Yongheng NAOC, Beijing Nov
Jeroen Stil Department of Physics & Astronomy University of Calgary Stacking of Radio Surveys.
Extremely Large Telescopes and Surveys Mark Dickinson, NOAO.
Interactive Narrative Content and Context for Visualization Curtis Wong Group Manager Next Media Research Microsoft Research Visualization of Astrophysical.
Long-Term Preservation of Astronomical Research Results Robert Hanisch US National Virtual Observatory Space Telescope Science Institute Baltimore, MD.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Data-Intensive Computing in the Science Community Alex Szalay, JHU.
Leicester Database & Archive Service J. D. Law-Green, J. P. Osborne, R. S. Warwick X-Ray & Observational Astronomy Group, University of Leicester What.
Astro-DISC: Astronomy and cosmology applications of distributed super computing.
Aus-VO: Progress in the Australian Virtual Observatory Tara Murphy Australia Telescope National Facility.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Why Build Image Mosaics for Wide Area Surveys? An All-Sky 2MASS Mosaic Constructed on the TeraGrid A. C. Laity, G. B. Berriman, J. C. Good (IPAC, Caltech);
The NASA/NExScI/IPAC Star and Exoplanet Database 14 May 2009 David R. Ciardi on behalf of the NStED Team.
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
1 Arecibo Synergy with GLAST (and other gamma-ray telescopes) Frontiers of Astronomy with the World’s Largest Radio Telescope 12 September 2007 Dave Thompson.
Amdahl Numbers as a Metric for Data Intensive Computing Alex Szalay The Johns Hopkins University.
The Cosmic Simulator Daniel Kasen (UCB & LBNL) Peter Nugent, Rollin Thomas, Julian Borrill & Christina Siegerist.
Automated Classification of X-ray Sources R. J. Hanisch, A. A. Suchkov, R. L. White Space Telescope Science Institute T. A. McGlynn, E. L. Winter, M. F.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 New Frontiers with LSST: leveraging world facilities Tony Tyson Director, LSST Project University of California, Davis Science with the 8-10 m telescopes.
Alex Szalay, Jim Gray Analyzing Large Data Sets in Astronomy.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Future role of DMR in Cyber Infrastructure D. Ceperley NCSA, University of Illinois Urbana-Champaign N.B. All views expressed are my own.
Innovations in the Multimission Archive at STScI (MAST) M. Corbin, M. Donahue, C. Imhoff, T. Kimball, K. Levay, P. Padovani, M. Postman, M. Smith, R. Thompson.
GENIUS kick-off - November 2013 GENIUS kick-off meeting WP400 – Tools for data exploitation X. Luri.
1 The Terabyte Analysis Machine Jim Annis, Gabriele Garzoglio, Jun 2001 Introduction The Cluster Environment The Distance Machine Framework Scales The.
Alex Szalay Department of Physics and Astronomy The Johns Hopkins University and the SDSS Project The Sloan Digital Sky Survey.
July 16, 2004P. Padovani, NEON Archive School Science with multi-wavelength Archival Data Paolo Padovani (ESO) Virtual Observatory Systems Department &
Lecture Outlines Astronomy Today 8th Edition Chaisson/McMillan © 2014 Pearson Education, Inc. Chapter 25.
Canadian Virtual Observatory David Schade Canadian Astronomy Data Centre Pat Dowler, Daniel Durand, Luc Simard, Norm Hill, Severin Gaudet.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
Figure 1. Typical QLWFPC2 performance results with two WFPC2 observations of a Local Group globular cluster running on a 5-node Beowulf cluster with 1.8.
Research Networks and Astronomy Richard Schilizzi Joint Institute for VLBI in Europe
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Common Archive Observation Model (CAOM) What is it and why does JWST care?
THEORETICAL ASTROPHYSICS AND THE US-NVO INITIATIVE D. S. De Young National Optical Astronomy Observatory.
Astronomy, Petabytes, and MySQL MySQL Conference Santa Clara, CA April 16, 2008 Kian-Tat Lim Stanford Linear Accelerator Center.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
March 1st, 2006Prospective PNG PNG: Databases - Virtual Observatory.
G. Miknaitis SC2006, Tampa, FL Observational Cosmology at Fermilab: Sloan Digital Sky Survey Dark Energy Survey SNAP Gajus Miknaitis EAG, Fermilab.
Large surveys and estimation of interstellar extinction Oleg Malkov Institute of Astronomy, Moscow Moscow, Apr 10-11, 2006.
1 Imaging Surveys: Goals/Challenges May 12, 2005 Luiz da Costa European Southern Observatory.
Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory NSF
Large Area Surveys - I Large area surveys can answer fundamental questions about the distribution of gas in galaxy clusters, how gas cycles in and out.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
Kevin Cooke.  Galaxy Characteristics and Importance  Sloan Digital Sky Survey: What is it?  IRAF: Uses and advantages/disadvantages ◦ Fits files? 
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Data Mining Challenges and Opportunities in Astronomy The Punchline: Astronomy has become an immensely data- rich field (and growing) There is a need.
Towards a Virtual Observatory Alex Szalay Department of Physics and Astronomy The Johns Hopkins University ADASS 2000.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Gijs Verdoes Kleijn Edwin Valentijn Marjolein Cuppen for the Astro-WISE consortium.
Large Scale Computations in Astrophysics: Towards a Virtual Observatory Alex Szalay Department of Physics and Astronomy The Johns Hopkins University ACAT2000,
FIRST LIGHT A selection of future facilities relevant to the formation and evolution of galaxies Wavelength Sensitivity Spatial resolution.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
Chapter 25 Galaxies and Dark Matter. 25.1Dark Matter in the Universe 25.2Galaxy Collisions 25.3Galaxy Formation and Evolution 25.4Black Holes in Galaxies.
Budapest Group Eötvös University MAGPOP kick-off meeting Cassis 2005 January
February 12, 2002Tom McGlynn ADEC Interoperability Technical Working Group Report.
Wide-field Infrared Survey Explorer (WISE) is a NASA infrared- wavelength astronomical space telescope launched on December 14, 2009 It’s an Earth-orbiting.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Data Mining Challenges and Opportunities in Astronomy
National Virtual Observatory
Google Sky.
Presentation transcript:

National Virtual Observatory Theory,Computation, and Data Exploration Panel of the AASC Charles Alcock, Tom Prince, Alex Szalay

National Virtual Observatory The National Virtual Observatory National distributed in scope across institutions and agencies available to all astronomers and the public Virtual not tied to a single “brick-and-mortar” location supports astronomical “observations” and discoveries via remote access to digital representations of the sky Observatory general purpose access to large areas of the sky at multiple wavelengths supports a wide range of astronomical explorations enables discovery via new computational tools

National Virtual Observatory Why Now ? The past decade has witnessed a thousand-fold increase in computer speed a dramatic decrease in the cost of computing & storage a dramatic increase in access to broadly distributed data large archives at multiple sites and high speed networks significant increases in detector size and performance These form the basis for science of qualitatively different nature

National Virtual Observatory Trends Future dominated by detector improvements Total area of 3m+ telescopes in the world in m 2, total number of CCD pixels in Megapix, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels. Moore’s Law growth in CCD capabilities Gigapixel arrays on the horizon Improvements in computing and storage will track growth in data volume Investment in software is critical, and growing

National Virtual Observatory The Discovery Process discover significant patterns from the analysis of statistically rich and unbiased image/catalog databases understand complex astrophysical systems via confrontation between data and large numerical simulations Past: observations of small, carefully selected samples of objects in a narrow wavelength band Future: high quality, homogeneous multi-wavelength data on millions of objects, allowing us to The discovery process will rely heavily on advanced visualization and statistical analysis tools

National Virtual Observatory NVO Science: Discoveries Discoveries of rare objects: Searches for exotic new sources truly rare at level of 1 source in 10 million Multi-wavelength identification of large statistical samples of previously rare objects: brown dwarfs, high-z quasars, ultra-luminous IR galaxies, etc. Efficient cross-identification of “unidentified sources” from new surveys Example: Use radio, optical, and IR surveys to identify serendipitous Chandra X-ray sources Selection of targets for spectroscopic follow-up

National Virtual Observatory NVO Science: Statistical Surveys Homogeneous samples of typical objects Mega-surveys: sample size not a problem any more Statistical accuracy determined entirely by systematics Multi-wavelength data enables accurate sample selection (evolution, rest-frame selection) High Precision Astrophysics of Origins Large scale structure of the universe Galactic structure Galaxy evolution Active galaxies, galaxy clusters,... Stellar populations New Astronomy Leading to New Astronomy

National Virtual Observatory New Astronomy – Different! Systematic Data Exploration will have a central role in the New Astronomy Digital Archives of the Sky will be the main access to data Data “Avalanche” the flood of Terabytes of data is already happening, whether we like it or not! Transition to the new may be organized or chaotic

National Virtual Observatory Ongoing Mega-Surveys Large number of new surveys Multi-terabyte in size, 100 million objects or larger Individual archives planned and under way Multi-wavelength view of the sky More than 13 wavelength coverage within 5 years Impressive early discoveries Finding exotic objects by unusual colors L,T dwarfs, high redshift quasars Finding objects by time variability gravitational micro-lensing MACHO 2MASS SDSS DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE,... MACHO 2MASS SDSS DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE,...

National Virtual Observatory High Redshift Quasars Several z>5 QSOs discovered by SDSS in the early test data

National Virtual Observatory Methane/T Dwarf Discovery of several new objects by SDSS & 2MASS SDSS T-dwarf (June 1999)

National Virtual Observatory DPOSS Discoveries

National Virtual Observatory New Neighbor of the Milky Way  Finding new galaxies by spatial clustering of red objects  New galaxy is about 30 million light years away  Larger than most of the spiral galaxies in the Messier Catalogue  Clearly visible in the 2MASS infrared image  Expect to find 1000’s of such galaxies with 2MASS Infrared Optical

National Virtual Observatory The Observatories NOAO/NRAO 20% of the time on all its telescopes dedicated to major surveys using a wide range of telescope and instrumentation packages The NASA Great Observatories new opportunities for surveys, combine mission-specific data with those from other missions and from the ground several multi-Terabyte databases and further extensive catalogs of objects

National Virtual Observatory HST Data Archive Several Terabytes/year Already more retrieval than ingest!

National Virtual Observatory Proposed Surveys Next Decade: New optimized “survey systems” exploring new parameter space Dark Matter Telescope (DMT) map the distribution of matter for z<1.5 from weak lensing, through deep, high quality images of galaxies moving and variable objects through repetitive surveys Spectroscopic Wide-Field Telescope (SWIFT) evolution of galaxies from z~4 to the present from star formation rates determine chemical abundances and kinematics

National Virtual Observatory The Road to the NVO The environment to exploit these huge sky surveys does not exist today! 1 Terabyte at 10 Mbyte/s takes 1 day Expect 100’s of intensive queries and 1000’s of casual queries per-day Data will reside at multiple locations Existing analysis tools do not scale to Terabyte data sets Acute need in a few years, it will not just happen a New Initiative is needed!

National Virtual Observatory NVO: A New Initiative A new initiative is needed to ensure an evolutionary, cost-effective transition to maximize the impact of large current and future efforts to create the necessary new standards in the community to develop the software tools needed to ensure that the astronomical community has the proper network and hardware infrastructure to carry out its science The National Virtual Observatory can be the catalyst of the “New Astronomy”

National Virtual Observatory The Goals of the NVO  Virtual observations of the sky in multiple wavelengths, by integrating all-sky Mega-surveys  Query the individual object catalogs and image databases thousands of times per day  Joint queries of the combined catalogs thousands of times per day  Enable discovery in these archives via new tools novel visualization techniques, supervised, unsupervised learning, advanced classification techniques

National Virtual Observatory NVO: The Challenges Size of the archived data 40,000 square degrees is 2 trillion pixels One band: 4 Terabytes Multi-wavelength: Terabytes Time dimension: few Petabytes The development of new archival methods new analysis tools Hardware requirements Training the next generation

National Virtual Observatory Necessary Components  New archival methods  New analysis tools  New hardware requirements

National Virtual Observatory New Archival Methods  Structure and manage multi-TB (and soon PB) data archives, distributed across the continent  Rapid and transparent access to image/catalog databases across all wavelengths, via intelligent query agents  Efficient query and data retrieval by more than 10,000 scientists world-wide, with enhanced search operators (like spatial proximity)

National Virtual Observatory Examples: non-local queries  Find all objects within 1' which have more than two neighbors with u-g, g-r, i-K colors within 0.05m  Find all star-like objects within dm=0.2 of the colors of a quasar at 5.5<z<6.5, using all colors in all available catalogs  Find galaxies that are blended with a star, output the deblended magnitudes  Provide a list of moving objects consistent with an asteroid, based on all the surveys, estimate possible orbit parameters  Find binary stars where at least one of them has the colors of a white dwarf, within the error boxes of hard x-ray sources

National Virtual Observatory Examples: Today’s I/O rates Reading a 1 TB data set data access speed time [days] Fast database server50 MB/s0.23 Local SCSI/Fast Ethernet10 MB/s 1.2 T10.5 MB/s 23 Typical ‘good’ www20 KB/s580 Brute force is not enough – we need clever techniques

National Virtual Observatory Geometric Indexing “Divide and Conquer” Partitioning 3  N  M3  N  M 3  N  M3  N  M Hierarchical Triangular Mesh Split as k-d tree Stored as r-tree of bounding boxes Using regular indexing techniques AttributesNumber Sky Position 3 Multiband FluxesN = 5+ Other M= 100+ AttributesNumber Sky Position 3 Multiband FluxesN = 5+ Other M= 100+

National Virtual Observatory Sky coordinates Stored as Cartesian coordinates: projected onto a unit sphere Longitude and Latitude lines: intersections of planes and the sphere Boolean combinations: query polyhedron Stored as Cartesian coordinates: projected onto a unit sphere Longitude and Latitude lines: intersections of planes and the sphere Boolean combinations: query polyhedron

National Virtual Observatory Sky Partitioning Hierarchical Triangular Mesh - based on octahedron

National Virtual Observatory Hierarchical Subdivision Hierarchical subdivision of spherical triangles represented as a quadtree In SDSS the tree is 5 levels deep triangles, In 2MASS the tree goes much deeper in the Galactic plane Hierarchical subdivision of spherical triangles represented as a quadtree In SDSS the tree is 5 levels deep triangles, In 2MASS the tree goes much deeper in the Galactic plane One shoe fits all… This indexing is now adopted by SDSS, 2MASS, GSC2, POSS2, FIRST and is considered by CDS, PLANCK and GAIA New standard spatial index for astronomy! One shoe fits all… This indexing is now adopted by SDSS, 2MASS, GSC2, POSS2, FIRST and is considered by CDS, PLANCK and GAIA New standard spatial index for astronomy!

National Virtual Observatory Result of the Query

National Virtual Observatory New Analysis Tools  Discover new patterns through advanced statistical methods and visualization techniques  Confront catalogs and image databases with numerical simulations of astrophysical systems  Collaborative exploration of multi-wavelength databases by multiple groups working at remote sites

National Virtual Observatory New Hardware Requirements  Large distributed database engines with Gbyte/s aggregate I/O speed  High speed (>10 Gbits/s) backbones cross- connecting the major archives  Scalable computing environment with hundreds of CPUs for statistical analysis and discovery

National Virtual Observatory What is the NVO? - Content Source Catalogs, Image Data Query Tools Specialized Data: Spectroscopy, Time Series, Polarization Information Archives: Derived & legacy data: NED,Simbad,ADS, etc Analysis/Discovery Tools: Visualization, Statistics Standards

National Virtual Observatory What is the NVO? - Components Information Providers e.g. ADS, NED,... Data Providers Surveys, observatories, archives, SW repositories Service Providers Query engines, Compute engines

National Virtual Observatory Conceptual Architecture Data Archives Analysis tools Discovery tools User Gateway

National Virtual Observatory The Flavor/Role of the NVO Highly Distributed and Decentralized Multiple Phases, built on top of another Establish standards, meta-data formats Integrate main catalogs Develop initial querying tools Develop collaboration requirements, establish procedure to import new catalogs Develop distributed analysis environment Develop advanced visualization tools Develop advanced querying tools

National Virtual Observatory NVO Development Functions Software development –query generation/optimization, software agents, user interfaces, discovery tools, visualization tools Standards development –Meta-data, meta-services, streaming formats, object relationships, object attributes,... Infrastructure development –archival storage systems, query engines, compute servers, high speed connections of main centers Train the Next Generation –train scientists equally at home in astronomy and modern computer science, statistics, visualization

National Virtual Observatory The Mission of the NVO The National Virtual Observatory should  provide seamless integration of the digitally represented multi-wavelength sky  enable efficient simultaneous access to multi-Terabyte to Petabyte databases  develop and maintain tools to find patterns and discoveries contained within the large databases  develop and maintain tools to confront data with sophisticated numerical simulations

National Virtual Observatory NVO Funding The NVO is ideal for multi-agency and IT funding relevant for all areas of astronomy and space science excellent match to goals of the IT 2 initiative but: core funding must come from NASA and NSF needs serious involvement of computer scientists Scope approximately $25M for the first 5 years, could be larger in the second half Requires long term commitment development/deployment (5 + 5 years) Needs to start soon data avalanche has already begun An effort for the whole astronomy - astrophysics community!

National Virtual Observatory

NVO Layers Basic analysis tools Query capabilities Statistical tools Ability to run user code (API) Browsing tools Three layers built on top of another, tied together with standards Archives Data content Interconnections Cross identifications Services Discovery tools Visualization Advanced classification methods Supervised/unsupervised learning Data mining Standards Meta-data Interfaces between archives Cross-identification standards Archive-tool interfaces