Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang &

Slides:



Advertisements
Similar presentations
Trying to Use Databases for Science Jim Gray Microsoft Research
Advertisements

Configuration management
Configuration management
The zooniverse.org real science online. The Zooniverse is a collection of websites where members of the public are asked to look at data and interpret.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
The Role of Computer Vision in Astronomy
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Virtual Dart: An Augmented Reality Game on Mobile Device Supervisor: Professor Michael R. Lyu Prepared by: Lai Chung Sum Siu Ho Tung.
HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,
July 7, 2008SLAC Annual Program ReviewPage 1 Weak Lensing of The Faint Source Correlation Function Eric Morganson KIPAC.
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Algorithms (Contd.). How do we describe algorithms? Pseudocode –Combines English, simple code constructs –Works with various types of primitives Could.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Making the Sky Searchable: Fast Geometric Hashing for Automated Astrometry Sam Roweis, Dustin Lang & Keir Mierle.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
כמה מהתעשייה? מבנה הקורס השתנה Computer vision.
Robust fitting Prof. Noah Snavely CS1114
Multiple testing correction
CS 376b Introduction to Computer Vision 04 / 29 / 2008 Instructor: Michael Eckmann.
Multimedia Databases (MMDB)
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Configuration Management (CM)
HOUGH TRANSFORM Presentation by Sumit Tandon
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang &
Axial Flip Invariance and Fast Exhaustive Searching with Wavelets Matthew Bolitho.
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
The Sloan Digital Sky Survey ImgCutout: The universe at your fingertips Maria A. Nieto-Santisteban Johns Hopkins University
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
Virtual Memory 1 1.
Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang &
Bart M. ter Haar Romeny.  Question: can top-points be used for object- retrieval tasks?
Survey – extra credits (1.5pt)! Study investigating general patterns of college students’ understanding of astronomical topics There will be 3~4 surveys.
Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory NSF
August 10, 2004 Apache Point Observatory, NM FINDING SUPERNOVAE IN A SLICE OF PI Dennis J. Lamenti San Francisco State University.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Making the Sky Searchable: Fast Geometric Hashing for Automated Astrometry Sam Roweis, Dustin Lang & Keir Mierle.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Introduction to Scale Space and Deep Structure. Importance of Scale Painting by Dali Objects exist at certain ranges of scale. It is not known a priory.
June 27-29, DC2 Software Workshop - 1 Tom Stephens GSSC Database Programmer GSSC Data Servers for DC2.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Automated Geo-referencing of Images Dr. Ronald Briggs Yan Li GeoSpatial Information Sciences The University.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Wide-field Infrared Survey Explorer (WISE) is a NASA infrared- wavelength astronomical space telescope launched on December 14, 2009 It’s an Earth-orbiting.
Color Magnitude Diagram VG. So we want a color magnitude diagram for AGN so that by looking at the color of an AGN we can get its luminosity –But AGN.
Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.
MapReduce “MapReduce allows us to stop thinking about fault tolerance.” Cathy O’Neil & Rachel Schutt, 2013.
Algorithms and Problem Solving
Geometric Features for Fast Image Matching
Map Reduce.
Rick, the SkyServer is a website we built to make it easy for professional and armature astronomers to access the terabytes of data gathered by the Sloan.
Geometric Hashing: An Overview
SIFT.
Algorithms and Problem Solving
Efficient Catalog Matching with Dropout Detection
Virtual Memory 1 1.
Presentation transcript:

Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang & Keir Mierle University of Toronto David Hogg & Michael Blanton New York University

Organize All Astronomical Data Vision: Take every astronomical image ever taken in the history of the world and unify them into a single accurately annotated and easily searchable database. We want to include all modern professional telescope surveys plus all amateur photos, satellite images, historical plate archives… How could this ever be possible?

You show me a picture of the night sky. I tell you where on the sky it came from. Core Technology: Starfield Search

Rules of the game We start with a catalogue of stars in the sky, and from it build an index which is used to assist us in locating (‘solving’) new test images. ?

Rules of the game We can spend as much time as we want building the index but solving should be fast. Challenges: 1) The sky is big. 2) Both catalogues and pictures are noisy. We start with a catalogue of stars in the sky, and from it build an index which is used to assist us in locating (‘solving’) new test images.

Catalogues: USNO-B TYCHO-2 USNO-B is an all-sky catalogue compiled from scans of old Schmidt plates. Contains about 10 9 objects, both stars and galaxies. TYCHO-2 is a tiny subset of 2.5M brightest stars.

Bad news: Query images may contain some extra stars that are not in your index catalogue, and some catalogue stars may be missing from the image. Distractors and Dropouts These “distractors” & “dropouts” mean that naïve matching techniques will not work.

Example (a million times easier) Find this “field” on this “sky”.

Example (a million times easier) Find this “field” on this “sky”.

Robust Matching Required We need to do some sort of robust matching of the test image to any proposed location on the sky. Intuitively, we need to ask: “Is there an alignment of the test image and the catalogue so that surprisingly many catalogue stars lie (almost * ) exactly on top of an observed star?” [*Depends on the noise levels in the catalogue & image. ]

Search, then Robustly Verify For each potential match our search algorithm finds, we do a robust verification test to see if we really have the correct alignment on the sky. This test looks at the number of catalog-object matches and computes the log-odds of getting that many “hits” if the query image were dropped on a random patch of sky.

Solving the search problem Now we know how to robustly check if a match is correct. But we still have to solve a huge search problem. Which match locations should we try to verify? Exhaustive search? too expensive! The Sky is Big TM ?

(Inverted) Index of Features To solve this problem, we employ the classic idea of an “inverted index”. We define a set of “features” for any particular view of the sky (image). Then we make an (inverted) index, telling us which views on the sky exhibit certain (combinations of) feature values. When we see a new test image, we compute which features are present, and use our inverted index to look up which possible views from the catalogue also have those feature values.

Matching a test image When we see a new test image, we compute which features are present, and use our inverted index to look up which possible views from the catalogue also have those feature values. Each feature generates a candidate list in this way, and by intersecting the lists we can zero in on the true matching view. The features in our inverted index act as “hash codes” for locations on the sky.

Robust Features for Geometric Hashing In simple search domains like text, the inverted index idea can be applied directly. However, in our star matching task, the features we chose must be invariant to scale, rotation and translation. They must also be robust to small positional noise. Finally, there is the additional problem of distractor & dropout stars. The features we use are the relative positions of nearby quadruples of stars.

Quads as Robust Features We encode the relative positions of nearby quadruples of stars (ABCD) using a coordinate system defined by the most widely separated pair (AB). Within this coordinate system, the positions of the remaining two stars form a 4-dimensional code for the shape of the quad. Swapping AB or CD does not change the shape but it does “reflect” the code, so there is some degeneracy. A B C D Continuous, vector-valued hash codes!

Quads as Robust Features This geometric hash code is invariant to scale, translation and rotation. It is continuous! It also has the property that if stars are uniformly distributed in space, codes are uniformly distributed in 4D. We compute codes for most nearby quadruples of stars, but not all; we require C&D to lie in the unit circle with diameter AB. A B C D Continuous, vector-valued hash codes!

A Typical Final Index at One Scale 144M stars (6 quads/star) 205M quads (4-5 arcmin) 12 healpixes Codes in 4D Quads on the sky

“Solving” a new test image Identify objects (stars+galaxies) in the image bitmap and create a list of their 2D positions. Cycle through all possible valid * quads (brightest first) and compute their corresponding codes. Look up the codes in the code KD-tree to find matches within some tolerance; this stage incurs some false positive and false negative matches. Each code match returns a candidate position & rotation on the sky. As soon as 2 quads agree on a candidate, we proceed to verify that candidate against all objects in the image.

“Solving” is Easy as ) Get your image. 2) Upload it to us. 3) Your exact location + lots of other data!

1.5 degrees

Astronomy Picture of the Day ?

0.5 degrees

7 arcminutes

7.5 degrees

68 degrees

1.4 degrees

6.1 degrees

Preliminary Scaleup Results: SDSS The Sloan Digital Sky Survey (SDSS) is an all-sky, multi-band survey which includes targeted spectroscopy of interesting objects. The telescope is located at Apache Point Observatory. Fields are 14x9arcmin corresponding to 2048x1361 pixels.

Preliminary Scaleup Results: SDSS 336,554 fields science grade+ 0 false positives 99.84% solved 530 unsolved 99.27% solve w/ 60 brightest objs Assume known pixel scale (for speedup of solving only.) Magnitudes used only to decide search order.

Speed/Memory/Disk Indexing takes ~12 hours, uses ~ 2 GB of memory and ~100 GB of disk. Solving a test image almost always takes <<1sec (not including object detection). All the work is in the hardest few% of fields Results on all of SDSS

astrometry.net is open source! We have released all our code. Download it from astrometry.net if you want to try the system out yourself. We are putting the engine on the web. if you want to be an alpha tester for the web service. Our internal trac pages are public. Check out trac.astrometry.net if you want to see all the gory details.

What have we done already Built a working prototype. Resolved ~ 400,000 Sloan Digital Sky Survey images blind. Solved historical plates (~1810). Tested a live service (professional & amateur astronomers as users.) Created a “picture of the day” layer for upcoming Google Sky. All with 2 students and <$80K. Run out of money.

What we need to do next Algorithms are extremely effective. Prototype is highly successful. Now we need to scale up. Next steps: –Get funding & resources. –Work with researchers, amateur & professional astronomers and others to understand/develop needs. –Go live and change science! + You?

The Core Team David Hogg Michael BlantonKeir Mierle Dustin Lang Sam Roweis The real talent!

Caching Computation The idea of an inverted index is that is pushes the computation from search time back to index construction time. We actually do perform an exhaustive search of sorts, but it happens during the building of the inverted index and not at search time, so queries can still be fast. There are millions of patches of the scale of a test image on the sky (plus rotation), so we need to extract about 30 bits.

Making a uniform catalogue Starting with USNO+ TYCHO we “cut” to get a spatially uniform set of the ~150M brightest stars & galaxies. We do this by laying down a fine “healpix” grid and taking the brightest K unique objects in each pixel.

Algorithms & Data Structures Implementations are all in-core. Written in C & Python. Parallelization is at the script level, which has many aggregation & storage advantages. We make extensive use of mem-mapped files, some fancy AVL lists and a cool new “pointerless” KD-tree implementation. [Mierle & Lang]

Pointer-Free KD-Trees

Pointer-Free KD-Trees

Preliminary Results: GALEX GALEX is a space-based telescope, seeing only in the ultraviolet. It was launched in April 2003 by Caltech&NASA and is just about finished collecting data now. It takes huge (80 arcmin) circular fields with 5arcsec resolution and spectra of all objects.

Preliminary Results: GALEX GALEX NUV fields can be solved easily using an index built from bright blue USNO stars.

A Real Example from SDSS Query image (after object detection). An all-sky catalogue.

A Real Example from SDSS Query image (after object detection). Zoomed in by a factor of ~ 1 million.

A Real Example from SDSS Query image (after object detection). The objects in our index.

A Real Example from SDSS All the quads in our index which are present in the query image.

A Real Example from SDSS A single quad which we happened to try.

A Real Example from SDSS The query image scaled, translated & rotated as specified by the quad.

A Real Example from SDSS The proposed match, on which we run verification.

A Real Example from SDSS The verified answer, overlaid on the original catalogue. The proposed match, on which we run verification.