Tin Kam Ho Computing Sciences Research Center Bell Labs, Lucent Technologies In collaboration with David Wittman, J. Anthony Tyson of UC Davis Samuel Carliles,

Slides:



Advertisements
Similar presentations
Viewing and Features ShowSky - a Jini aware Applet/API astronomical archive discovery tool Object Design and Implementation Guide Star Catalog-II Jini.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
PARTITIONAL CLUSTERING
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Components and Architecture CS 543 – Data Warehousing.
12.1 Vis_04 Data Visualization Lecture 12 Visualization Software Environments: - Overview of Major Systems - Distributed and Collaborative Visualization.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Project Update: Law Enforcement Resource Allocation (LERA) Visualization System Michael Welsman-Dinelle April Webster.
Data Mining – Intro.
FLANN Fast Library for Approximate Nearest Neighbors
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Esri International User Conference | San Diego, CA Technical Workshops | Esri Tracking Solutions: Working with real-time data Adam Mollenkopf David Kaiser.
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Commissioning the NOAO Data Management System Howard H. Lanning, Rob Seaman, Chris Smith (National Optical Astronomy Observatory, Data Products Program)
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
By LaBRI – INRIA Information Visualization Team. Tulip 2010 – version Tulip is an information visualization framework dedicated to the analysis.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Enabling Technology for Participatory Spatial Decision Making Hans Voss Gennady Andrienko Natalia Andrienko Spatial Decision Support Team
CHAPTER TEN AUTHORING.
Visual Perspectives iPLANT Visual Analytics Workshop November 5-6, 2009 ;lk Visual Analytics Bernice Rogowitz Greg Abram.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Today Ensemble Methods. Recap of the course. Classifier Fusion
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
A Toolkit for Remote Sensing Enviroinformatics Clustering Fazlul Shahriar, George Bonev Advisors: Michael Grossberg, Irina Gladkova, Srikanth Gottipati.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Architecture of Decision Support System
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
VO Enabled Mirage and The IVOA Client Package Samuel Carliles 1, Tin Kam Ho 2, and William O’Mullane 1 1 Department of Physics and Astronomy, The Johns.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Web based spectrum databases and utilities László Dobos Tamás Budavári István Csabai MAGPOP kick-off meeting, January Cassis.
Lucent Technologies - Proprietary 1 Interactive Pattern Discovery with Mirage Mirage uses exploratory visualization, intuitive graphical operations to.
Books Visualizing Data by Ben Fry Data Structures and Problem Solving Using C++, 2 nd edition by Mark Allen Weiss MATLAB for Engineers, 3 rd edition by.
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
IVOA Small Projects Meeting Application to the science S. Honda, Y. Shirasaki, M. Tanaka and JVO team National Astronomical Observatory of Japan.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Budapest Group Eötvös University MAGPOP kick-off meeting Cassis 2005 January
Waqas Haider Bangyal. Classification Vs Clustering In general, in classification you have a set of predefined classes and want to know which class a new.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Understanding your FLOW-3D simulations better with EnSight June 2012.
Flow cytometry data analysis: SPADE for cell population identification and sample clustering Narahara.
VisIt Project Overview
Data Mining – Intro.
Gedas Adomavicius Jesse Bockstedt
DEEP LENS SURVEY Long term dual hemisphere campaign
Data Warehousing and Data Mining
Pattern Discovery Tools for Large Astronomical Surveys
Analysis models and design models
Introduction to Visual Analytics
Clustering Wei Wang.
Google Sky.
CSE572: Data Mining by H. Liu
What's New in eCognition 9
Presentation transcript:

Tin Kam Ho Computing Sciences Research Center Bell Labs, Lucent Technologies In collaboration with David Wittman, J. Anthony Tyson of UC Davis Samuel Carliles, William O’Mullane, Alex Szalay of JHU Interactive Pattern Discovery with Large Imaging Databases

What Is the Story in this Image?

1.Describe each symbol shape with a numerical vector [ …] 2.Find clusters of symbol shapes 3.Interpret each cluster using context Solving the Puzzle with a 3-step Approach

*** SERVICE GOAL -- AT&T said it has set a goal of restoring service on its long-distance network faster than the system itself disconnects calls after a cable break.

Tracking Intensive Rain Cells in Radar Images

BVRz to 26 mag over 28 sq. degree The Deep Lens Survey (Tyson, Wittman, … )

Weak Gravitational Lensing Uses distortion of background galaxies to map foreground mass concentrations J.A. Tyson, DLS 2002

Catalog of Extracted Objects

Stars or Galaxies? J.A. Tyson, DLS 2002

Discrimination task depends on tiny differences in color and shape Survey is to an unpreceded depth: most objects have never been observed before and nobody knows their true classification How does one build confidence on the results of the classifier? Need to correlate several perspectives: object characteristics in the color space, shape parameters, the brightness statistics Visualization can help verify correctness of preprocessing steps, clean up undesirable artifacts, choose relevant samples, spot explicit patterns, select useful features, and suggest algorithms and models

The Virtual Observatory

Essential Steps in Automatic Pattern Recognition Feature Extraction Classifier Training Classification Clustering Cluster Validation Cluster Interpretation Samples features classifier features class membership Supervised learning Unsupervised learning feature 1 feature 2

Feature Set A Set B Unknown Relationship Clustering Data Mining ParametersResponses Feature Computation Filtering, Clustering Simulation Analysis Data Relationships Across Multiple Feature Sets

Key Algorithms Clustering: find natural groups in data, construct index structures to facilitate proximity queries Dimensionality reduction: embed high-dimensional data in 2D displays Navigation: traverse index structures in systematic ways

Clustering Methods Model based Clustering identification of finite mixtures Partitional Clustering divides data set into N mutually exclusive subsets Hierarchical Clustering top-down procedures: tree splitting bottom-up, agglomerative procedures: merge similar clusters successively

Similarity / Clustering of Objects from Different Perspectives Objects can be described by many types of attributes: position, weight, shape, spectrum, time variability, … Meaningful similarity metric exists only for the same type of attributes Clusters found from one perspective need to be correlated to those from others e.g. Are the objects similar in color also similar in shape? Shape clusters Color clusters

Exploratory Tools Needed To bring in domain expertise, interpretation context To visualize data or classifier geometry To track point/class correlations To test tentative classifications To compare groupings from different perspectives To relate numerical data to other data types To facilitate systematic, repeatable explorations

Mirage for Interactive Pattern Recognition Data Display in Linked Views Show patterns in histograms, scatter plots, parallel coordinates, tables, and images Selection and Tracking Select points in any view, broadcast to all others Traversal of Data Structures Walk in histograms, cluster graphs or trees, echoed in all other views Graphical Utilities Open multiple-page plots with arbitrary configuration Command Scripts Run prepared groups of operations as an animation Intuitive Graphical Tool for  Exploratory Data Analysis  Visualization of Clusters and Classes  Correlation of Proximity Structures  Manual or Automatic Classification

Software Features Based on Java Swing library Intuitive, easy-to-use graphical operations Mutiple-page, arbitrary plot configurations Online or offline cluster analysis GUI or Script driven command execution Database interface via JDBC Ready to be adapted for on-line monitoring Ready to be integrated with database access and decision support systems

Design Motivated by the Needs Interactive plays, intuitive operations to bring domain experts into the loop Multiple types of plots, extensible for more to visualize data or classifier geometry Linked views, traversal actions to track point/class correlations Highlights, colors to test tentative classifications Projection to arbitrary subspaces to compare groupings in different perspectives Linking data with images to relate numerical data to other types Command scripting to facilitate systematic, repeatable explorations

Challenges for the Analysis Tool Separate treatment of non-comparable groups of variables Versatile visualization utilities allowing many perspectives Support for exploratory discovery across diverse data types Integrate manual & automatic pattern recognition methods Also, a good tool should -- leverage existing visualization and analysis methods -- enable continued growth: new visualization, analysis tools -- support interface with existing databases -- be scalable in data volume and processing speed

Mirage Core Data Access Clients Data Analysis Methods Custom Data Views Data Exchange Pipes VO Data Archives External Rendering Code Web Services Other Analysis Platforms Cone Search, CAS Extinction Calculator Python? Matlab? FITS viewer, … Towards Extensibility

VO Enabled Mirage (with Samuel Carliles, William O’Mullane, and Alex Szalay)

VO Enabled Mirage Load VOTable data and perform VO Cone/SIAP and SDSS CAS searches using IVOA Client Package Astronomical imaging module loads FITS images using JSky classes, supporting image operations: Select data points and broadcast selection to other views. Cut levels. Colormap. SAO DS9-style brightness/contrast enhance. Zoom.

Extinction Web Service (with Chris Miller, Simon Krughoff) Using DIRBE/IRAS Dust Maps by Schlegel et al.DIRBE/IRAS Dust Maps Mirage Core Object selection Extracts RA,DEC,[mag] from Mirage data set SOAP client calls Extinction server Merges results with Mirage data set Extinction Service Positions, mags Positions, mags, filterIDs E(b-v), dered_mags Enhanced data set Result stream

205th Meeting of the American Astronomical Society 9-13 January 2005 San Diego, CA Wednesday, 12 January Astronomical Research with the Virtual Observatory More at NVO Public Release 1.0

Analysis of Simulations of Control Dynamics in Optical Transport Systems (with the FROG collaboration) Head End Terminal Repeater Fiber link Repeater Gain Equalizer Tail End Terminal Signal Spectrum with noise floor

Monitoring Network Traffic Software tool for online monitoring and analysis of QoS in IP networks continuously monitors traffic statistics at edge and core devices synthesizes statistics in real time to obtain network-wide QoS status and general network element health indicators Mirage refreshes displays on alerts of database updates via Java Messaging Service SEQUIN SNMP polling SLA verification Billing Provisioning MPLS IP Core (QoS-guaranteed paths) DiffServ Edge (aggregation and classification) (With Marina Thottan, Ken Swanson)