The VAO is operated by the VAO, LLC. Ashish Mahabal Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

11 Nov 2009IVOA Garching: Apps II1 Crowdsourcing and the VO Matthew J. Graham (Caltech, NVO) et Roy Williams, Andrew Drake, George Djorgovski Ashish Mahabal,
Development of China-VO ZHAO Yongheng NAOC, Beijing Nov
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Space …. are big. Really big. You just won't believe how vastly, hugely, mindbogglingly big they are. Massive data streams Douglas Adams – Hitchhiker’s.
Nokia Technology Institute Natural Partner for Innovation.
Big Data Kirk Borne George Mason University LSST All Hands Meeting August , 2012.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
A Paradigm for Space Science Informatics Kirk D. Borne George Mason University and QSS Group Inc., NASA-Goddard or
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
NSF DMS VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,
Clementine Server Clementine Server A data mining software for business solution.
Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering Dissertation Defense.
Strategies and Tactics for Data Mining  Data Mining is part of Knowledge Discovery in databases, KDD.  There Are various KDD paradigmns. The CRISP KDD.
Data Mining – Intro.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.
Data Mining Techniques
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Cyber-Infrastructure for Agro-Threats Steve Goddard Computer Science & Engineering University of Nebraska-Lincoln.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,
The VAO is operated by the VAO, LLC. VAO: Archival follow-up and time series Matthew J. Graham, Caltech/VAO.
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
Data Characterization in Gravitational Waves Soma Mukherjee Max Planck Institut fuer Gravitationsphysik Golm, Germany. Talk at University of Texas, Brownsville.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
The VAO is operated by the VAO, LLC. LSST VAO Meeting Robert Hanisch Space Telescope Science Institute Director, Virtual Astronomical Observatory.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
The end of geographic theory ? Prospects for model discovery in the geographic domain Mark Gahegan Centre for eResearch & Dept. Computer Science University.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
The VAO is operated by the VAO, LLC. Title of Presentation Author of Presentation, Affiliation.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining – Intro.
What Is Cluster Analysis?
So, what was this course about?
School of Computer Science & Engineering
A Black-Box Approach to Query Cardinality Estimation
Modern Data Management
Introduction C.Eng 714 Spring 2010.
Data Warehousing and Data Mining
The state of VOEvent semantics THE US NATIONAL VIRTUAL OBSERVATORY
Promising “Newer” Technologies to Cope with the
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

The VAO is operated by the VAO, LLC. Ashish Mahabal Ciro Donalek Matthew Graham Ray Plante George Djorgovski Data 2 Knowledge study project VAO-LSST Meeting, NOAO, 24 March 2011

March 23, 2011Ashish Mahabal 2 Goals Feasibility study What is out there What is needed Milestones What can be done

Exploration of observable parameter spaces and searches for rare or new types of objects Djorgovski

March 23, 2011Ashish Mahabal 4 Overview – many connections Astroinformatics (next meeting in Sep. 2011) VOStat and other R/Statistics tools Data challenges Various sky surveys Related issues Semantics Classification/characterization Distributed data GPUs Focus on time domain

March 23, 2011Ashish Mahabal Focus on time-domain 5 Expertise, and it encompasses all aspects of data mining (save one) Plus, real-time forces us to be fast. Portfolio building – growing columns of tables Bayesian networks utilizing auxiliary information Lightcurve techniques for characterizing objects

March 23, 2011Ashish Mahabal Missing stat and CS tools 6

March 23, 2011Ashish Mahabal Missing stat and CS tools 7 Bootstrap aggregating Mixture of experts Boosting Simulated annealing Semi-supervised learning …. From IVOA KDD User guide for Data Mining (Nick Ball)

March 23, 2011Ashish Mahabal 8 Science goal: to solve the growing gap between the huge generation of data and our understanding of it Data Gathering (e.g., new generation instruments …) Data Farming: Storage/Archiving Indexing, Searchability Data Fusion, Interoperability, ontologies, etc. Data Mining (or Knowledge Discovery in Databases): Pattern or correlation search Clustering analysis, automated classification Outlier / anomaly searches Hyperdimensional visualization Data visualization and understanding Computer aided understanding KDD Etc. New Knowledge Data storage, Pbytes Data access >10 3 access Scalability: Petaflops, Exaflops Computing power (multicore) Algorithm: parallelism Visualization: N-dimensional

March 23, 2011Ashish Mahabal 9 Currently on the plate DAME Knime (Konstanz Information Miner) Orange (Visual/python) Weka (ML/Java) Rapidminer (standalone)

March 23, 2011Ashish Mahabal 10 Comparison matrix for DM/Viz tools Accuracy Scalability Interpretability Usability Robustness Versatility Speed Popularity

March 23, 2011Ashish Mahabal 11 Related activities  Skyalert integration (Graham) – adding data and methods  Solicitation of examples from community  WD, Blazars’ example  Making R more astronomy friendly  Various datasets  Differing number of rows, columns  For supervised/unsupervised classification  TA on GPUs – incorporate in pipeline

March 23, 2011Ashish Mahabal Slide from Budavari 12 CUDA zone, PyCUDA, …

March 23, 2011Ashish Mahabal VAO People working on this 13 Ashish Mahabal, Ciro Donalek, Matthew Graham, George Djorgovski (Caltech) Ray Plante (NCSA) But we are in touch with many others in astro/CS/stats and relying on many groups including LSST transients and informatics working groups