New Geometric Methods of Mixture Models for Interactive Visualization PIs: Jia Li, Xiaolong (Luke) Zhang, Bruce Lindsay Department of Statistics College.

Slides:



Advertisements
Similar presentations
Large Scale Knowledge Management across Media Prof. Fabio Ciravegna, Department of Computer Science University of Sheffield
Advertisements

Multi-Scale Analysis of Crime and Incident Patterns in Camden Dawn Williams Department of Civil, Environmental & Geomatic.
THE ARTS, MEDIA AND ENGINEERING PROGRAM transdisciplinary research and graduate education in experiential media.
Scientific Grand Challenges Workshop Series: Challenges in Climate Change Science and the Role of Computing at the Extreme Scale Warren M. Washington National.
Visual Analytics Research at WPI Dr. Matthew Ward and Dr. Elke Rundensteiner Computer Science Department.
The Vuel Concept: Towards a new way to manage Multiple Representations in Spatial Databases ISPRS / ICA Workshop Multi-Scale Representations of Spatial.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Mean-Shift Algorithm and Its Application Bohyung Han
Visualizating the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Who am I and what am I doing here? Allan Tucker A brief introduction to my research
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Welcome and Overview Richard Anderson University of Washington June 30, 20081IUCEE: Welcome and Overview.
Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
How to do Fast Analytics on Massive Datasets Alexander Gray Georgia Institute of Technology Computational Science and Engineering College of Computing.
General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Document (Text) Visualization Mao Lin Huang. Paper Outline Introduction Visualizing text Visualization transformations: from text to pictures Examples.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Data Science Dianna Xu Bryn Mawr College 1. The Course 300-level Computer Science elective CS majors and minors Pre-reqs: CS1, CS2 (Data Structures),
Machine Learning Queens College Lecture 13: SVM Again.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Centers for Disease Control and Prevention Office of Public Health Scientific Services CDC Health Information Innovation Consortium November Forum Brian.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department.
TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning Carolyn Penstein Rosé, Jaime Arguello, Yue Cui, Rohit.
1 Collaborations and Partnerships John Broome CODATA-International.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park.
Particle Filters.
Esri UC2013. Technical Workshop. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Thinking Spatially.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Kernel Properties 2012 Computer Science PhD Showcase 17 February 2012 Roberto Valerio Dr. Ricardo Vilalta Pattern Analysis Lab.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Digital Intuition Cluster, Smart Geometry 2013, Stylianos Dritsas, Mirco Becker, David Kosdruy, Juan Subercaseaux Welcome Notes Overview 1. Perspective.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Visualization Michel Bruley Teradata Aster EMEA Marketing Director April 2013 Michel Bruley Teradata Aster EMEA Marketing Director.
H. Lexie Yang1, Dr. Melba M. Crawford2
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
ICT-enabled Agricultural Science for Development Scenarios, Opportunities, Issues by ICTs transforming agricultural science, research & technology generation.
Lucent Technologies - Proprietary 1 Interactive Pattern Discovery with Mirage Mirage uses exploratory visualization, intuitive graphical operations to.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
AZURE MACHINE LEARNING Bringing New Value To Old Data SQL Saturday #
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
1998 NSF Information and Data Management Workshop Research Agenda for the 21st Century.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Advanced Research Computing
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Accelerated B.S./M.S An approved Accelerated BS/MS program allows an undergraduate student to take up to 6 graduate level credits as an undergraduate.
SNS COLLEGE OF TECHNOLOGY
National Institute of Standards and Technology (NIST) Advanced Manufacturing Technology Consortia (AMTech) Program Award Number: 70NANB14H056 Development.
Shuang Hong Yang College of Computing, Georgia Tech, USA Hongyuan Zha
MATLAB Distributed, and Other Toolboxes
©Jiawei Han and Micheline Kamber
©Jiawei Han and Micheline Kamber
What is Pattern Recognition?
Finding Clusters within a Class to Improve Classification Accuracy
Session 2:50 PM - 4:20 PM; Saturday, Nov 18, 2017
Data Warehousing and Data Mining
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Ernest Valveny Computer Vision Center
Presentation transcript:

New Geometric Methods of Mixture Models for Interactive Visualization PIs: Jia Li, Xiaolong (Luke) Zhang, Bruce Lindsay Department of Statistics College of Information Sciences and Technology Penn State University

Project Goals Develop theories and algorithms for revealing prominent geometric features of mixture density. Develop approaches to clustering, dimension reduction, and variable selection based on the geometry of mixture density. Develop a new interactive visualization system empowered by a suite of statistical learning tools. Apply the statistical methods and visualization paradigm to meteorology data for weather prediction and engineering design data (large scale, high dimensional, temporally evolving).

Interdisciplinary Collaboration Thesis research of Ph.D students on the project W.-Y. Hua, Statistics Clustering and Kalman filter based weather prediction M. Qiao, Computer Science and Engineering Statistical learning based on mixture models X. Yan, Information Sciences and Technology Interactive visualization based on statistical learning and data mining Close collaboration with faculty members across departments Fuqing Zhang, Meteorology Tim Simpson, Mechanical Engineering, Industrial Engineering Data provision Co-advising students Challenges in visualization for engineering design data Evaluation of visualization system Clustering and other pattern recognition needs in meteorology prediction

Advances: Research Modal EM algorithm for solving modes of mixture density. Clustering methods based on mode association. Variable selection based on the geometry of mixture density. Two-way mixture model for high dimensional data. Construct a prototype visualization system called LIVE: Learning based Interactive Visualization for Engineering Design. Explored applications to meteorology data and engineering design data.

Geometry of Mixture Models

Cloud Map Segmentation

Visualization System: LIVE

Advances: Publication J. Li, S. Ray, B. G. Lindsay, “A nonparametric statistical approach to clustering via mode identification,” Journal of Machine Learning Research, 8(8): , M. Qiao, J. Li, “Two-way Gaussian mixture models for high dimensional classification”, Journal of Statistical Analysis and Data Mining, vol.3(4), pp , H. M. Lee, J. Li, “Variable selection for clustering by separability based on ridgelines,” submitted to journal, J. Li, M. Qiao, T. Simpson, X. Yan, X. Zhang, “Facilitating knowledge discovery and decision making in engineering design with multidimensional data mining and clustering,” In Preparation for Journal of Engineering Design: Special Issue on Design Creativity. Online demo for LIVE:

Participation in Community Participation in FODAVA workshops by X. Zhang Attendee, Forum of FODAVA: Geometric Aspects of Machine Learning and Visual Analytics, VisWeek, October, 2009 Attendee, Workshop of Extreme Scale Visual Analytics, VisWeek, October, 2010 Invited session organized by J. Li in Joint Statistical Meetings (JSM), Vancouver, Canada, July 2010: Statistical Modeling and Learning for Information Visualization and Dimension Reduction “Energy functions for Nonlinear Dimension Reduction and Graph Visualization,” L. Chen, Yale University “ Penalized Matrix Classification Analysis in Classifying Volatile Chemical Toxicants,” W. Zhong, UIUC “ A Comparative Study of Variable Screening Methods: Univariate versus Multivariate Screening,” C. Liu, T. Shi, and Y. Lee, Ohio State University “Mode Based Clustering with Applications to Information Visualization,” J. Li, X. Zhang, Penn State University Invited Panelist: X. Zhang Panel of Visualization and Rich Data Sets in the Annual Workshop of Human-Computer Interaction Consortium, February, 2010 Panel Presentation: Interactive Visualization of Large Data Sets: Challenges and Some Preliminary Answers

Future Directions Methodology: Regularized mixture modeling by exploiting geometric characteristics of kernel density or mixture density. Parallel computing for statistical learning methods to improve efficiency of interactive visualization. Visualization system: Work with researchers on engineering design to add new functions and to enhance existing ones for the LIVE system. Evaluate the LIVE system. Applications: Test the data summarization method for meteorology data by embedding it into weather prediction model. Explore other pattern recognition problems faced by meteorologists. Explore new challenges faced in engineering design.