Show suggestions and borderlines Hierarchical Clustering

Slides:



Advertisements
Similar presentations
FUNCTION FITTING Student’s name: Ruba Eyal Salman Supervisor:
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Clementine Server Clementine Server A data mining software for business solution.
Data Mining – Intro.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Artificial Intelligence (AI) Addition to the lecture 11.
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
An Extensible Python User Environment Jeff Daily Karen Schuchardt, PI Todd Elsethagen Jared Chase H41G-0956 Website Acknowledgements.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Machine Learning.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Learning from observations
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Futures Lab: Biology Greenhouse gasses. Carbon-neutral fuels. Cleaning Waste Sites. All of these problems have possible solutions originating in the biology.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Recent Trends in Text Mining
Data Mining – Intro.
CS 445/545 Machine Learning Winter, 2017
Semi-Supervised Clustering
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Fall 2004 Perceptron CS478 - Machine Learning.
Julia Lane New York University
Siemens Enables Digitalization: Data Analytics & Artificial Intelligence Dr. Mike Roshchin, CT RDA BAM.
Constrained Clustering -Semi Supervised Clustering-
R SE to the challenges of ntelligent systems
Supervised Time Series Pattern Discovery through Local Importance
CS 445/545 Machine Learning Spring, 2017
Data Mining 101 with Scikit-Learn
HI 5354 – Cognitive Engineering
Introductory Seminar on Research: Fall 2017
Visualizing Spatiotemporal Embeddings Demo
Senior Engineering Lead
Transfer Learning: Analyst-Sourcing Behavioral Classification
Lexical: Words vs. Characters Syntactic and Stylistic
Deceptive News Prediction Clickbait Score Inference
Data Mining: Concepts and Techniques Course Outline
Unsupervised Learning and Autoencoders
AI in Cyber-security: Examples of Algorithms & Techniques
Machine Learning & Data Science
Machine Learning Week 1.
Advanced Analytics. Advanced Analytics What is Machine Learning?
Finding Clusters within a Class to Improve Classification Accuracy
Quantifying Deception Propagation on Social Networks
Proportion of Original Tweets
Data Warehousing and Data Mining
InnovationQ Plus Quick Start Guide
Big Data.
Overview of Machine Learning
Text Analytics and Machine Learning Workshop Machine Learning Session
Creative Activity and Research Day (CARD)
CHAPTER 7: Information Visualization
CS639: Data Management for Data Science
Topological Signatures For Fast Mobility Analysis
Semi-Automatic Data-Driven Ontology Construction System
Distributed Edge Computing
Lab 2: Information Retrieval
Semi-Supervised Learning
Machine Learning overview Chapter 18, 21
Getting Started with Microsoft Azure Machine Learning
Presentation transcript:

Show suggestions and borderlines Hierarchical Clustering Interactive Machine Learning at Scale with CHISSL Dustin Arendt1, Emily Grace2, and Svitlana Volkova2 1Visual Analytics Group, 2Data Science and Analytics Group Structured Input Boston Dataset1 Task: housing price (regression) 14 hand engineered features Visualized using radar glyphs The distribution is the median housing price CHISSL Computer-Human Interaction Semi-Supervised Learning What? CHISSL is an O(n) incremental transductive learning algorithm and user interface to: Rapidly organize unlabeled data into groups defined by example The model learns user intent and makes recommendations The model can be corrected in a feedback loop. The user can export the recommendations as labels to train the model How? Train inductive model Export transduction Show suggestions and borderlines 1-NN classifier User Provides Label Parent pointer array Hierarchical Clustering Representation Client Loop Server Text Input VAST Challenge 2014 text dataset2 Task: text exploration by selecting a keywords and coloring by topic The distribution is the date of publication Tip of the Iceberg Only a few carefully chosen representative instances are shown per group to avoid overwhelming the user. Drag and Drop Users train the model by dragging an instance to the group they feel it is most similar to. Responsive The model learns from the users example after each user interaction and re-predicts within milliseconds. User friendly Users can choose what to group, how many groups there are, and what groups mean. ABOUT Pacific Northwest National Laboratory The Pacific Northwest National Laboratory, located in southeastern Washington State, is a U.S. Department of Energy Office of Science laboratory that solves complex problems in energy, national security, and the environment, and advances scientific frontiers in the chemical, biological, materials, environmental, and computational sciences. The Laboratory employs nearly 5,000 staff members, has an annual budget in excess of $1 billion, and has been managed by Ohio-based Battelle since 1965. For more information on the science you see here, please contact: Dustin Arendt Pacific Northwest National Laboratory Richland, WA 99352 (509) 371-6902 dustin.arendt@pnnl.gov Sequence Input VAST Challenge 2014 GPS dataset2 Task: patterns of life, an icon encoding the predominant activity, e.g., home, work, food, each hour for 1 day The distribution is the date the sequence occurred Dynamic Graph Input VAST Challenge 2014 Email Dataset2 Task: sub-network clustering Instances are day x ego-network pairs The features are the degree distribution of the sub-network Why? You can’t crowdsource every problem: Requires domain expertise Experts are rare and their time is valuable Classes are well-defined or known ahead of time Better than active learning because the user: Chooses what to label Can refine the task on the fly Can understand model performance in real-time Next? Evaluation against Active Learning: Computational User study Applications: Trajectories & time series Cybersecurity and insider threat Engineering: Enrichment of streaming data File Name // File Date // PNNL-SA-##### Datasets http://archive.ics.uci.edu/ml/datasets/Housing http://www.vacommunity.org/VAST+Challenge+2014