Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Presented by Andrew Tjang.
Significance Testing.  A statistical method that uses sample data to evaluate a hypothesis about a population  1. State a hypothesis  2. Use the hypothesis.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Data Mining Classification: Alternative Techniques
Yue Han and Lei Yu Binghamton University.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
CMPUT 466/551 Principal Source: CMU
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Classification and Decision Boundaries
K nearest neighbor and Rocchio algorithm
A Study on Feature Selection for Toxicity Prediction*
MAE 552 – Heuristic Optimization Lecture 24 March 20, 2002 Topic: Tabu Search.
Localized Techniques for Power Minimization and Information Gathering in Sensor Networks EE249 Final Presentation David Tong Nguyen Abhijit Davare Mentor:
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Data Mining Classification: Alternative Techniques
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Sparse vs. Ensemble Approaches to Supervised Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Mining Binary Constraints in the Construction of Feature Models Li Yi Peking University March 30, 2012.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Issues with Data Mining
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Bayesian Networks. Male brain wiring Female brain wiring.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
K Nearest Neighborhood (KNNs)
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
Modern Topics in Multivariate Methods for Data Analysis.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Ensemble Methods: Bagging and Boosting
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Machine Learning Concept Learning General-to Specific Ordering
Citation Provenance FYP/Research Update WING Meeting 28 Sept 2012 Heng Low Wee 1/5/
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
JMP Discovery Summit 2016 Janet Alvarado
Data Mining K-means Algorithm
K Nearest Neighbor Classification
Active learning The learning algorithm must have some control over the data from which it learns It must be able to query an oracle, requesting for labels.
Large Scale Support Vector Machines
Instance Based Learning
COSC 4335: Other Classification Techniques
Machine Learning in Practice Lecture 23
Nearest Neighbors CSC 576: Data Mining.
Data Mining Classification: Alternative Techniques
Chapter 7: Transformations
Data Mining CSCI 307, Spring 2019 Lecture 21
Presentation transcript:

Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn

2 Outline Active Learning FAW-Project Spatial Data Experiment Outline

3 Active Learning Difficult / expensive to obtain labelled data –manual preparation of documents for text mining –analysis of drugs or molecules Active learning strategies actively select which data points to query in order to –minimize the number of training examples for a given classification quality –maximize the quality of results for a given number of data points

4 Selective Sampling Which Instance to choose next? Where we have no data? perform poorly? have a low confidence? expect our model to change? previously found data that improved quality? ORACLE Instance Label? add to training set

5 The FAW-Project FAW:Association to regulate outdoor commercials Goal:Prediction of traffic frequencies for 82 major German cities Samples:~ poster sites measured per city

6 Data Characteristics, Prediction street name, segment ID speed class street type sidewalks one-way-road POIs no. restaurants no. public buildings … spatial coordinates KNN: similarity calculated based on scalar attributes and spatial coordinates applies weights according to (spatial) distance of neighbors

7 Spatial Data Spatial Data: spatial covariance between data points high autocorrelation and concentrated linkage* on street name bias test accuracy –1:n relationship between street name and segments –frequencies within one street are alike here: complete instance space is known (all street segments of a city) *David Jensen, Jennifer Neville: Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners NordstraßeRiesenweg Streets Segments Frequency

8 Active Learning in FAW Usage: additional samples at ~50 places per city KNN needs cross product of street segments with all poster places –Cologne: 50 GB, 5 days Strategy: Data density mean distance of next k neighbors Model differences Build Model Tree with predicted frequencies Disagreement between models?

9 Experiment Outline Test Training Oracle Samples Model Tree Distance Ranking for AL KNN Frequencies Iterations Comparison of accuracy-increase using Ranking vs Random order of added samples Alternatives iterative ranking (reality?, greedy search optimal?) rank once, remove similar objects (eg: exclude segments of same street, …) Possible Problems: KNN not very stable few samples, Oracle has little choice to provide requested data sets

10 Thank you! Suggestions Ideas Questions