Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys MICS3 Regional Training Workshop Data Analysis and Report Writing Workshop Objectives.
Advertisements

Disclosure Slides Council Operations Committee 1 Template and Examples.
Faculty Disclosure Information Elements
Matching SmartHouse Technology to Needs of the Disabled and Elderly Genevieve Davies, Nirmalie Wiratunga, Bruce Taylor and Susan Craw THE ROBERT GORDON.
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Giving Feedback. The right and the wrong. >> giving feedback
Assessment Report School of TAHSS Department: Anthropology Chair: Dr. Roger Kurtz Assessment Coordinator: Dr. Pilapa Esara Date of Presentation:
Position Title: Band: Location: Reports to: Supervises: Sales Manager / Regional Sales Manager AVP On Site General Manager, Sales Job overview Achieve.
Frankenstein homes: would you want to live in one? Bruce Taylor SEARCH (Scottish Centre for Environmental Design Research) Robert Gordon University Aberdeen.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
E-Theses Developments in the UK
Workshop Co-chairs: Derek Bridge Paulo Gomes Nuno Seco Fourth Workshop in Textual Case-Based Reasoning: Beyond Retrieval.
The Quantitative Research Approach
PROBABILITY AND SAMPLES: THE DISTRIBUTION OF SAMPLE MEANS.
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
Classification with several populations Presented by: Libin Zhou.
February 2015 Medical Fitness and Wellness MediBeat Pro™ Server Platform.
Computer Science: A Structured Programming Approach Using C1 3-7 Sample Programs This section contains several programs that you should study for programming.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.
Presented by: Sarah Wong Orthoptist II Orthoptic Unit QA 2011.
HSRU is funded by the Chief Scientist Office of the Scottish Government Health Directorates. The author accepts full responsibility for this talk. Health.
computer
Problem Solving, and Critical Thinking Armed Forces Academy of Health Sciences Leadership Course.
© Prof David J Harper 2004 Looking beyond the delivery of e-Theses to using them: Intelligent document browsing using ProfileSkim David J Harper The Robert.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Effective Lectures. Teaching from objectives Learning Goals Learning / Teaching Activities Feedback & Assessment Constructive Alignment.
Double-tank Process Hybrid Controller: Time-optimal controller –achieve fast step-change response and is used when states are far away from the reference.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
1 Your job is to browse through the following PowerPoint presentation to see an example of a plant cell and the plant cell organelles (parts). After you.
Anomaly Detection in GPS Data Based on Visual Analytics Kyung Min Su - Zicheng Liao, Yizhou Yu, and Baoquan Chen, Anomaly Detection in GPS Data Based on.
Design the game question for LOGIQ. Preparation Question Sheet Pencil Global Math ID card.
SEM Conference Dr. Stephen Allen Jared Wilcken Parker Grimes Southern Utah University SEM Conference Dr. Stephen Allen Jared Wilcken Parker Grimes Southern.
1 st Semester Module 7 Arrays อภิรักษ์ จันทร์สร้าง Aphirak Jansang Computer Engineering Department.
ASSOCIATE RESUME  An associate plays an important role in the effective working organization.  The main job of an associate is to provide support to.
Figure 1 7-prong Model for Improving Service Quality Figure 1. Mayo Clinic Arizona’s data- and accountability driven model for improving service quality.
Curriculum Based Measurement Math. Introduction to Math CBM Show examples of Addition Fact Families 0-12 (0+0 to 12+12) Subtraction Fact Families 0-12.
ACF Office of Community Services (OCS) Community Services Block Grant (CSBG) Survey of Grantees Satisfaction with OCS Survey of Eligible Entities Satisfaction.
Category 3: Do you listen? MidwayUSA Jeff Larkin, Vice President of Marketing Adam Ray, Vice President of Customer Support.
本课内容导读: 一、传统戏曲的表现内容 二、传统戏曲的特点 三、现代戏曲出现了哪些新变化 一、传统戏曲的表现内容 传统戏曲的表现内容大体来说可以分 为三类,通过下面三个戏曲片段的欣赏, 我们共同总结一下是哪三类。
Dr. Ephantus Njagi EQA Practice in Kenya: History and Lessons Learnt 1.
Purpose Design your own experimental procedures. Decide
Islamic University of Gaza Faculty of Nursing
RTL Simulator for VChip Emulator
Learning Feature Mappings Using Evolutionary Computation
CIS 170 Lessons in Excellence-- cis170.com. CIS 170 Entire Course (Devry) For more course tutorials visit CIS170C All iLabs 1,2,3,4,5,6,7.
The idea Policy Process Procedure.
Curriculum-Based Measurement: A Method for Monitoring Student Academic Progress in Basic Skills.
MR Imaging in Children.
Lesson 1: Overview of Sequential Control and Data Acquisition
Link Smarter, Not Harder: How Good Student Academic Assessment Leads to Better Classroom Interventions Jim Wright
Uniform Acceleration Lab: Car and Ramp
Made for individuals ages years
Active learning The learning algorithm must have some control over the data from which it learns It must be able to query an oracle, requesting for labels.
Introduction to Policies and Procedures Toolkit
Musculoskeletal Comorbidities in Cardiac Patients: Prevalence, Predictors, and Health Services Utilization  Susan Marzolini, MSc, Paul I. Oh, MD, David.
I. Statistical Tests: Why do we use them? What do they involve?
Effect combined IMPACT on achieving outcomes Organizational OUTPUTS
eV-TEK Andrew Phillips Ben Laskowski Rob Swanson Shannon Abrell
Label Name Label Name Label Name Label Name Label Name Label Name

Idea documentation #3 Identifying children and youth at risk of social exclusion
Electrical Engineering Tech. (A. A. S
Musculoskeletal Comorbidities in Cardiac Patients: Prevalence, Predictors, and Health Services Utilization  Susan Marzolini, MSc, Paul I. Oh, MD, David.
C.2.10 Sample Questions.
C.2.8 Sample Questions.
C.2.8 Sample Questions.
Purpose Design your own experimental procedures. Decide
Presentation transcript:

Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing

Overview nSelective sampling nCluster creation using an index nCluster and case utility scores nEvaluation

Selective Sampling selected cases labelled cases select interesting cases unlabelled cases (pool) Index case-base Relevance feedback Distance learning Patient monitoring

Uncertainty and Representativeness +- ? ? +- ? ? ? ? ? ?

Sampling Procedure L = set of labelled cases U = set of unlabelled cases LOOP model <= create-domain-model (L) clusters <= create-clusters(model, L, U) k-clusters <= select-clusters(k, clusters, L, U) FOR 1 to Max-Batch-Size case <= select-case(k-clusters, L, U) L <= L U get-label(case, oracle) U <= L \ case UNTIL stopping-criterion

Overview nSelective sampling nCluster creation using an index nCluster and case utility scores nEvaluation

Forming Clusters 5 labelled (4X, 1Y) 6 unlabelled 0 labelled 6 unlabelled f3 5 labelled (2X, 2Z, 1Y) 0 unlabelled < N>= N 5 labelled (2X, 2Y, 1Z) 6 unlabelled f1 f2 ab d e 5 labelled (4Y, 1Z) 0 unlabelled c

Analysing Clusters X X X Y X Y X X Y Z Z Y Y Y Y Z X X Y Z

Overview nSelective sampling nCluster creation nCluster and case utility scores nEvaluation

Ranking Clusters - Cluster Utility Score

Ranking Cases - Case Utility Score

Overview nSelective sampling nCluster creation nCluster and case utility scores nEvaluation

Evaluation nSelection Heuristics Rnd : randomly select cluster and cases Rnd-Cluster : random cluster with highest ranked cases Rnd-Case : highest ranked cluster random cases Informed-S : highest ranked cluster and cases Informed-M : highest ranked clusters and case nUCI ML (6 datasets) smaller data sets (Zoo, Iris, Lymph, Hep) medium data sets (house votes, breast cancer)

Experimental Design Index case-base sampling pool Inc2Inc3Inc4Inc5Inc test set case base size = L + selected cases selected cases = sampling iterations * Max-Batch-Size kNN accuracy

Results I RndRnd-clusterRnd-caseInformed-MInformed-S nZoo (7C, 18F, A, P9) nIris (3C, 4F, #+A, P3)

Results II RndRnd-clusterRnd-caseInformed-MInformed-S nLymphography (4C, 19F, #+A, P9) nHepatitis (2C, 20F, A+?, P7)

Results III RndRnd-clusterRnd-caseInformed-MInformed-S nHouse (2C, 16F, A+?, P3 ) nBreast (2C, 9F, A+?, P7)

Conclusions nDeveloped a case selection mechanism exploiting case base partitions nUtility Scores to rank clusters and cases ClUS captures uncertainty within clusters and uses entropy to further weight this score CaUS captures the impact on other cases nSignificant improvement with informed selection on 6 data sets nThe influence of votes, partitions and entropy needs further investigation

Training Time Ratio (Informed-M/Rnd) Training set size Zoo Iris Lymphography Hepatitis Training set size House Votes Breast Cancer nSmall data sets (difference 2 sec to 15 sec) nLarge data sets (difference 15 sec to 60 sec)

Discussion nImproving the utility scores the changing performance of informed-M, informed-S with different partition numbers needs examined should distances employed with CaUS be transformed? what about considering the votes of the labelled cases? should the training accuracy play a more active role in ClUS? nHow can the presented approach be used for hole discovery? case base maintenance? nShould be evaluated with other sampling methods Uncertainty sampling

Entropy L = labelled cases M = 2 p is the proportion of positive cases in L p Θ the proportion of negative cases in L Entropy measures the impurity of L: Entropy(L) = p (-log 2 p ) + p Θ (-log 2 p Θ ) = - p log 2 p - p Θ log 2 p Θ P Entropy Log 2 m Entropy(C unlabelled )= 0 Entropy(+1, -1)= 1 Entropy (+6, -1)= 0.59 Entropy(+7, -2)= 0.76

Creation, Sampling, Maintenance Case generation Meta Knowledge Sampling Impact of Sampling

Some Requirements for Sampling nUncertainty is not enough? Consider the effect of sampling on the rest of the unlabelled Sampling in dense regions may be good compared to isolated points, because it influences many cases Selecting more than one case may help pick representatives from dense areas, i.e. informed

Forming Clusters f1 5 labelled (4X, 1Y) f2 0 labelled f3 5 labelled (2X, 2Y, 1Z) ab d e < N>=N 5 labelled (2X, 2Y, 1Z) 5 labelled (4Y, 1Z) c 5 labelled 6 unlabelled 0 labelled 6 unlabelled f3 5 labelled 0 unlabelled < N>= N 5 labelled 6 unlabelled f1 f2 ab d e 5 labelled 0 unlabelled c

Experimental Design nUCI ML (6 datasets) Larger data sets (house votes, breast cancer) Smaller data sets (Zoo, Iris, Lymph, Hep) n5 increasing train / test set sizes equally sized splits for selection pool / test sets Training set or case base initialised with labelled cases 150 with an increment of with an increment of 25 nK-NN accuracy on test set averaged over 25 runs