The datasets have 100 instances, and 100 features

Slides:



Advertisements
Similar presentations
STUDENT INFORMATION SYSTEM (SIS) Course Approval Procedure.
Advertisements

Topic 23 Red Black Trees "People in every direction No words exchanged No time to exchange And all the little ants are marching Red and black antennas.
How to Add and Manage Service Locations in eGrants
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Starting Out with C++: Early Objects 5/e © 2006 Pearson Education. All Rights Reserved Starting Out with C++: Early Objects 5 th Edition Chapter 5 Looping.
Cognos 8.4 Upgrade Business Intelligence. Why Cognos 8.4 Increased Performance on Database due to optimized SQL and more filters passed in native SQL.
Separating Columns in Excel. An extremely useful function in Excel is the Text to Column feature which can be used for any type of column separation but.
Checking for Collisions Ellen Yuan Under the direction of Professor Susan Rodger at Duke University June 2014.
Create Mailing Labels (Word 2007) Word 2007 using the Mail Merge function and an Excel spreadsheet Create mailing labels from Member Rosters in.
by Chris Brown under Prof. Susan Rodger Duke University June 2012
In.  This presentation will only make sense if you view it in presentation mode (hit F5). If you don’t do that there will be multiple slides that are.
Overview Trackaparcel booking system is very easy to use which sends up to minute information directly to the logistic company whilst building the manifest.
By Melissa Dalis Professor Susan Rodger Duke University June 2011 Multiplication Table.
AD 305 Electronic Visualization I : School of Art and Design : University of Illinois at Chicago : Spring 2007 Intro to Action Script 9 "The games of a.
1 Chapter 9. To familiarize you with  Simple PERFORM  How PERFORM statements are used for iteration  Options available with PERFORM 2.
AD 206 Intermediate CG : School of Art and Design : University of Illinois at Chicago : Spring 2009 Intro to Action Script 9 "The games of a people reveal.
GEO375 Final Project: From Txt to Geocoded Data. Goal My Final project is to automate the process of separating, geocoding and processing 911 data for.
Processing Text Excel can not only be used to process numbers, but also text. This often involves taking apart (parsing) or putting together text values.
Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
The test data is now online:. Everyone has their own two datasets (small and large). You can find your dataset by looking at the code below.
THE NAE SCREEN (And Other Relevant Things) “ N ame and A ddress Entry”
DU REDCap Introduction
Eamonn Keogh CS005 Introduction to Programming: Matlab Eamonn Keogh
Journal of Mountain Science (JMS)
Topic 2: binary Trees COMP2003J: Data Structures and Algorithms 2
Trees Chapter 15.
Eamonn Keogh CS005 Introduction to Programming: Matlab Eamonn Keogh
On the next slide, you will be shown a figure with an asterisk located at one corner of the figure. Please look at the figure carefully. When it goes away.
Prof. Mark Glauser Created by: David Marr
CS005 Introduction to Programming
What to do when a test fails
Attendance Tracking Module
CS 235 Decision Tree Classification
Chapter 5: Looping Starting Out with C++ Early Objects Seventh Edition
Data File Import / Export
Naviance: Do What You Are Personality Survey
Engineering Innovation Center
Yenka Portfolio Level for this topic: Student Name : My Levels
While Loops BIS1523 – Lecture 12.
Vehicles and Conditionals
Eamonn Keogh CS005 Introduction to Programming: Matlab Eamonn Keogh
Project 2 datasets are now online.
CS 235 Nearest Neighbor Classification
Learning to Program in Python
File Handling Programming Guides.
NextGen Trustee General Ledger Accounting
E-permits Tutorial for first-time users
Iteration: Beyond the Basic PERFORM
Iteration: Beyond the Basic PERFORM
Topic 23 Red Black Trees "People in every direction No words exchanged No time to exchange And all the little ants are marching Red and Black.
Getting Started: Amazon AWS Account Creation
Making Objects Move in Unison: Using Lists
Click ‘browse’ to search your device for
An Introduction to Alice
Nested Loops & The Step Parameter
Lab 2 and Merging Data (with SQL)
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Please use speaker notes for additional information!
Project 2, The Final Project
Overview of Contract Association Batch Upload
HP ALM Test Lab Module To protect the confidential and proprietary information included in this material, it may not be disclosed or provided to any third.
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Data Structures & Algorithms
Making Objects Move in Unison: Using Lists
Correct Function.
The Step Parameter & Nested For … To … Next loops
Depth-First Searches.
Presentation transcript:

The datasets have 100 instances, and 100 features The second column up to the last column are the features Class labels are in the first column Either a 1 or 2

These numbers are in standard IEEE 754-1985, single precision format (space delimited) You can use an off-the-shelf package to read them into your program. In matlab, just >> load CS235testdata1.txt will work

I have a key for the two hidden datasets. On dataset 1 the error rate can be 0.89, when using only features 57 70 99 On dataset 2 the error rate can be 0.93, when using only features 55 87 41 On dataset 3 the error rate can be ----, when using only features -------------- On dataset 4 the error rate can be ----, when using only features ------------- You don’t have this key! So it is your job to do the search to find that subset of features.

To finish this project, I recommend that you completely divorce the search part, from the leave-one-out-cross-validation part. To do this, I wrote a stub function that just returns a random number I will use this in my search algorithm, and only when I am 100% sure that search works, will I “fill in” the full leave-one-out-cross-validation code. function accuracy= leave_one_out_cross_validation(data,current_set,feature_to_add) accuracy = rand; % This is a testing stub only end

function feature_search_demo(data) for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) end I began by creating a for loop that can “walk” down the search tree. I carefully tested it… 1 2 3 4 EDU>> feature_search_demo(mydata) On the 1th level of the search tree On the 2th level of the search tree On the 3th level of the search tree On the 4th level of the search tree 1,2 1,3 2,3 1,4 2,4 3,4 1,2,3 1,2,4 1,3,4 2,3,4 1,2,3,4

function feature_search_demo(data) for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) for k = 1 : size(data,2)-1 disp(['--Considering adding the ', num2str(k),' feature']) end EDU>> feature_search_demo(mydata) On the 1th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On the 2th level of the search tree On the 3th level of the search tree On the 4th level of the search tree Now, inside the loop that “walks” down the search tree, I created a loop that considers each feature separately… I carefully tested it… 1 2 3 4

We are making great progress! function feature_search_demo(data) for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) for k = 1 : size(data,2)-1 disp(['--Considering adding the ', num2str(k),' feature']) end We are making great progress! These nested loops are basically all we need to traverse the search space. However at this point we are not measuring the accuracy of leave_one_out_cross_validation and recording it, so lets us do that (next slide).

feature_search_demo(mydata) On the 1th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On level 1 i added feature 2 to current set On the 2th level of the search tree --Considering… The code below almost works, but, once you add a feature, you should not add it again… function feature_search_demo(data) current_set_of_features = []; % Initialize an empty set for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) feature_to_add_at_this_level = []; best_so_far_accuracy = 0; for k = 1 : size(data,2)-1 disp(['--Considering adding the ', num2str(k),' feature']) accuracy = leave_one_out_cross_validation(data,current_set_of_features,k+1); if accuracy > best_so_far_accuracy best_so_far_accuracy = accuracy; feature_to_add_at_this_level = k; end disp(['On level ', num2str(i),' i added feature ', num2str(feature_to_add_at_this_level), ' to current set']) We need an IF statement in the inner loop that says “only consider adding this feature, if it was not already added” (next slide) 1 2 3 4 1,2 2,3 2,4

EDU>> feature_search_demo(mydata) On the 1th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On level 1 i added feature 4 to current set On the 2th level of the search tree On level 2 i added feature 2 to current set On the 3th level of the search tree On level 3 i added feature 1 to current set On the 4th level of the search tree On level 4 i added feature 3 to current set …We need an IF statement in the inner loop that says “only consider adding this feature, if it was not already added” function feature_search_demo(data) current_set_of_features = []; % Initialize an empty set for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) feature_to_add_at_this_level = []; best_so_far_accuracy = 0; for k = 1 : size(data,2)-1 if isempty(intersect(current_set_of_features,k)) % Only consider adding, if not already added. disp(['--Considering adding the ', num2str(k),' feature']) accuracy = leave_one_out_cross_validation(data,current_set_of_features,k+1); if accuracy > best_so_far_accuracy best_so_far_accuracy = accuracy; feature_to_add_at_this_level = k; end current_set_of_features(i) = feature_to_add_at_this_level; disp(['On level ', num2str(i),' i added feature ', num2str(feature_to_add_at_this_level), ' to current set'])

We are done with the search! EDU>> feature_search_demo(mydata) On the 1th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On level 1 i added feature 4 to current set On the 2th level of the search tree On level 2 i added feature 2 to current set On the 3th level of the search tree On level 3 i added feature 1 to current set On the 4th level of the search tree On level 4 i added feature 3 to current set We are done with the search! The code is the previous slide is all you need. You just have to replace the stub function leave_one_out_cross_validation with a real function, and echo the numbers it returned to the screen. 1 2 3 4 1,2 1,3 2,3 1,4 2,4 3,4 1,2,3 1,2,4 1,3,4 2,3,4 1,2,3,4