The test data is now online:. Everyone has their own two datasets (small and large). You can find your dataset by looking at the code below.

Slides:



Advertisements
Similar presentations
ProgressBook User Start-Up
Advertisements

STUDENT INFORMATION SYSTEM (SIS) Course Approval Procedure.
Online Sign-Up Demo This demo is designed to walk you through the process of selecting a space during the Room Sign-Up phase of Room Selection. Click your.
Gaussian Elimination Matrices Solutions By Dr. Julia Arnold.
Html: getting started HTML is hyper text markup language. It is what web browsers look at on the Internet. HTML documents should be created in a simple.
Getting Started With Alice By Ruthie Tucker under the direction of Prof. Susan Rodger Duke University, July
Topic 23 Red Black Trees "People in every direction No words exchanged No time to exchange And all the little ants are marching Red and black antennas.
Exit Microsoft Outlook Skills Using Categories for Sorting, Filtering and Creating Group Oklahoma Department of Corrections Training Administration.
CS 206 Introduction to Computer Science II 04 / 27 / 2009 Instructor: Michael Eckmann.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Cognos 8.4 Upgrade Business Intelligence. Why Cognos 8.4 Increased Performance on Database due to optimized SQL and more filters passed in native SQL.
Introduction to a Programming Environment
Inventory Throughout this slide show there will be hyperlinks (highlighted in blue) follow the hyperlinks to navigate to the specified Topic or Figure.
4-H Leader Training 4-H Online Orientation State 4-H Online Team Erika Thiel Greg Brixey Adam McKinney.
Application Process USAJOBS – Application Manager USA STAFFING ® —OPM’S AUTOMATED HIRING TOOL FOR FEDERAL AGENCIES.
Separating Columns in Excel. An extremely useful function in Excel is the Text to Column feature which can be used for any type of column separation but.
Karen Y & Enzo Silvestri Fayetteville Technical Community College
Addition Algorithms S. Matthews. Partial-Sums Method for Addition Add from left to right and column by column. The sum of each column is recorded on a.
Checking for Collisions Ellen Yuan Under the direction of Professor Susan Rodger at Duke University June 2014.
Create Mailing Labels (Word 2007) Word 2007 using the Mail Merge function and an Excel spreadsheet Create mailing labels from Member Rosters in.
Dear User, This presentation has been designed for you by the Hearts and Minds Support Team. It provides a template for presenting the results of the SAFE.
by Chris Brown under Prof. Susan Rodger Duke University June 2012
CS 261 – Fall 2009 Binary Search Trees Again, but in detail.
The Civil War: Quiz Yourself
In.  This presentation will only make sense if you view it in presentation mode (hit F5). If you don’t do that there will be multiple slides that are.
Overview Trackaparcel booking system is very easy to use which sends up to minute information directly to the logistic company whilst building the manifest.
Washington Campus Compact New Time Log Database Note to users: You should use Internet Explorer to use this database. In other programs (i.e. Firefox)
MPA Online Entering Solos, Ensembles, and Bands. The first step is to log into MPA Online at: You will need.
Iteration. Adding CDs to Vic Stack In many of the programs you write, you would like to have a CD on the stack before the program runs. To do this, you.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
Shaping Learning Together Jessica Wu ELT Consultant
Alice Learning to program: Part Three Camera Control, Invisibility, and 3-D Text By Ruthie Tucker and Jenna Hayes, Under the direction of Professor Rodger.
Changing Camera Views! Part 2: Simple Scene Change & Lighting Fixes By Bella Onwumbiko under the direction of Professor Susan Rodger Duke University July.
Upper Dublin High School Guidance Department
Discipline, Crime, and Violence August New DCV Application The DCV application and submission process has been revised beginning with the
When you start a new world, pick a backgrou nd.. Click on add objects to add objects to your world.
Arrays and ArrayLists in Java L. Kedigh. Array Characteristics List of values. A list of values where every member is of the same type. Each member in.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Animating Objects in Groups: Using Arrays and Lists By Ruthie Tucker under the direction of Professor Susan Rodger Summer 2008.
1 What to do before class starts??? Download the sample database from the k: drive to the u: drive or to your flash drive. The database is named “FormBelmont.accdb”
An Introduction to Alice (Short Version) – Extras! Yossra Hamid Under the Supervision of Professor Susan Rodger Duke University, June 2014 This is a continuation.
WJEC Exam Unit 1: Reading English in the Daily World.
WRITING CONTROL STRUCTURES (CONDITIONAL CONTROL).
1 Chapter 9. To familiarize you with  Simple PERFORM  How PERFORM statements are used for iteration  Options available with PERFORM 2.
ERIC Educational Resources Information Center Searching.
Creating a Historical Tour in Alice By Jenna Hayes May 2010.
Page Designer Storyboard J. A. Fitzpatrick December 2004.
A (VERY) SHORT INTRODUCTION TO MATLAB J.A. MARR George Mason University School of Physics, Astronomy and Computational Sciences.
Tutorial for Arrays and Lists. Description This presentation will cover the basics of using Arrays and Lists in an Alice world It uses a set of chickens.
GEO375 Final Project: From Txt to Geocoded Data. Goal My Final project is to automate the process of separating, geocoding and processing 911 data for.
Source Cards. Getting Started: This Power Point will help take you through the process of writing your source cards and making sure they are perfect.
SurveyDIG 2.1 Tutorial. Tutorial Contents Introduction Introduction Item Groups Item Groups –Creating new Groups –Naming Convention –Searching/Editing.
Instructions for transferring names and addresses from a MS WORD table (previously created for printing address labels) to MS Excel for upload to SendOutCards.
Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
THE NAE SCREEN (And Other Relevant Things) “ N ame and A ddress Entry”
Eamonn Keogh CS005 Introduction to Programming: Matlab Eamonn Keogh
Topic 2: binary Trees COMP2003J: Data Structures and Algorithms 2
Attendance Tracking Module
Data File Import / Export
Project 2 datasets are now online.
Managing Rosters Screener Training Module Module 5
The datasets have 100 instances, and 100 features
Iteration: Beyond the Basic PERFORM
Iteration: Beyond the Basic PERFORM
Depth-First Searches Introduction to AI.
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Project 2, The Final Project
MTPD Technology Overview
Depth-First Searches.
Presentation transcript:

The test data is now online:

Everyone has their own two datasets (small and large). You can find your dataset by looking at the code below.

The small datasets have 100 instances, and ten features The large datasets have 1000 instances, and forty features Class labels are in the first column Either a 1 or 2 The second column up to the last column are the features

These numbers are in standard IEEE , single precision format (space delimited) You can use an off-the-shelf package to read them into your program.

On large dataset 80 the error rate can be when using only features *************************** On small dataset 80 the error rate can be 0.89 when using only features *************************** I have a key for all the datasets. For example, I know that for small dataset 80, all the features are irrelevant, except for features 5, 7 and 3. And I know that if you use ONLY those features, you can get an accuracy of about You don’t have this key! So it is your job to do the search to find that subset of features. Everyone will have a different subset and a different achievable accuracy (You can use test case 80 below to test your algorithm)

function accuracy= leave_one_out_cross_validation(data,current_set,feature_to_add) accuracy = rand; % This is a testing stub only end To finish this project, I recommend that you completely divorce the search part, from the leave-one-out-cross-validation part. To do this, I wrote a stub function that just returns a random number I will use this in my search algorithm, and only when I am 100% sure that search works, will I “fill in” the full leave-one-out-cross-validation code.

function feature_search_demo(data) for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) end end ,42,41,42,31,31,2 2,3,41,3,41,2,41,2,3 1,2,3,4 EDU>> feature_search_demo(mydata) On the 1th level of the search tree On the 2th level of the search tree On the 3th level of the search tree On the 4th level of the search tree I began by creating a for loop that can “walk” down the search tree. I carefully tested it…

function feature_search_demo(data) for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) for k = 1 : size(data,2)-1 disp(['--Considering adding the ', num2str(k),' feature']) end 1234 EDU>> feature_search_demo(mydata) On the 1th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On the 2th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On the 3th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On the 4th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature Now, inside the loop that “walks” down the search tree, I created a loop that considers each feature separately… I carefully tested it…

function feature_search_demo(data) for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) for k = 1 : size(data,2)-1 disp(['--Considering adding the ', num2str(k),' feature']) end We are making great progress! These nested loops are basically all we need to traverse the search space. However at this point we are not measuring the accuracy of leave_one_out_cross_validation and recording it, so lets us do that (next slide).

function feature_search_demo(data) current_set_of_features = []; % Initialize an empty set for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) feature_to_add_at_this_level = []; best_so_far_accuracy = 0; for k = 1 : size(data,2)-1 disp(['--Considering adding the ', num2str(k),' feature']) accuracy = leave_one_out_cross_validation(data,current_set_of_features,k+1); if accuracy > best_so_far_accuracy best_so_far_accuracy = accuracy; feature_to_add_at_this_level = k; end end disp(['On level ', num2str(i),' i added feature ', num2str(feature_to_add_at_this_level), ' to current set']) end feature_search_demo(mydata) On the 1th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On level 1 i added feature 2 to current set On the 2th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering… The code below almost works, but, once you add a feature, you should not add it again… ,42,31,2 We need an IF statement in the inner loop that says “only consider adding this feature, if it was not already added” (next slide)

function feature_search_demo(data) current_set_of_features = []; % Initialize an empty set for i = 1 : size(data,2)-1 disp(['On the ',num2str(i),'th level of the search tree']) feature_to_add_at_this_level = []; best_so_far_accuracy = 0; for k = 1 : size(data,2)-1 if isempty(intersect(current_set_of_features,k)) % Only consider adding, if not already added. disp(['--Considering adding the ', num2str(k),' feature']) accuracy = leave_one_out_cross_validation(data,current_set_of_features,k+1); if accuracy > best_so_far_accuracy best_so_far_accuracy = accuracy; feature_to_add_at_this_level = k; end current_set_of_features(i) = feature_to_add_at_this_level; disp(['On level ', num2str(i),' i added feature ', num2str(feature_to_add_at_this_level), ' to current set']) end EDU>> feature_search_demo(mydata) On the 1th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On level 1 i added feature 4 to current set On the 2th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature On level 2 i added feature 2 to current set On the 3th level of the search tree --Considering adding the 1 feature --Considering adding the 3 feature On level 3 i added feature 1 to current set On the 4th level of the search tree --Considering adding the 3 feature On level 4 i added feature 3 to current set …We need an IF statement in the inner loop that says “only consider adding this feature, if it was not already added”

EDU>> feature_search_demo(mydata) On the 1th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature --Considering adding the 4 feature On level 1 i added feature 4 to current set On the 2th level of the search tree --Considering adding the 1 feature --Considering adding the 2 feature --Considering adding the 3 feature On level 2 i added feature 2 to current set On the 3th level of the search tree --Considering adding the 1 feature --Considering adding the 3 feature On level 3 i added feature 1 to current set On the 4th level of the search tree --Considering adding the 3 feature On level 4 i added feature 3 to current set We are done with the search! The code is the previous slide is all you need. You just have to replace the stub function leave_one_out_cross_validation with a real function, and echo the numbers it returned to the screen ,42,41,42,31,31,2 2,3,41,3,41,2,41,2,3 1,2,3,4

I will review the leave_one_out_cross_validation part another time. However, as you can see from these notes, you can work on the search, and completely code it up now! I strongly recommend that you do so.