Interactive Session on Project 2 & Assorted Topics Copyright, 1996 © Dale Carnegie & Associates, Inc. By Surendra Singhi Spring 2005
CSE 471/598, CBS 598 by Surendra Singhi2 Agenda Project 2 discussion Various Terms Requirements Some tips Some code Analysis Questions, questions, and questions. Questions on anything and everything you want to know about the course/projects/home works.
CSE 471/598, CBS 598 by Surendra Singhi3 Jargon Dataset – Data along with the meta- information. Instance – Each row in the dataset. Attributes/Features – Columns in the dataset. Real/Numeric attributes - Continuous Nominal attributes – Discrete Classifier – The decision tree built. Classification – The process of making prediction for instances.
CSE 471/598, CBS 598 by Surendra Singhi4 Project Requirements A function named "read-data" which should take an input file and store it in a structure in some format (its up to you) and return the structure (henceforth called dataset). A sample input file is given as (read-data "train.arff")
CSE 471/598, CBS 598 by Surendra Singhi5 Function "build-classifier" which should take the dataset structure (returned by read-data) and an optional impurity function as its parameter. The function should build a decision tree using a given impurity function. The reason why your function should take another function as a parameter is that in case if you want to use a different impurity measure you should have the flexibility of doing that. This function should return the built classifier (some structure). (build-classifier train-data #'entropy)
CSE 471/598, CBS 598 by Surendra Singhi6 A functions named "entropy" should take probability values (any number of them) and return the impurity (purity) value. (entropy ) (entropy 1) How to do it? Look at “&rest” keyword.
CSE 471/598, CBS 598 by Surendra Singhi7 A function named "evaluate-classifier", it should take as input parameter a classifier and a dataset and return the classification error (percentage) on it. It should print the classification of all the instances in the dataset and also print the confusion matrix. A confusion matrix is a matrix showing the predicted and actual classifications. It is of size L x L, where L is the number of different label values.
CSE 471/598, CBS 598 by Surendra Singhi8 (evaluate-classifier (build-classifier train-data) train-data) (evaluate-classifier (build-classifier train-data) test-data)
CSE 471/598, CBS 598 by Surendra Singhi9 Confusion Matrix Act/Pred Class1 Class2 Class3 Class1 a b c Class2 d e f Class3 g h i
CSE 471/598, CBS 598 by Surendra Singhi10 Tips Try to make extensive use of Ansi Lisp functions. Functions – sort, reduce, map, max, min Control statements – if, when, unless, cond, case, loop while, loop until, loop for Global variables - *i-am-a-global-variable* Use “let” to create local variables
CSE 471/598, CBS 598 by Surendra Singhi11 Use a good editor, something better than “Notepad” ) ) – is irritating for lisp hackers Please comment all print statements which are used for testing purpose. Please do not use “progn” unnecessarily
CSE 471/598, CBS 598 by Surendra Singhi12 Some code snippets Parsing the tsp problem: 1. (defun read-tsp-problem (file-name) 2. (let (city-list matrix len) 3. (with-open-file (fptr file-name :direction :input) 4. (setf city-list (with-input-from-string (cities (read-line fptr)) 5. (loop for city = (read cities nil cities) 6. until (eq city cities) 7. collect city))) 8. (setf len (length city-list)) 9. (setf matrix (make-array (list len len))) 10. (dotimes (i len) 11. (dotimes (j len) 12. (setf (aref matrix i j) (read fptr))))) 13. (print matrix)))
CSE 471/598, CBS 598 by Surendra Singhi13 Some more code….. Helper function for parsing input file 1. (defun read-till-delimiter (stream delimiter) 2. (string-trim '(#\Space #\Tab #\Newline) 3. (with-output-to-string (str) 4. (loop for ch = (read-char stream nil stream) 5. until (or (eq ch stream) (eql ch delimiter)) 6. do (princ ch str)))))
CSE 471/598, CBS 598 by Surendra Singhi14 Question Time!!!!