Feature Engineering Studio January 21, 2015. Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.

Slides:



Advertisements
Similar presentations
Experience of using formative assessment and students perception of formative assessment Paul Ong Greg Benfield Margaret Price.
Advertisements

CS1022 Computer Programming & Principles Second In-Course Assessment.
Sunday, Dec. 14 (the day before the first day of final exams) TAs Courtney Staycoff and Josh Kressmer will be on duty in the open lab (room 203) to answer.
EACH WEEK FOR THE NEXT FEW MONTHS, YOU WILL TURN IN ANALYSES OF THE VARIOUS SOURCES YOU ARE READING FOR YOUR THESIS. Article Analyses.
We’ll be spending minutes talking about Quiz 1 that you’ll be taking at the next class session before you take the Gateway Quiz today.
Entering Students Have a Lot to Tell Us: Are We Listening? NISOD Monday, May 31, :15AM – 12:15PM Room 13A.
Copyright 2003, Christine L. Abela, M.Ed. I’m failing… help! Straight facts to help you try to rebound!
University of Delaware The First Week of Class Institute for Transforming Undergraduate Education.
Intro to CIT 594
Welcome to Physics 2025! ( General Physics Lab 2 - Spring 2013)
Welcome to Physics 1809! General Physics Lab Spring 2013.
Feature Engineering Studio Special Session January 26, 2015.
IT 240 Intro to Desktop Databases Introduction. About this course Design a database: Entity Relation (ER) modeling and normalization techniques Create.
Intro to CIT 594
Instructor David Bell EDB 8514 Office:
SEAS Acad Mtg – 8/26/03Prof. Frank Sciulli Introduction - Physics SEAS Academic Meeting l Intro: Frank Sciulli – Professor in the Physics Dept. u Lecturing.
CSE 322: Software Reliability Engineering Topics covered: Course outline and schedule Introduction, Motivation and Basic Concepts.
Dr. Tatiana Erukhimova [year] Overview of Today’s Class Folders Syllabus and Course requirements Tricks to survive Mechanics Review and Coulomb’s Law.
Nsm.uh.edu Math Courses Available After College Algebra.
Test Preparation Strategies
Feature Engineering Studio February 23, Let’s start by discussing the HW.
Intro to CIT 594
Next class session: Test 1 on all sections covered from Chapters 1 and 2 Remember: If you haven’t yet passed the Gateway Quiz, you need to review your.
Feature Engineering Week 3 Video 3. Feature Engineering.
What you need to know about this class A powerpoint syllabus.
Feature Engineering Studio Special Session September 11, 2013.
Regina Howard, MBA IT133 - Software Applications.
7-Sep-15 Physics 1 (Garcia) SJSU Conceptual Physics (Physics 1) Prof. Alejandro Garcia Spring 2007.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 12, 2012.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 13, 2010.
ACIS 4684/5584 IS Security and Assurance. 2 Dr. Linda Wallace  Office: Pamplin 3092  
Prof. Matthew Hertz WTC 207D /
Feature Engineering Studio September 23, Welcome to Mucking Around Day.
Welcome to MM207, Statistics! Unit 1 Seminar Dr. Bob Lockwood To resize your pods: Place your mouse here. Left mouse click and hold. Drag to the right.
Review of Course Approach and Assignment on Class Discussions These slides from session 1 of the class and can be found on the class website.
PHY 1405 Conceptual Physics (CP 1) Spring 2010 Cypress Campus.
Feature Engineering Studio September 9, Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class.
Fall 2o12 – August 27, CMPSC 202 First Day Handouts  Syllabus  Student Info  Fill out, include all classes and standard appointments  Return.
Lecture Section 001 Spring 2008 Mike O’Dell CSE 1301 Computer Literacy.
Quantitative Methods in Geography Geography 391. Introductions and Questions What (and when) was the last math class you had? Have you had statistics.
Mrs. Susan Ahrensdorf Room Welcome to CC6 Math.
Feature Engineering Studio September 23, Let’s start by discussing the HW.
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
Welcome to Academic Strategies CS Janine Przybyl "When there is a start to be made, don't step over! Start where you are." ~Edgar Cayce.
CM220: Unit 1 Seminar “You must be the change you wish to see in the world.” ~ Mohandas Gandhi.
Feature Engineering Studio September 30, Quick Note Please me for appointments rather than just showing up at my office – I’m always glad.
ACIS 3504 Accounting Systems and Controls. 2 Dr. Linda Wallace  Office: Pamplin 3092  
Welcome to PHY2049 Physics for Engineers and Scientists II Dr. Bindell.
Intro to CIT 594
CS 139 – Algorithm Development MS. NANCY HARRIS LECTURER, DEPARTMENT OF COMPUTER SCIENCE.
Feature Engineering Studio October 7, Welcome to Bring Me a Rock Day 2.
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Welcome to Astronomy 113 “ It would seem that you have no useful skill or talent whatsoever, he said.
REMINDER: If you haven’t yet passed the Gateway Quiz, make sure you take it this week! (You can find more practice quizzes online in the Gateway Info menu.
Data Structures and Algorithms in Java AlaaEddin 2012.
GEK Frederick H. Willeboordse Compound & Prosper! GEK2507.
Feature Engineering Studio February 2, Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class.
The Information School of the University of Washington Information System Design Info-440 Autumn 2002.
REMINDER: If you haven’t yet passed the Gateway Quiz, make sure you take it this week! (You can find more practice quizzes online in the Gateway Info menu.
Grade Scale Quiz 3 Results: Average class score after partial credit: XX.X% Commonly missed questions: # ____________________ We will be going over some.
Welcome to Physics 2225! Physics Lab for Scientist & Engineers 2 Fall 2012.
Welcome to CS 115! Introduction to Programming Spring 2016.
Feature Engineering Studio October 7, Welcome to Bring Me Another Rock.
CS6501 Advanced Topics in Information Retrieval Course Policy
Computer Science 102 Data Structures CSCI-UA
Big Data, Education, and Society
Intro to CIT 594
PHYS 202 Intro Physics II Catalog description: A continuation of PHYS 201 covering the topics of electricity and magnetism, light, and modern physics.
Core Methods in Educational Data Mining
Presentation transcript:

Feature Engineering Studio January 21, 2015

Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features for data mining

What We’ll Cover The process of feature engineering and distillation – brainstorming features – deciding what features to create – criteria for selecting features – actually creating the features – studying the impact of features on model goodness

Why? Feature engineering is the most important, and least well-studied part of the process of developing prediction models It is an art, it is human-driven design It involves lore rather than well-known and validated principles It is hard! (But fun, and important)

Why? It’s well known in data mining (and statistics for that matter) That your model will never be any good if your features (predictors) aren’t very good

The Big Idea How can we take the voluminous, ill-formed, and yet under-specified data that we now have in education And shape it into a reasonable set of variables In an efficient, effective, and predictive way?

Tools We’ll Use Excel Google Refine RapidMiner Other relevant tools (TBD/your choice)

Course times Monday 11am-12:40pm Wednesday 11am-12:40pm Not every week; please see online schedule

Course Prerequisite Core Methods in Educational Data Mining Or instructor approval I will approve anyone who has at least a little bit of background building prediction models or similar statistical models – Talk to me after class, during my office hours, or by appointment

That said… If you haven’t had experience building prediction models in RapidMiner or a similar tool, then you’ll need to learn We will have a few special lab sessions to help you catch up if you don’t have experience with this paradigm or tools You can definitely catch up

Who here? Took or audited my Core Methods course? Has built a prediction model using a classification algorithm and cross-validation? Has built a regression model in a stats package using stepwise regression? Has run a regression in a stats package? Has built any kind of mathematical model?

How this class works Lots of assignments (13) – They can’t be late, because we will discuss them in class – 3 of 12 regular assignments can be missed without penalty, but not the final presentation (#13) – Important note: You cannot do extra assignments and take the best grades. Only the first 9 assignments turned in will be graded. Not many required readings Essential to participate in critique and class discussions

Who here? Has had a design studio style course before?

This is not… A lecture class A reading discussion seminar

This is… A class where you will be working on a project of your own choosing the whole semester A class where you’ll get, and give, a lot of constructive criticism

The semester project You will build a prediction model If you have your own data set, and research question – perfect! If you don’t have your own data set, and research question – no worries! I will help you find one!

Two types of classes Regular sessions – Discuss readings, work on projects Lab sessions – Extra practice with tools – Lecture on concepts beyond regular class topics Including core content from HUDK4050 needed for this class Not a substitute for HUDK4050, we’ll be covering about 5% of HUDK4050 in these sessions

Assignments Let’s look at syllabus

Readings Will be made available very soon

Any questions?

Upcoming Classes 1/26 Lab session on data set finding – Come to this if you don’t have a data set in mind 2/2 Problem proposal (Asgn. 1 due) 2/4 Data cleaning (Asgn. 2 due) 2/16 Lab session on RapidMiner – Come to this if you’ve never built a classifier or regressor in RapidMiner (or a similar tool) – Statistical significance tests using linear regression don’t count… 2/23 Feature distillation in Excel (Asgn.3 due)

Assignment One Problem Proposal – Due Monday, February 2 Be ready to talk for 5 minutes on: – A data set Give where it came from and how big it is You need to already have this data set, or be able to acquire it in the next two weeks – A prediction model you will build in this data set – What variable will you predict? – What kind of variables will you use to predict it? – Why is this worth doing?

Example (Pardos et al., 2014) Data set – ASSISTments system, formative assessment and learning software for math used by 60k students a year (Razzaq et al., 2007) – 810,000 data points from 229 students studied – Student actions in the software have been overlaid with synchronized field observations of student affect (boredom, frustration, etc.) 3075 field observations Each field observation connects to 20 seconds of log file actions

Example (Pardos et al., 2014) We will predict whether a student is bored at a specific time – So that we can replicate the human judgments without needing a field observer We will predict this from what was going on in the log files at the time the field observation was made – We know every student action’s correctness, timing, relevant skill, and probability they knew the skill

Example (Pardos et al., 2014) This is worth doing because boredom is known to predict student learning (Craig et al., 2004; Rodrigo et al., 2009; Pekrun et al., 2010) And building a detector will help us study boredom more thoroughly As well as enabling us to intervene on boredom in real time

Important Considerations Is the problem genuinely important? (usable or publishable) Is there a good measure of ground truth? (the variable you want to predict) Do we have rich enough data to distill meaningful features? Is there enough data to be able to take advantage of data mining?

You don’t need to be able to answer these questions in a week Think about them Think about your problem me or come to my office hours (or set up an appointment) Bring it to class We’ll discuss it in class No idea is perfect right from the start!

Be ready to answer questions

Be ready to ask questions too…

No data ready at hand? Come to next Monday’s session, we will find you data!

Any questions or concerns?