PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director
John Stamper –DataShop Technical Director Alida Skogsholm –DataShop Manager, Developer Brett Leber –Interaction Designer Shanwen Yu –DataShop Developer Sandy Demi –QA (Quality Assurance – Testing) The DataShop Team 2
Central Repository –Secure place to store & access research data Every LearnLab and every study –Supports various kinds of research Primary analysis of study data Exploratory analysis of course data Secondary analysis of any data set Analysis & Reporting Tools –Focus on student-tutor interaction data –Learning curves & error reports provide summary and low-level views of student performance –Performance Profiler aggregates across various levels of granularity (problem, dataset levels, knowledge components, etc.) –Data Export Tab delimited tables you can open with your favorite spreadsheet program or statistical package –New tools created to meet highest demands What is DataShop? 3
Repository Allows for full data management Controlled access for collaboration File attachments Paper attachments Great for secondary analyses
Web Application Knowledge component model analysis with learning curves Learning curve point decomposition
Web Application ◄ Performance Profiler tool for exploring the data ► Easy knowledge component model creation
Problem: a task for a student to perform that typically involves multiple steps Step: an observable part of the solution to a problem Transaction: an interaction between the student and the tutoring system. DataShop Terminology
KC: Knowledge component –also known as a skill/concept/fact –a piece of information that can be used to accomplish tasks KC Model: –also known as a cognitive model or skill model –a mapping between correct steps and knowledge components DataShop Terminology
Base16 Base2 Base3 ExpandedPower1100,000,000 ExpandedPower2 ExpandedPower3 Exponent18 Exponent2 Exponent3 GeneralHelpGoal Node Multiplier16 Multiplier2 Multiplier3
TransactionsStudent-Steps Enter 8 in Multiplier1Multiplier1 Ask for hint on next step ExpandedPower1 Ask for hint Enter 10,000 in ExpandedPower1 Enter 100,000 in ExpandedPower1 Enter 8 in Base1 MultiplierExpandedPowerBaseExponent Multiplier1 Multiplier2 Multiplier3 ExpandedPower1 ExpandedPower2 ExpandedPower3 Base1 Base2 Base3 Exponent1 Exponent2 Exponent3 Enter 6 in Exponent1 Enter 5 in Exponent1 Base1 Exponent1 8100,00010, Observation
TransactionsStudent-Steps Multiplier1 UpdateTextField 8Multiplier1 HintButton ButtonPressed HintRequest ExpandedPower1 Exp.Power1 HintButton ButtonPressed HintRequest ExpandedPower1 UpdateTextField 10,000 ExpandedPower1 UpdateTextField 100,000 Base1 UpdateTextField 8 MultiplierExpandedPowerBaseExponent Multiplier1 Multiplier2 Multiplier3 ExpandedPower1 ExpandedPower2 ExpandedPower3 Base1 Base2 Base3 Exponent1 Exponent2 Exponent3 Exponent 1 UpdateTextField 6 Exponent1 UpdateTextField 5 Base1 Exponent1 8100,00010, KCOpportunitySelection Action InputStep
TransactionsStudent-Steps Multiplier2 UpdateTextField 8S1 Multiplier1 Multiplier1 S1 ExpandedPower1 Exp.Power1 ExpandedPower2 UpdateTextField 100,000 ExpandedPower2 UpdateTextField 1,000,000 Base2 UpdateTextField 8 MultiplierExpandedPowerBaseExponent Multiplier1 Multiplier2 Multiplier3 ExpandedPower1 ExpandedPower2 ExpandedPower3 Base1 Base2 Base3 Exponent1 Exponent2 Exponent3 Exponent 2 UpdateTextField 6 S1 Base1 Base1 S1 Exponent1 Exponent1 8 1,000, , KC Opportunity Selection Action InputStudent Step S1 Multiplier2 Multiplier2 S1 ExpandedPower2 Exp.Power2 S1 Base2 Base2 S1 Exponent2 Exponent2
Terminology Review Observation: a group of transactions for a particular student working on a particular step. Attempt: transaction; an attempt toward a step Opportunity: a chance for a student to demonstrate whether he or she has learned a given knowledge component. An opportunity exists each time a step is present with the associated knowledge component.
Directly –Some tutors are logging directly to the PSLC logging database –CTAT-based tutors (when configured correctly) Indirectly –Other tutors are logging to their own file formats or their own databases –These data require a conversion process –Many studies are in this category How do I get data in? 14
PSLC DataShop Tools Slides current to DataShop version Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. (in press) A Data Repository for the EDM commuity: The PSLC DataShop. To appear in Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC Press.
Dataset Info Performance Profiler Error Report Learning Curve KC Model Export/Import Analysis Tools
Explore data through the DataShop tools Where is DataShop? – –Linked from DataShop homepage and learnlab.org Getting to DataShop 17
Creating an account On DataShop's home page, click "Sign up now". Complete the form to create your DataShop account. "Sign up now" If you’re a CMU student/staff/faculty, click “Log in with WebISO” to create your account. 18
Getting access to datasets By default, you will have access to the public datasets. Of these, we recommend three for getting started: –Geometry Area ( ) –Joint Explanation - Electric Fields - Pitt - Spring 2007 –Chinese Vocabulary Fall 2006 For access to other datasets, contact us: 19
Public datasets that you can view only. Private datasets you can’t view. us and the PI to get access. Datasets you can view or edit. You have to be a project member or PI for the dataset to appear here. DataShop – Dataset selection 20
Dataset Info Meta data for given dataset PI’s get ‘edit’ privilege, others must request it Meta data for given dataset PI’s get ‘edit’ privilege, others must request it 21 Papers and Files storage Problem Breakdown table Dataset Metrics
Performance Profiler Aggregate by Step Problem Student KC Dataset Level Aggregate by Step Problem Student KC Dataset Level View measures of Error Rate Assistance Score Avg # Hints Avg # Incorrect Residual Error Rate View measures of Error Rate Assistance Score Avg # Hints Avg # Incorrect Residual Error Rate Multipurpose tool to help identify areas that are too hard or easy View multiple samples side by side Mouse over a row to reveal uniqueness
Error Report View by Problem or KC Provides a breakdown of problem information (by step) for fine- grained analysis of problem-solving behavior Attempts are categorized by evaluation Provides a breakdown of problem information (by step) for fine- grained analysis of problem-solving behavior Attempts are categorized by evaluation
Learning Curves 24 Visualizes changes in student performance over time Time is represented on the x- axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC Hover the y-axis to change the type of Learning Curve. Types include: Error Rate Assistance Score Number of Incorrects Number of Hints Step Duration Correct Step Duration Error Step Duration Hover the y-axis to change the type of Learning Curve. Types include: Error Rate Assistance Score Number of Incorrects Number of Hints Step Duration Correct Step Duration Error Step Duration
Learning Curves: Drill Down 25 Click on a data point to view point information Click on the number link to view details of a particular drill down information. Details include: Name Value Number of Observations Click on the number link to view details of a particular drill down information. Details include: Name Value Number of Observations Four types of information for a data point: KCs Problems Steps Students Four types of information for a data point: KCs Problems Steps Students
Learning Curve: Latency Curves 26 For latency curves, a standard deviation cutoff of 2.5 is applied by default. The number of included and dropped observations due to the cutoff is shown in the observation table. For latency curves, a standard deviation cutoff of 2.5 is applied by default. The number of included and dropped observations due to the cutoff is shown in the observation table. Step Duration = the total length of time spent on a step. It is calculated by adding all of the durations for transactions that were attributed to a given step. Error Step Duration = step duration when first attempt is an error Correct Step Duration = step duration when the first attempt is correct Step Duration = the total length of time spent on a step. It is calculated by adding all of the durations for transactions that were attributed to a given step. Error Step Duration = step duration when first attempt is an error Correct Step Duration = step duration when the first attempt is correct
Dataset Info: KC Models Handy information displayed for each KC Model: Name # of KCs in the model Created By Mapping Type AIC & BIC Values Handy information displayed for each KC Model: Name # of KCs in the model Created By Mapping Type AIC & BIC Values 27 Toolbox allows you to export one or more KC models, work with them, then reimport into the Dataset. Toolbox allows you to export one or more KC models, work with them, then reimport into the Dataset. DataShop generates two KC models for free: Single-KC Unique-step These provide upper and lower bounds for AIC/BIC. DataShop generates two KC models for free: Single-KC Unique-step These provide upper and lower bounds for AIC/BIC. Click to view the list of KCs for this model. Click to view the list of KCs for this model.
Dataset Info: Export a KC Model 28 Export multiple models at once. Select the models you wish to export and click the “Export” button. Model information as well as other useful information is provided in a tab-delimited Text file. Select the models you wish to export and click the “Export” button. Model information as well as other useful information is provided in a tab-delimited Text file. Selecting the “export” option next to a KC Model will auto-select the model for you in the export toolbox. Selecting the “export” option next to a KC Model will auto-select the model for you in the export toolbox.
Dataset Info: Import a KC Model When you are ready to import, upload your file to DataShop for verification. Once verification is successful, click the “Import” button. Your new or updated model will be available shortly (depending on the size of the dataset). When you are ready to import, upload your file to DataShop for verification. Once verification is successful, click the “Import” button. Your new or updated model will be available shortly (depending on the size of the dataset). 29
Web Services To access the data from a program –New visualization tools –Data mining –or other application 30
Get Web Services Download 31
Getting Credentials 32
To get more details… WebServicesDemoClient_src.zip 33
› Awarded to the PSLC and DataShop First time the challenge used education data This year’s challenge asked participants to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems. The competition addressed questions of both scientific and practical importance. Improved models could be saving millions of hours of students' time (and effort) in learning algebra. These models should both increase achievement levels and reduce time needed to learn. KDD Cup 2010 EDM Challenge
The competition ended on June 8, There were: –655 registered teams –130 teams who submitted predictions –3,400 submissions DatasetStudentsStepsFile size Algebra I ,3109,426,9663 GB Bridge to Algebra ,04320,768, GB The datasets used for the challenge were:
Improving learning by improving the cognitive model: A data- driven approach
Why we need better expert & student models in ITS Two key premises Expert & student model drives instruction –Cognitive model in Cognitive Tutors determine much of ITS behavior; Same for constraints… These models are sometimes wrong & almost always imperfect –ITS developers often build models rationally –But such models may not be empirically accurate A correct cognitive model should predict task difficulty and transfer => generate smooth learning curves => Huge opportunity for ITS/EDM researchers to improve their tutors
If you change cognitive model you change instruction Problem creation, selection, & sequencing –New skills or concepts (= “knowledge components” or “KCs”) require: New kinds problems & instructional activities Changes to student modeling – skillometer, knowledge tracing Feedback and hint message content –One skill becomes two => need new hint messages for new skill –New bug rules may be needed Even interface design – “make thinking visible” –If multiple skills per step => break down by adding new intermediate steps to interface
Expert & student models are imperfect in most ITS How can we tell? Don’t get learning curves –If we know tutor works (get pre to post gains), but “learning curves don’t curve”, then the model is wrong Don’t get smooth learning curves –Even when every KC has a good learning curve (error rate goes down as student gets more opportunities to practice), model still may be imperfect when it has significant deviations from student data
41 Smooth Learning Curves
44 Redesign based on New Model Our discovery suggested changes needed to be made to the tutor –Resequencing – put problems requiring fewer skills first –Knowledge Tracing – adding new skills –Creating new tasks – new problems –Changing instructional messages, feedback or hints
45 Example Geometry Area – Compose by addition
46 “Close the Loop” experiment –5 Classes at a local middle school (2 teachers) –Students took the pre test together and started unit together –Students were allowed to finish the unit at their own pace –Post test immediately followed the completion of the unit –Delayed post test was available but not administered due to teacher’s schedule –80 Students completed the unit and pre/post test and had valid transaction data (missing 1 student’s data)
47 New Model is better
DataShop - What’s in it for me? Free tools to analyze your data Free researchers to analyze your data Real opportunities to validate ideas across multiple data sets
John Stamper –DataShop Technical Director Alida Skogsholm –DataShop Manager, Developer Brett Leber –Interaction Designer Shanwen Yu –DataShop Developer Sandy Demi –QA (Quality Assurance – Testing) Thanks! - The DataShop Team