Download presentation
Presentation is loading. Please wait.
1
Web Tools to Help Students Get Individualized Datasets on a Common Theme
Robin Lock, Ivan Ramler, Choong-Soo Lee Department of Math, CS and Statistics St. Lawrence University Joint Mathematics Meetings, Atlanta, GA January 2017 Dennis
2
Goal Allow students (or groups of students) to easily create individual datasets with a similar structure on a common theme. Criteria: Easy, electronic access Real data (and relatively interesting) Let students personalize what they select Let instructor anticipate what they get
3
Three Examples (Web Scraping)
IMDb Ratings of TV Series Episodes Prices of Used Cars Prices of Houses
4
IMDb TV Series Ratings Shiny app: shiny.stlawu.edu:3838/TVSeries
Ivan Ramler Tenzin Choeyang Shiny app: shiny.stlawu.edu:3838/TVSeries User enters: the name of a TV series (from a list of series with at least 10 rated episodes) Output includes: Episode Average User Rating Season Number of Raters Episode within Season Episode name
5
Opens with a random choice
6
Autocomplete as you type
7
Preview of the data
9
What Can Students Do With IMDb TV Series Datasets?
Graphical (side-by-side plots) and numerical comparison of ratings by season Graphical and numerical comparison of ratings within seasons Time series plots of ratings Scatterplot/correlation/regression to see any relationship between ratings and # of raters Comparisons of ratings between two (or more) shows.
11
Used Car Prices from Cars.com
Choong-Soo Lee myslu.stlawu.edu/~clee/dataset/cars User enters: Make/Model (from a list of 100+ popular models) Zip Code (to search near) Max sample size File name to store data Output includes: Year Price (in $1,000’s) Mileage (in 1,000’s)
12
myslu.stlawu.edu/~clee/dataset/cars
14
What Can Students Do With a Cars.com Dataset?
Intro stat: Predict price based on age (or mileage) Is there an age where the predicted price is free? Find a CI for mean price (or mileage) of a car model. Test for a difference in mean price between two different car models (or two different zip codes) Assess regression conditions
15
What Else Can Students Do With a Cars.com Dataset?
Stat2: Try a quadratic model (for price) based on age. Multiple regression using both age and mileage to predict price. Find a prediction interval for the price for a particular age and/or mileage. Compare the Price~Age regression lines for two different car models (or zip codes) Compare mean price (ANOVA) for several different models
16
R2=84.5% R2=89.0%
17
House Prices from Zillow.com
Choong-Soo Lee myslu.stlawu.edu/~clee/dataset/zillow User enters: Zip Code, Max sample size, File name to store data Default output includes: Price (in $1,000’s) Bedrooms Bathrooms Square footage Lot size (acres) URL to property Optional (slower): Address (street, city, state, zip) Year built Zestimate Exact Price
18
myslu.stlawu.edu/~clee/dataset/zillow
19
Default Housing Data: Canton, NY
20
Full Housing Data: Canton, NY
21
What Can Students Do With a Zillow Houses Datasets?
Predict price based on beds, baths, sqfootage, and/or lot size as single predictors. Use multiple regression to predict price Compare mean house price between different zip codes Detect outliers and influential points
22
What’s up with these houses?
23
Default Housing Data: Canton, NY
26
New Medical Building
27
Benefits of Individualized Datasets
Each student (group) has their own data Students can personalize the data (picking a TV show, car model, or housing area of interest). Data collection is relatively easy/convenient. Instructor can anticipate the general features of the dataset (e.g. decreasing trend for price vs. age of car) Drawbacks? Challenging to grade when each student (group) has a different dataset
28
Grading Tip: RMarkdown
Students turn in a .csv file with their data Prepare a generic R script that reads the data from a .csv file and does many of the relevant calculations, summary statistics, fitted models, and graphics Note that variable names and format are standardized Embed the R commands in a Markdown document that formats the results to make a student-specific key.
29
Link to All Three myslu.stlawu.edu/~clee/dataset
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.