Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robin Lock, Ivan Ramler, Choong-Soo Lee

Similar presentations


Presentation on theme: "Robin Lock, Ivan Ramler, Choong-Soo Lee"— Presentation transcript:

1 Web Tools to Help Students Get Individualized Datasets on a Common Theme
Robin Lock, Ivan Ramler, Choong-Soo Lee Department of Math, CS and Statistics St. Lawrence University Joint Mathematics Meetings, Atlanta, GA January 2017 Dennis

2 Goal Allow students (or groups of students) to easily create individual datasets with a similar structure on a common theme. Criteria: Easy, electronic access Real data (and relatively interesting) Let students personalize what they select Let instructor anticipate what they get

3 Three Examples (Web Scraping)
IMDb Ratings of TV Series Episodes Prices of Used Cars Prices of Houses

4 IMDb TV Series Ratings Shiny app: shiny.stlawu.edu:3838/TVSeries
Ivan Ramler Tenzin Choeyang Shiny app: shiny.stlawu.edu:3838/TVSeries User enters: the name of a TV series (from a list of series with at least 10 rated episodes) Output includes: Episode Average User Rating Season Number of Raters Episode within Season Episode name

5 Opens with a random choice

6 Autocomplete as you type

7 Preview of the data

8

9 What Can Students Do With IMDb TV Series Datasets?
Graphical (side-by-side plots) and numerical comparison of ratings by season Graphical and numerical comparison of ratings within seasons Time series plots of ratings Scatterplot/correlation/regression to see any relationship between ratings and # of raters Comparisons of ratings between two (or more) shows.

10

11 Used Car Prices from Cars.com
Choong-Soo Lee myslu.stlawu.edu/~clee/dataset/cars User enters: Make/Model (from a list of 100+ popular models) Zip Code (to search near) Max sample size File name to store data Output includes: Year Price (in $1,000’s) Mileage (in 1,000’s)

12 myslu.stlawu.edu/~clee/dataset/cars

13

14 What Can Students Do With a Cars.com Dataset?
Intro stat: Predict price based on age (or mileage) Is there an age where the predicted price is free? Find a CI for mean price (or mileage) of a car model. Test for a difference in mean price between two different car models (or two different zip codes) Assess regression conditions

15 What Else Can Students Do With a Cars.com Dataset?
Stat2: Try a quadratic model (for price) based on age. Multiple regression using both age and mileage to predict price. Find a prediction interval for the price for a particular age and/or mileage. Compare the Price~Age regression lines for two different car models (or zip codes) Compare mean price (ANOVA) for several different models

16 R2=84.5% R2=89.0%

17 House Prices from Zillow.com
Choong-Soo Lee myslu.stlawu.edu/~clee/dataset/zillow User enters: Zip Code, Max sample size, File name to store data Default output includes: Price (in $1,000’s) Bedrooms Bathrooms Square footage Lot size (acres) URL to property Optional (slower): Address (street, city, state, zip) Year built Zestimate Exact Price

18 myslu.stlawu.edu/~clee/dataset/zillow

19 Default Housing Data: Canton, NY

20 Full Housing Data: Canton, NY

21 What Can Students Do With a Zillow Houses Datasets?
Predict price based on beds, baths, sqfootage, and/or lot size as single predictors. Use multiple regression to predict price Compare mean house price between different zip codes Detect outliers and influential points

22 What’s up with these houses?

23 Default Housing Data: Canton, NY

24

25

26 New Medical Building

27 Benefits of Individualized Datasets
Each student (group) has their own data Students can personalize the data (picking a TV show, car model, or housing area of interest). Data collection is relatively easy/convenient. Instructor can anticipate the general features of the dataset (e.g. decreasing trend for price vs. age of car) Drawbacks? Challenging to grade when each student (group) has a different dataset

28 Grading Tip: RMarkdown
Students turn in a .csv file with their data Prepare a generic R script that reads the data from a .csv file and does many of the relevant calculations, summary statistics, fitted models, and graphics Note that variable names and format are standardized Embed the R commands in a Markdown document that formats the results to make a student-specific key.

29 Link to All Three myslu.stlawu.edu/~clee/dataset


Download ppt "Robin Lock, Ivan Ramler, Choong-Soo Lee"

Similar presentations


Ads by Google