Download presentation
Presentation is loading. Please wait.
1
Belinda Boateng, Kara Johnson, Hassan Riaz
Bike Sharing Demand Belinda Boateng, Kara Johnson, Hassan Riaz
2
Business Objective Bicycle Rental Service in DC Metro Area
Customers rent bicycles from unmanned kiosks around the city Casual vs Registered Renters Objective: Model to predict how many rentals will occur at a given date and time based on factors such as date/time, season, holiday, working day, etc.
3
Data Description Source: Kaggle.com Competition Predictors Response
Ordinal: datetime Nominal: season, holiday, workingday, weather Continuous: temp, atemp, humidity, windspeed Response count: the total number of rentals NOTE: This slide has been changed from the version used in the audio presentation. The audio presentation version incorrectly lists ‘casual’ and ‘registered’ as Predictor variables when in fact they are part of the Response variable Talk about Kaggle.com Talk in depth about each variable Talk about the number of records and how they are split into training vs validation sets Discuss how the observations are split in each set
4
Data Exploration to explore and visualize the data, we used JMP’s graph builder to gain insight about trends, behaviors, patterns and tendencies. Prior to our data exploration phase, we created additional variables from the “datetime” variable. Previously the date time variable was one variable containing the day, month, year and hour. We created multiple variables each containing one of the extracted components in order to find the trends within the trends including: hourly trends within the day. daily trends within the week weekly trends within the month monthly trends within the year We began the data exploration phase by creating a distribution and summary statistics for each of the variables. (Hassan - please note that the findings explained below are all captured in the above chart) While exploring the data, we found some interesting trends within the data. We noticed that we have a peak count of registered bike renters at 8am, 5pm and 6pm during working days. Based on this observation, we hypothesize that most of these individuals are using the bikeshare as their mode of transportation to and from work. Casual users are skewed towards the end of the day for working days. The average working day for registered users has three main peaks in bike rentals. The first peak is 8 am with an average of 480 rentals, a second peak at Noon which consist of 199 rentals and a third peak at 5pm consisting of 529 rentals. The peak in rentals at 8am, 12 noon and 5pm further indicates that most of the registered users are using the bike rentals to get to and from work, hence the peak in users during the beginning and closing of the business day. The peak at noon is explained by the free 30 minutes rental that bike share provides to registered users. Based on the data exploration, we can hypothesize that registered users are using the free 30-minute rentals during their lunch period at noon. For registered users on non working day, there is a peak of bike rentals at noon with an average total of 388 bike rentals.
5
Data Exploration continued
Key Findings from Data Exploration Most registered users rent bikes during working days with peak rentals at 8am, Noon and 5pm. Most casual users rent bikes during non working days More registered users rent bikes on both working days as well as non working days Season 3 (Fall) the largest count of bike rentals Listed on this slide are our key finding from our data exploration phase. (Read findings on slide)
6
Multivariate Regression Model
Initial Model Used all possible predictor variables except datetime Poor Results RSquare: 0.277 RMSE: Explain why we could not use the datetime variable
7
Improved Regression Model
Addition of hour variable Computed from the datetime variable Significantly better results RSquare: 0.632 RMSE: Coefficients tell us a lot Explain the purpose of the hour variable and how it was derived talk about how other variables explain datetime, but nothing accounted for the hour of date Go into detail of Coefficients on key findings slide
8
Time Series Analysis Compared several models: Seasonal ARIMA
Seasonal Exponential Smoothing Various seasonal ARIMA variations Various Transfer Functions Seasonal ARIMA RSquare: 0.885 MAPE: 53.08%, MAE: 38.86 It was surprising that the Seasonal ARMA performed better than the Transfer Function Model, which explicitly took into account temperature, humidity, and windspeed. However upon further reflection we determined that this result indicates that the variation in temperature,windspeed, and humidity are all accounted for by the variation in datetime. This is likely because each season has a relatively stable weather pattern; therefore the seasonal model is able to account for outliers in weather with its error term. MAPE still seems pretty high, from our research we ideally want a MAPE lower than 25%.
9
Neural Network Analysis
Final Chosen Model! Advantage: Improved accuracy RSquare: 0.99 RMSE: .101 Performed similarly on Training and Validation Disadvantage: lack of interpretability Accuracy is more important for this problem!
10
Model Comparison Regression Lowest Accuracy, Higher Interpretability
Time Series Better Accuracy, better suited for time series, problems with missing values Neural Network Best for accuracy Black box
11
Suggestions for Capital Bikeshare
Corporate Discounts due to Commuting Offer Discount Bike Rentals to organizations that operate between 10pm-7am Approach tourist organizations to setup bulk-rentals over summer More data related research
12
Conclusions Performed Exploratory Analysis on data
Created and improved regression model, whose coefficients are informative Time-series model was most accurate Learned a lot about the business from data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.