Creative Activity and Research Day (CARD)

Creative Activity and Research Day (CARD)
Forecasting Smart Meter Energy Usage using Distributed Systems and Machine Learning Chris Dong, Lingzhi Du, Feiran Ji, Amber Song, Vanessa Zheng (USF Master of Data Science) Feiran We are team sparkling smartwater, and we are going to talk about our project on . I am Feiran. (...)

Why? Data | MongoDB | Spark | ML Outcome | Conclusion
We choose smart meter data because… Smart meters in every home in London Optimize energy usage High electricity prices Helps consumer understand their own energy consumption Feiran The government plans to install smart meters in every home in London Optimize energy usage High electricity prices Helps consumer understand their own energy consumption Data | MongoDB | Spark | ML Outcome | Conclusion

Importing data from S3 to EC2
aws s3 cp s3://smart-meters/halfhourly/ . --recursive --acl public-read aws s3 cp s3://smart-meters/hhblock_dataset/ . --recursive --acl public- read aws s3 cp s3://smart-meters/other/ . --recursive --acl public-read Chris Here we see the data loaded into EC2, which goes from block 0 to block 111. Data | MongoDB | Spark | ML Outcome | Conclusion

Importing data into MongoDB from S3
Add a new column indicating the filename for i in *.csv; do mongoimport -d smart -c energy --type csv --file $i -- headerline ; done for i in *.csv; do mongoimport -d wide -c energy --type csv --file $i -- headerline ; done for i in *.csv; do mongoimport -d other -c $i - -type csv --file $i -- headerline ; done Chris Here we import the data into separate MongoDB databases and run a simple query. I also add a new column indicating the filename Data | MongoDB | Spark | ML Outcome | Conclusion

Creating RDD from MongoDB on EC2
Amber Data | MongoDB | Spark | ML Outcome | Conclusion

Querying Spark DataFrame data on EC2
Amber Data | MongoDB | Spark | ML Outcome | Conclusion

AWS Instance Specs Data | MongoDB | Spark | ML Outcome | Conclusion
EC2: 1 Instance: t2.large 2 cores 8 GB RAM 300 GB Storage 0.09/hour Standalone：5 instances, 4 workers C3.2xlarge 8 cores 15 GB RAM 160 GB Storage 2.1/hour YARN：5 instances, 4 workers C3.2xlarge 16 cores 15 GB RAM 160 GB Storage 2.1/hour YARN: 1 Instance C3.8xlarge 32 cores 60 GB RAM 640 GB Storage 1.68/hour Amber 4.26 3.07 Data | MongoDB | Spark | ML Outcome | Conclusion

Data Overview Data | MongoDB | Spark | ML Outcome | Conclusion

Feature Engineering Data | MongoDB | Spark | ML Outcome | Conclusion
* day1 = one day before label day * day2 = two days before label day * e.g., if label day is , day1 = , day2 = Lingzhi. Let me talk about how we do feature engineering and build our training dataset. Basically we pick one timestamp as our response and use the data before this timestamp to build our features. Features we built include daily average energy usage for some days before the timestamp. This is used to capture the trend of the time series. We also include weather data in these days because temperature is highly correlated to energy usage. Then we add hourly average energy usage. For example, we calculated the average energy usage at 12 am for day 1 to day 7 as a feature. There are 48 half-hours so there are 48 features here. This work as a seasonal component to capture the daily seasonality of the time series. We also include neighborhood info as a categorical feature and feed it into random forest with label encoding. For training response, as we are predicting for every 2 hours, we actually have 12 values at that date. So we are building 12 different random forest models and they share the same features but are trained separately. Data | MongoDB | Spark | ML Outcome | Conclusion

Energy Usage Predictions
Data | MongoDB | Spark | ML Outcome | Conclusion

Time per model Data | MongoDB | Spark | ML Outcome | Conclusion
Attempt Data preprocessing Model training Number of Trees in RF 1 Spark SQL Spark ML 10 2 Pandas 3 300 Attempt single ec2 instance Standalone 5 instances Yarn 5 instances 3 instances 1 instances 1 forever 3602 s 3478 s 3186 s 2 34.8 s 36.7 s 35.3 s 43.6 s 3 error 500.0 s 546.0 s 937.4 s Amber 6.52 Data | MongoDB | Spark | ML Outcome | Conclusion

Github link: https://github.com/LenzDu/Smart-Meter
Thanks! Github link:

Creative Activity and Research Day (CARD)

Similar presentations

Presentation on theme: "Creative Activity and Research Day (CARD)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Creative Activity and Research Day (CARD)

Similar presentations

Presentation on theme: "Creative Activity and Research Day (CARD)"— Presentation transcript:

Similar presentations

About project

Feedback