Earthquake Prediction https://athena.ecs.csus.edu/~kokkiras/index.html Team - 18 Mentor : Prof. Meiliu Lu Done By : Akhil Madineni Suraj Krishna Kokkirala
Over View Problem Statement Goal Data Overview Data Preprocessing Model Implementation Demo Results Learnings References
Why predicting an Earthquake ? Forecasting earthquakes is one of the most important problems in science because of their catastrophic consequences. Current scientific studies related to earthquake forecasting focus on three key points: When the event will occur ? Where will it occur ? How large will it be ?
Goal The goal is to predict the timing of laboratory earthquakes using seismic signals . The data is been taken from an experimental set-up used to study earthquake physics. The acoustic_data input signal is used to predict the time remaining before the next laboratory earthquake (time_to_failure).
Data Overview File descriptions train.csv - A single, continuous training segment of experimental data. test - A collection of many small segments of acoustic data signals. sample_sumbission.csv - According to the sample submission file, we need to predict time to failure for each segment of test data.
Data Pre-Processing - I Raw Data Fields acoustic_data - the seismic signal [int16] time_to_failure - the time (in seconds) until the next laboratory earthquake [float64]
Data Pre-Processing - II Analyze Acoustic Data and “time to failure” Data.
Data Pre-Processing - III Steps for converting Data as X and Y for our prediction model Read all rows between earthquakes and create folds. For chunks of size 150'000,75000,50000,30000, we extract a couple of features and store them in a row of a matrix X. The response is stored in a vector y(which is last time step of each chunk. We move by "stride" positions and repeat steps 2 & 3 until the earthquake happens.
Model Implementation Random Forest - It is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
#MODEL Trees Strides Mean Absolute Error(MAE) Model-1 500 150000 2.267461 Model-2 2000 2.265719 Model-3 5000 2.264752 Model-4 75000 2.282808 Model-5 2.282194 Model-6 2.282426 Model-7 50000 2.284351 Model-8 2.282518 Model-9 2.282003 Model-10 30000 2.291800 Model-11 2.289706 Model-12 2.289344
Learnings…. Data Pre-Processing is vital to the accuracy of the models. Opting for appropriate machine learning techniques and algorithms to model the system. Plotting data provides useful insights and can lead to better models. Learnings related from ’R’. To solve the error “cannot allocate a vector of size xxx MB ” use memory.limit function to assign more RAM to R kernel.
References https://en.wikipedia.org/wiki/Earthquake https://paperswithcode.com/