Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression Jana Naue, Huub C.J. Hoefsloot, Olaf R.F. Mook, Laura Rijlaarsdam-Hoekstra, Marloes C.H. van der Zwalm, Peter Henneman, Ate D. Kloosterman, Pernette J. Verschure Forensic Science International: Genetics Volume 31, Pages 19-28 (November 2017) DOI: 10.1016/j.fsigen.2017.07.015 Copyright © 2017 Terms and Conditions
Fig. 1 Final age-dependent markers. DNAm results for the 208 samples of the training set are plotted. The DNAm of seven markers show an increase with age (red) and a decrease for six markers (blue). SAMD10_2, GRM2_9, LDB2_3, and KLF14_2 are neighboring CpG sites of the original 450K site. Spearman coefficient is provided (S.rho: Spearman’s rho). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Forensic Science International: Genetics 2017 31, 19-28DOI: (10.1016/j.fsigen.2017.07.015) Copyright © 2017 Terms and Conditions
Fig. 2 Age-prediction using the final 13 markers. a) Cross-Validation of the training set. b) Test dataset predicted with the model trained on all 208 training set samples. The orange and yellow range represent the RMSE (70.19% of data) and 2xRMSE (94.2%) of the test set. The maximum deviation from the chronological age observed was 12.5 years. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Forensic Science International: Genetics 2017 31, 19-28DOI: (10.1016/j.fsigen.2017.07.015) Copyright © 2017 Terms and Conditions
Fig. 3 Deviation of the prediction from the chronological age. The number of samples with prediction errors is plotted for all 104 samples grouped into three age groups of approximately the same size (34–36). The individuals of the young age group are rather overestimated in comparison to older individuals that tend to be underestimated more often. The two lines show the RMSE (orange, dashed) and 2x RMSE range (yellow). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Forensic Science International: Genetics 2017 31, 19-28DOI: (10.1016/j.fsigen.2017.07.015) Copyright © 2017 Terms and Conditions
Fig. 4 Increase of mean square error (MSE) in% if the marker would be random assigned values (no age-dependency). It can be seen that ELOVL2_6 contributes most information to the model, followed by TRIM59_5, F5_2, and KLF14_2. Forensic Science International: Genetics 2017 31, 19-28DOI: (10.1016/j.fsigen.2017.07.015) Copyright © 2017 Terms and Conditions