Download presentation
Presentation is loading. Please wait.
Published bySimon Griffin Modified over 6 years ago
1
Deep Learning Cascading Failure Prediction in a High Performance Computing System
Eric Abreut1, Zhongbo Li2 1 Florida International University 2 The University of Tennessee, Knoxville Introduction Results Test file loaded into program using 2 different error cases (NDdoGS and Wa) and 40 occurrences: Purpose: To apply deep learning methods to predict the cascading failure in High Performance Computing. The Titan can tolerate many different physical and cyber failures during daily operations. System log is kept of these errors including component information as well as the kind of error. Deep learning methods can help parse the logs and keep the essential information which includes the type of error as well as the amount of times it occurred. Console output of all the cases: Concept Sample of System Log: :58:53 Node c5-2c2s4n1 DBE detected on GPU SerialNum :33:07 Warmswap adding c7-4c2s0 :39:23 Node c8-7c2s0n3 DBE detected on GPU SerialNum The results lined up with the information being parsed by the program using basic python functions. Remove timestamps, whitespaces, and ‘Node’, ‘Link’, and ‘Module’ IDs: Node DBE detected on GPU SerialNum Warmswap adding Future Steps Apply Pyspark library to replicate python code but add deep learning methods to facilitate the error calculation. Increase time complexity of program so that large file parsing becomes faster and much more efficient. Extract first letter of words and generate sentence in the specific line: NDdoGS Wa References Rajman, M., & Besançon, R. (1998). Text mining: Natural language techniques and text mining applications. In S. Spaccapietra, & F. Maryanski (Eds.), Data mining and reverse engineering: Searching for semantics. IFIP TC2 WG2.6 IFIP seventh conference on database semantics (DS-7) 7–10 october 1997, leysin, switzerland (pp ). Boston, MA: Springer US. Record amount of error occurrences: Error Type Occurrences NDdoGS 2 Wa 1 This work was supported primarily by the ERC Program of the National Science Foundation and DOE under NSF Award Number EEC Other US government and industrial sponsors of CURENT research are also gratefully acknowledged.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.