LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Lawrence Livermore National Security, LLC Application of Machine Learning Patterns and Behaviors in Complex Systems SSCI James M. Brase Deputy Associate Director, Computation Lawrence Livermore National Laboratory
LLNL-PRES Machine learning is applied to a broad set of applications at LLNL Document analysis – Is this document relevant to topic Y? Topics are defined as distributions of terms, phrases, phrase graphs …. Cybersecurity – How many network connections do we expect node A to make in the next minute? Materials science – Discovery of patterns in component material attributes and critical reaction parameters to produce custom- designed properties Adaptive mesh simulation- Will this simulation parameter set cause the mesh to tangle? Image and multimedia analysis – Can we label the objects in this image? Can we find other, similar videos?
Lawrence Livermore National Laboratory LLNL-PRES Machine learning – statistical inference of patterns in data Training data Feature vectors Labels Training set Supervised learning – Mapping feature vectors to labels Discrete labels – classifiers Continuous labels – regression Function mapping Logistic regression Random forests Neural networks Unsupervised learning – Finding structure in data Association rules Clustering Density estimation Autoencoders New data Feature vector Training…. Applying….
Lawrence Livermore National Laboratory LLNL-PRES Learning language models for estimating document relevance New documents Keyphrase extractor Weak filtering Entity extractor Collocation filter New document graph Training graph models Graph classifier Relevant graphs vs backround graphs Relevance score Forced migration reference documents
Lawrence Livermore National Laboratory LLNL-PRES Document relevance for the NYT corpus Relevance to forced migration reference document set
Lawrence Livermore National Laboratory LLNL-PRES Cybersecurity uses machine learning and graph analysis to model network behavior Applications Inferring node and group roles Prediction of activity distributions Cueing analysts to anomalous behaviors Functional network discovery and characterization Collect packets, flow and process data from the full physical network Build a dynamic graph representation of activity Machine learning on the dynamic graph Node and group classification algorithms Temporal activity models – dynamic Bayesian networks Anomaly detection algorithms Stream processing for feature and signature extraction
Lawrence Livermore National Laboratory LLNL-PRES Ryan Rossi, Brian Gallagher, Jennifer Neville, Keith Henderson. Modeling Dynamic Behavior in Large Evolving Graphs. ACM International Conference on Web Search and Data Mining (WSDM), Learning Markov models for behavior forecasting Host role learning Anomaly Detection in host role distribution Dynamic IP-IP graph Reduced prediction error using host roles Host roles are local characteristics of the IP-IP graph structure e.g. “center of star”, end node, …
Lawrence Livermore National Laboratory LLNL-PRES Some R&D directions in machine learning Training data Feature vectors Labels Training set Training…. Features have traditionally been hand engineered. Is there a principled approach to finding a good set of features? Deep learning We usually deal with N>>D. In emerging app’s we can have N<<D. (e.g. genomics,...). Can we regularize (constrain the solutions) with mechanistic models? N D
Lawrence Livermore National Laboratory LLNL-PRES Deep learning provides an unsupervised approach to learning feature sets from data
Lawrence Livermore National Laboratory LLNL-PRES Deep machine learning research is extending pattern recognition and discovery beyond human capabilities Learning patterns in 100M random images from Flickr Airplanes neuron “Fireworks” neuron Images w. text neuron Discovering complex patterns in massive multisource intelligence data sets guided by science-based models – not exact keywords Image recognition performance now surpasses human accuracy Partnership with Stanford and UC Berkeley on algorithms, NVIDIA on large GPU implementations, and IBM on neurosynaptic architectures 100B synapse deep learning networks
Lawrence Livermore National Laboratory LLNL-PRES Data movement is the limiting factor for analytics – supplementing the memory hierarchy Partnership with Intel and Cray to develop a 150 TF/s data analytics computer Technical focus on NVRAM layers in memory hierarchy supporting 24 core node – prototyping analytics in new environment Initial applications will focus on Prototyping exascale simulation analysis architectures Bioinformatics algorithms Graph analytics Over 5GB DRAM & 36GB NVRAM per core