Modeling and Detecting Anomalous Topic Access Siddharth Gupta 1, Casey Hanson 2, Carl A Gunter 3, Mario Frank 4, David Liebovitz 4, Bradley Malin 6 1,2,3,4 Department of Computer Science, 3,5 Department of Medicine, 6 Department of Biomedical Informatics 1,2,3 University of Illinois at Urbana-Champaign, 4 University of California, Berkeley, 5 Northwestern University, 6 Vanderbilt University
Motivation and Challenges Our Contributions Dataset Description Random Topic Access (RTA) Model Random Topic Access Detection (RTAD) Model Evaluation and Results Outline of the talk
Reported on April 2013 The University of Florida : 2 offenders illegitimately accessed 15,000 patients over 3 years (March October 2012). Personal information, including names, addresses, date of birth, medical record numbers and Social Security numbers were compromised for the purposes of billing fraud. One of the offender was the insider in the hospital without prior. How can we efficiently model and detect these types of attacks in the healthcare system. EMR Access Breach
Two broad classes of threats: Inside Threats: the behaviors of hospital users (staff) that adversely affects the healthcare institution, where they commit financial frauds, medical identity thefts and curiosity accesses to EMR. Outside Threats: an outsider entity hires an insider to commit fraud, a visitor accessing records on open computers in some scenarios, untrustable patient seeking information about other patient’s records. Ramifications: Irreversible violation of patient privacy and subsequent high cost for hospitals. Deterrent: The current legal deterrent is a number of legal regulations, such as the HIPAA and HITECH, which impose specific privacy rules for patients and financial penalties for violating them Motivation
Build a classifier on labeled data to differentiate anomalous users from legitimate users. Real healthcare data is not labeled. Current methods use injection of synthetic anomalous users and evaluate on them. Classical Detection Methodologies
In Healthcare information systems the primary mechanism for generating anomalous users is to associate users with random patients in the dataset. We call such a system, ROA (random object access). The resulting user doesn’t appear to be a plausible attacker in the real hospital setting. Random Object Access
Random Topic Access (RTA): we introduce and study a random topic access model or RTA aimed at users whose access may be illegitimate but is not fully random because it is focused on common semantic themes. User Simulation: we utilize the latent topic framework to simulate illegitimate users and model them as samples from a Dirichlet distribution over topic multinomials. Anomaly Detection Framework: study RTA to detect and evaluate the users having suspicious access patterns. Our Contributions
Data Set Fig a) Summary Statistics for Audit Logs Fig b) Summary Statistics for Patient Records
Random Topic Access (RTA) Model: a mechanism for utilizing latent topic structures to represent real users in the population and allow for the synthetic generation of semantically relevant anomalous users. Topic modeling can provide a concise description of how a user behaves in the context of his peers and the meaning of that behavior. Model users as samples from a Dirichlet distribution over topic multinomials. Random Topic Access (RTA) Model
Latent Dirichlet Allocation (LDA) Diagnosis Raw Feature Patient LDA Diagnosis Topic Feature Patient
Topic Distributions
Topics Distributions Diagnosis Topics Neoplasm TopicObstetric Topic Kidney Topic
Characterizing Users
Multidimensional Scaling: Patient Diagnosis
RTA: Simulating Users a.) Directed or Masquerading User (α<1) : an anomalous user of some specialty gains sole access to the terminal of another user in the hospital. b.) Purely Random User (α=1): user is characterized by completely random behavior, with little semantic congruence to the hospital setting c.) Indirect User: user type resembles an even blend of the topics of many specialized users
Population Distribution α = 0.01 α = 0.1 α = 1 α = 100 A. Directed Users B. Purely Random Users C. Indirected Users
Role Distribution NMH Resident Fellow CPOE Masquerading Users Purely Random Users Indirect Users Anomalous Users Real Users
Random Topic Access Detection (RTAD)
Results - I
Results - II
Thank You ! Contact: Sponsors: