Data Warehousing Data Mining Privacy
Data Warehousing Repository of data providing organized and cleaned enterprise-wide data (obtained form a variety of sources) in a standardized format Data mart (single subject area) Enterprise data warehouse (integrated data marts) Metadata Farkas CSCE 824
Data Mining DM: search for for correlations, sequences, and trends Prediction Tasks Use some variables to predict unknown or future values of other variables Description Tasks Find human-interpretable patterns that describe the data Farkas CSCE 824
Knowledge Discovery in Databases: Process Interpretation/ Evaluation Knowledge Data Mining Patterns Preprocessing Preprocessed Data Selection Target Data Mine for: Selection Aggregation Abstraction Visualization Transformation/Conversion Statistical Analysis “Cleaning” Data adapted from: U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An Overview,” Advanced in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press Farkas CSCE 824
Data Mining Technologies Clustering: find groups of similar data items Classification: separate data items into predefined groups Association rule mining: find dependencies in data Sequential associations: identify event sequences that are likely Detect Deviations: find outliers Farkas CSCE 824
DM Issues: Integrity Poor quality data: inaccurate, incomplete, missing meta-data Loss of traditional consistency, e.g., keys Source data quality vs. derived data quality Trust in the result of analysis? Farkas CSCE 824
Big Data Security and Privacy Large amount of data being considered Probabilistic inference Existing inference prevention: guaranteed truth Privacy-preserving analytics Farkas CSCE 824
Big Data Integrity Data-poisoning Data Accuracy Source provenance End-point filtering and validation Data-poisoning Farkas CSCE 824
Inference Problem DM: discover “new knowledge” how to evaluate security risks? Example security risks: Prediction of sensitive information Misuse of information Assurance of “discovery” Farkas CSCE 824
Privacy and Sensitivity Large volume of private (personal) data Need: Proper acquisition, maintenance, usage, and retention policy Integrity verification Control of analysis methods (aggregation may reveal sensitive data) Farkas CSCE 824
Privacy What is the difference between confidentiality and privacy? Identity, location, activity, etc. Anonymity vs. accountability Farkas CSCE 824
Social Network Analysis The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities Behavioral Profiling Note: Social Network Signatures User names may change, family and friends are more difficult to change Farkas CSCE 824
DM for Security Large-scale data analytics Fraud detection Intrusion detection Insiders misuse detection Fraud detection User/group/web site profiling Farkas CSCE 824
Next Class Continue on cloud and DM Farkas CSCE 824