Download presentation
Presentation is loading. Please wait.
1
Data Warehousing Data Mining Privacy
2
Data Warehousing Repository of data providing organized and cleaned enterprise-wide data (obtained form a variety of sources) in a standardized format Data mart (single subject area) Enterprise data warehouse (integrated data marts) Metadata Farkas CSCE 824
3
Data Mining DM: search for for correlations, sequences, and trends
Prediction Tasks Use some variables to predict unknown or future values of other variables Description Tasks Find human-interpretable patterns that describe the data Farkas CSCE 824
4
Knowledge Discovery in Databases: Process
Interpretation/ Evaluation Knowledge Data Mining Patterns Preprocessing Preprocessed Data Selection Target Data Mine for: Selection Aggregation Abstraction Visualization Transformation/Conversion Statistical Analysis “Cleaning” Data adapted from: U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An Overview,” Advanced in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press Farkas CSCE 824
5
Data Mining Technologies
Clustering: find groups of similar data items Classification: separate data items into predefined groups Association rule mining: find dependencies in data Sequential associations: identify event sequences that are likely Detect Deviations: find outliers Farkas CSCE 824
6
DM Issues: Integrity Poor quality data: inaccurate, incomplete, missing meta-data Loss of traditional consistency, e.g., keys Source data quality vs. derived data quality Trust in the result of analysis? Farkas CSCE 824
7
Big Data Security and Privacy
Large amount of data being considered Probabilistic inference Existing inference prevention: guaranteed truth Privacy-preserving analytics Farkas CSCE 824
8
Big Data Integrity Data-poisoning Data Accuracy Source provenance
End-point filtering and validation Data-poisoning Farkas CSCE 824
9
Inference Problem DM: discover “new knowledge” how to evaluate security risks? Example security risks: Prediction of sensitive information Misuse of information Assurance of “discovery” Farkas CSCE 824
10
Privacy and Sensitivity
Large volume of private (personal) data Need: Proper acquisition, maintenance, usage, and retention policy Integrity verification Control of analysis methods (aggregation may reveal sensitive data) Farkas CSCE 824
11
Privacy What is the difference between confidentiality and privacy?
Identity, location, activity, etc. Anonymity vs. accountability Farkas CSCE 824
12
Social Network Analysis
The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities Behavioral Profiling Note: Social Network Signatures User names may change, family and friends are more difficult to change Farkas CSCE 824
13
DM for Security Large-scale data analytics Fraud detection
Intrusion detection Insiders misuse detection Fraud detection User/group/web site profiling Farkas CSCE 824
14
Next Class Continue on cloud and DM Farkas CSCE 824
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.