Data Warehousing Data Mining Privacy
Reading FarkasCSCE Spring 20112
Data Warehousing Repository of data providing organized and cleaned enterprise- wide data (obtained form a variety of sources) in a standardized format Repository of data providing organized and cleaned enterprise- wide data (obtained form a variety of sources) in a standardized format –Data mart (single subject area) –Enterprise data warehouse (integrated data marts) –Metadata FarkasCSCE Spring 20113
OLAP Analysis Aggregation functions Aggregation functions Factual data access Factual data access Complex criteria Complex criteria Visualization Visualization FarkasCSCE Spring 20114
Warehouse Evaluation Enterprise-wide support Enterprise-wide support Consistency and integration across diverse domain Consistency and integration across diverse domain Security support Security support Support for operational users Support for operational users Flexible access for decision makers Flexible access for decision makers FarkasCSCE Spring 20115
Data Integration Data access Data access Data federation Data federation Change capture Change capture Need ETL (extraction, transformation, load) Need ETL (extraction, transformation, load) FarkasCSCE Spring 20116
Data Warehouse Users Internal users Internal users –Employees –Managerial External users External users –Reporting and auditing –Research FarkasCSCE Spring 20117
Data Mining Databases to be mined Knowledge to be mined Techniques Used Applications supported FarkasCSCE Spring 20118
Data Mining Task Prediction Tasks Prediction Tasks –Use some variables to predict unknown or future values of other variables Description Tasks Description Tasks –Find human-interpretable patterns that describe the data FarkasCSCE Spring 20119
Common Tasks Classification [Predictive] Classification [Predictive] Clustering [Descriptive] Clustering [Descriptive] Association Rule Mining [Descriptive] Association Rule Mining [Descriptive] Sequential Pattern Mining [Descriptive] Sequential Pattern Mining [Descriptive] Regression [Predictive] Regression [Predictive] Deviation Detection [Predictive] Deviation Detection [Predictive] FarkasCSCE Spring
Security for Data Warehousing Establish organizations security policies and procedures Establish organizations security policies and procedures Implement logical access control Implement logical access control Restrict physical access Restrict physical access Establish internal control and auditing Establish internal control and auditing FarkasCSCE Spring
Security for Data Warehousing (cont.) Security Issues in Data Warehousing and Data Mining: Panel Discussion Security Issues in Data Warehousing and Data Mining: Panel Discussion Panel discussion of Bhavani Thuraisingham, The MITRE Corporation, Linda Schlipper, The MITRE Corporation, Pierangela Samarati, SRI International, T. Y. Lin, San Jose State University, Sushil Jajodia, George Mason University, Chris Clifton, The MITRE Corporation, xanadu.cs.sjsu.edu/~tylin/publications/pape rList/109_ security.ps Panel discussion of Bhavani Thuraisingham, The MITRE Corporation, Linda Schlipper, The MITRE Corporation, Pierangela Samarati, SRI International, T. Y. Lin, San Jose State University, Sushil Jajodia, George Mason University, Chris Clifton, The MITRE Corporation, xanadu.cs.sjsu.edu/~tylin/publications/pape rList/109_ security.ps FarkasCSCE Spring
Integrity Poor quality data: inaccurate, incomplete, missing meta-data Poor quality data: inaccurate, incomplete, missing meta-data Source data quality vs. derived data quality Source data quality vs. derived data quality FarkasCSCE Spring
Access Control Layered defense: Layered defense: –Access to processes that extract operational data –Access to data and process that transforms operational data –Access to data and meta-data in the warehouse FarkasCSCE Spring
Access Control Issues Mapping from local to warehouse policies Mapping from local to warehouse policies How to handle “new” data How to handle “new” data Scalability Scalability Identity Management Identity Management FarkasCSCE Spring
Inference Problem Data Mining: discover “new knowledge” how to evaluate security risks? Data Mining: discover “new knowledge” how to evaluate security risks? Example security risks: Example security risks: –Prediction of sensitive information –Misuse of information Assurance of “discovery” Assurance of “discovery” Interesting Read: C. C. Aggarwal and P.S. Yu, PRIVACY-PRESERVING DATA MINING: MODELS AND ALGORITHMS, Interesting Read: C. C. Aggarwal and P.S. Yu, PRIVACY-PRESERVING DATA MINING: MODELS AND ALGORITHMS, FarkasCSCE Spring
Privacy Large volume of private (personal) data Large volume of private (personal) data Need: Need: –Proper acquisition, maintenance, usage, and retention policy –Integrity verification –Control of analysis methods (aggregation may reveal sensitive data) FarkasCSCE Spring
Privacy What is the difference between confidentiality and privacy? What is the difference between confidentiality and privacy? Identity, location, activity, etc. Identity, location, activity, etc. Anonymity vs. accountability Anonymity vs. accountability FarkasCSCE Spring
FarkasCSCE Spring Legislations Privacy Act of 1974, U.S. Department of Justice ( ) Privacy Act of 1974, U.S. Department of Justice ( ) Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, ( dex.html ) Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, ( dex.html ) dex.htmlhttp:// dex.html Health Insurance Portability and Accountability Act of 1996 (HIPAA), ( tability_and_Accountability_Act ) Health Insurance Portability and Accountability Act of 1996 (HIPAA), ( tability_and_Accountability_Act ) tability_and_Accountability_Acthttp://en.wikipedia.org/wiki/Health_Insurance_Por tability_and_Accountability_Act Telecommunications Consumer Privacy Act ( communications-privacy-act ) Telecommunications Consumer Privacy Act ( communications-privacy-act ) communications-privacy-acthttp:// communications-privacy-act
Online Social Network Social Relationship Social Relationship Communication context changes social relationships Communication context changes social relationships Social relationships maintained through different media grow at different rates and to different depths Social relationships maintained through different media grow at different rates and to different depths No clear consensus which media is the best No clear consensus which media is the best FarkasCSCE Spring
Internet and Social Relationships Internet Bridges distance at a low cost Bridges distance at a low cost New participants tend to “like” each other more New participants tend to “like” each other more Less stressful than face-to-face meeting Less stressful than face-to-face meeting People focus on communicating their “selves” (except a few malicious users) People focus on communicating their “selves” (except a few malicious users) FarkasCSCE Spring
Social Network Description of the social structure between actors Description of the social structure between actors Connections: various levels of social familiarities, e.g., from casual acquaintance to close familiar bonds Connections: various levels of social familiarities, e.g., from casual acquaintance to close familiar bonds Support online interaction and content sharing Support online interaction and content sharing FarkasCSCE Spring
Social Network Analysis The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities Behavioral Profiling Behavioral Profiling Note: Social Network Signatures Note: Social Network Signatures –User names may change, family and friends are more difficult to change FarkasCSCE Spring
Interesting Read: M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social Networks, oc/summary?doi= M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social Networks, oc/summary?doi= oc/summary?doi= oc/summary?doi= FarkasCSCE Spring
Next Hippocratic Databases FarkasCSCE Spring
FarkasCSCE Spring Next Class Stream Data