Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #17 Data Warehousing, Data.

Slides:



Advertisements
Similar presentations
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #26 Emerging Technologies.
Advertisements

Data Mining for Security Applications Dr. Bhavani Thuraisingham The University of Texas at Dallas February 2005.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data warehouse example
DATA WAREHOUSING.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Secure Knowledge Management: and.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #17 Data Mining, Security.
Secure Data Architectures
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Introduction to the Course January.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #1 Introduction to Data.
Chapter 1 Introduction to Data Mining
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #18 Data Mining for Security.
Introduction to Data, Information and Knowledge Management Dr. Bhavani Thuraisingham The University of Texas at Dallas Data, Information and Knowledge.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Multilevel Secure Database.
Data Mining for Security Applications Dr. Bhavani Thuraisingham The University of Texas at Dallas January 2006.
Data and Applications Security Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #1 Introduction to Data and Applications Security August.
Economic Development for the DFW Metroplex Related to Security: An Academic Perspective Dr. Bhavani Thuraisingham The University of Texas at Dallas December.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Multilevel Secure Data Management.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #22 Secure Web Information.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture ##9 Data Mining, Security.
Data Mining for Security Applications Dr. Bhavani Thuraisingham Dr. Doug Harris The University of Texas at Dallas March.
Introduction to Biometrics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #2 Information Security August 24, 2005.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Introduction to the Course January.
Data Mining for Security Applications Dr. Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #1 Introduction to Data.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Introduction to Biometrics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #1 Biometrics and Other Emerging Technologies in Applications.
Data Mining, Security and Privacy Prof. Bhavani Thuraisingham Prof. Murat Kantarcioglu Ms Li Liu (PhD Student – completing December 2007) The University.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Security for Distributed Data Management.
Trustworthy Semantic Web Dr. Bhavani Thuraisingham The University of Texas at Dallas Inference Problem March 4, 2011.
Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Cyber Security Research on Engineering Solutions Dr. Bhavani.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #25 Dependable Data Management.
Data Mining, Security and Privacy Dr. Bhavani Thuraisingham The University of Texas at Dallas March 2008.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #11 Secure Heterogeneous.
Data Mining, Security and Privacy Prof. Bhavani Thuraisingham Prof. Murat Kantarcioglu Ms Li Liu (PhD Student – completing December 2007) The University.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data and Applications Security
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Building Trustworthy Semantic Webs
Data and Applications Security Developments and Directions
Data and Applications Security
Data and Applications Security Developments and Directions
Data and Applications Security Introduction to Data Mining
Introduction to Data, Information and Knowledge Management
Analyzing and Securing Social Networks
Data and Applications Security Developments and Directions
Building Trustworthy Semantic Webs
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Trustworthy Semantic Web
Data and Applications Security Developments and Directions
Data and Applications Security
Data and Applications Security Developments and Directions
Data and Applications Security
Data and Applications Security Developments and Directions
Data and Applications Security
Presentation transcript:

Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #17 Data Warehousing, Data Mining and Security March 23, 2009

Outline l Background on Data Warehousing l Security Issues for Data Warehousing l Data Mining and Security

What is a Data Warehouse? l A Data Warehouse is a: - Subject-oriented - Integrated - Nonvolatile - Time variant - Collection of data in support of management’s decisions - From: Building the Data Warehouse by W. H. Inmon, John Wiley and Sons l Integration of heterogeneous data sources into a repository l Summary reports, aggregate functions, etc.

Example Data Warehouse Oracle DBMS for Employees Sybase DBMS for Projects Informix DBMS for Medical Data Warehouse: Data correlating Employees With Medical Benefits and Projects Could be any DBMS; Usually based on the relational data model Users Query the Warehouse

Some Data Warehousing Technologies l Heterogeneous Database Integration l Statistical Databases l Data Modeling l Metadata l Access Methods and Indexing l Language Interface l Database Administration l Parallel Database Management

Data Warehouse Design l Appropriate Data Model is key to designing the Warehouse l Higher Level Model in stages - Stage 1: Corporate data model - Stage 2: Enterprise data model - Stage 3: Warehouse data model l Middle-level data model - A model for possibly for each subject area in the higher level model l Physical data model - Include features such as keys in the middle-level model l Need to determine appropriate levels of granularity of data in order to build a good data warehouse

Distributing the Data Warehouse l Issues similar to distributed database systems Distributed Warehouse Central Bank Branch ABranch B Central Warehouse Central Bank Branch A Branch B Central Warehouse Branch B Warehouse Branch A Warehouse Non-distributed Warehouse

Multidimensional Data Model

Indexing for Data Warehousing l Bit-Maps l Multi-level indexing l Storing parts or all of the index files in main memory l Dynamic indexing

Metadata Mappings

Data Warehousing and Security l Security for integrating the heterogeneous data sources into the repository - e.g., Heterogeneity Database System Security, Statistical Database Security l Security for maintaining the warehouse - Query, Updates, Auditing, Administration, Metadata l Multilevel Security - Multilevel Data Models, Trusted Components

Example Secure Data Warehouse

Secure Data Warehouse Technologies

Security for Integrating Heterogeneous Data Sources l Integrating multiple security policies into a single policy for the warehouse - Apply techniques for federated database security? - Need to transform the access control rules l Security impact on schema integration and metadata - Maintaining transformations and mappings l Statistical database security - Inference and aggregation - e.g., Average salary in the warehouse could be unclassified while the individual salaries in the databases could be classified l Administration and auditing

Security Policy for the Warehouse Federated policies become warehouse policies? Component Policy for Component A Component Policy for Component B Component Policy for Component C Generic Policy for Component A Generic Policy for Component B Generic policy for Component C Export Policy for Component A Export Policy for Component B Export Policy for Component C Federated Policy for Federation F1 Federated Policy for Federation F2 Export Policy for Component B Security Policy Integration and Transformation

Security Policy for the Warehouse - II

Secure Data Warehouse Model

Methodology for Developing a Secure Data Warehouse

Multi-Tier Architecture Tier 1:Secure Data Sources Tier 2: Builds on Tier 1 Tier N: Data Warehouse Builds on Tier N-1 * * Tier 1:Secure Data Sources Tier 2: Builds on Tier 1 Tier N: Secure Data Warehouse Builds on Tier N-1 * * Each layer builds on the Previous Layer Schemas/Metadata/Policies

Administration l Roles of Database Administrators, Warehouse Administrators, Database System Security officers, and Warehouse System Security Officers? l When databases are updated, can trigger mechanism be used to automatically update the warehouse? - i.e., Will the individual database administrators permit such mechanism?

Auditing l Should the Warehouse be audited? - Advantages l Keep up-to-date information on access to the warehouse - Disadvantages l May need to keep unnecessary data in the warehouse l May need a lower level granularity of data l May cause changes to the timing of data entry to the warehouse as well as backup and recovery restrictions l Need to determine the relationships between auditing the warehouse and auditing the databases

Multilevel Security l Multilevel data models - Extensions to the data warehouse model to support classification levels l Trusted Components - How much of the warehouse should be trusted? - Should the transformations be trusted? l Covert channels, inference problem

Inference Controller

Status and Directions l Commercial data warehouse vendors are incorporating role- based security (e.g., Oracle) l Many topics need further investigation - Building a secure data warehouse - Policy integration - Secure data model - Inference control

Data Mining for Counter-terrorism

Data Mining Needs for Counterterrorism: Non-real-time Data Mining l Gather data from multiple sources - Information on terrorist attacks: who, what, where, when, how - Personal and business data: place of birth, ethnic origin, religion, education, work history, finances, criminal record, relatives, friends and associates, travel history,... - Unstructured data: newspaper articles, video clips, speeches, s, phone records,... l Integrate the data, build warehouses and federations l Develop profiles of terrorists, activities/threats l Mine the data to extract patterns of potential terrorists and predict future activities and targets l Find the “needle in the haystack” - suspicious needles? l Data integrity is important l Techniques have to SCALE

Data Mining for Non Real-time Threats Integrate data sources Clean/ modify data sources Build Profiles of Terrorists and Activities Examine results/ Prune results Report final results Data sources with information about terrorists and terrorist activities Mine the data

Data Mining Needs for Counterterrorism: Real-time Data Mining l Nature of data - Data arriving from sensors and other devices l Continuous data streams - Breaking news, video releases, satellite images - Some critical data may also reside in caches l Rapidly sift through the data and discard unwanted data for later use and analysis (non-real-time data mining) l Data mining techniques need to meet timing constraints l Quality of service (QoS) tradeoffs among timeliness, precision and accuracy l Presentation of results, visualization, real-time alerts and triggers

Data Mining for Real-time Threats Integrate data sources in real-time Build real-time models Examine Results in Real-time Report final results Data sources with information about terrorists and terrorist activities Mine the data Rapidly sift through data and discard irrelevant data

Data Mining Outcomes and Techniques for Counter-terrorism

Example Success Story - COPLINK l COPLINK developed at University of Arizona - Research transferred to an operational system currently in use by Law Enforcement Agencies l What does COPLINK do? - Provides integrated system for law enforcement; integrating law enforcement databases - If a crime occurs in one state, this information is linked to similar cases in other states - It has been stated that the sniper shooting case may have been solved earlier if COPLINK had been operational at that time

Where are we now? l We have some tools for - building data warehouses from structured data - integrating structured heterogeneous databases - mining structured data - forming some links and associations - information retrieval tools - image processing and analysis - pattern recognition - video information processing - visualizing data - managing metadata

What are our challenges? l Do the tools scale for large heterogeneous databases and petabyte sized databases? l Building models in real-time; need training data l Extracting metadata from unstructured data l Mining unstructured data l Extracting useful patterns from knowledge-directed data mining l Rapidly forming links and associations; get the big picture for real- time data mining l Detecting/preventing cyber attacks l Mining the web l Evaluating data mining algorithms l Conducting risks analysis / economic impact l Building testbeds

IN SUMMARY: l Data Mining is very useful to solve Security Problems - Data mining tools could be used to examine audit data and flag abnormal behavior - Much recent work in Intrusion detection (unit #18) l e.g., Neural networks to detect abnormal patterns - Tools are being examined to determine abnormal patterns for national security l Classification techniques, Link analysis - Fraud detection l Credit cards, calling cards, identity theft etc. BUT CONCERNS FOR PRIVACY