CSG – Data Management & Governance at Berkeley Shel Waggener – CIO/AVC IT UC Berkeley
Common Data Management Challenges Data is replicated over and over again, modified and fragmented by the many independent systems and the organizations who manage them. Security weaknesses exist as a result of both the spread of the data and the vast number of varying architectures used to collect, distribute, and manage the data. The administrative and academic units that manage duplicated data (or create duplication for the purpose of local management) all assert some level of ownership to each data element, often overlapping. Considerable and overlapping investments continue to be made across campus silos with technologies and staff time for both functional and technical groups to maintain this “spaghetti architecture” of legacy, current, and emerging technologies. This represents a divergent strategy at a time where, more than ever, the campus needs greater visibility, accessibility to, and sharing of data within and across silos to make better decisions and work together as a campus. The opportunity here is not only for administrative data, but also for scholarly information not only for structured, but also for unstructured data.
Berkeley Data Services Four groups in support of Institutional and Scholarly Data: Data Repository Management: Supports data modeling, data warehousing, data repositories, data integration, collections and archives/media vault. Collaboration, Presentation, Analysis: Supports tools for data capture and collection, user interface tools for reporting, decision support, visualization and collaboration. Data Architecture: Facilitates the definition of architectural standards for campus Information Services and coordinates architectural planning with the CIO Office’s Architecture group. Social Sciences Computing Laboratory: Operates instructional facilities and consultative services for academic use. Operates environments to provide on-line access to large-scale collections of quantitative, structured, or image-type data. Provides specialized research services, including custom application for data collection and data management for the collection and processing of survey-type data.
Source: NSF Atkins Report on Cyberinfrastructure NSF Cyberinfrastructure View Scholarly Data
Source: NSF Atkins Report on Cyberinfrastructure NSF Cyberinfrastructure View Scholarly Data
Case Study - Decision making UC Berkeley Enterprise Data Warehouse Opportunity analysis and conceptual EDW Architecture study performed over the last 30 months. From the inception, it was universally recognized that the core decision making structures and processes necessary to support prioritization and commitment beyond existing data was absent at a campus level. We have not made progress in developing an implementation plan for the EDW beyond tactical enhancements. This has contributed to parallel implementation efforts and investments with separate data warehouses and reporting systems (and tools) for use across the same university community. The issues involved are complex, however, inadequate decision making and the pace of that decision making equates to missed opportunities and greater risk. Enterprise Data Warehousing and reporting are disconnected from other major campus initiatives.
Case Study – Policy and Compliance UC Berkeley Requests for data across organizational boundaries are increasing dramatically. However, we have not adequately invested in the tools, infrastructure, processes or people needed to make the data safely available The data itself may be sensitive, with “ownership” or stewardship issues associated with it. While we have no perceived shortage of policies and regulations …we do lack a compliance program to measure how effectively the policies are Compliance is “best effort” approach rather than a systematic program
UCB Data Management Governance Status New “Campus Technology Council” assembled and general IT Governance processes in development Campus “Data Stewardship Council” exists as an advisory body 2007 target to establish a formal Data Management Governance strategy, moving beyond advisory and into an operational mode with decision making abilities Proposals under consideration, including…
UCB - DM Governance Structure Proposal
A Use Case – Restricted Data Data Management for Restricted, Sensitive or Personally Identifiable Information IS different. Stolen laptop with no encryption constant problem Security Breach of Sensitive Data is big $$$$$ loss Most of the centrally managed data is the greatest target but best protected. Most of the distributed data is smaller target but much easier to get to Berkeley is trying a carrot with stick approach
What We Provide Simply tell us where the PII is and we… Add rules to all the scanning tools to monitor your server Setup 7x24 response procedures for notification any identified issues Application scan for vulnerabilities against all registered apps Provide augmented security training to administrators of those departments Provide security tools and licenses at no costs
What We Do if You DON’T tell us Possible approaches… Once identified, give you 72 hours to get the system registered before we block traffic Once blocked, require training and penetration testing before allowed back on the network.
Validation Thank you for using the Restricted Data Management (RDM) system at ! We appreciate your efforts to safeguard restricted university information assets by letting us know about your data systems and the machines that host them. You are receiving this because you have logged in to RDM, or have been set up as a user of the system by another member of your department. We want to follow-up with all users of RDM and find out if you have experienced any difficulty or need further information/assistance. Please reply to this with any questions or comments about RDM, or if you would like to arrange an in-person meeting with us to demonstrate RDM. To ensure that your systems are being protected by SNS, please make sure that you complete the following steps: 1) Add you systems to RDM and check appropriate data elements (if you aren't sure about any elements, just skip them for now as you can always update this information later). 2) Let us know where the data is stored -- a local machine in your department or an IST supported service. 3) For local machines, please BE SURE TO INCLUDE AN IP ADDRESS. This is critical to the SNS monitoring systems and will result in a higher level of SNS services for your host machine. 4) Register a security plan for your system Thanks again and please contact us with any additional comments or concerns.
Additional materials Refer to UCB CIO’s Data Management Governance Proposal 2007 UCB EDW Process Architecture Presentation 2006 UCB EDW Process Architecture Report 2006