Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Content Intelligence Company

Similar presentations


Presentation on theme: "The Content Intelligence Company"— Presentation transcript:

1 The Content Intelligence Company
Eric Rossborough Bytes, Basics and Beyond March 2017

2 About us Haystac The Content Intelligence Company
Privately held & self-funded Launched in 2014 Headquarters in Newton, Massachusetts ~ 20 employees Working with Engineering and Operations, we identified and classified a large set of scanned documents to address regulatory compliance requirements around key topics. We developed a County-wide solution to classify and extract data points for large volume of scanned images as well as electronic stored information (including s). The Content Intelligence Company

3 Situation analysis – Some goofy math
For a sense of scale - Some goofy math (just for fun): In shared files today at a large US Bank: Estimates in PB = 10,000 TB = 10,000,000 GB = 150,000,000,000 pages of “Dark Data” 1 box of documents = ft high 150,000,000,000 pages = 75,000,000 boxes Bank Building in Atlanta = 310 ft = 372 boxes 10PB = 75,000,000 boxes = 201,612 bank towers = 31 miles Distance from surface of earth to stratosphere = 30 miles The Content Intelligence Company

4 Why Content Analytics Reduce information security risk
Reduce potential hack “strike zone” PII, PCI, HCI, etc. Confidential or restricted content Lower storage management costs Reliably identify Relevant vs Redundant, Obsolete, and Trivial (ROT) content Improve accuracy and speed of content searches Consistently apply best practices for Information Governance. Minimize end-user impact on content indexing Eliminates ROT data Accelerate document- based business processes Commercial/retail loan origination Forensic accounting Dynamically classify content according to business value and events Mergers and acquisitions Litigations and e-discovery Audits Report on content for advanced analytics The Content Intelligence Company

5 Cross-Industry Use Cases
Storage management and legacy information cleanup IT cost reduction Information governance Corporate and regulatory compliance Information Security Sensitive PII/PCI/PHI content identification and remediation Retention/disposition content Data Monetization Data Migration Litigation and E-Discovery acceleration Process improvement initiatives Mergers and acquisitions Document analytics The Content Intelligence Company

6 Cross-Industry Use Cases - Examples
Large US Bank – Cost reduction and Information Security 25 PB of content in file shares - $100 M/year expenditure and growing 6 versions of File Net, SharePoint – expensive to maintain, poor user value Large stream of digitized paper coming from business (retail banking in particular) Large Electrical Utility – Info security and governance 6 PB of content in OpenText Content Server + x PB in FileShares Under corporate mandate to universally develop and apply retention and disposition policies Integrated Oil and Gas – Acquisition (Data Migration) Mandate to migrate from ECM (OpenText, SharePoint) and file Shares to Corporate ECM (Documentum) Large Canadian Bank – Migration and Governance Over 3,000 applications running on Notes Corporate mandate to migrate content to Corporate ECM (Documentum) Over 5 PB of contents derived from acquisition Unknown value and risk of content Large volume of PST files The Content Intelligence Company

7 Our discussion today – Large US Bank
Historically, long term storage of XXXX’s information assets (data and e-documents) has supported an environment where structured and unstructured information is over-retained, and disposed of infrequently and inconsistently. User-created records can be stored anywhere Little or no retention or Lifecycle Governance (value vs risk) Lack of search findability Not always secure – can contain PHI, PII orother sensitive information Increased cost for e-discovery, storage, and backups Increased RISK ! The Content Intelligence Company

8 Current Unstructured Content Environment
Problem Statement Using Indāgō Content Analytics - Crawled and Indexed all NAS Drives – both personal and shared drives Presented our findings from a high level review of the primary NAS storage environments: Surfaced current storage size of ~1.9 PB and corresponding managed storage costs of ~$18 MM / yr. Initial estimates have surfaced that operationalizing the disposal of unnecessary data could reduce storage expenses by ~$10MM in year one (with organic growth / ROT reduction assumptions). In an effort to validate the size of the opportunity, there is a need to interrogate storage environments and quantify business benefits associated with disposing of “ROT” data (Redundant, Obsolete, Transient) as defined by corporate policies. Current Unstructured Content Environment The Content Intelligence Company

9 High Level Findings Environment ROT Summary – 6/08/2016
The Content Intelligence Company

10 Business Case Unstructured Cleansing * Financial Impact
1) Prohibited File Types 7% Review File List Haystac to ID Data Mitigate 2) Non-Accountable Data 8% Abandoned Home Shares Orphaned Home Shares N/A Data in Common Shares 3) Aging Data 25% Home Share Data 2+ Years (85 TB) Records past retention Common Share or Shared Drives 4) Duplicate Data 10% Home Shares Shared Drives Across Enterprise Financial Impact Using the same data that was originally presented: $10M/y x 50% = $ 5M/y Using 40% New Data Growth Rate (NDGR) $25 M saved over 5 years * Very Conservative percentages with Content Analytics The Content Intelligence Company

11 What is Haystac Indāgō Comprehensive and scalable Content Analytics
Machine learning and Visual Content Intelligence Searches, crawls, profiles and clusters unstructured data repositories File-shares, , Google Drive, Enterprise Content Management (ECM), SharePoint, Office365, etc. Identifies ROT and Sensitive data Automatically profiles and clusters relevant data Manages content and metadata in-place within ECM Connectors to FileNet, Documentum, OpenText ContentServer, SharePoint, Google Drive, etc. 600+ files types, including scanned and pdf documents Applies dynamically known or derived classification model Applies visual classification to scanned and pdf content Applies retention policies to content Automatically extracts data points from content Auto-indexes electronic documents Targeted OCR (visual anchor) for scanned and pdf documents The Content Intelligence Company

12 What is Move to Manage – Process
© 2012 Capgemini – All rights reserved What is Move to Manage – Process Identifies ROT, Non-records, Dups and Near-Dups to reduce volume of content to be moved Tags sensitive content 1 File Share File Share File Share File Share FileNet ROT Non-records Dups SharePoint 2 Records Leverage existing protocols, connectors and accelerators File Share File Share File Share File Share The Content Intelligence Company © 2012 Capgemini – All rights reserved

13 What is Manage in Place – Process
© 2012 Capgemini – All rights reserved What is Manage in Place – Process Crawl based inventory of content and meta-data Google Drive Aodocs Likely daily syndication Published reports of meta-data updates Haystac Indago integrates with key ECM systems and classifies content, providing decision support Disposition or management of content will happen at system of record System of Management responsible for CRUD action (Create, Replace, Update, Delete) The Content Intelligence Company © 2012 Capgemini – All rights reserved

14 Unleash the Power of Content Understand, Classify, Act
Director of Sales: Eric Rossborough – The Content Intelligence Company


Download ppt "The Content Intelligence Company"

Similar presentations


Ads by Google