Download presentation
Presentation is loading. Please wait.
Published byCharla Cathleen Burns Modified over 6 years ago
1
Unstructured Data Is Costing You More Than You Think The High Cost of Finding Information & How to Fix It Tuesday, January 17, :00 pm ET Sandra Serkes, President & CEO Valora Technologies, Inc. Eugenia Brumm, Ph.D., CRM, FAI HBR Consulting
2
Before we begin, let’s play a little game…
3
How much are you worth to your organization every minute?
Write down your annual salary (include any bonus, commission, etc.) Multiply that number by 1.3 Divide by 52 Divide by 40 Divide by 60 and hold onto this number At $50,000 annual salary per year: $65,000 $1,250 $31.25 $0.52 <- your worth per minute
4
How long to find the green belt?
Did you find it? How about now? Welcome to content classification! Link to Stopwatch app
5
What was the cost of your inability to find the belt?
At $0.52 per minute, and ~ 1 minute 30 seconds to find the belt, you cost your organization $1.56 just to perform this task! How many times a day are you actively searching for a file? How long does it take you to find it? Are you asking others to assist? Do you sometimes just give up and recreate the content?
6
Important Metrics An engineer’s time spent searching for information has increased 13% since 2002. Workers take up to 8 searches to find the right document and information. McKinsey report - employees spend 1.8 hours every day—9.3 hours per week, on average—searching and gathering information. 19.8% of business time – the equivalent of one day per working week – is wasted by employees searching for information to do their job effectively. The average information worker spends 2 hours per day searching for information, plus 1.5 in meeting, and another hour lost scheduling meetings. We waste another 2 hours reading and sending s. 59% of 1000 MIDDLE MANAGERS SAY THEY MISS IMPORTANT INFORMATION almost every day because it exists within the company but they cannot find it. By adding the numbers, we end up with 6.5 hours per day lost with planning and talking and not actually doing our job.
7
High Cost of Not Finding Information
Knowledge workers spend from 15% to 35% of their time searching for information. (IDC 2004, 2009, Delphi Group 2011, Working Council of CIO’s, Ford, AIIM) Knowledge workers spend more time recreating existing information than they do creating new information that does not already exist. Some studies indicate 90% of time knowledge workers are recreating information that already exists. (IDC, “High Cost of Not finding Information”, 2004) Only 21% of respondents said they found the information they needed 85% to 100% of the time. (IDC 2001) At $0.52 per minute, 25% of your time = $62.40 every single day! That’s $16,473 per person per year!
8
Welcome to the notion of Findability & the cost of “Dark Data”
To get a complete picture of the business of an organization, you must combine structured database information with unstructured content. Information is scattered in multiple repositories and databases all over most organizations. No one knows what exists or where it is, and there is no single unified access to it. There are collections of information everywhere and not knowing what the organization knows has become a major barrier to conducting business, keeping customers, avoiding risks, and growing the company.
9
A Real-World Example Large pharmaceutical organization of over 2,300 knowledge workers – 218 R&D scientists Average salary = $130,000 including benefits Time spent looking for and not finding information costs $6- $11 million/yr. Does not include opportunity costs or costs of reworking information that exists but can’t be located. Cost of reworking information because it can’t be found – additional $4,250,000/yr. 15% of time spent in duplicating existing information Not locating and retrieving information has an opportunity cost of $10 - $15 million annually
10
No Metrics Available Cannot measure the increase in creativity and original thinking that might be unleashed if knowledge workers had more time to think and were not frustrated looking for and not finding information There is no metric we can use to compare the value of a good decision to a bad one. How do we know that a project has taken twice as long as it should have for lack of good access to information?
11
Outdated Productivity Measures
Although we live in an information age, we still draw measures of productivity from studies of industrial workers making tangible products at the turn of the last century Current productivity measures only capture the process and structure that is a part of an organization’s work – physical counts off production lines, and the time spent to produce those items. We do not capture performance-improvement factors such as making higher quality decisions, better products, faster response times and accelerated business innovation.
12
We Now Live in a Sea of Information
Client 1: 30 TB of stored The typical large corporation has: servers, shared file servers, archive servers, loose desktop, mobile & social media files Client 2: 125,000 “obsolete” files Client 3: 400,000 contracts
13
Litigation & eDiscovery
Who’s Job is it to Solve These Data, Document & Content (DDC) Convergence Problems? RECORDS MANAGEMENT BUSINESS UNITS INFORMATION GOVERNANCE Corporate Legal Litigation & eDiscovery
14
Corporate Legal is Taking Control
Risk and exposure have reached epic proportions Their eDiscovery training sets them up well to take control of a larger document picture Records and Information Management (RIM) now report up through Legal & encompass paper and electronic records RIM being supplanted by Information Governance IT only concerned with functionality – not exposure To save costs and gain power, corporations have brought legal counsel inside their organizations
15
Emerging Best Practices
Centralized DDC management DMS/ECM platforms Cloud-based storage (and applications) Information Governance as umbrella organization over Compliance, Litigation, Security, Records Policy-based management Automation of Information Gathering & Analysis AutoClassification De-duplication Crawls & Processing in place Tools & ownership of process PowerHouse
16
Understanding AutoClassification
Custom-configured software with recognition algorithms for Document Type Content analytics Indexing & Tagging Recommended locations & naming “Middleware” that sits between current file locations (includes Archive) and intended file locations (includes deletion), providing an intelligent filter and metadata creation/enhancement for all content. AutoClassification does not rely on people to create their own metadata. Instead, a set of linguistic and pattern-matching rules create the classification content (metadata) and schema (storage hierarchy). AutoClassification is a much better option for Large volumes of data Consistent labelling and file storage People or circumstances where manual metadata creation is tedious, wasteful, time-consuming or impossible Sensitive information IG & Records scenarios with high scrutiny (litigation, investigatory, regulatory, etc.) Cost reduction Valora’s AutoClassification engine is called
17
Metadata is the magic link…
… that brings disparate or “blind” content together KM, DM, and RM all variations on the same theme Storage of document content for different purposes No reason for 3 separate content stores Instead, have one content store for multi-purposes using pointers The utensil analogy
18
How does AutoClassification Work?
Processing (aka Intake) is the process of “ingesting” data into an analytics engine Creating OCR for scanned images Extracting text for native files & Speech to text for audio/video files Translating content to English Re-ordering or re-aligning pages Applying redactions Tagging (aka Coding, Indexing, Sequencing) is the process of extracting key information and attributes about each document Document Type, Important Dates Key Names & Phrases Topics, Keywords & Themes File, Content and DocType attributes Relation to other documents (duplicate, related, attached, contradictory, etc.) Disposition (rules) is the process of creating a destination or status for each document Retention status & duration Folder (taxonomy) location Labelling & keywords display native text text fielded data fielded data disposition
19
AutoClassifying an email with data mining (analytics)
Author Doc Type & Implied attachment range Matter indicator & validation Author Validation & Contact Info Implied matter: Passaro ( )
20
AutoClassifying an attachment (patent application)
Date Format = US DocType = Patent Application Date = 10/18/2007 Author = Patent Authors, Author City, Author Country Assignee = RIM Tone = Neutral to slightly positive Embedded Graphic with Title Other Data Capturable Data Elements: Patent Number Filing Date Key Phrases & Terms Managing PTO Implied/Attached Docs Bar Code Present And many more . . .
21
10 Use Cases for AutoClassification with PowerHouse
Migration (fix metadata and general clean-up/removal as part of effort to move data from one location to another) Remediation (fixing bad/no metadata, overall file organization) Upgrade (same idea as above, but due to some system or platform upgrade) Compliance/Monitoring (needing a filter on communications/docs to monitor for topics, phrases, PII, etc.) Security (identify, protect, and suppress sensitive data) Orphaned Data ( /files leftover from departed staff, acquired or divested departments, old records) Conversion (converting paper file to electronic/imaging, and adding additional smarts to that activity - unitization, coding, etc.) Litigation/Investigation (analyzing files for responsive/relevant content, attempting to find patterns of behavior) Repository/Knowledge Management (store documents by topic, keywords and content) Workflow (bring documents & files into a corporate workflow via portal, bounce, server crawl, etc.)
22
Intake – PowerHouse – Output
PH Web Portal Hosted Repository OCR/Text Extraction Translation/Transcription Unitization Coding/Tagging Rules/Disposition Redaction Exceptions Shared Server Poll Folder Taxonomy Transatlantic Corp. 2016 IP Matters CX4 redesign Caro Prototype COI Database Billing System Docket Tracking
23
PowerHouseTM & BlackCatTM Architecture
BlackCat Presentation Layer PowerHouse Platform Layer PH AutoProcessors (Pattern-Matching Algorithms & Rules) PH Quality Control User Interface (QCUI) PH Admin Console (Admin) SQL Server Database Layer
24
PowerHouse Architecture With BlackCat
Administration Console PowerHouse Configuration Editor PowerHouse Quality Control BlackCat End-users User Interface SharePoint/O365 SharePoint iManage OpenText Intake Transfer Agent Export Transfer Agent Relativity Exchange BlackCat Data Transfer Filesystem PowerHouse Processing Components PowerHouse Controller BlackCat Controller BlackCat Web Server Processing BlackCat Document DataBase BlackCat Filesystem Cache PH Tracking DataBase PH Temporary Filesystem Storage
25
BlackCat Screen Shots
26
Things to Take Away It’s not all about searching. It’s really about content organization. With file AutoClassification you will
27
About Valora Technologies
Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars Forbes, March 2012 (this is Valora’s story, too) “ ”
28
About HBR Consulting HBR Consulting LLC provides independent and objective records and information management consulting services tailored to address the increasingly complex and demanding regulatory and technological challenges of today’s information management environment. HBR designs, develops and helps implement all aspects of effective Information Governance Programs, such as: Records Retention Schedules – domestic, international and global Remediation of legacy electronic and hard copy records Information Governance Program Development – policies, procedures Digital Workspace – management, ECM systems, O365 Privacy, Protection – Assessment, policies, procedures, implementation HBR has an outstanding reputation for developing innovative and creative solutions that are practical and sustainable. HBR partners with its clients to help them achieve their compliance and governance objectives through its strategic vision, in-depth knowledge of records and information management, and a total commitment to excellence.
29
Valora Technologies, Inc.
Thank You! For More Information: Valora Technologies, Inc. 101 Great Road, Suite 220 Bedford, MA 01730 HBR Consulting One Financial Place 440 South LaSalle Street Chicago, IL 60605
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.