Download presentation
Presentation is loading. Please wait.
1
AutoClassification 101 What is AutoClassification and Why Should I Care?
Housekeeping: (Elise?) All on mute. Use chat box if you need to reach to moderators Will hold Q&A at the end, but if you have a question at any time, go ahead and type into the question box as it comes up and we’ll get to as many as we can We are recording this webinar. At the conclusion, we will send the link to the recording, a copy of the slides, and a feedback survey link to all registrants. Thank you for attending! (Hand off to Julie) Wednesday, November 15, 2017 1:00pm Eastern
2
Facilitators Sandra E. Serkes President & CEO
(Julie) Introductions Julie and Sandy introduce themselves with enough fodder to establish credibility as SMEs Sandra E. Serkes President & CEO Valora Technologies, Inc. Julie J. Colgan, CRM, IGP Sr. Dir. Strategy & Innovation DTI Global
3
We Live in a Sea of Information
Client 1: 30 TB of stored The typical large corporation has: servers, shared file servers, archive servers, loose desktop, mobile & social media files Client 2: 125,000 “obsolete” files Client 3: 400,000 contracts (Julie) Define the problem and how we got here
4
High Cost of Not Finding Information
Knowledge workers spend from 15% to 35% of their time searching for information. (IDC 2004, 2009, Delphi Group 2011, Working Council of CIO’s, Ford, AIIM) Knowledge workers spend more time recreating existing information than they do creating new information that does not already exist. Some studies indicate 90% of time knowledge workers are recreating information that already exists. (IDC, “High Cost of Not finding Information”, 2004) Only 21% of respondents said they found the information they needed 85% to 100% of the time. (IDC 2001) (Julie) Briefly touch on the risks of not fixing the problem Inefficient operations Non-compliance (laws, regs, outside counsel guidelines) Legal exposure Exacerbates breach response and potential for serious reputational damage and costs At $0.52 per minute x 25% of your time = $62.40 every single day! That’s $16,473 per person per year!
5
What is AutoClassification?
AutoClassification = Rich Metadata + Rules Computer software that performs automated analysis and disposition of file/document content Software contains recognition algorithms for Document Type Content analytics Indexing & Tagging Recommended locations & naming “Middleware” that sits between file locations (storage) and file uses (applications), providing an intelligent filter and control system for all content. (Julie) Define autoclassification at the very highest level (level setting with audience)
6
AutoClassification as Traffic Cop
Data Silos File shares Personal/ Group Work Product Databases, Repositories & Collaborative Cloud Apps Traffic Cop Data Needs Julie: This is the case because there isn’t anything that is enabling the translation of data into actionable information. That’s where autoclassification comes in. Autoclassification tools, like Valora’s Powerhouse and Black Cat, takes the context and content of data so organizations can make eyes-wide-open decisions about it (and preferably do it an automated way … but that’s a topic for a future webinar in this series). Without the insight autoclassification brings to the table, it’s impossible to make sense of all of that data on scale. Retention & Legal Hold Data Lifecycle Planning & Mgmt Search & Retrieval Data Privacy & Security Migration & Archival
7
(this is Valora’s story, too)
About Valora Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. (Julie) So Sandy, technologies that can do “file analysis” have been around for years, many of them arising out of the onset of eDiscovery as an industry in the mid-2000s. Is that where Valora also got it’s start? (Sandy) Talks about Valora heritage, dealing with the tough file types and culling requirements (Sandy) Built technology to help create structure around unstructured content via rich metadata creation – reduce the need for Valora staff to do classification manually. The power of Big Data is the story about the ability to compete and win with few resources and limited dollars. - Forbes, March 2012 (this is Valora’s story, too)
8
What is Rich Metadata? Creation Date Signature Date Effective Date
Last modified date File extension (file type) Path name Hash value Author Custodian Content Owner Recipient Copyees (CC & BCC) Assignee Signatories Patient Name Creation Date Signature Date Effective Date Last Activity Date Expiration Date Destruction Date Language PII present Word & page count Tone/Sentiment Security Level Attachment Range Legal Hold Audience Brand Employee Document Type (Julie) Yes, so let’s talk about that – the role of metadata in our ability to make smart decisions about content. Tell me more about what Valora does and how that is different and brings more value than just using document properties, for example. Why isn’t system generated metadata enough? (Sandy) Talks about reasons why properties aren’t enough – that we need to look inside the document (unless file name tells you what the content is, you have no idea what it is … and who trusts file names?? Have to look inside)
9
How does AutoClassification Work?
Processing (aka Intake) is ingesting data or processing in place Creating OCR for scanned images or applying redactions Extracting text for native files & Speech to text for audio/video files or translating content to English Re-ordering or re-aligning pages Tagging (aka Coding, Indexing, Sequencing) is the process of extracting key information and attributes about each document Document Type, Important Dates, File/Content/DocType attributes Key Names, Phrases, Topics, Keywords, Themes Relation to other documents (duplicate, related, attached, contradictory, etc.) Disposition (rules) is the process of creating a destination or status for each document Retention status & duration Folder (taxonomy) location Labelling & keywords display native text text fielded data Sandy gives brief overview of how it works. fielded data disposition
10
AutoClassifying an email with data mining (analytics)
Author Doc Type & Implied attachment range Matter indicator & validation Author Validation & Contact Info Sandy: Let’s go through a few examples or use cases … first Implied matter: Passaro ( )
11
AutoClassifying “Junk”
(aka R.O.T.) “Watch list” terms Date + No further content + Heavy graphics + 4,000 ID copies Sandy … Now ROT Implied status: Junk/Remove All
12
AutoClassifying a Contract
DocType = Contract Effective Date = 2/1/2005 Party One = Office of the Commissioner of Insurance Party Two = AMI Risk Consultants, Inc. Keywords = Actuarial Services Term = 1 year Renewals = two 1year terms Sandy – Now contracts
13
Understanding Context: Wonder if anyone would mind…
Sandy – so important to capture the context … straight text analysis won’t give you the whole picture.
14
10 Use Cases for AutoClassification
Migration (fix metadata and general clean-up as part of effort to move data from one location to another) Remediation (fixing bad/no metadata, overall file organization) Upgrade (same idea as above, but due to some system or platform upgrade) Compliance/Monitoring (needing a filter on communications/docs to monitor for topics, phrases, PII, etc.) Security (identify, protect, and suppress sensitive data) Orphaned Data ( /files leftover from departed staff, acquired or divested departments, old records) Conversion (converting paper file to electronic/imaging, and adding additional smarts to that activity - unitization, coding, etc.) Litigation/Investigation (analyzing files for responsive/relevant content, attempting to find patterns of behavior) Repository/Knowledge Management (store documents by topic, keywords and content) Workflow (bring documents & files into a corporate workflow via portal, bounce, server crawl, etc.) (Julie) So the ROT use case is a popular one. I think the phrase “defensible deletion” has been around for about at least the last 5-7 years, if not more. But what other use cases are popular targets for autoclassification technologies? (Sandy and Julie) Talk through some of the use cases and why we see clients asking for them Key use cases include: Conversion, Orphaned Data, Security Leave migration as last one to discuss so it flows to next question … (Julie) With this migration use case, it begs an important point - It’s not enough to just know what information you have, you have to then be able to do something about it. Migration is an obvious one, but so are most, if not all, of the other use cases. (Sandy) Finding things is only the beginning! What orgs need to do is act on that insight. + 1 more: GDPR Compliance (ensuring proper treatment of PII for EU Citizens’ data)
15
Intake – PowerHouse – Output
PH Web Portal Hosted Metadata Repository OCR/Text Extraction Translation/ Transcription Unitization Coding/Tagging Rules/Disposition Redaction Exceptions Journaling Scanned Paper Files Folder Taxonomy Transatlantic Corp. 2016 IP Matters CX4 redesign Caro Prototype (Julie) Another key aspect, and one that has been mostly elusive to-date is the ability to operationalize this kind of work. Until now, most initiatives around these kinds of things have been project driven. And while there are certainly important reasons to take on projects, for me as a Certified Records Manager, I need to find a way to embed this capability into the day-forward management of content. Thoughts about that? (Sandy) Gives her thoughts Shared Server Poll Cloud File Storage HR Database Billing System ERP
16
(Julie) Ok, shifting gears a bit, let’s talk about who the stakeholders are around autoclassification, and what their stake is. CONSIDER SHOWING EDRM IGRM MODEL ON SLIDE? (Julie) Julie talks through the IGRM stakeholders and discusses who cares about what and how autoclassification helps them get where they want to go. Sandy to chime in as she is inspired. (Julie) So Sandy, in your experience, which of these stakeholders is most likely to champion autoclassification investments, and why? (Sandy) Sandy says whatever she says and Julie chimes in as inspired (Julie) Do you find that budget exists for autoclassification technology and services, or is this something net new to hit folks’ budgets that they should start planning for? (Sandy) Sandy shares her thoughts on budget
17
Closing Comments What is AutoClassification? Software that performs automated analysis and disposition of file/document content using advanced techniques such as “entity extraction” An intelligent filter and control system for all content The precursor step to automated disposition of content Why should I care? Get rid of the junk Save on storage costs Leverage and re-use valuable information Find and manage official records to policy Meet compliance obligations for Retention Protection Coming up next … Part 2: Creating Structure From Unstructured Data January 17, pm Eastern Closing Comments Parting thoughts Set up for next in the series – deep dive into metadata
18
Valora Technologies, Inc.
Q&A For more information: Valora Technologies, Inc. 101 Great Road, Suite 220 Bedford, MA 01730 DTI 2 Ravinia Dr. Suite 850 Atlanta, GA 30346
19
Where to Get More Info on AutoClassification
Info & advice from Valora & DTI Read Valora’s blog Download Valora’s White Papers, Articles & Datasheets Visit Valora’s YouTube Channel for recorded demos, Q&A, etc. Join us for our continued web series parts 2, 3, & 4! Info & advice from other sources Join the Information Coalition (it’s free) that hosts InfoGovCon each year Visit the Information Governance Initiative website See the EDRM Information Governance Reference Model Check out the Predictive Coding & Analytics in eDiscovery and the GRC LinkedIn group Score yourself on the ARMA InfoGov Maturity Model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.