Data Mining Documents for Corporate Legal

Slides:



Advertisements
Similar presentations
Visit the ccScan Website Scan, Import, and Automatically File documents to the Cloud SCAN, IMPORT, AND AUTOMATICALLY FILE DOCUMENTS TO SALESFORCE ® Introduction.
Advertisements

E-Discovery in Government Investigations Jeane Thomas, Crowell & Moring LLP February 9, 2009.
Developing a Records & Information Retention & Disposition Program:
Chapter 14 The Second Component: The Database.
Libraries and Institutional Content Management Systems
Electronic Data Interchange (EDI)
Content Management and Process Automation Presented by Mark Chambers SE Regional Manager Document Imaging Solutions, Inc.
Document Solutions Document Solutions William Zastrow President, CEO FileMark Corporation July 30, 2008 Document Solutions Document Solutions Leveraging.
Environment for Information Security n Distributed computing n Decentralization of IS function n Outsourcing.
Presentation Path  Introduction to Ved Consultancy and OpenText  Current Challenges  The Valued Customers and Sectors  Our Solutions  Demo. Together,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Copyright © 2013 Avaali. All Rights Reserved. 1 SAP OpenText ECM Solutions: Travel Receipts Management.
Automation Living in a Paper Oriented World and The Steps to Automation.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
LEADING TAX DEPARTMENTS FORWARD John Diamond, Director Ryan Lynch, Director Tax’s World of Data February 26, 2016.
PROV NETWORK MEETING Linda Tolson, Corporate Records Manager 6 May, 2016.
How Sage ERP X3 Systems Can Benefit Businesses.  Sage X3 is an affordable and flexible ERP solution designed to help mid-sized companies manage business.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Valora BlackCat: An Introduction to Data Visualization & Hosting Thursday, May 26, :00 pm ET Sandra Serkes, President & CEO Valora Technologies,
Drill Workflow- Make a workflow using the task and decision boxes on the board to simulate a student getting up and going to school in the morning. Use.
Kodak ScanMate i1150 Scanners Presentation Pack About this document: This set of slides is cleared for external use with customers, partners and other.
Module 1: Overview of Information System in Organizations
17 YEARS 11/2000 – 11/2017 Get to Know Valora! eDiscovery & Litigation
Strategies in the Game of
13 YEARS 11/2000 – 11/2013 Valora Records Management & Information Governance (RMIG) Specific Customer Example: Brand Integrity / Product Liability.
Technology Market Trends Understanding ECM
Monday, July 24, :00 pm ET Sandra Serkes, President & CEO
Leveraging the Data Map – A Case Study November 15, 2016
Utilizing Technology to Interpret, Classify & Data Mine Documents
AutoRedaction & Workflow Rules
How AutoIndexing Works The Steps before BlackCat Data Visualization
The effort-saving, cost-cutting, low-overhead, cloud capture platform.
Data Minimization Framework
Meemim's Microsoft Azure-Hosted Knowledge Management Platform Simplifies the Sharing of Information with Colleagues, Clients or the Public MICROSOFT AZURE.
Microsoft Dynamics GP Paperless Automation
Data Mining Personally Identifiable Information (PII)
13 YEARS 11/2000 – 11/2013 Automated Privilege Detection, De-Threading & Automated Priv Logs 1st Quarter 2014 Confidential.
Subject Name: MANGEMENT INFORMATION SYSTEM Subject Code:10IS72
Proactive Information Management and eDiscovery
Business Document Platform
Letsignit, an Automated Signature Solution for Microsoft Office 365 and Microsoft Exchange, Provides Efficiency in Branding and Customization OFFICE.
Measure Effectiveness of Communication, Engage Your Employees, and Bridge Communication Gaps with Sparrow App and Power of Microsoft Azure MICROSOFT AZURE.
IWRITER 365 Offers Seamless, Easy-to-Use Solution for Using, Designing, Managing, and Sharing All Your Company Templates in Microsoft Office 365 OFFICE.
Smart Org Charts in Microsoft Office 365: Securely Create, Collaborate, Edit, and Share Org Charts in PowerPoint and Online with OrgWeaver Software OFFICE.
ECM for Legal & Law.
MetaShare, Powered by Azure, Gives SharePoint a User-Friendly, Intuitive User Interface and Added App Features with No Added Administrative Tasks OFFICE.
Accounting Automation
System And Application Software
File Manager for Microsoft Office 365, SharePoint, and OneDrive: Extensible Via Custom Connectors in Enterprise Deployments, Ideal for End Users OFFICE.
Chapter 1 Database Systems
Innovative content & language solutions: Transforming digital.
Get Enterprise-Grade Call Handling and Control for Microsoft Office 365 and Skype for Business with the Bridge Boss-Admin Executive Console OFFICE 365.
The Only Digital Asset Management System on Microsoft Azure, MediaValet Is Uniquely Equipped to Meet Any Company’s Needs MICROSOFT AZURE ISV PROFILE: MEDIAVALET.
Use of Electronic and Internet advertising options
Compliance….GlobalSearch……WHAT?!?!
Finance Robotics, a Customer Perspective by RIOCAN
Technical Capabilities
Office Edition Overview (Dec. 2018).
Letsignit, an Automated Signature Solution for Microsoft Office 365 and Microsoft Exchange, Provides Efficiency in Branding and Customization OFFICE.
Business Document Platform
Managed Content Services
Chapter 1 Database Systems
Data Warehousing Concepts
Microsoft Data Insights Summit
Make it real: Help your customers comply with the GDPR
Microsoft Data Insights Summit
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
OU BATTLECARD: Oracle Identity Management Training
Leveraging Best Practices for Digital Asset & Marketing Collateral Management HITMC 2018 April 6, 2018.
Presentation transcript:

Data Mining Documents for Corporate Legal Wednesday, October 29, 2014 2:00 pm ET Speaker: Sandra Serkes, President Valora Technologies, Inc.

“ ” Valora Technologies Data Mining for Corporate Legal Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars. - Forbes, March 2012 (this is Valora’s story, too) “ ” Why Data Mine Documents? 2 Examples of Data Mining in Use by Corporate Legal Document Data Visualization Valora’s Analytics & Data Mining Solutions

Why Data Mine Documents? Document is a loose term here. Really, it means any structured or unstructured form of textual or metadata content. Voicemails, tweets, texts, websites, audio & video files, receipts and transactions are all “documents” as far as data analytics are concerned. Litigation Doc Review & Productions Investigations Compliance Legal, regulatory & ethics Financial & investor Health & safety Business Intelligence Information Governance Management & control Cost savings Exposure mitigation We data mine documents to learn where they are & what they say. Ultimately, we gain management and control over the contents, storage, access, retrieval, use and exposure of our information.

Does your organization know what exactly your documents hold? EXPIRED

How wise is either strategy? “We know what’s in our data, but we aren’t dealing with it.” “We don’t know what’s in our data” Courts, shareholders, consumers, government agencies, watchdog groups, media spotlight reports and more demand that we DO know what is in our data and that we actively manage and control it responsibly. VERSUS

Facebook Tinkers With Users’ Emotions in News Feed Experiment Why talk about this now? 608,087,870 – total number of records containing sensitive personal information involved in security breaches in the United States since January 2005 Source: Privacy Rights Clearinghouse, June 2013 Big Data is a big deal Everyone is talking about it: clients, investors, media, employees, management, government Increasing data breach events keep the conversation alive So do exploitations of information power And reactions to that power People now expect that organizations are routinely collecting and mining data on their behavior, purchases, searches, posts, etc. They are starting to demand ethical, compliant & competent management of that data Costs to perform large-scale, complex analytics & hosting have come down enough to be financially viable for most organizations Facebook Tinkers With Users’ Emotions in News Feed Experiment 6/24/2014 Google removes search results in wake of EU privacy ruling 6/26/2014

Typical Corporate Legal Document Populations Email Messages + attachments, calendar entries, notifications… Business Documents Contracts Financials Reports Sales & marketing materials Client & patient data Supplier information Media & social media

Content & Context You know what content is – WHAT does the document say? Good for searching & comparing body text Good basis for data analytics Context is everything else WHO wrote this, saw this, knew this, received this authorized this? WHEN was it sent, received, modified, copied, stored, deleted? WHY was this created? What was the intent, basis, plan, pattern of behavior? WHICH version of this content is the most important, recent, comprehensive, damaging? Predictive Analytics goes beyond content analysis into context and trend analysis.

Drawbacks to Classification without Context (Classification alone) Treats all document contents the same Misses the notion of document context (who, what, where, when and how) Email particularly has important attributes as a communication mechanism Makes retain/delete decisions on content only Oversimplifies inherent or explicit decisioning hierarchies Duplicative content vs. “best” content Assumes all content instances are equal Assumes Backfile/Batch methodology only Weak solution for Day Forward document creation or intake Does not inform creation or approval of email content Focused on cleanup, rather than asset management Assumes no future value of past email or content, ignores business value Does not prioritize results for future use Relegates to IT tool, rather than IG strategy Unable to adapt to ongoing maintenance or contextual changes Assumes technology is independent of policy creation & enforcement Ignores evolving capabilities to define policy and ensure compliance Ignores PII, PHI & other content sensitivities Assumes all content instances are equal Exists outside of data visualization Missed opportunity for information presentation, forward-use asset management

“ ” Valora Technologies First Example of Data Mining Documents for Corporate Legal Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars. - Forbes, March 2012 (this is Valora’s story, too) “ ” Data Mining for Information Governance: Enterprise Email Management, Classification & Control

Enterprise Email Management is a very good Case In Point Universal Issue Involves several key IG problems: Storage/hosting Content analysis & classification Context – correspondence, notification & record, date/time/file signatures, transmission & attachments, custodianship, etc. Administration, management & maintenance Elements of Backfile and Day Forward records management ESI is generally easier & lower cost to tackle than paper files Because of Context, EEM is a hot button issue with real budgets available Investor & media attention Customer concerns Risk & compliance danger zone Predecessor to managing social media

How a computer classifies an email with data mining (analytics) INDEXING/TAGGING for Records & IG How a computer classifies an email with data mining (analytics) Author Doc Type & Implied attachment range Matter indicator & validation Author Validation & Contact Info Implied matter: Passaro (34-6788)

Additional Info Data Analytics Determine What DocType is this? An email with an attachment Who created this? Who is the author? Stuart Trumbull, Partner at DCH Who is receiving this? Why? Roberta Halstrom, paralegal Work instruction/direction What is the Author-Recipient relationship? Supervisor-subordinate What are important words, patterns & concepts? “please file” “Motion in Limine” “Passaro matter” “34-6788” How is attachment related? Author match Passaro match Key Motion content What else is known about this party? Wrote 14 emails that day 94% of “Passaro” mentions include him as auth/recip/cc 7 instances as Pleading Author w/ Passaro matter(s) Halstom & assistant/associate on 48% of Trumbull+pasaro content What other context can be inferred? Tuesday = 5/13/14 15 date-correlated instances of 5/13/14 with Passaro docket Tone is neutral-friendly, professionally appropriate What presents better visually? Topics over time Relationship between Trumbull & others Passaro matter against other matters

“ ” Valora Technologies Second Example of Data Mining Documents for Corporate Legal Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars. - Forbes, March 2012 (this is Valora’s story, too) “ ” Data Mining for Litigation & eDiscovery: Document Review & Production

INDEXING/TAGGING for eDiscovery DocType = Patent Application Date Format = US DocType = Patent Application Date = 10/18/2007 Author = Patent Authors, Author City, Author Country Assignee = RIM Tone = Neutral to slightly positive Embedded Graphic with Title Other Data Capturable Data Elements: Patent Number Filing Date Key Phrases & Terms Managing PTO Implied/Attached Docs Bar Code Present And many more . . .

Litigation Document Review Manual Indexing/Tagging ANALYSIS/RULES For eDiscovery Litigation Document Review Manual Determining Responsiveness The document should be marked responsive if any of the following conditions are present: Mentions or discusses the specific protocol for handling simultaneous voice and data actions Is a design document or graphic that shows the specific protocol for handling simultaneous voice and data actions Discusses or is related to patent ‘009 Mentions Apple Inc. or Apple Computers, Inc. or is a communication from/to anyone at Apple Computer, Inc., or apple.com. And so on… Rule: Responsive for Protocol Discussion When: [FullText] contains any of <Voice protocol key phrases 12> and [FullText] contains any of <Data protocol key phrases 25> and [DocType] is not any of [Brochure, Press Release, Website], ... Rule: Responsive for Patent ‘009 When: Any document in the Attachment Family matches: [FullText] contains any of <Patent '009 key phrase list 4>, or Parent of Attachment Family matches: Any of [Author, Recipient, CCs] contains any of <Patent '009 experts contact list 23>, … Rule: Responsive for Apple When: [FullText] contains (fuzzy match) any of <Apple key phrase list 7>, or Any of [Authors, Recipients, CCs] contains any of <Apple contact list 15>, or [Author] matches "*@apple.com“ … -7-

“ ” Valora Technologies Data Mining Documents for Corporate Legal Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars. - Forbes, March 2012 (this is Valora’s story, too) “ ” Data Presentation & Visualization

What is Data Visualization? Simple visual representation of relationships and patterns in document data Common examples Graph sales over time Distribution by ethnicity Word Clouds & Heat Maps USA Today-style graphics Use of charts, graphs, dashboards, animation and sound to help convey important connections Data visualization is a general term that describes any effort to help people understand the significance of data by placing it in a visual context. -TechTarget

“ ” Valora Technologies How Valora Technologies Data Mines Documents & Presents Information Visually Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars. - Forbes, March 2012 (this is Valora’s story, too) “ ” PowerHouseTM Automated Data Mining Services Platform BlackCatTM Hosting & Data Visualization Presentation Layer

PowerHouse & BlackCat Architecture BlackCat Presentation Layer PowerHouse Platform Layer PH AutoProcessors (Pattern-Matching Algorithms & Rules) PH Quality Control User Interface (QCUI) PH Admin Console (Admin) SQL Server Database Layer

PowerHouse Platform Capabilities AutoUnitization Ability to distinguish the beginning & end of documents, as well as determine which documents incorporate other documents as attachments AutoCoding Identify and label documents by type (balance sheet, tax form, memo, etc.), relevant people (authors, recipients, cc/bcc), date and subject/title. AutoReview Identify and labeling documents by groupings (dupes/near dupes, conversation threads, issues/clustering) and disposition (responsive, privileged, “hot,” etc.) AutoRedaction Ability to identify & markup documents to “black out” select information (such as PII – private identification information, patient data or privileged information) AutoTranslation Automatic translation of non-English documents to English text. Supports dozens of originating languages. Electronic File Processing (EFP) File Conversion to TIF/PDF format, text and metadata extraction, de-NISTing, cross-custodian de-duplication, filtering/culling, analytics OCR Optical Character Recognition for converting images to searchable text NearDuplicateDetection Identify documents that are highly similar, if not identical across custodians and the entire population. Includes cross-correlation of paper & electronic documents EmailThreadGrouping Join separated email conversation threads into a consistent stream from start to finish AutoBusinessRules Identify and label documents by workflow treatment, retention plans, compliance audit or other groupings. Initiate notifications or other actions based on incoming data Audio & Video Files Mining Identify key data elements from audio or video content

Litigation & eDiscovery Selected Valora Client Use Cases of Document Data Mining for Corporate Legal Records Management Litigation & eDiscovery AutoIndex 400,000 files per day for 4 months AutoRedact SSN & TID from credit applications Host online “Bidder’s Library” of 100 years of scanned records AutoBusiness Rules for document retention & compliance Convert paper medical records to digital format with embedded indexing AutoReview 1.5M files for responsiveness, privilege, & hotdocs AutoIndex 3M FOIA request documents AutoTranslate Japanese, Spanish, French & German docs to English Oversee & manage 6-city simultaneous data collection & conversion AutoRedact personally identifying information (PII)

Typical Problems Valora Solves Legal/Litigation/eDiscovery Problems Too many documents to review, cull & produce by hand Cost-effective alternative solutions to contract attorney & offshore labor “armies” Missing, poor, or ineffective metadata Re-unitization, organization, indexing & redacting of documents Bridging multi-language document populations to English Records Management Problems Help automate defensible deletion efforts for IG Organize & control loose documents on shared drives, desktops, networks & devices Eliminate expensive and information-poor storage options Serve as automated intake for multiple content generation sources Business Intelligence Problems Organize & control decades of contracts & agreements Provide brand integrity/protection data mining of public/private documents Forecast & trending of topics, people & locations over time Loose, shared files analysis & control Health Care Problems Heavy expense & time converting hardcopy medical records to EMRs/EHRs Cannot keep up with fax server data collection Cost effective alternative solutions to “armies” of temp data entry coders

Why Valora? Mature & Stable Company Owner Operated Domestic Business – Brings years of experience and subject matter expertise to every engagement, small or large – Long term Government contract (Mega 2-4) requires annual financial and security review Owner Operated Domestic Business – Simple to get things done! – Your data and documents stay here in the US Time Tested Proprietary Technology – Our software has been used since 2003 giving you the advantage of the system having learned 1,000’s of document types and document attributes – Easily customized to meet your needs and demands Lightning Fast Processing – Automated processes means fast turn around allowing you to meet unrealistic deadlines – Same day service – In by 9 out by 5! Unique service offerings – We automate our solutions allowing you to virtually eliminate manual labor, reducing time and cost • Redaction, Audio/Video coding, Translation, Review – We process across paper and electronic collections giving you the advantage of working with only 1 service provider Flexible Pricing – Pricing models designed to meet your specific document processing needs – Subscription based models that offer predictability

Valora Technologies, Inc. Thank You! For More Information: Valora Technologies, Inc. 101 Great Road, Suite 220 Bedford, MA 01730 781.229.2265 www.valoratech.com info@valoratech.com