Download presentation
Presentation is loading. Please wait.
1
Data Mining Documents for Corporate Legal
Wednesday, October 29, 2014 2:00 pm ET Speaker: Sandra Serkes, President Valora Technologies, Inc.
2
“ ” Valora Technologies Data Mining for Corporate Legal
Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars Forbes, March 2012 (this is Valora’s story, too) “ ” Why Data Mine Documents? 2 Examples of Data Mining in Use by Corporate Legal Document Data Visualization Valora’s Analytics & Data Mining Solutions
3
Why Data Mine Documents?
Document is a loose term here. Really, it means any structured or unstructured form of textual or metadata content. Voic s, tweets, texts, websites, audio & video files, receipts and transactions are all “documents” as far as data analytics are concerned. Litigation Doc Review & Productions Investigations Compliance Legal, regulatory & ethics Financial & investor Health & safety Business Intelligence Information Governance Management & control Cost savings Exposure mitigation We data mine documents to learn where they are & what they say. Ultimately, we gain management and control over the contents, storage, access, retrieval, use and exposure of our information.
4
Does your organization know what exactly your documents hold?
EXPIRED
5
How wise is either strategy?
“We know what’s in our data, but we aren’t dealing with it.” “We don’t know what’s in our data” Courts, shareholders, consumers, government agencies, watchdog groups, media spotlight reports and more demand that we DO know what is in our data and that we actively manage and control it responsibly. VERSUS
6
Facebook Tinkers With Users’ Emotions in News Feed Experiment
Why talk about this now? 608,087,870 – total number of records containing sensitive personal information involved in security breaches in the United States since January 2005 Source: Privacy Rights Clearinghouse, June 2013 Big Data is a big deal Everyone is talking about it: clients, investors, media, employees, management, government Increasing data breach events keep the conversation alive So do exploitations of information power And reactions to that power People now expect that organizations are routinely collecting and mining data on their behavior, purchases, searches, posts, etc. They are starting to demand ethical, compliant & competent management of that data Costs to perform large-scale, complex analytics & hosting have come down enough to be financially viable for most organizations Facebook Tinkers With Users’ Emotions in News Feed Experiment 6/24/2014 Google removes search results in wake of EU privacy ruling 6/26/2014
7
Typical Corporate Legal Document Populations
Messages + attachments, calendar entries, notifications… Business Documents Contracts Financials Reports Sales & marketing materials Client & patient data Supplier information Media & social media
8
Content & Context You know what content is – WHAT does the document say? Good for searching & comparing body text Good basis for data analytics Context is everything else WHO wrote this, saw this, knew this, received this authorized this? WHEN was it sent, received, modified, copied, stored, deleted? WHY was this created? What was the intent, basis, plan, pattern of behavior? WHICH version of this content is the most important, recent, comprehensive, damaging? Predictive Analytics goes beyond content analysis into context and trend analysis.
9
Drawbacks to Classification without Context (Classification alone)
Treats all document contents the same Misses the notion of document context (who, what, where, when and how) particularly has important attributes as a communication mechanism Makes retain/delete decisions on content only Oversimplifies inherent or explicit decisioning hierarchies Duplicative content vs. “best” content Assumes all content instances are equal Assumes Backfile/Batch methodology only Weak solution for Day Forward document creation or intake Does not inform creation or approval of content Focused on cleanup, rather than asset management Assumes no future value of past or content, ignores business value Does not prioritize results for future use Relegates to IT tool, rather than IG strategy Unable to adapt to ongoing maintenance or contextual changes Assumes technology is independent of policy creation & enforcement Ignores evolving capabilities to define policy and ensure compliance Ignores PII, PHI & other content sensitivities Assumes all content instances are equal Exists outside of data visualization Missed opportunity for information presentation, forward-use asset management
10
“ ” Valora Technologies
First Example of Data Mining Documents for Corporate Legal Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars Forbes, March 2012 (this is Valora’s story, too) “ ” Data Mining for Information Governance: Enterprise Management, Classification & Control
11
Enterprise Email Management is a very good Case In Point
Universal Issue Involves several key IG problems: Storage/hosting Content analysis & classification Context – correspondence, notification & record, date/time/file signatures, transmission & attachments, custodianship, etc. Administration, management & maintenance Elements of Backfile and Day Forward records management ESI is generally easier & lower cost to tackle than paper files Because of Context, EEM is a hot button issue with real budgets available Investor & media attention Customer concerns Risk & compliance danger zone Predecessor to managing social media
12
How a computer classifies an email with data mining (analytics)
INDEXING/TAGGING for Records & IG How a computer classifies an with data mining (analytics) Author Doc Type & Implied attachment range Matter indicator & validation Author Validation & Contact Info Implied matter: Passaro ( )
13
Additional Info Data Analytics Determine
What DocType is this? An with an attachment Who created this? Who is the author? Stuart Trumbull, Partner at DCH Who is receiving this? Why? Roberta Halstrom, paralegal Work instruction/direction What is the Author-Recipient relationship? Supervisor-subordinate What are important words, patterns & concepts? “please file” “Motion in Limine” “Passaro matter” “ ” How is attachment related? Author match Passaro match Key Motion content What else is known about this party? Wrote 14 s that day 94% of “Passaro” mentions include him as auth/recip/cc 7 instances as Pleading Author w/ Passaro matter(s) Halstom & assistant/associate on 48% of Trumbull+pasaro content What other context can be inferred? Tuesday = 5/13/14 15 date-correlated instances of 5/13/14 with Passaro docket Tone is neutral-friendly, professionally appropriate What presents better visually? Topics over time Relationship between Trumbull & others Passaro matter against other matters
14
“ ” Valora Technologies
Second Example of Data Mining Documents for Corporate Legal Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars Forbes, March 2012 (this is Valora’s story, too) “ ” Data Mining for Litigation & eDiscovery: Document Review & Production
15
INDEXING/TAGGING for eDiscovery DocType = Patent Application
Date Format = US DocType = Patent Application Date = 10/18/2007 Author = Patent Authors, Author City, Author Country Assignee = RIM Tone = Neutral to slightly positive Embedded Graphic with Title Other Data Capturable Data Elements: Patent Number Filing Date Key Phrases & Terms Managing PTO Implied/Attached Docs Bar Code Present And many more . . .
16
Litigation Document Review Manual
Indexing/Tagging ANALYSIS/RULES For eDiscovery Litigation Document Review Manual Determining Responsiveness The document should be marked responsive if any of the following conditions are present: Mentions or discusses the specific protocol for handling simultaneous voice and data actions Is a design document or graphic that shows the specific protocol for handling simultaneous voice and data actions Discusses or is related to patent ‘009 Mentions Apple Inc. or Apple Computers, Inc. or is a communication from/to anyone at Apple Computer, Inc., or apple.com. And so on… Rule: Responsive for Protocol Discussion When: [FullText] contains any of <Voice protocol key phrases 12> and [FullText] contains any of <Data protocol key phrases 25> and [DocType] is not any of [Brochure, Press Release, Website], ... Rule: Responsive for Patent ‘009 When: Any document in the Attachment Family matches: [FullText] contains any of <Patent '009 key phrase list 4>, or Parent of Attachment Family matches: Any of [Author, Recipient, CCs] contains any of <Patent '009 experts contact list 23>, … Rule: Responsive for Apple When: [FullText] contains (fuzzy match) any of <Apple key phrase list 7>, or Any of [Authors, Recipients, CCs] contains any of <Apple contact list 15>, or [Author] matches … -7-
17
“ ” Valora Technologies Data Mining Documents for Corporate Legal
Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars Forbes, March 2012 (this is Valora’s story, too) “ ” Data Presentation & Visualization
18
What is Data Visualization?
Simple visual representation of relationships and patterns in document data Common examples Graph sales over time Distribution by ethnicity Word Clouds & Heat Maps USA Today-style graphics Use of charts, graphs, dashboards, animation and sound to help convey important connections Data visualization is a general term that describes any effort to help people understand the significance of data by placing it in a visual context. -TechTarget
20
“ ” Valora Technologies
How Valora Technologies Data Mines Documents & Presents Information Visually Bedford, MA software firm specializing in machine-assisted document processing capabilities (aka analytics) World experts in the automated analysis, indexing, mining and presentation of documents, data & content 20 staff, 200+ clients, 1,500,000+ pages every week Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies) Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data” Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics. Provide cost competitive document analytics solutions in the United States Provide efficient, world-class, targeted solutions to data, document & content utilization problems The power of Big Data is the story about the ability to compete and win with few resources and limited dollars Forbes, March 2012 (this is Valora’s story, too) “ ” PowerHouseTM Automated Data Mining Services Platform BlackCatTM Hosting & Data Visualization Presentation Layer
21
PowerHouse & BlackCat Architecture
BlackCat Presentation Layer PowerHouse Platform Layer PH AutoProcessors (Pattern-Matching Algorithms & Rules) PH Quality Control User Interface (QCUI) PH Admin Console (Admin) SQL Server Database Layer
22
PowerHouse Platform Capabilities
AutoUnitization Ability to distinguish the beginning & end of documents, as well as determine which documents incorporate other documents as attachments AutoCoding Identify and label documents by type (balance sheet, tax form, memo, etc.), relevant people (authors, recipients, cc/bcc), date and subject/title. AutoReview Identify and labeling documents by groupings (dupes/near dupes, conversation threads, issues/clustering) and disposition (responsive, privileged, “hot,” etc.) AutoRedaction Ability to identify & markup documents to “black out” select information (such as PII – private identification information, patient data or privileged information) AutoTranslation Automatic translation of non-English documents to English text. Supports dozens of originating languages. Electronic File Processing (EFP) File Conversion to TIF/PDF format, text and metadata extraction, de-NISTing, cross-custodian de-duplication, filtering/culling, analytics OCR Optical Character Recognition for converting images to searchable text NearDuplicateDetection Identify documents that are highly similar, if not identical across custodians and the entire population. Includes cross-correlation of paper & electronic documents ThreadGrouping Join separated conversation threads into a consistent stream from start to finish AutoBusinessRules Identify and label documents by workflow treatment, retention plans, compliance audit or other groupings. Initiate notifications or other actions based on incoming data Audio & Video Files Mining Identify key data elements from audio or video content
23
Litigation & eDiscovery
Selected Valora Client Use Cases of Document Data Mining for Corporate Legal Records Management Litigation & eDiscovery AutoIndex 400,000 files per day for 4 months AutoRedact SSN & TID from credit applications Host online “Bidder’s Library” of 100 years of scanned records AutoBusiness Rules for document retention & compliance Convert paper medical records to digital format with embedded indexing AutoReview 1.5M files for responsiveness, privilege, & hotdocs AutoIndex 3M FOIA request documents AutoTranslate Japanese, Spanish, French & German docs to English Oversee & manage 6-city simultaneous data collection & conversion AutoRedact personally identifying information (PII)
24
Typical Problems Valora Solves
Legal/Litigation/eDiscovery Problems Too many documents to review, cull & produce by hand Cost-effective alternative solutions to contract attorney & offshore labor “armies” Missing, poor, or ineffective metadata Re-unitization, organization, indexing & redacting of documents Bridging multi-language document populations to English Records Management Problems Help automate defensible deletion efforts for IG Organize & control loose documents on shared drives, desktops, networks & devices Eliminate expensive and information-poor storage options Serve as automated intake for multiple content generation sources Business Intelligence Problems Organize & control decades of contracts & agreements Provide brand integrity/protection data mining of public/private documents Forecast & trending of topics, people & locations over time Loose, shared files analysis & control Health Care Problems Heavy expense & time converting hardcopy medical records to EMRs/EHRs Cannot keep up with fax server data collection Cost effective alternative solutions to “armies” of temp data entry coders
25
Why Valora? Mature & Stable Company Owner Operated Domestic Business
– Brings years of experience and subject matter expertise to every engagement, small or large – Long term Government contract (Mega 2-4) requires annual financial and security review Owner Operated Domestic Business – Simple to get things done! – Your data and documents stay here in the US Time Tested Proprietary Technology – Our software has been used since 2003 giving you the advantage of the system having learned 1,000’s of document types and document attributes – Easily customized to meet your needs and demands Lightning Fast Processing – Automated processes means fast turn around allowing you to meet unrealistic deadlines – Same day service – In by 9 out by 5! Unique service offerings – We automate our solutions allowing you to virtually eliminate manual labor, reducing time and cost • Redaction, Audio/Video coding, Translation, Review – We process across paper and electronic collections giving you the advantage of working with only 1 service provider Flexible Pricing – Pricing models designed to meet your specific document processing needs – Subscription based models that offer predictability
26
Valora Technologies, Inc.
Thank You! For More Information: Valora Technologies, Inc. 101 Great Road, Suite 220 Bedford, MA 01730
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.