How AutoIndexing Works The Steps before BlackCat Data Visualization 13 YEARS 11/2000 – 11/2013 How AutoIndexing Works The Steps before BlackCat Data Visualization January, 2014 Confidential
Who We Serve Corporate Legal Departments with complex document/data/content management needs Litigation Compliance Records Information Governance Government Agencies with limited resources for document/data/content monitoring, analysis, management Investigations Law firms and Service Providers who support these entities
What We Do Utilize technology to understand and interpret documents (or files, or records, or streamed text, etc.) Probabilistic Hierarchical Context-Free Grammars Statistical Pattern-Matching Tag documents with as many attributes and indices as possible Analyze those tags, along with text, and other clues, to provide a disposition on documents Report the results in a variety of ways. This presentation centers around Tagging Ultimately, Valora is a Consulting Service Provider, utilizing our own, highly customized tools to deliver excellent, timely and highly cost-effective work product to our clients.
Many Levels of Analytics & Data Mining Multiple Documents Character @ Word Covenant Phrase Attorney-client privilege Line SKU 2465 @ $3.41 ea. Paragraph Page Document Cross-Population Sub-population Population
Valora’s Proprietary Technology REPORTING BlackCat, Relativity, .CSV … ANALYSIS/RULES Year Total, Hot Doc, Priv… INDEXING/TAGGING Date, Author, Patent # … Valora’s view of document review process Index Analysis Reporting Valora’s underlying technology Developed in 2001 Used in 1,000’s of projects Applied against millions of pages/documents over the years Works off extracted text or OCR Pattern Matching Algorithms Automated determination of document attributes (DNA) Doc Type Doc Date Doc Sentiment (neutral, hostile etc.) Doc Author PowerHouse
INDEXING/TAGGING DocType = Patent Application Date = 10/18/2007 Date Format = US DocType = Patent Application Date = 10/18/2007 Author = Patent Authors, Author City, Author Country Assignee = RIM Tone = Neutral to slightly positive Embedded Graphic with Title Other Data Capturable Data Elements: Patent Number Filing Date Key Phrases & Terms Managing PTO Implied/Attached Docs Bar Code Present And many more . . .
AutoCoding Defined AutoCoding is the application of software and technique to capture information about a document. Bibliographic Fields: Author, Recipient, CC/BCC, Date, Subject/Title, Document Type. Characteristic Fields: Draft, Confidential, Foreign Language, Pages Missing, Duplicate/NearDuplicate, Conversation Thread Many flavors of AutoCoding All use software to some degree All use OCR and extracted text from e-docs Generally accepted that AutoCoding is faster and lower cost than manual coding, but sometimes lower quality
How AutoCoding Works PowerHouse AutoIndexing AutoBusinessRules Analytics Database Prep Data is extracted from each document into a database table Docs enter the system as extracted or OCR’ed text
Litigation Document Review Manual Indexing/Tagging ANALYSIS/RULES Litigation Document Review Manual Determining Responsiveness The document should be marked responsive if any of the following conditions are present: Mentions or discusses the specific protocol for handling simultaneous voice and data actions Is a design document or graphic that shows the specific protocol for handling simultaneous voice and data actions Discusses or is related to patent ‘009 Mentions Apple Inc. or Apple Computers, Inc. or is a communication from/to anyone at Apple Computer, Inc., or And so on… Rule: Responsive for Protocol Discussion When: [FullText] contains any of <Voice protocol key phrases 12> and [FullText] contains any of <Data protocol key phrases 25> and [DocType] is not any of [Brochure, Press Release, Website], ... Rule: Responsive for Patent ‘009 When: Any document in the Attachment Family matches: [FullText] contains any of <Patent '009 key phrase list 4>, or Parent of Attachment Family matches: Any of [Author, Recipient, CCs] contains any of <Patent '009 experts contact list 23>, … Rule: Responsive for Apple When: [FullText] contains (fuzzy match) any of <Apple key phrase list 7>, or Any of [Authors, Recipients, CCs] contains any of <Apple contact list 15>, or [Author] matches "*“ … -7-
What would you like to know? Analysis/Rules REPORTING Numerous Reporting Options Hosting, Early Case Assessment & DataVisualization DataVisualization & Hosting in BlackCatTM Hosting in other industry platforms Load File Import Opticon/LFP, Summation DII .CSV, other delimited file Loading to proprietary platforms Render to File PDF, Excel, HTML Comprehensive Report What would you like to know?
Popular Valora Services AutoUnitization Ability to distinguish the beginning & end of documents, as well as determine which documents incorporate other documents as attachments AutoCoding Identify and label documents by type (balance sheet, tax form, memo, etc.), relevant people (authors, recipients, cc/bcc), date and subject/title. AutoReview Identify and labeling documents by groupings (dupes/near dupes, conversation threads, issues/clustering) and disposition (responsive, privileged, “hot,” etc.) AutoRedaction Ability to identify & markup documents to “black out” select information (such as PII – private identification information, patient data or privileged information) AutoTranslation Automatic translation of non-English documents to English text. Supports dozens of originating languages. DataVisualization Presentation of data in intuitive, graphical ways with easy navigation, understanding and manipulation of document subsets. Often used for Early Case Assessment. Hosting & Database Creation Hosting of pre- or post-processed documents and files in Valora’s BlackCat database or others (iConect, Relativity, etc.). Electronic File Processing (EFP) File Conversion to TIF/PDF format, text and metadata extraction, de-NISTing, cross-custodian de-duplication, filtering/culling, analytics OCR Optical Character Recognition for converting images to searchable text NearDuplicateDetection Identify documents that are highly similar, if not identical across custodians and the entire population. Includes cross-correlation of paper & electronic documents EmailThreadGrouping Join separated email conversation threads into a consistent stream from start to finish Scanning Image conversion for paper documents into electronic image format (TIF, PDF, JPEG, etc.) AutoBusinessRules Identify and label documents by workflow treatment, retention plans, compliance audit or other groupings. Professional Services Options for Project Management, Technical data/file manipulation, Subject Matter Expertise, Resources & Worfklow Design & Management Most services available as Auto, Auto+Manual “Hybrid,” and Manual-Only. Call for specifics.
Don’t take our word for it, take theirs…
Valora Technologies, Inc. Thank You! For More Information: Valora Technologies, Inc. 101 Great Road, Suite 220 Bedford, MA 01730 781.229.2265