How AutoIndexing Works The Steps before BlackCat Data Visualization

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Visit the ccScan Website Scan, Import, and Automatically File documents to the Cloud SCAN, IMPORT, AND AUTOMATICALLY FILE DOCUMENTS TO SALESFORCE ® Introduction.
Presentation on 3CD welcomes you to a Winman Software Pvt. Ltd.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Corporate Imaging Kathy Mumford Session 2 Room C Tennessees BEST 2005 Tech Tips User Conference.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
C6 Databases.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
Internet Research Internet Applications. The Internet is not the Web Because of the great popularity of the World Wide Web, people think the Internet.
© Nuance Communications, Inc. All rights reserved. Page 1 Nuance ® AutoStore ® for SAP ® solutions.
Overview of PubWEST Patent and Trademark Depository Library Training Seminar April 2006.
Management Information Systems, Sixth Edition
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
Overview of Search Engines
Improving the Quality of Tax Statistics: Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S. Internal.
SIS – Simplified Interline Settlement IS Functionality – How IS works? 15 th September 2010 Robin PAUL, Kale Consultants.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Document Solutions Document Solutions William Zastrow President, CEO FileMark Corporation July 30, 2008 Document Solutions Document Solutions Leveraging.
De-identifying Pathology Reports for Pathology Informatics
The Complexities & Economics of Digitizing Microfilm
The Advantage Series ©2004 The McGraw-Hill Companies, Inc. All rights reserved Chapter 8 Managing Worksheet Lists Microsoft Office Excel 2003.
Chapter 19 Managing Worksheet Lists. Creating Lists ► Microsoft Office Excel 2003 is inarguably the most powerful electronic spreadsheet available. ►
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
Introduction to ALCoder TM LAW PREDISCOVERY CONFERENCE Lisa Rosen, President Rosen Technology Resources, Inc. October 20, 2008.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
TECHONOLOGY experts INDUSTRY Some of our clients Link Translation’s extensive experience includes translation for some of the world's largest and leading.
Lecture 3 Creating a Web Page with HTML. Objectives §Hypertext Document in WWW p §The HTML language p l Definition l Web browsers and.
+ Information Systems and Databases 2.2 Organisation.
Elements of Website Design. Homepage ● first page of the website ● website title ● general introduction ● authors or creators information ● date updated.
The Complexities & Economics of Scanning Microfilmed Documents Videos
Introducing the New iManage Dan Carmel, Chief Marketing Officer.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
What is a Computer An electronic, digital device that stores and processes information. A machine that accepts input, processes it according to specified.
Automation Living in a Paper Oriented World and The Steps to Automation.
McGraw-Hill/Irwin ©2008,The McGraw-Hill Companies, All Rights Reserved Chapter 5 Data Resource Management.
Forum to improve your experience entering data into SRDR 1 SRDR is being developed and maintained by the Brown EPC under contract with the Agency for Healthcare.
Web Content And Customer Relationship Management Solution. Transforming web sites into a customer-focused, revenue generating channel with less stress.
Valora BlackCat: An Introduction to Data Visualization & Hosting Thursday, May 26, :00 pm ET Sandra Serkes, President & CEO Valora Technologies,
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Information Retrieval in Practice
17 YEARS 11/2000 – 11/2017 Get to Know Valora! eDiscovery & Litigation
13 YEARS 11/2000 – 11/2013 Valora Records Management & Information Governance (RMIG) Specific Customer Example: Brand Integrity / Product Liability.
Data Mining Documents for Corporate Legal
Monday, July 24, :00 pm ET Sandra Serkes, President & CEO
Utilizing Technology to Interpret, Classify & Data Mine Documents
13 YEARS 11/2000 – 11/2013 Automated Privilege Detection, De-Threading & Automated Priv Logs 1st Quarter 2014 Confidential.
Databases Chapter 16.
Business Document Platform
GO! with Microsoft Access 2016
Vaccine Code Set Management Services Pilot
Introduction to Statistical Analysis in PatBase
Databases.
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
PolyAnalyst Data and Text Mining tool
Search Techniques and Advanced tools for Researchers
Accelerate Your Self-Service Data Analytics
Computers Are Your Future
MyInstitution.Net Institutional Information Management System
SIS – Simplified Interline Settlement
ABOUT ME MY NAME IS DIOSDADO MACASAET OR DON
Presented by: Jeff Moore – Artsyl Technologies, Inc.
Office Edition Overview (Dec. 2018).
Business Document Platform
e-Discovery through Text Mining
The ultimate in data organization
Module 2 - Xtrata Pro Product Overview Module 2 – Product Overview
Presentation transcript:

How AutoIndexing Works The Steps before BlackCat Data Visualization 13 YEARS 11/2000 – 11/2013 How AutoIndexing Works The Steps before BlackCat Data Visualization January, 2014 Confidential

Who We Serve Corporate Legal Departments with complex document/data/content management needs Litigation Compliance Records Information Governance Government Agencies with limited resources for document/data/content monitoring, analysis, management Investigations Law firms and Service Providers who support these entities

What We Do Utilize technology to understand and interpret documents (or files, or records, or streamed text, etc.) Probabilistic Hierarchical Context-Free Grammars Statistical Pattern-Matching Tag documents with as many attributes and indices as possible Analyze those tags, along with text, and other clues, to provide a disposition on documents Report the results in a variety of ways. This presentation centers around Tagging Ultimately, Valora is a Consulting Service Provider, utilizing our own, highly customized tools to deliver excellent, timely and highly cost-effective work product to our clients.

Many Levels of Analytics & Data Mining Multiple Documents Character @ Word Covenant Phrase Attorney-client privilege Line SKU 2465 @ $3.41 ea. Paragraph Page Document Cross-Population Sub-population Population

Valora’s Proprietary Technology REPORTING BlackCat, Relativity, .CSV … ANALYSIS/RULES Year Total, Hot Doc, Priv… INDEXING/TAGGING Date, Author, Patent # … Valora’s view of document review process Index Analysis Reporting Valora’s underlying technology Developed in 2001 Used in 1,000’s of projects Applied against millions of pages/documents over the years Works off extracted text or OCR Pattern Matching Algorithms Automated determination of document attributes (DNA) Doc Type Doc Date Doc Sentiment (neutral, hostile etc.) Doc Author PowerHouse

INDEXING/TAGGING DocType = Patent Application Date = 10/18/2007 Date Format = US DocType = Patent Application Date = 10/18/2007 Author = Patent Authors, Author City, Author Country Assignee = RIM Tone = Neutral to slightly positive Embedded Graphic with Title Other Data Capturable Data Elements: Patent Number Filing Date Key Phrases & Terms Managing PTO Implied/Attached Docs Bar Code Present And many more . . .

AutoCoding Defined AutoCoding is the application of software and technique to capture information about a document. Bibliographic Fields: Author, Recipient, CC/BCC, Date, Subject/Title, Document Type. Characteristic Fields: Draft, Confidential, Foreign Language, Pages Missing, Duplicate/NearDuplicate, Conversation Thread Many flavors of AutoCoding All use software to some degree All use OCR and extracted text from e-docs Generally accepted that AutoCoding is faster and lower cost than manual coding, but sometimes lower quality

How AutoCoding Works PowerHouse AutoIndexing AutoBusinessRules Analytics Database Prep Data is extracted from each document into a database table Docs enter the system as extracted or OCR’ed text

Litigation Document Review Manual Indexing/Tagging ANALYSIS/RULES Litigation Document Review Manual Determining Responsiveness The document should be marked responsive if any of the following conditions are present: Mentions or discusses the specific protocol for handling simultaneous voice and data actions Is a design document or graphic that shows the specific protocol for handling simultaneous voice and data actions Discusses or is related to patent ‘009 Mentions Apple Inc. or Apple Computers, Inc. or is a communication from/to anyone at Apple Computer, Inc., or apple.com. And so on… Rule: Responsive for Protocol Discussion When: [FullText] contains any of <Voice protocol key phrases 12> and [FullText] contains any of <Data protocol key phrases 25> and [DocType] is not any of [Brochure, Press Release, Website], ... Rule: Responsive for Patent ‘009 When: Any document in the Attachment Family matches: [FullText] contains any of <Patent '009 key phrase list 4>, or Parent of Attachment Family matches: Any of [Author, Recipient, CCs] contains any of <Patent '009 experts contact list 23>, … Rule: Responsive for Apple When: [FullText] contains (fuzzy match) any of <Apple key phrase list 7>, or Any of [Authors, Recipients, CCs] contains any of <Apple contact list 15>, or [Author] matches "*@apple.com“ … -7-

What would you like to know? Analysis/Rules REPORTING Numerous Reporting Options Hosting, Early Case Assessment & DataVisualization DataVisualization & Hosting in BlackCatTM Hosting in other industry platforms Load File Import Opticon/LFP, Summation DII .CSV, other delimited file Loading to proprietary platforms Render to File PDF, Excel, HTML Comprehensive Report What would you like to know?

Popular Valora Services AutoUnitization Ability to distinguish the beginning & end of documents, as well as determine which documents incorporate other documents as attachments AutoCoding Identify and label documents by type (balance sheet, tax form, memo, etc.), relevant people (authors, recipients, cc/bcc), date and subject/title. AutoReview Identify and labeling documents by groupings (dupes/near dupes, conversation threads, issues/clustering) and disposition (responsive, privileged, “hot,” etc.) AutoRedaction Ability to identify & markup documents to “black out” select information (such as PII – private identification information, patient data or privileged information) AutoTranslation Automatic translation of non-English documents to English text. Supports dozens of originating languages. DataVisualization Presentation of data in intuitive, graphical ways with easy navigation, understanding and manipulation of document subsets. Often used for Early Case Assessment. Hosting & Database Creation Hosting of pre- or post-processed documents and files in Valora’s BlackCat database or others (iConect, Relativity, etc.). Electronic File Processing (EFP) File Conversion to TIF/PDF format, text and metadata extraction, de-NISTing, cross-custodian de-duplication, filtering/culling, analytics OCR Optical Character Recognition for converting images to searchable text NearDuplicateDetection Identify documents that are highly similar, if not identical across custodians and the entire population. Includes cross-correlation of paper & electronic documents EmailThreadGrouping Join separated email conversation threads into a consistent stream from start to finish Scanning Image conversion for paper documents into electronic image format (TIF, PDF, JPEG, etc.) AutoBusinessRules Identify and label documents by workflow treatment, retention plans, compliance audit or other groupings. Professional Services Options for Project Management, Technical data/file manipulation, Subject Matter Expertise, Resources & Worfklow Design & Management Most services available as Auto, Auto+Manual “Hybrid,” and Manual-Only. Call for specifics.

Don’t take our word for it, take theirs…

Valora Technologies, Inc. Thank You! For More Information: Valora Technologies, Inc. 101 Great Road, Suite 220 Bedford, MA 01730 781.229.2265 www.valoratech.com info@valoratech.com