Tagging documents made easy, using machine learning

Slides:



Advertisements
Similar presentations
Support.ebsco.com Australia/New Zealand Reference Centre Basic Searching Tutorial.
Advertisements

Use Watch folders to automatically add PDFs to Mendeley Desktop.
Welcome to informaworld TM. The following demo will show you just a few of the features on informaworld TM. Please select where you would like start. ePublication.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
R EALLY [ ] S TRATEGIES It’s all about the content XML That Pays Off for Your Content Database “It’s all about the content.” Lisa Bos
Simfund Filing Training Introduction First Look Step by Step Training.
2 President, Susan Hanley LLC Led national Portals, Management Collaboration, and Content practice for Dell Director of Knowledge Management at American.
FPDS- NG Reports Overview December 16, Today’s Goals Provide an overview of the FPDS-NG reporting capability Demonstrate each of the reporting tools.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Support.ebsco.com Basic EBSCOhost Searching for Public Libraries Tutorial.
Support.ebsco.com Basic Searching for K-12 School Libraries Tutorial.
QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--) This PowerPoint 2007 template produces a 36x48 inch professional poster. You can use it to create.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
SharePoint, The Semantic Web, Serendipity, Search & Metadata.
The Internet 8th Edition Tutorial 4 Searching the Web.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Pulse for TM1 Version 4 New features, improvements and lots more.
UNIT-II Principles of dimensional modeling
Tutorial support.ebsco.com Core Collections Complete.
Database Management Systems (DBMS)
Managing Documents the Right Way IA354 Amanda Murphy.
CPSC 203 Introduction to Computers T97 By Jie (Jeff) Gao.
Organize. Collaborate. Discover. 1 Introduction to Mendeley.
© 2012 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the U.S.
June 23–24, 2016 Hyatt Regency Boston Harbor User Group Meeting and Conference Helpful Tools and Utilities that may surprise you Rachael LeBlanc, Solution.
Laserfiche Business Process Library: Jumpstart Business Process Automation Brandon Buccowich, Technical Marketing Engineer | Katie Fries, Presales Consultant.
Metataxis Can you really implement taxonomies in native SharePoint? Marc Stephenson March 2017.
JOB EVALUATION MAGNETIC CONTACTORS 1/26/2018.
Business Searching Interface
Project Management: Messages
BIM 360 Glue Migration to BIM 360 Account Administration (HQ)
Presentation on Software Requirements Submitted by
Types of Search Questions
IBM Rational Rhapsody Advanced Systems Training v7.5
Automation of systematic reviews: the reviewer’s viewpoint
SHAREPOINT METADATA & TAXONOMIES AUTOMATED
Abstract descriptions of systems whose requirements are being analysed
Introduction With TimeCard users can tag SharePoint events with information that converts them into time sheets. This way they can report.
Best Practices for IW Document Management with SharePoint 2007
Kodak Alaris Sales Information Library User Training
Basic Searching for K-12 School Libraries
Code Tax: Programming With The Taxonomy API In SharePoint 2010
Taxonomies, Lexicons and Organizing Knowledge
Reserved for Intro Picture
What is a Database and Why Use One?
Strategies for Researching Information Online
How to Cure Those Digital Adoption Blues: Oracle Guided Learning
Search Techniques and Advanced tools for Researchers
Your Finance Cloud End User Adoption and Enablement Starts Here
Business Searching Interface
EBSCOhost Basic Searching for Academic Libraries
Introduction into Knowledge and information
Employee Change Process
WISER Humanities: Keeping up to date
One Language. One Enterprise.™
Using Subscription Databases
Document Tagger for SharePoint Libraries
Inside a PMI Online Course
Building and Integrating a Chatbot in 30 minutes
MEDLINE with Full Text Searching
5.00 Apply procedures to organize content by using Dreamweaver. (22%)
Semantic Wikis Expedition #52 Conor Shankey CEO July 18, 2006
Text Mining Application Programming Chapter 9 Text Categorization
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
SysKit Security Manager
The photo app every contractor & supplier needs
Presentation transcript:

Tagging documents made easy, using machine learning Brendan Clarke brendan@termset.com www.termSet.com

PART ONE – APPROACHES FOR BUILDING TAXONOMIES

Defines top level containers and work downwards. TOP DOWN - APPROCH Defines top level containers and work downwards. Usually broad (3-10 wide) and shallow (3-4 deep) Simple, high level classification (functional) A top down approach defines containers for terms, usually starting with some global taxonomies such as locations, departments or products (used throughout the business). Lots of level 1 and 2 term sets that define the function of the document. For example, Departments -> HR Level 3 may begins to define the content itself, for example Departments -> HR -> Policy Documents Works well to classify content into the right areas. This is functional classification.

Manually defined or replicated from existing structures TOP DOWN – TERMS Manually defined or replicated from existing structures Imported from other systems Industry standards / purchased taxonomies Often terms are defined by committees who involve specialist groups to define terms Line of business systems or databases may contain data that can be imported (http://www.termset.com/blog/2016/8/25/loading-metadata-terms-into-sharepoint-using-powershell) SKOS is an interesting for advanced taxonomies (https://www.w3.org/2001/sw/wiki/SKOS/Datasets), WAND is off the shelf (http://www.wandinc.com/wand-taxonomy-library-portal.aspx)

People / Committee Driven approach TOP DOWN – SUMMARY People / Committee Driven approach Some guesswork of what terms should be Simple, high level classification (functional) – Way better than folders! The challenge with deciding terms without looking at your documents is that it will be guesswork to know what would be effective. That said, a simple top down taxonomy is 10x better than a folder structure. No duplication as documents can be tagged within multiple areas.

Terms driven by the words and phrases within your content BOTTOM UP - APPROCH Terms driven by the words and phrases within your content More complex taxonomies Detailed, accurate terms that are subject or facet level Bottom up means looking at the information you have in your content (usually documents and e-mails) and building taxonomies that are based on how you actually describe information. Bottom up results in a taxonomy that can describe the subject or facet of the document.

Manual analysis of the documents BOTTOM UP - TERMS Manual analysis of the documents Statistical analysis of terms and phrases Natural Language processing How long does it take for people to read and process documents: http://www.termset.com/calc/ Getting a working team of people to actually read documents is time consuming and expensive, but sometimes if the information is valuable it may be worth it. There are tools that can analyse the frequency of works or phrases in your documents. They can be highly effective but need a lot of consultancy to make sense of the results. NLP is the future of text analysis (more later).

BOTTOM UP - SUMMARY Technology driven approach (or a very tough people process) Produces detailed taxonomies that reflect the actual content Extra granulation of tagging A bottom up approach can be used to describe the contents of the documents (not just the area)

AND THE WINNER IS… Combining top down and bottom up is the best approach Top down classifies the type of documents Bottom up classifies the subject of the document New technology allows bottom up to be realistic

Builds taxonomies (bottom up) using NLP Applies tags TermSet adds accurate consistent metadata without placing any burden on end users or your IT team. Builds taxonomies (bottom up) using NLP Applies tags Metadata as a service TM TermSet has a different approach.  It manages every step of adding metadata to your SharePoint content.  Projects can be completed in days or weeks instead of months or years. The application uses machine learning that can build over 400 taxonomies that relate to your data. You can also easily train it to apply tags that are important to you. A full list of features is available at http://www.termset.com/platform/

WHAT EXACTLY IS NLP ? Natural language processing is at the core of TermSet. We have an engine trained to recognise entities within documents. (First Click) This a BBC news article, when our engine reads the text it identifies entitles such as people, locations and organisations. (Second Click) In fact, we identify a vast array of information inside the documents including concepts, sentiment and relationships.

DEMO – CREATING TERMS FROM YOUR DOCUMENTS USING NLP

PART TWO – APPLYING YOUR TAGS

MANUAL TAGGING Adoption problem Asbestos problem / GIGO Challenging to do retrospectively (migration tools can help) Every time you add a field that needs to be completed in order to save a document you are impeding adoption of a new DMS If you do mandate fields, many users will pick the first on the list or just randomly pick anything in order to save the document What do you do with the 1 million documents that came from a file share (or any other source without metadata)?

MANUAL TAGGING Infer as many terms as possible from: Document types, Location, Function Mandate as few tags as possible Stay shallow or flat with hierarchies Manually tagging new content can work well. Always use default values to answer as many questions before the user is involved (infer the metadata wherever possible). Keeping it simple is a good plan. Single lookup columns may be better than deep hierarchies.

MACHINE TAGGING Simple machine tagging can use search to match taxonomy terms to the content of documents More advanced taggers allow rules or weights to be assigned to each tag (tags not context aware) New technologies (NLP) provide a new approach to creating taxonomies There are a number of taggers for SharePoint that will look at your documents and apply tags from a taxonomy that you have defined Some tagggers ask for rules to be defined for each term (can work well, takes forever to get right).

TERMSET TAGGING TermSet recommends the right taxonomies for each library (context aware tagging) TermSet automates building the underlying IA in SharePoint Extra cool NLP tags can be added (Summaries, Sentiment and Language) Monitors for new documents and terms arriving into your world

DEMO – TAGGING DOCUMENTS

WRAP UP TermSet automates a bottom up approach to create and use taxonomies for SharePoint Visit www.termset.com or e-mail brendan@termset.com for a free licence Visit www.termset.com or e-mail brendan@termset.com for a free licence If you need assistance with top down taxonomies or you use a different DMS please e-mail me to join the beta program for www.taxononica.com