1 ROI & Impact: Quantitative & Qualitative Measures for Taxonomies Wednesday, 11 February 2009 12:00 – 12:30 PM MST Presented by Jay Ven Eman, Ph.D., CEO Access Innovations, Inc. / Data Harmony 505.998.0800 / / DHUG 2009

2 First, some questions  Do you know what a taxonomy is?  Does your boss’s boss know? Care?  What are YOU trying to accomplish?  What are your objectives?  What isn’t working? What is? How badly? How much? Who? Where? Copyright  2007 Access Innovations, Inc.

3 First, some questions - 2  Who are your searchers? Internal? Intranet? External? Web? Fee based (commercial)? How many? What do they do? How do they do it?  What are they seeking?  Why? Copyright  2007 Access Innovations, Inc.

4 First, some questions - 3  Where are they looking?  How many searching environments? Physical? Internal resources? External resources? Search interfaces?  And so on… Copyright  2007 Access Innovations, Inc.

5 “Meaning” starts with a knowledge organization system (KOS)  Uncontrolled list  Name authority file  Synonym set/ring  Controlled vocabulary  Taxonomy  Thesaurus Not complex - $ Highly complex - $$$$ LOTS OF OVERLAP! Topic Map Ontology SKOS

6 The Pain of Search Copyright  2007 Access Innovations, Inc. The Pain of Search Percent Number of Employees Search & Use Timel Per Week Time Searching Per Week Time Analysing Per Week Average Loaded Salary Annual Cost of Looking Search Time ReductionDifference Mission critical1000Hours $ Per Hour10% High10100148.45.62008,736,0007,862,400873,600 Medium80800127.24.815044,928,00040,435,2004,492,800 Low1010010641003,120,0002,808,000312,000 $56,784,000$51,105,600$5,678,400

7 ROI - Segments  Cost of taxonomy system  Indexing costs  Cost of getting system ready  Ongoing maintenance  Increased efficiency  Increased quality of retrieval  Cost of legacy system maintenance

8 Copyright © 2005 Access Innovations, Inc. Taxonomy construction ProcessTerms/hr# of terms Cost/hrCost From scratch45000$75$93,750 License0 - 100K License & customize650007562,500+ 5,000 Auto- generate/cleanup + tool 650007562,500+ 100,000 Mapping850007546,875

9 Indexing & Search Metrics  Hit, Miss, Noise  Subjective Relevance Aboutness  Statistical Precision Recall Level of effort

10 Hit, Miss, Noise  Hit – exactly what a human indexer would use  Miss – human indexer would use but system did not assign  Noise – system assigned but human did not Relevant noise – could have been assigned Irrelevant noise – just plain wrong

11 Subjective  Relevance Reflects how akin it is to the users request  Aboutness Reflects the topical match between the document content and the term How well the topic describes what the document is about  Varies with level of conceptual terms vs. factual terms in the thesaurus

12 Subjective  “There is now a 92% accuracy rating accuracy on accounting and regulatory document search based on hit, miss and noise or relevance, precision and recall statistics…Access Innovations.” USGAO  “IEEE had their system up and running in three days, in full production in less than two weeks.” Institute of Electrical and Electronics Engineers (IEEE)  “The American Economic Association said its editors think using it is fun and makes time fly!” American Economic Association (AEA)  “ ProQuest CSA have achieved a 7 fold increase in productivity – thus they have four licenses.” ProQuest CSA  “Weather Channel finds things 50% faster using Data Harmony. A significant saving in time.” The Weather Channel

13 Statistical  Precision Correct retrieval / Total retrieval Hits / hits + noise  Recall Correct retrieval / Total correct in system Hits / Hits + misses  Level of effort Hits / Hits + misses + noise

14 Cost Goals  Cost Savings Software/hardware More efficient delivery systems Retirement of legacy systems  Cost Avoidance Additional staff not needed to scale Lower training costs

15 Productivity Goals  Productivity gains Employee productivity – fourfold Get up to speed faster Learn vocabulary faster Able to capture peoples knowledge in the rule base Staff savings / redeployment Elimination of new hires

16 Additional Benefits  Revenue Generation Higher hit rates  More purchases off the site  Competitive advantage Shorter product / sales cycles Faster implementation Better search experience  Ability to meet regulatory requirements

17 Go – No Go  Reach 85% precision to launch for productivity - assisted  Reach 85% for filtering or categorization Sorting for production  Level of effort to get to 85%  Integration into the workflow is efficient

18 Benchmarks  15 – 20% irrelevant returns / noise  Amount of work needed to achieve 85% level  How good is good enough? Satisfice = satisfaction + suffice How much error can you put up with?

19 Example ROI Calculation  Assume – 5,000 term thesaurus 1.5 synonyms per terms 7,500 terms total  Assume 85% accuracy Use assisted for indexing Use automatically for filtering  Assume $75 per hour for staff  Assume 10,000 records for test batch

20 Indexing costs with Data Harmony  80% of rules built automatically  7,500 x.8 = 6,000  20% require complex rules Average rule takes 5 minutes (Actually MUCH faster using M.A.I. GUI) 5 x 1,500 = 7,500 minutes 125 hours x $75 = $9,375

21 Indexing Costs  Base cost of MAIstro EE - $60,000  Cost of getting system ready Programming support and integration  Estimated at 2 weeks programming $125 / hour = $10,000 Rule building  Estimated at 125 hours $75 / hour = $9,375  Possible need to re-run training set several times  Ongoing maintenance Estimated at 15% of purchase price for license = $9,000 Rule building for new terms 50 terms per quarter  200 terms x.8 = 160 automatic  40 at 5 minutes per term = 200 minutes /60 = 3.33 hours x $75 = $250  Targeted initial accuracy at 85%

22 Indexing costs  Year one $60,000 + $10,000 + $9,375 = $79,375  Years thereafter 9000 + 250 = $9250  85% accuracy

23 ROI  Taxonomy costs = $67,500  Indexing costs = $79,375  Pain of search – difference = $5,678,400  If off by factor of 4, then a positive ROI of 241% Copyright  2007 Access Innovations, Inc.

