Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.

Similar presentations


Presentation on theme: "Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc."— Presentation transcript:

1 Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.

2 Industry in change Technology changes Evolving standards Mergers New buzzwords Hard to tell what is real

3 Popular Misconceptions Computers can do it all No need to index No need for thesauri or subject headings Full text gives all we need Automatic full text User friendly search engines Search engines are indexes User profiles provide the right context Data filters give right answers

4 Some of it is true What can we use? Automatic - semi - classification Depends….. Size of collection Cost of the effort

5 What’s in?? Taxonomies –thesauri –hierarchies - classification –categorization –browsing Wellformedness Bricks and mortar, i.e., profit

6 Options for Access/Control Keep track of the input –Thesaurus –Authority file Maximize the access –Search engine –Browse list Power of the word –McCain

7 What do we need? The basics... Authority file –People, places, things Taxonomy –Thesaurus* with authority file or document instance “Automatic” Classification

8 Thesaurus Construction Parts of a whole Noun and noun phrases People, places, things Actions and reactions Concepts and processes

9 Term Records - Thesaurus - format Main Entries Top Terms - TT Broader Terms - BT Narrower Terms - NT Scope Notes - SN History - HI Date Term - added/changed - DA

10 Thesaurus - Format Related Terms - RT See - S See Also - SA Use - U Use For - UF “Wellformedness” = W3C

11 What are the parts? Natural Language Processing Term forms Term Relationships Term Associations

12 Natural Language Processing Morphological Lexical Analysis Syntactic Numerical Phraseological Semantic Analysis Pragmatic

13 Seven Major Parts of NLP 1. Morphological – plural – past tense to present

14 Seven Major Parts of NLP 2. Lexical Analysis – part of speech tagging 3. Syntactic analysis – non phrase id –proper name boundary

15 Seven Major Parts of NLP 4. Numeric concept boundary 5. Semantic analysis –Proper name concept categorization –Numeric concept categorization –Semantic relation extraction 6. Phraseological - discourse analysis –Text structure identification

16 Seven Major Parts of NLP 7. Pragmatic analysis –Cause and effect relationships –Nurse and nursing –Common sense reasoning (buy  possess) –Who has x ? –These are the people who brought you.....

17 Say it another way Term standardization Term forms Term relationships Term associations Rule building / domain creation

18 Word Standardization Split out chemical & drug terms – Separates chemical & drug terms for special treatment Split out homonyms, non-English terms, and authority terms – Separates objects, proper names, place names, and dates for special treatment Run spelling standardization program – Identifies variant spellings

19 Word Standardization Run word standardization program – ie, ing, -ed, -s, es, pre-, non-, and “-” Match preferred terms and synonyms

20 Term Forms Noun Adjective Verb, adverb Singular, plural Initial articles Spelling variants

21 Term Forms Punctuation Capitalization Abbreviations

22 Term Relationships Generic Hierarchical Systematic Alphabetic Instance Poly-hierarchical

23 Term Associations Cross references All and some rule Associative terms Related terms

24 “Rule building”* process Put terms in context Group like categories Consider relationships Standardize variants Meld to a single concept rule How much is really automatic???

25 Domains Taxonomy Term Record - thesaurus Hierarchical Browse-able list Handout in Booth 150

26 What else can we have? Proximity Stemming (lemmatization) Truncation Statistical clustering Bayesian and others

27 Other terms and tools Neural networks Word normalization Lexical (word) networks Distance mapping Pattern recognition

28 Moving toward the search engines Term weighting Frequency counts Relevance Precision Recall

29 Classification of Evolving model… Noun Extractors Rule Based Systems Semantic Processors Fuzzy Search Systems Filtering Systems “Automatic Classification Systems”

30 (Semi) Automatic Indexing Basic theories Thesaurus construction Natural language processing Domain specific

31 Noun extractors Noun Extractors Use stop word list and frequency counts –Semio –Word Perfect 5.0 –Recon Prebuilt domains –Autonomy –Net Owl –Newsindexer

32 Rules Based Systems Rule Based –Data Harmony –API –DTIC –Mapit

33 Semantic Processors Synth Bank n-Stein - expected Quiver - beta

34 Fuzzy Search Systems Dr. Link Sovereign Hill

35 Filtering Systems Screaming Media Data Harmony

36 New Directions Topic Maps - TAO –Topic –Associations –Occurrences Relational Indexing Index Visualization Based on term records Add the search engines….

37 What’s a user to do? Enjoy the presentation What about a database producer? –Look the options, –Build from the basics –Evaluate the new tools –See it work before you buy

38 Give me your card I will email the presentation tonight

39 Thank You Marjorie M.K. Hlava President, Access Innovations, Inc. www.accessinn.com Chairman, Data Harmony mhlava@accessinn.com 505-998-0800 Booth 150


Download ppt "Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc."

Similar presentations


Ads by Google