Download presentation
Presentation is loading. Please wait.
1
NITISH MANOCHA
2
Platforms §AIX workstation §OS/390 §Sun Solaris §Windows NT
3
Tools to Use §Topic categorization tool l Categorizing emails l Categorizing Web Pages
4
Text Analysis Tool §Topic Categorization Tool
5
Text Analysis Tool §Topic Categorization Tool l Category 1 (AI Schedule)
6
Text Analysis Tool l Category2 (Database Schedule)
7
Text Analysis Tool §Target Category ( Data Mining Schedule)
8
Text Analysis Tool §Result - Category 2 (Databases)
9
Tools to Use §Clustering Tool (Finding Similar Information) l Dividing Documents into Groups l Identifying hidden similarities in documents l Identifying duplicate documents from a collection l Finding Documents that are out of place
10
Text Analysis Tool §Hierarchical Clustering - imzhclst
11
Text Analysis Tool §Binary Clustering - imzcrlst
12
Text Analysis Tool §Results
13
Text Analysis Tool §Results
14
Tools to Use §Feature Extraction Tool l Name Extraction l Abbreviation Extraction l Relation Extraction
15
Text Analysis Tool §Using Feature Extraction tool to extract names l imzxrun -b 2 -f C -x n -o faculty.out faculty.htm
16
Text Analysis Tool
17
Tools to Use §Language Identification Tool l Organize collection of documents by language l Restrict Search Results to documents in a particular language
18
Text Analysis Tool §Using Language Identification tool l imzlgini -b 2 -v < mydoc.htm
19
Text Analysis Tool §Language Identification Tool Results l Supports 13 Languages, New Languages Can be trained
20
Text Analysis Tool §Using Summarizer tool l imzsum -l 4 project.html
21
Text Analysis Tool §Summarizer tool - Results
22
Tools to Use §Web Crawler l Follows the Link topology for a fast search l Produces a Web Site Map l Use to Recognize the Authoritative pages l Provides a filtered collection of pages
23
Web Crawler §imyclean - to define a web space l Created include.re, exclude.re, types.re §imycrawl - to crawl a defined web space l imycrawl url webspace §imystat - to track what happens during a crawl
24
Tools to Use §Text Search Engine l Complicated Text Search l Powerful Linguistic Capabilities l Fuzzy searches l Query based on structure of document
25
Text Search Engine §Operates on a Previously based index
26
Text Search Engine §Types of Index l Linguistic Index (bought as buy) l Feature Index (Linguistics + Names) l Precise Index (bought as bought) l Normalized Precise Index (Case Insensitive) l Ngram Index
27
Combining Tools for Solutions §Searching with Categories l combining Text Search Engine and Topic Categorization Tool §Surviving a flood of email l by using Topic Categorization Tools §Selectively indexing Web Pages l by combining Web Crawler, Topic Categorization Tool & Text Search Engine
28
Views of the Tool §Command Line (Good for Unix) §Not very useful on Windows NT §Not a good stand-alone Tool §Should be viewed as a Library
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.