Knowledge Management Systems: Development and Applications Part II: Techniques and Examples Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Knowledge Management Systems: Development and Applications Part II: Techniques and Examples Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial.
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
Image Information Retrieval Shaw-Ming Yang IST 497E 12/05/02.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006.
Web Mining Research: A Survey
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
1 Information Retrieval and Web Search Introduction.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Information Visualization for Digital Library Hsinchun Chen McClelland Professor University of Arizona PI, NSF DLI-1, DLI-2
High-Performance Digital Library Classification Systems: PI: Hsinchun Chen, The University of Arizona From Information Retrieval to Knowledge Management.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Overview of Search Engines
Tamas Doszkocs, Ph.D. Computer Scientist Meta Searching and Clustering.
Chapter 11 Managing Knowledge. Dimensions of Knowledge.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June.
Dataware Products Direction Presented to BRS North American Users Group Meeting August 27th, 1999 Dave Schubmehl.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
IEEE Knowledge Media Networking KMN’02 Keynote Address, CRL, Kyoto Japan, July 11, 2002 Concept Switching in the Interspace: Networking Infrastructure.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
C11- Managing Knowledge.
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
The Internet 8th Edition Tutorial 4 Searching the Web.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Revolutionary System Models, The Net, & The Public Interest The Interspace Prototype ( ) Digital Libraries Initiative ( ) Worm Community.
Revolution & Kids: Building the Future of the Net & Understanding the Structures of the World Bruce R. Schatz CANIS - Community Systems Laboratory University.
Information Retrieval
IT and Network Organization Ecommerce. IT and Network Organization OPTIMIZING INTERNAL COLLABORATIONS IN NETWORK ORGANIZATIONS.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Semantic Interoperability for Geographic Information Systems Tobun Dorbin Ng Artificial Intelligence Lab The University of Arizona.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Search Engine Architecture
Information Retrieval and Web Search
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Personalized Social Image Recommendation
Information Retrieval and Web Search
Federated & Meta Search
Information Retrieval and Web Search
Taxonomies, Lexicons and Organizing Knowledge
CSE 635 Multimedia Information Retrieval
Web Mining Research: A Survey
Information Retrieval and Web Design
Information Retrieval and Web Search
Introduction to Search Engines
Presentation transcript:

Knowledge Management Systems: Development and Applications Part II: Techniques and Examples Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E- Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation 美國亞歷桑那大學, 陳炘鈞 博士 Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, NCSA, HP, SAP

Knowledge Management Systems: Overview

KMS Root: Intersection of IR and AI Information Retrieval (IR) and Gerald Salton Inverted Index, Boolean, and Probabilistic, 1970s Expert Systems, User Modeling and Natural Language Processing, 1980s Machine Learning for Information Retrieval, 1990s Internet Search Engines, late 1990s

KMS Root: Intersection of IR and AI Artificial Intelligence (AI) and Herbert Simon General Problem Solvers, 1970s Expert Systems, 1980s Machine Learning and Data Mining, 1990s Autonomous Agents, late 1990s

Representing Knowledge IR Approach Indexing and Subject Headings Dictionaries, Thesauri, and Classification Schemes AI Approach Cognitive Modeling Semantic Networks, Production Systems, Logic, Frames, and Ontologies

Knowledge Retrieval Vendor Direction (Source: GartnerGroup) grapeVINE Sovereign Hill CompassWare Intraspect KnowledgeX WiseWire Lycos Autonomy Perspecta Lotus Netscape* Technology Innovation Niche Players IR Leaders Verity Fulcrum Excalibur Dataware Microsoft Content Experience IDI Oracle Open Text Folio IBM InText PCDOCS Documentum Knowledge Retrieval NewBies Newbies: IR Leaders: Niche Players: Market Target * Not yet marketed

KM Software Vendors Ability to Execute Completeness of Vision Niche Players Visionaries Challengers Leaders Microsoft * Lotus * Dataware * * Verity * Excalibur Netscape * Documentum* * IBM Inference* Lycos/InMagic* CompassWare* KnowledgeX* SovereignHill* Semio* IDI* PCDOCS/* Fulcrum OpenText* Autonomy* GrapeVINE* * InXight WiseWire* *Intraspect

Competitive Analysis: Text Analysis

Competitive Analysis: Collection Creation

Competitive Analysis: Retrieval/Display

Knowledge Management Systems: Techniques

KMS Techniques: Linguistic analysis/NLP: identify key concepts (who/what/where…) Statistical/co-occurrence analysis: create automatic thesaurus, link analysis Statistical and neural networks clustering/categorization: identify similar documents/users/communities and create knowledge maps Visualization and HCI: tree/network, 1/2/3D, zooming/detail-in-context

KMS Techniques: Linguistic Analysis Word and inverted index: stemming, suffixes, morphological analysis, Boolean, proximity, range, fuzzy search Phrasal analysis: noun phrases, verb phrases, entity extraction, mutual information Sentence-level analysis: context-free grammar, transformational grammar Semantic analysis: semantic grammar, case-based reasoning, frame/script

Illinois DLI-1 project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Text Tokenization Part-of-speech-tagging Noun phrase generation Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Natural Language Processing Text Tokenization Part-of-speech-tagging Noun phrase generation Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

KMS Techniques: Statistical/Co- Occurrence Analysis Similarity functions: Jaccard, Cosine Weighting heuristics Bi-gram, tri-gram, N-gram Finite State Automata (FSA) Dictionaries and thesauri

Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Heuristic term weighting Weighted co-occurrence analysis Co-occurrence analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Co-occurrence analysis Heuristic term weighting Weighted co-occurrence analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

KMS Techniques: Clustering/Categorization Hierarchical clustering: single-link, multi- link, Ward’s Statistical clustering: multi-dimensional scaling (MDS), factor analysis Neural network clustering: self-organizing map (SOM) Ontologies: directories, classification schemes

Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Document clustering Category labeling Optimization and parallelization Co-occurrence analysisNeural Network Analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Neural Network Analysis Document clustering Category labeling Optimization and parallelization Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

KMS Techniques: Visualization/HCI Structures: trees/hierarchies, networks Dimensions: 1D, 2D, 2.5D, 3D, N-D (glyphs) Interactions: zooming, spotlight, fisheye views, fractal views

Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing 1D: alphabetic listing of categories 2D: semantic map listing of categories 3D: interactive, helicopter fly- through using VRML Co-occurrence analysisNeural Network AnalysisAdvanced Visualization Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Advanced Visualization 1D, 2D, 3D Techniques Automatic Generation of CL:

Entity Extraction and Co-reference based on TREC and MUG Visualization techniques based on Fisheye, Fractal, and Spotlight Text segmentation and summarization based on Textile and Wavelets Advanced Techniques Automatic Generation of CL: (Continued)

Lexicon-enhanced indexing (e.g., UMLS Specialist Lexicon) Ontology-enhanced semantic tagging (e.g., UMLS Semantic Nets) Ontology-enhanced query expansion (e.g., WordNet, UMLS Metathesaurus) Advanced Techniques Integration of CL: Spreading-activation based term suggestion (e.g., Hopfield net)

YAHOO vs. OOHAY: YAHOO: manual, high-precision OOHAY: automatic, high-recall Acknowledgements: NSF, NIH, NLM, NIJ, DARPA

Knowledge Computing Approach From YAHOO! To OOHAY? YAHOO! AHYOO AHYOO AHYOO AHYOO AHYOO AHYOO AHYOO O O HA Y ? O bject O riented H ierarchical A utomatic Y ellowpage

Knowledge Management Systems: Examples

Web Analysis (1M): Web pages, spidering, noun phrasing, categorization

OOHAY : Visualizing the Web Arizona DLI-2 project: “From Interspace to OOHAY?” Research goal: automatic and dynamic categorization and visualization of ALL the web pages in US (and the world, later) Technologies: OOHAY techniques Multi-threaded spiders for web page collectionHigh-precision web page noun phrasing and entity identificationMulti-layered, parallel, automatic web page topic directory/hierarchy generation Dynamic web search result summarization and visualizationAdaptive, 3D web-based visualization Research Status

MUSIC ROCK OOHAY : Visualizing the Web … 50 6 Research Status

Lessons Learned: Web pages are noisy: need filtering Spidering needs help: domain lexicons, multi-threads SOM is computational feasible for large- scale application SOM performance for web pages = 50% Web knowledge map (directory) is interesting for browsing, not for searching Techniques applicable to Intranet and marketing intelligence

News Classification (1M): Chinese news content, mutual information indexing, PAT tree, categorization

Lessons Learned: News readers are not knowledge workers News articles are professionally written and precise. SOM performance for news articles = 85% Statistical indexing techniques perform well for Chinese documents Corporate users may need multiple sources and dynamic search help Techniques applicable to eCommerce (eCatalogs) and ePortal

Personal Agents (1K): Web spidering, meta searching, noun phrasing, dynamic categorization

2. Search results from spiders are displayed dynamically 1. Enter Starting URLs and Key Phrases to be searched OOHAY : CI Spider For project information and free download: Research Status

2. Search results from spiders are displayed dynamically 1. Enter Starting URLs and Key Phrases to be searched OOHAY : CI Spider, Meta Spider, Med Spider For project information and free download: Research Status

OOHAY : Meta Spider, News Spider, Cancer Spider For project information and free download:

4. SOM is generated based on the phrases selected. Steps 3 and 4 can be done in iterations to refine the results. 3. Noun Phrases are extracted from the web ages and user can selected preferred phrases for further summarization. OOHAY : CI Spider, Meta Spider, Med Spider For project information and free download: Research Status

Lessons Learned: Meta spidering is useful for information consolidation Noun phrasing is useful for topic classification (dynamic folders) SOM usefulness is suspect for small collections Knowledge workers like personalization, client searching, and collaborative information sharing Corporate users need multiple sources and dynamic search help Techniques applicable to marketing and competitive analyses

CRM Data Analysis (5K): Call center Q/A, noun phrasing, dynamic categorization, problem analysis, agent assistance

Lessons Learned: Call center data are noisy: typos and errors Noun phrasing useful for Q/A classification Q/A classification could identify problem areas Q/A classification could improve agent productivity: , online chat, and VoIP Q/A classification could improve new agent training Techniques applicable to virtual call center and CRM applications

Newsgroup Categorization (1K): Workgroup communication, noun phrasing, dynamic categorization, glyphs visualization

Thread Disadvantages: No sub-topic identification Difficult to identify experts Difficult to learn participants’ attitude toward the community

Thread Representation Time Message Person Length of Time

People Representation Time Message Thread Length of Time

Visual Effects: Thickness = how active a subtopic is Length in x- dimension = the time duration of a sub-topic

Proposed Interface (Interaction Summary) Visual Effects: Healthy sub- garden with many blooming high flowers = popular active sub-topic A long, blooming flower is a healthy thread

Proposed Interface (Expert Indicator) Visual Effects: Healthy sub- garden with many blooming high flowers = popular sub-topic A long, blooming people flower is a recognized expert.

Lessons Learned: P1000: A picture is indeed worth 1000 words Expert identification is critical for KM support Glyphs are powerful for capturing multi- dimensional data Techniques applicable to collaborative applications, e.g., , online chats, newsgroup, and such

GIS Multimedia Data Mining (10GBs): Geoscience data, texture image indexing, multimedia content

Airphoto analysis: Texture (Gabor filter)

AVHRR satellite data: Temperature/vegetation

Lessons Learned: Image analysis techniques are application dependent (unlike text analysis) Image killer apps not found yet Multimedia applications require integration of data, text, and image mining techniques Multimedia KMS not ready for prime-time consumption yet