Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Organization: Overview

Similar presentations


Presentation on theme: "Information Organization: Overview"— Presentation transcript:

1 Information Organization: Overview

2 IO: What What is Information Organization?
Systematic arrangement of items group similar items together assign meaning to groups determine relationships between groups assign items to groups Grouping 1 Grouping 2 Grouping 3 Big Small Square Circle Blue Red Small Big Small Big Small Big Small Big Blue Red Square Circle Square Circle Search Engine

3 IO: Why Why organize information?
Why do we put certain things in certain places? Closet - Seasonal groups - Pants vs. Shirts - Color groups - Favorite vs. non-favorite To find things easier → Information Retrieval (IR) Taxonomy Food Good Bad sweet taste smell like milk too hot hard to chew To make sense of the world → Knowledge Discovery (KD) Search Engine

4 IO: How What to do when information to organize is massive?
How do we organize information? General Approach anticipate how item is searched for e.g. by subject, date, author look for common features among items determine what an item is about Classification Identification/creation of classes Assignment of items into classes Clustering group similar items together What to do when information to organize is massive? 10,000 books 100,000 journal papers 1,000,000 web pages Search Engine

5 Machine Learning: Introduction
What is Machine Learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997). Any change in a system that allows it to perform better the second time on repetition of the same task or on task drawn from the same population (H. Simon, 1983). How can systems improve? By acquiring new knowledge Acquiring new facts Acquiring new skills By adapting its behavior Solving problems more accurately Solving problems more efficiently Search Engine

6 Machine Learning: Introduction
Which is different? Which are similar? How is learning possible? Because there are regularities in the world. Search Engine

7 ML: Classification vs. Clustering
Task is to learn to assign instances to predefined classes Supervised Learning data has to specify what we are trying to learn (the classes) requires training data predefined classes and classified items Clustering Task is to learn a classification from the data no predefined classification is required Unsupervised Learning data doesn’t specify what we are trying to learn (the clusters) Clustering algorithms divide a data set into natural groups (clusters) items in the same cluster are similar to each other and share certain properties Search Engine

8 IO for IR Clustering Document Clustering Cluster Hypothesis
Documents having similar contents tend to be relevant to the same query Rank clusters by Query-Cluster Similarity Cluster documents based on vector similarity Post-retrieval clustering Scatter-Gather Keyword Clustering Automatic Thesaurus Construction Query Expansion Search Engine

9 IO for IR Classification Document Categorization
classify documents into manually defined categories supports hierarchical browsing, query expansion via relevance feedback Document Indexing assign keywords to documents automatic indexing with controlled vocabulary, metadata generation Document Filtering e.g. news delivery, spam filtering Query Classification collection selection algorithm selection Search Engine


Download ppt "Information Organization: Overview"

Similar presentations


Ads by Google