Download presentation
Presentation is loading. Please wait.
1
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
4
Information Retrieval
6
Create a list of words Remove stop words Stem words Calculate frequency of each stemmed word Figure 2.1 Transforming text document to a weighted list of keywords
8
Data Mining has emerged as one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them. Commercial enterprises have been quick to recognize the value of this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $10 billion. Data mining refers to a family of techniques used to detect interesting nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store, manage and assimilate data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.
16
Figure 2.43 Relationship between precision and recall
18
Semantic Web
19
Semantic Web The layer language model (Berners-Lee, 2001; Broekstra et al, 2001)
22
Figure 3.4 Representing classes and instances (Noy et al., 2001)
26
Queries 1 and 2
27
Queries 3 and 4
32
A RDF model for automobiles
35
Classification and Association
36
Data Preparation Database Theory SQL Data Transformation http://www.ecn.purdue.edu/KDDCUP/data/
37
Classification Find a rule, a formula, or black box classifier for organizing data into classes. –Classify clients requesting loans into categories based on the likelihood of repayment –Classify customers into Big or Moderate Spenders based on what they buy –Classify the customers into loyal, semi-loyal, infrequent based on the products they buy The classifier is developed from the data in the training set The reliability of the classifier is evaluated using the test set of data
38
Classification ID3 Algorithm –Numerical Illustration –Application to a Small E-commerce Dataset C4.5 for Experimentation Other approaches –Neural Networks –Fuzzy Classification –Rough Set Theory
39
Association Market basket analysis –determine which things go together Transactions might reveal that –customers who buy banana also buy candles –cheese and pickled onions seem to occur frequently in a shopping cart Information can be used for –arranging a physical shop or structuring the Web site –for targeted advertising campaign
40
Association Apriori Algorithm Demonstration for an E-commerce Application
41
Clustering
42
Breaks a large database into different subgroups or clusters Unlike classification there are no predefined classes The clusters are put together on the basis of similarity to each other The data miners determine whether the clusters offer any useful insight
44
Statistical Methods k – means –Numerical Example –Implementation Data Preparation Clustering Other Methods
45
Neural Network Based Approaches Kohonen Self Organising Maps –Numerical Demonstration –Application to Web Data Collection Other Neural Network Based Approaches
46
Clustering of customers
48
Web Usage Mining
49
High level web usage mining process (Srivastava et al., 2000)
50
Applications of web usage mining (Romanko, 2006; Srivastava et al., 2000)
67
Clustering exercise
70
Classification exercise
71
Association exercise
73
Sequence Pattern Analysis of Web Logs
77
Web Content Mining
78
Data Collection Web Crawlers Public Domain Web Crawlers An Implementation of a Web Crawler
79
Architecture of a search engine (Romanko, 2006)
83
Other topics in Web Content Mining Search Engines –How to prepare for and setup a search engine –Types and listings of search engines (freeware, remote hosting services, commercial) Multimedia Information Retrieval
84
Web Structure Mining
86
http://www.iprcom.com/papers/pagerank/
91
Index quality for different search engines (Henzinger, et al., 1999)
92
Index quality per page for different search engines (Henzinger, et al., 1999)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.