Download presentation
1
Data mining in web applications
Web mining Data mining in web applications
2
1. What is Data mining? Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD).
3
2. What is Web mining? Web mining is the use of data mining techniques to automatically discover and extract information from Web documents and services.
4
3. Data mining versus Web mining
Scale - In traditional data mining, processing 1 million records from a database would be large job. In web mining, even 10 million pages wouldn’t be a big number. Access – When doing data mining of corporate information, the data is private and often requires access rights to read. For web mining, the data is public and rarely requires access rights. Structure – A traditional data mining task gets information from a database, which provides some level of explicit structure. A typical web mining task is processing unstructured or semi-structured data from web pages.
5
4. Types Web usage mining - from server logs and Web browser activity tracking. Web structure mining - from links between pages, people and other data. Web content mining - for the data found on Web pages and inside of documents.
6
4.1. Web usage mining Web Usage Mining is the application of data mining techniques to discover interesting usage patterns from Web data in order to understand and better serve the needs of Web-based applications. Usage data captures the identity or origin of Web users along with their browsing behavior at a Web site.
7
4.2. Web structure mining Web structure mining is the process of using graph theory to analyze the node and connection structure of a website. According to the type of web structural data, web structure mining can be divided into two kinds: Extracting patterns from hyperlinks in the web: a hyperlink is a structural component that connects the web page to a different location. Mining the document structure: analysis of the tree-like structure of page structures to describe HTML or XML tag usage. Facts: PageRank by Larry Page(Google) – a page is important if many important pages link to it.
8
4.3. Web content mining Web content mining is the mining, extraction and integration of useful data, information and knowledge from Web page content. How to do? Collect – fetch the content from the Web Parse – extract usable data from formatted data (HTML, PDF, etc) Analyze – tokenize, rate, classify, cluster, filter, sort, etc. Produce – turn the results of analysis into something useful (report, search index, etc)
9
5. Crawling the web A Web crawler is an Internet bot which systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering). Web search engines and some other sites use Web crawling or spidering software to update their web content or indices of others sites' web content. Including a robots.txt file can request bots to index only parts of a website, or nothing at all.
10
6. Application areas of web mining
E-commerce: personalized marketing; Fight against terrorism: classify threats; Prediction; And others :)
11
7. Future research directions
Multimedia data mining: a picture is worth a thousand words; Multilingual knowledge extraction: web page translations; Semantic web mining.
12
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.