Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014)

Similar presentations


Presentation on theme: "Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014)"— Presentation transcript:

1 Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014) Prepared by : Ahmed Ghazi Hameed (812517). 2013 Prepared for Dr. Farzana binti Kabir Ahmad

2 2 2 Overview  Introduction.  Background to the Study  Problem Statement.  Research Questions.  Research Objective.  Significance of the Research.  Literature Review.  The Architecture of the Web Log Mining.  Related Work.  Methodology.  Summary.  References

3 Introduction The internet has presented new opportunities to the world. This has made the internet to be popular more than ever and has become a necessity for everyone. The information on the Web is growing like never before. This growth is rapid and significant and has become a challenge to the users due to information overload an eventual drowning of the users that occurs with time. This has been caused by the World Wide Web which has presented a powerful platform where information can be stored, disseminated, and retrieved. Also, this platform helps to mine useful knowledge.

4 The Internet consists of varieties of data. These data are stored in a big repository. The big data repository also consists of large amount of unseen information knowledge. The unseen information knowledge can be discovered by the use of data mining. The approaches that are commonly used for database research for information retrieval are intelligent computing and computational intelligence. Background to the Study

5 Problem Statement The problem of information overload has brought about the challenge of how to find the relevant information. In finding particular information on the Internet, it is possible for the user to either make use of the search engine or employ the use of a search assistant or they can decide to browse Web documents directly. This is the problem; a user will always type in several keywords as a query into the search engine, then the search engine will then return several numbers of pages based on their ranking that is relevant to the query.

6 6 Research Questions The following research questions will be answered to achieve the objectives of this research work. 1- In what ways can the latent semantic factor space be revealed and discovered? 2- How will the Web pages based on their usage-oriented similarities be grouped? 3- Why is it important to predict Web users’ task preference distributions for Web recommendations?

7 7 Research Objective The main objective is to develop ways how the needed information can be accurately found on the internet in the midst of information overload. The objectives to be achieved are as follows: 1- To discover the latent semantic factor space and Web user preference in Web search by Probability Latent Semantic Analysis model. 2- The Web pages will be grouped based on the usage- oriented similarities. 3- To forecast the preference of the Web user’s task and Web recommendations. The usage pattern knowledge will be used for Web recommendations.

8 8 Significance of the Research The research will focus on the ways to help Web users with the exact information needed during the Web information retrieval. This will be achieved by improving the performance of retrieval system in Web applications and Web presentation. This will be achieved by employing and developing Web data mining paradigms. In addition, by capturing the interest of the Web user or pattern this will help to facilitate better understanding of how users navigational on the Web.

9 9 Literature Review. The earlier chapter gives the understanding of the topic being discussed and the aims and objects to be achieved in this research. This chapter begins with explorations of characteristics of web data. The chapter explains the concept of searching the web and the issue of information overload. Then, the architecture of the web data mining was discussed to give the understanding of the web data. Both web mining techniques and web capturing we evaluated in this chapter. There are unique features when the data on the web is compared to the data that is available in any conventional database management systems. The characteristics of the data on the Web are huge in term s of the size. http://searchenginewatch.com/.

10 10 The Architecture of the Web Log Mining The probability inference approach is used in mining the Web usage which is used for Web page grouping and profiling the users. The approaches are useful to reveal the implicit associations between the users of the web and the pages visited. At the same time, it is capable to capture latent task space. This corresponds to the users’ mode of navigation and the functionality of the Web site.

11 11 Related Work The two types of clustering methods that are performed on usage data in the field of Web usage mining are i. Web page clustering and ii. Web transaction clustering (8). Web page clustering has been applied in various ways. It has been used in the adaptive Web site and it has proved to be successful. One example of the application is the PageGather. PageGather is an algorithm of Web page clustering (46, 81). The PageGather algorithm is used to synthesize the index pages. These index pages do not exist before. This is achieved by sorting Web pages according to different groups.

12 12 Methodology The methodology will focus on discovering the usage of Web pattern through applying the Web usage mining. This will help to discover the usage knowledge will then be applied to present the Web users with personalized Web contents. This is a form of web recommendation. There is a need to establish a mathematical framework which will help in analysing Web user behaviour. The framework will be referred to as the usage data analysis model. The framework model will in turn help to categorize the observed Web log files as they occur together. Then the mathematical model will show the understanding between the Web pages and the users. The mathematical model will be based on matrix of the usage data schema.

13 13 Methodology After creating the data model, the algorithms that will detect mutual associations between Web pages and the users will be done. The access data that is hidden in the Web log data of the users sessions will be uncover. The three types of latent analytical techniques that are based on statistical models will be used. The techniques are traditional Latent Semantic Indexing, Probabilistic Latent Semantic Analysis, and Latent Dirichlet Allocation model. This will help to show the mutual relationships between Web objects, such as Web sites and the user sessions. The technique will uncover the Web page categories and the pattern of usage from the Web log files.

14 14 Summary In this chapter the methodology to discover the web pattern by applying the web usage mining was discussed. Also, the need to have a mathematical framework was established. Also, the data analysis model that will help categorize the web log files as they occur together was discussed. The three types of latent analytical techniques that are based on statistical models were also mentioned in this chapter.

15 15 References Agarwal, R., C. Aggarwal, and V. Prasad, A Tree Projection Algorithm for Generation ofFrequent Itemsets. Journal of Parallel and Distributed Computing 1999. 61(3): p. 350371. Agrawal, R. and R. Srikant. Mining Sequential Patterns. in Proceedings of the InternationalConference on Data Engineering (ICDE). 1995, p. 3-14, Taipei, Taiwan: IEEEComputer Society Press. Asano, Y., et al. Finding Neighbor Communities in the Web Using Inter-site Graph. in Proc.of the 14th International Conference on Database and Expert Systems Applications(DEXA'03). 2003, p. 558-568, Prague, Czech Republic. Baeza-Yates, R. and B. Ribeiro-Neto, Modern Information Retrieval. 1999: Addison Wesley,ACM Press. Borodin, A., et al. Finding Authorities and Hubs from Hyperlink Structures on the WorldWide Web. in Proceedings of the 10th International World Wide Web Conference.2001, p. 415-429, Hong Kong, China. Brin, S. and L. Page, The PageRank Citation Ranking: Bringing Order to the Web (http://www-db.stanford.edu/~backrub/pageranksub.ps.). 1998. Büchner, A.G. and M.D. Mulvenna, Discovering Internet Marketing Intelligence throughOnline Analytical Web Usage Mining. SIGMOD Record, 1998. 27(4): p. 54- 61. Chakraborty, S., Data mining for hypertext: A Tutorial Survey. ACM SIGKDD ExplorationsNewsletter, 2000. 1(2): p. 1-11.

16 16 شكراً جزيلاً


Download ppt "Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014)"

Similar presentations


Ads by Google