Supervisor:Mr. Sayed Morteza Zaker Presentor:Fateme hadinezhad
1) Introduction 2) Definition 3) Problems Statement 4) Results 5) Contribution 6) Concluson 7) Refrance 1
1) The rapid diffusion of the Internet and of the World Wide Web infrastructure is producing a considerable growth of the demand of new Web sites and Web Applications. 2) to obtain a further reduction of time-to-market, new pages are obtained by reusing the code of existing pages, just by copy-and- paste operations. 3) Duplicated Web pages,having the same structure and justdiffering for the data they include, can be considered as clones. 2
4) In this paper an approach to detect duplicated pages in WAs is Proposed. 5) The validity of the proposed approach has been assessed by means of experiments involving several WAs. 6) Section1:clone analysis Section2:WAs’ duplicated pages identification Section3:experiments carried Section4:conclusive remarks 3
WAs: a Web site may be thought of as a static site that may provide dynamic information too. A Web application provides the Web user.with a means to modify the site status Clones: Duplicated or similar portions of code in software artifacts Levenshtein:introduced the concept of near miss clone, which is a fragment of code that partially coincides with another one. [Bax98] clone analysis :clone analysis is the research area that investigates methods and techniques for automatically detecting them. 4
The detection of duplicated WA pages based on the Levenshtein distance is in general very expensive from a computational point of view. The computational complexity of the algorithm for computing the Levenshtein distance is in fact O (n2),where n is the length of the longer string. 5
6
<img src="../images/Nuovo.jpg" width="92" height="27"> (td, width, img, src, width, height, /td) u = hifgieb <img src ="../pic1.jpg" width="92" height="27"> (td, width, div, align, img, src, width, height, /div, /td) v = hidcfgieab 7
8 D(u, v)=3 ED=1.732
9
Background and motivations: 1) Software clones and clone analysis 2) Web applications and Software clones Client pages: a) static page b. dynamic page 10
Web page: a. control component b. data component Metrics to detect duplicated Web pages 1) Detecting duplicated Web pages by the Levenshtein distance 11
a) Detecting duplicated client pages 2) Detecting duplicated client pages using a frequency based metric 3) Detecting duplicated server pages Case studies Clone detection within a WA 12
In this paper an approach to clone analysis in the context of Web systems has been proposed. Pages of a WA having the same control component were considered as clones, even if they differed for the data component. Two methods for detecting duplicated WA pages - one exploiting the Levenshtein distance and the other one based on the frequency of the HTML tags in a page - have been defined and experimented with. 13
The proposed approach has been successfully applied to identify a case of plagiarism too. Further experimentation should be carried out to better validate the proposed methods. 14
[Bak93] Baker S. B., A theory of parametrized pattern matching: algorithms and applications, in Proceedings of the 25th Annual ACM Symposium on Theory of Computing, 71-80, May [Bak95] Baker B. S., On finding duplication and near duplication in large software systems, in Proc. of the 2nd Working Conference on Reverse Engineering, IEEE Computer Society Press, [Bak95b] Baker S. B., Parametrized pattern matching via Boyer-Moore algorithms, in Proceedings of Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, , Jan
[Bal00] Balazinska M., Merlo E., Dagenais M., Lagüe B., Kontogiannis K., Advanced clone-analysis to support object-oriented system refactoring, in Seventh Working Conference on Reverse Engineering, , Nov [Bal99] Balazinska M., Merlo E., Dagenais M., Lagüe B., Kontogiannis K., Measuring clone based reengineering opportunities, in International Symposium on software metrics. METRICS’99. IEEE Computer Society Press, Nov
[Bax98] Baxter I. D., Yahin A., Moura L., Sant’Anna M., Bier L., Clone Detection Using Abstract Syntax Trees, in Proceedings of the International Conference on Software Maintenance, , IEEE Computer Society Press, [Ber84] Berghel H.L., Sallach D.L., Measurements of program similarity in identical task environments, SIGPLAN Notices, 9(8):65-76, Aug [Frak92] W.B. Frakes, R. Baeza-Yates - Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ,
[Gri81] Grier S., A tool that detects plagiarism in PASCAL programs, in SIGSCE Bulletin, 13(1), [Hor90] Horwitz Susan, Identifying the semantics and textual differences between two versions of a program, in Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, , June [Jan88] Jankowitz H.T., Detecting plagiarism in student PASCAL programs, in Computer Journal, 31(1):1-8, [ 18
[ Kon96] Kontogiannis K., De Mori R., Merlo E., Galler M., Bernstein M., Pattern Matching for clone and concept detection, in Journal of Automated Software Engineering, 3:77-108, Mar [Kon95] Kontogiannis K., De Mori R., Bernstein M., Merlo E., Pattern Matching for Design Concept Localization, in Proc. of the 2nd Working Conference on Reverse Engineering, IEEE Computer Society Press,