The use of an intelligent forum crawler for data retrieval from e-learning portals Miloš Pavković and Jelica Protić, University of Belgrade School of.

Slides:



Advertisements
Similar presentations
Metacrawler Melissa Cyr Information Literacy. A metasearch engine is a search tool that sends user requests to several other search engines and/or databases.
Advertisements

Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
The Blackboard Project EDIT 652 Fall 2005 Dr. Mike Uttendorfer.
Aki Hecht Seminar in Databases (236826) January 2009
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Page-level Template Detection via Isotonic Smoothing Deepayan ChakrabartiYahoo! Research Ravi KumarYahoo! Research Kunal PuneraUniv. of Texas at Austin.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
A Topic Specific Web Crawler and WIE*: An Automatic Web Information Extraction Technique using HPS Algorithm Dongwon Lee Database Systems Lab.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen, 1 David W. Embley 1 Stephen W. Liddle 2 1 Department of Computer Science 2 Rollins Center.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March 31, 2004 Funded by National.
A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,
Databases & Data Warehouses Chapter 3 Database Processing.
CMS, Professional Website & Research Themes. Professional Website (PWP) Purpose: To represent college staff’s professional work through a personal website.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
KNOWLEDGE DATABASE Topics inside  Document sharing  Event marketing  Web content.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
IClasses Project INFM 603, Spring 2012 Mary John, Marcelo Ramagem.
Data Mining By Dave Maung.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Search Engine Architecture
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
AfterCollege Self-Service Scrape Configuration & Posting Utility Kai Hu Haiyan Wu May 14, Harney 235.
SEO. SEO Market Store Best Practice “The Rakuten Merchant Package for SEO will aid in improving the visibility of your store in search.” Getting Started.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. How valuable is medical social media data? Content analysis of the medical web Presenter :Tsai Tzung.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Forms Collecting Data CSS Class 5. Forms Create a form Add text box Add labels Add check boxes and radio buttons Build a drop-down list Group drop-down.
ACIS Introduction to Data Analytics & Business Intelligence Database s Benefits & Components.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Forms 5.02 Understand database queries, forms, and reports.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Information Retrieval
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
OSIA Portal Project developing a portal for the future.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
SEO. SEO Market Store Best Practice “The Rakuten Merchant Package for SEO will aid in improving the visibility of your store in search.” Getting Started.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
UNIVERSITY OF ANTWERP PROGRAMME 2013 PROJECT TITLE: INFORMATION DISCOVERY SYSTEM WITH ABCD (VHL) SITE MODULES. Student Omar Safianu ( )
Automated Question Answering Suggestion Using User Expert and Semantic Information การแนะนำการตอบคำถามอัตโนมัติ โดยใช้ข้อมูลผู้เชี่ยวชาญ และข้อมูลเชิง.
WEB SPAM.
Basic Web-based Emissions Inventory Reporting (Web-EI)
Search Engine Architecture
Chapter 25 - Automated Web Search (Search Engines)
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
What is a Search Engine EIT, Author Gay Robertson, 2017.
Discussion Forum for Community assistance
Megaputer Intelligence
This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.
Presentation transcript:

The use of an intelligent forum crawler for data retrieval from e-learning portals Miloš Pavković and Jelica Protić, University of Belgrade School of Electrical Engineering, Belgrade, Serbia 6th International Conference on Education and New Learning Technologies Barcelona, 7th - 9th of July 2014

Introduction A large number of forums with different topics Forums are often used by students during their studies Large number of relevant information scattered around different forums inside one university domain Forums are based on different technologies 2

Issues The same topic can appear across different forums inside one university domain School official forums VS. departments independent forums Same documents can be uploaded as post attachments to a couple of different web forums Similar courses at different schools 3

Solution – Specialized crawler Specialized forum crawler Aggregation of crawled data from multiple forums of a single university domain Storing data into database Forum modules that use this database for helping students 4

Forum structure Always defined by presented implicit paths 5 Example of a) forum b) thread c) attachments inside post.

Crawler algorithm FCbRE – Forum Crawler based on Regular Expressions Automated system Identifying DOM structure and basic forum elements with regular expressions. Identifying forum implicit paths using regex Example: >>index\.php\?showforum\==\digit+!>+>\P=!<+ Extraction of post content and storing into the database 6

Crawler database Essential in FCbRE model Forum threads and posts are separately stored Similarity tables that contain unique pairs of identifiers of forums, threads and attachments 7 Forums + site id - forum id - forum name - forum link Threads + forum id - thread id - thread name - thread link Posts + thread id - post id - post info Attach + post id - attach id - attach name - attach link Web Forum - site id - site name - site link F – Simil. + forum id (1) + forum id (2) T – Simil. + thread id (1) + thread id (2) F/T – Simil. + forum id + thread id A – Simil. + attach id (1) + attach id (2)

Finding similarities Determining similarities of forums, threads or document names It is not enough to just compare the words grammatical errors Singular/plural form different form but the same semantic meaning Using existing search engines to distinguish semantics FCbRE uses low-level semantic difference 8

Module plugins Two module plugins FCbRE-S (FCbRE Search plugin ) FCbRE-DP (FCbRE Duplicate Prevention plugin) Both used for experimental purposes Written for vBulletin technology Can be adopted for any other forum technology 9

FCbRE-S (FCbRE Search plugin ) Designed for standard forums searches Forwards the requested query to FCbRE database for similarity comparison All similarities are shown as addition to standard search results 10

FCbRE-DP (Duplicate Prevention plugin) Implemented in the section where the users can create a topic or forum Monitors the field for the name of new thread or forum Notifies the user that the similarity exist 11

Results 9 web forums from the University of Belgrade, manually gathered This group is a mixture from different sources Percentage of similar forums is smallest, while for the document is highest True percentage of "useful" duplicates should be taken with caution 12

Conclusion The proposed solution performs information aggregation of related forums It has potential in reducing duplication of forums, topics and posts The use of plugins would result in higher forum content quality 13

Thank you! 14 Feel free to contact us and ask any question that you may find interesting