Download presentation
Presentation is loading. Please wait.
1
A Brief Survey of Web Data Extraction Tools (WDET) Laender et al.
2
Introduction Web data is hard to query A lot of unstructured data Wrappers can help extract data There are several ways to generate wrappers A wrapper maps a page to a repository This paper is a survey of different wrappers
3
Taxonomy of WDET Languages for Wrapper Development HTML-aware Tools NLP-based Tools Wrapper Induction Tools Modeling based Tools Ontology based Tools
4
Languages for Wrapper Development HTML-aware Tools NLP-based Tools procedural programming languages(Minerva, TSIMMIS) Overview of WDET W4F, XWRAP, RoadRunner Uses free text form (RAPIER, SRV, WHISK)
5
Taxonomy of WDET Wrapper Induction Tools Modeling based Tools Ontology based Tools Generates wrappers from input(WIEN,SoftMealy,STALKER) Based on hierarchies of objects(NoDoSE, DEByE) Uses Conceptual Models or Ontologies (BYU tool)
6
Qualitative Analysis Degree of Automation Support for Complex Objects Page Contents: Semistructured data or text Ease of Use XML Output Support for Non-HTML Sources Resilience and Adaptiveness
7
Conclusions
9
Questions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.