Download presentation
Presentation is loading. Please wait.
1
Extracting Semistructured Information from the Web J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo from Stanford University Presented by: Wei Mao
2
Introduction: Background Fast growing of WWW Semistructured data in web pages Difficulty with manipulating web data One solution A configurable extraction program Extraction result in OEM A wrapper is used for query
3
A detailed example: Weather table Can we query “What is the forecast for Vienna for Jan. 28, 1997?”?
4
Extraction process: HTML file Specification file Commands [ variables, source, pattern ] Package result into an OEM object
5
The HTML for weather table
6
A sample specification file
7
Extraction result
8
Customizing the extraction result
9
Additional capabilities Extract_table construct Case operator Get(url) operator Query the extracted result Use existing wrapper generation tool Only simple interface is required
10
Advantages Manipulate web data efficiently Flexible Easy to use Reuse the existing systems (OEM, Lorel, HTML parser)
11
Disadvantages Depends on outside input Requires prior knowledge of the structure of HTML file Have to use specification file
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.