Download presentation
Presentation is loading. Please wait.
Published byЗвонимир Бошковић Modified over 5 years ago
1
Measuring Complexity of Web Pages Using Gate
Prepared by: The Who
2
Subject1: Can more meaningful indicators be extracted from the resources (webpages), e.g. a more interesting complexity, diversity or even other like sentiment.
3
Complexity Definition: How to learn the features associated to the difficulty to understand the resources.
4
Our Vision To employ entities liked to diverse contexts as a base to determine the complexity of a Webpage by: Gathering sets of Webpages from different domains Annotating the complexity of the pages (Crowdsourcing) Obtaining the set of named entities on each page (Gate) Determining a complexity score for each entity based on which pages it appears (Centrality / text ranking / Entity authority metrics: how many times it appears in the page vs how many entities are in that page and what is the page complexity score) Employing the set of weighted entities to predict a score for new pages Correlate the outputs with the commonly employed sentence metrics
5
Proposed approach
6
Run Entity and Terms Recognition on a sample from the data set .
1. Create Datastore for the sample 1 3 2
7
1 2 2. Populate the sample on to the corpus & save it to the datastore.
8
3. Run the TermRaider (it is already contain the annieGazetteer for entity recognition )
1 2
9
4. Search for specific Annotation Type
10
5. Export the Terms and annotation set
11
Scoring Score the complexity of the entities
This score is based on the average complexity score of documents that the entity appears on. 2
12
Calculate the page based on the scores of the entities that appear in it
Score the complexity of the entities This score is based on the average complexity score of documents that the entity appears on.
13
Compare scores by the two methods
Site Vanilla Score Proposed Score 0.6 .475 0.796 0.568 .75 0.536 .45 0.52 .55 iswc2013_demo_36.html .375 0.504 .775 0.464 .6 0.48 .725 0.528 .5
14
Thank You! Gracias! Ευχαριστώ! Prepared by: The Who
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.