Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gefördert durch das Kompetenzzentrenprogramm www.know-center.at © Know-Center 2012 Measuring the Quality of Web Content using Factual Information 16. April.

Similar presentations


Presentation on theme: "Gefördert durch das Kompetenzzentrenprogramm www.know-center.at © Know-Center 2012 Measuring the Quality of Web Content using Factual Information 16. April."— Presentation transcript:

1 gefördert durch das Kompetenzzentrenprogramm www.know-center.at © Know-Center 2012 Measuring the Quality of Web Content using Factual Information 16. April 2012 WebQuality 2012 workshop at WWW 2012 Elisabeth Lex, Michael Voelske, Marcelo Errecalde, Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein and Michael Granitzer

2 © Know-Center 2012 2 Agenda Motivation Approach Results Summary and Outlook

3 © Know-Center 2012 3 Motivation People‘s decisions often based on Web content  lacking quality control, no verification  Inaccurate, incorrect infomation  No fact checking Measures needed to capture credibility and quality aspects  In respect to facts!

4 © Know-Center 2012 4 Approach Measure information quality based on factual information 3 Approaches:  Use simple statistics about the facts obtained from text  Exploit relational information contained in facts  Use semantic relationships like meronymy and hypernymy First approach:  Use simple statistical features about facts in a document  Indicates how informative a document is  Derive facts from Web content using Open Information Extraction

5 © Know-Center 2012 5 Definition of Factual Density Fact Count Factual Density

6 © Know-Center 2012 6 Experiments Wikipedia: 1000 Featured and Good articles versus 1000 Non- Featured (randomly selected)  Featured: a comprehensive coverage of the major facts in the context of the article’s subject Baseline: Word Count [Blumenstock 2008]  Featured articles longer than non-featured  Bias: longer docs contain more facts Evaluation: 2 Datasets  Unbalanced: articles differ in length  Balanced: articles similar in length

7 © Know-Center 2012 7 Distributions of docs in both datasets in respect to word count

8 © Know-Center 2012 8 Precision/Recall curves of Factual Density

9 © Know-Center 2012 9 Results Factual Density on balanced corpus

10 © Know-Center 2012 10 Experiments – Relational Features Approach 2: exploiting relational information contained in facts Extract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2) Use them to train a classifier to discriminate between featured/good and non-featured

11 © Know-Center 2012 11 Experiments – Relational Features Approach 2: exploiting relational information contained in facts Extract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2) Use them to train a classifier to discriminate between featured/good and non-featured

12 © Know-Center 2012 12 Summary Simple fact related measure: Factual Density Based on Factual Density, featured/good articles can be separated from non-featured if article length similar If articles differ in length, word count!  For future work, combination of both Plan to incorporate edit history: more editors, higher factual density Preliminary experiments with relational features  Promising results, more work in this direction  Goal here is to bring semantics in to the field of Information Quality  We expect this to unlock several IQ dimensions, e.g. generality vs specificity

13 © Know-Center 2012 13 Thank you for your attention! Elisabeth Lex elex@know-center.at


Download ppt "Gefördert durch das Kompetenzzentrenprogramm www.know-center.at © Know-Center 2012 Measuring the Quality of Web Content using Factual Information 16. April."

Similar presentations


Ads by Google