Download presentation
Presentation is loading. Please wait.
Published byGladys Melton Modified over 9 years ago
1
Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011
2
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 2WING, NUS Domain-Specific Resources
3
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Domain-Specific Resources 3WING, NUS Modular arithmetic page from Wikipedia Modular arithmetic page from Interactivate.com Domain-specific resources targets at varying audiences.
4
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Challenge for a Domain-Specific Search Engine 4WING, NUS How to measure readability for domain- specific resources?
5
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Heuristic-based Readability Measures – Weighted sum of text feature values – Examples: Flesch Kincaid Reading Ease (FKRE): [Flesch48] Dale-Chall readability formula: [Dale&Chall48] 5WING, NUS Quick and indicative but often oversimplify
6
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Natural Language Processing and Machine Learning Approaches – Extract deep text features and use supervised learning methods to generate models for readability measurement – Text Features Unigram [Collins-Thompson04], Parse tree height [Schwarm05], Discourse relations [Pitler08] – Supervised learning techniques Support Vector Machine (SVM) [Schwarm05], k-Nearest Neighbor (KNN) [Heilman07] 6WING, NUS More accurate but annotated corpus required and ignorant of the domain-specific concepts
7
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Literature Review Domain-Specific Readability Measures – Derive information of domain-specific concepts from expert knowledge sources – Examples: Open Access and Collaborative Consumer Health Vocabulary [Kim07] Medical Subject Headings ontology [Yan06] – Handles domain-specific concepts but expert knowledge sources are still expensive and not always available 7WING, NUS Key qualities of a good readability measure: effective, portable and domain-aware.
8
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Intuitions Use an iterative computation algorithm to estimate these two scores from each other Example: – Pythagorean theorem vs. ring theory 8WING, NUS A domain-specific resource is less readable if it contains more difficult concepts A domain-specific concept is more difficult if it appears in less readable resources
9
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Iterative Computation (IC) Algorithm Graph Construction – Construct a graph representing resources, concepts and occurrence information Score Computation – Initialize and iteratively compute the readability score of domain- specific resources and the difficulty score of domain-specific concepts – Two versions: heuristic and probabilistic Required Input – A collection of domain-specific resources – A list of domain-specific concepts 9WING, NUS
10
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Graph Construction 10WING, NUS …Pythagorean theorem can be written as a 2 + b 2 = c 2, where c represents the length of the hypotenuse… …The sine function (sin) can be defined as the ratio of the side opposite the angle to the hypotenuse… … right triangle Pythagorean theorem hypotenuse sine function cosine function … Resource 1 Resource 2 Concept List Pythagorean Theorem hypotenuse sine function Resource 1 Resource 2 right triangle cosine function
11
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) 11WING, NUS wxyz abc Resource Nodes Concept Nodes Initialization – Resource Node (FKRE) – Concept Node (Average score of its adjacent nodes) 1.00 3.00 2.004.00 2.002.503.00 wxyz abc Resource Nodes Concept Nodes 3.00 5.254.757.00 4.00 5.00 6.00 Iterative Computation – Each node (Original score + average of the original scores of its adjacent nodes) Initialization Iteration 1
12
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) 12WING, NUS wxyz abc Resource Nodes Concept Nodes 7.00 9.7510.2513.00 8.1310.0011.88 wxyz abc Resource Nodes Concept Nodes 15.13 18.8221.1924.88 16.51 20.00 23.51 Termination Condition – The rank order of the resource nodes stabilizes Iteration 2 Iteration 3
13
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Heuristic) Single-valued score for each node – Unable to handle concepts of varying difficulties Simple averaging in score computation – Difficult to incorporate sophisticated computational mechanisms 13WING, NUS
14
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Probabilistic) 14 wxyz abc Resource Nodes Concept Nodes Initialization – Resource Node (Sentence Sampling) – Concept Node (Resource Sampling) Initialization
15
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Score Computation (Probabilistic) 15 Iterative Computation – Modified Naïve Bayes Classification Original: Modified: Direct Adaptation: Resource Nodes Concept Nodes
16
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Evaluation Key qualities of a good readability measure – Effectiveness – Portability – Domain-awareness 16WING, NUS
17
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Effectiveness Corpus of Math Webpages Metrics: – Pairwise accuracy – Spearman’s rho Baseline: – Heuristic FKRE – Supervised learning NB, SVM, MaxEsnt Binary concept features only 17WING, NUS PairwiseSpearmanIterations FKRE.72.48- NB.72.52- SVM.80.70- Maxent.82.67- HIC.87.7518 PIC.85.737
18
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability Different selection strategies – Resource selection at random – Concept selection at random – Resource selection by quality – Concept selection by TF.IDF Performance measurement at 5 levels – 20%, 40%, 60%, 80% and 100% of the original resource collection / concept list 18WING, NUS
19
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability 19WING, NUS Resource Selection Strategies Concept Selection Strategies
20
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Portability 20WING, NUS PairwiseSpearman FKRE.63.28 NB.73.53 SVM.82.70 Maxent.76.60 HIC.74.49 PIC.75.55
21
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Domain-awareness Handling of domain-specific concepts – Simple yet effective – Concepts of multiple difficulty levels? Converge to single value even in PIC Splitting? (K-Means, GMM, etc.) Other computational mechanisms? 21WING, NUS
22
Jin Zhao and Min-Yen Kan 13/05/2011 / 22 Conclusion Iterative Computation – Estimate the readability of domain-specific resources and difficulty of domain-specific concepts in a iterative manner – Effective, Portable and Domain-aware Future Work – Handling of concepts of multiple difficulty levels 22WING, NUS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.