Download presentation
Presentation is loading. Please wait.
Published byNicholas Haynes Modified over 9 years ago
1
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : STEPHEN T. O’ROURKE, RAFAEL A. CALVO and Danielle S. McNamara 2011, EST Visualizing Topic Flow in Students’ Essays
2
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab Motivation Writing is an important learning activity, essays Visualizing is important that can help people assess and improve the quality of essays.
4
Intelligent Database Systems Lab Objectives This paper presents a novel document visualization technique and a measure of quality based on the average semantic distance between parts of a document.
5
Intelligent Database Systems Lab Methodology-Mathematical Framework In order to Visualization, so need to reduce dimension : term-by- paragraphs matrix topic model is created topic model is projected visualization of the document’s paragraphs Use NMF stop-words low frequency words stemming is applied 2-dimensional space identify features in the topic model of the document. Visualizing Topic Flow Quantifying Topic Flow term-by- sentence matrix topic model is created
6
Intelligent Database Systems Lab Methodology-Visualizing Topic Flow (term-by- paragraphs matrix) p1……pn i1. in i(term) j(paragraphs) If Log-Entropy is large, this word is more import Term’s Entropy in document Term’s frequency In paragraphs
7
Intelligent Database Systems Lab Methodology-Visualizing Topic Flow (NMF dimensionality reduction technique) Term-by-topic martix(m*r) Topic-by-paragraphs martix(r*n) Term-by-paragraphs martix (m*n) ≈ Ex.X(6,2)=w(6,3)*H(3,2) which can be approximated by minimizing the squared error of the Frobenius norm of X−WH. number of latent topics
8
Intelligent Database Systems Lab Methodology-Visualizing Topic Flow (2-dimensional representation) P1..Pj P11.1.. Pi123 paragraph-paragraph triangular distance table Multidimensional Scaling use in Similarity comparison iterative majorization algorithm (least-squares) minimize a loss function (Stress) between the vector dissimilarities approximated distances in the low dimensional
9
Intelligent Database Systems Lab Methodology-Visualizing Topic Flow (Visualizing Flow ) the diameter of the grid equal to the maximum possible distance between any two paragraphs Paragraphs Next paragraphs node-link introduction conclusion Low grade High grade, Why? Because: 1. paragraphs appear close, 2. ‘introduction’ and‘conclusion’ is similar The degree of deviate from a circle
10
Intelligent Database Systems Lab Methodology-Quantifying Topic Flow Semantic distances between consecutive pairs of sentences or paragraphs Double average over all the pairs of sentences or paragraphs DI <=0, indicates a random topic flow DI> 0, indicates the presence of topic flow.
11
Intelligent Database Systems Lab Experiment - Evaluation 1: Flow and Grades (Experiment Dataset) Dataset:120 essays written for assignments by undergraduate students at Mississippi State University Essay grades :1-6 level Subset:High:67(1-3)Low:53(3.2-6) k(number of topic):5 Average wordAverage sentence Average paragraphy Each essay726.60(114.37)40.03(8.29)5.55(1.32)
12
Intelligent Database Systems Lab Experiment - Evaluation 1: Flow and Grades (Measuring Topic Flow ) less present using either of the dimensionality reduction techniques P<0.05 P>0.05
13
Intelligent Database Systems Lab Experiment - Evaluation 1: Flow and Grades (Measuring Topic Flow ) Measure the correlation
14
Intelligent Database Systems Lab Experiment - Evaluation 2: Supporting Assessment(Methodology) 1.inter-rater agreement that the tutors had with two expert raters. 2. The two tutors independently marked assignments with map and no map hypothesized : Essay’s agreement can be subjectively assessed faster, more accurately, and more consistently with map. answer
15
Intelligent Database Systems Lab Experiment - Evaluation 2: Supporting Assessment(Essay Subset Preparation ) The 40 essays remaining were divided into two subsets of 20 essays each according to the MASUS procedure to assess subest1subest2
16
Intelligent Database Systems Lab Experiment - Evaluation 2: Supporting Assessment(Results) Rater1:native English speaker Rater2: non-native English speaker In order to eliminate the effect of essay length
17
Intelligent Database Systems Lab Conclusions Tutors assess the essays faster and more accurately and consistently with the aid of topic flow visualization.
18
Intelligent Database Systems Lab Comments Advantages – effectively discover market intelligence (MI) for supporting decision-makers. Applications – Document visualizations.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.