Download presentation
Presentation is loading. Please wait.
1
NICK PENDAR AND ELENA COTOS IOWA STATE UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE 19, 2008 Automatic Identification of Discourse Moves in Scientific Article Introductions
2
Outline Background and motivation Discourse move identification Data and annotation scheme Feature selection Sentence representation Classifier Evaluation Inter-annotator agreement Further work
3
Automated evaluation: Background Automated essay scoring (AES) in performance-based and high-stakes standardized tests (e.g., ACT, GMAT, TOEFL, etc.) Automated error detection in L2 output (Burstein and Chodorow, 1999; Chodorow et al., 2007; Han et al., 2006; Leacock and Chodorow, 2003) Assessment of various constructs, e.g., topical content, grammar, style, mechanics, syntactic complexity, and deviance or plagiarism (Burstein, 2003; Elliott, 2003; Landauer et al., 2003; Mitchell et al., 2002; Page, 2003; Rudner and Liang, 2002) Text organization limited to recognizing the five- paragraph essay format, thesis, and topic sentences AntMover (Anthony and Lashkia, 2003)
4
Wide range of possibilities for high quality evaluation and feedback ( Criterion ; Burstein, Chodorow, & Leacock, 2004) Potential in formative assessment, but – the effects of intelligent formative feedback are not fully investigated Warschauer and Ware (2006) call for the development of a classroom research agenda that would help evaluate and guide the application of AES in the writing pedagogy “the potential of automated essay evaluation for improving student writing is an empirical question, and virtually no peer-reviewed research has yet been published” (Hyland and Hyland, 2006, p. 109) Automated evaluation: CALI Motivation
5
Automated evaluation: EAP Motivation EAP pedagogical approaches (Cortes, 2006; Levis & Levis- Muller, 2003; Vann & Myers, 2001) fail to provide NNSs with sufficient academic writing practice and remediational guidance Problem of disciplinarity An NLP-based academic discourse evaluation software application could account for this drawback Such an application has not yet been developed
6
Automated evaluation: Research Motivation Long-term research goals: design and implementation of IADE (Intelligent Academic Discourse Evaluator) analysis of IADE effectiveness for formative assessment purposes
7
Evaluates students’ research article introductions in terms of moves/steps (Swales 1990, 2004) Draws from SLA models: interactionist views (Caroll, 1999; Gass, 1997; Long, 1996; Long & Robinson, 1998; Mackey, Gass, & McDonough, 2000; Swain, 1993) and Systemic Functional Linguistics (Martin, 1992; Halliday, 1985) Skill Acquisition Theory of learning (DeKeyser, 2007 ) Is informed by empirical research on the provision of feedback Is informed by Evidence Centered Design principles (Mislevy et al., 2006)
8
Discourse Move Identification Approached as a classification problem (similar to Burstein et al., 2003) given a sentence and a finite set of moves and steps, what move/step does the sentence signify? ISUAW corpus: 1,623 articles; 1,322,089 words; average length of articles 814.09 words Stratified sampling of 401 introduction sections representative of 20 academic disciplines Sub-corpus: 267,029 words; average length 665.91 words; 11,149 sentences Manual annotation
9
Discourse Move Identification Annotation scheme (Swales, 1990; Swales, 2004)
10
Discourse Move Identification Multiple layers of annotation for cases when the same sentence signified more than one move or more than one step
11
Feature Selection Features that reliably indicate a move/step Text-categorization approach (see Sebastiani, 2002) Each sentence treated as a data item to be classified and represented as an n-dimensional vector in the Euclidean space The task of the learning algorithm is to find a function F : S → M that would map the sentences in the corpus S to classes in M = {m1,m2,m3} Identification of moves, not yet steps
12
Feature Selection Extraction of word unigrams, bigrams, and trigrams from the annotated corpus Preprocessing: All tokens stemmed using the NLTK port of the Porter Stemmer algorithm (Porter, 1980) All numbers in the texts replaced by the string _number_ The tokens inside each n-gram alphabetized in case of bigrams and trigrams All n-grams with a frequency of less than five excluded
13
Feature Selection Odds ratio Conditional probabilities are calculated as maximum likelihood estimates N-grams with maximum odds ratios selected as features
14
Sentence Representation Each sentence represented as a vector Presence or absence of terms in sentences recorded as Boolean values (0 for the absence of the corresponding term or a 1 for its presence)
15
Classifier Support Vector Machines (SVM) (Basu et al., 2003; Burges, 1998; Cortes and Vapnik, 1995; Joachims, 1998; Vapnik, 1995) five-fold cross validation Machine learning environment RAPIDMINER (Mierswa et al., 2006) RBF kernel found through a set of different parameter settings on the feature set with 3,000 unigrams Parameters not necessarily the best; exhaustive searches will be performed on the other feature sets
16
Evaluation Five-fold cross validation on 14 different feature sets were performed
17
Evaluation Accuracy - the proportion of classifications that agreed with the manually assigned labels
18
Evaluation Precision - what proportion of the items assigned to a given category actually belonged to it Recall - what proportion of the items actually belonging to a category were labeled correctly
19
Evaluation Trigram models result in the best precision Unigram models result in the best recall
20
Evaluation Move 2 is most difficult to identify as revealed by error analysis – Move 2 gets misclassified as Move 1 Use the relative position of the sentence in the text to disambiguate the move involved see what percentage of Move 2 sentences identified as Move 1 by the system also have been labeled Move 1 by the annotator Extracted features are not discipline-dependent
21
This just in… Built a model with top 3000 unigrams and top 3000 trigrams Precision: 91.14% Recall: 82.98% Kappa: 87.57
22
Inter-annotator agreement Second annotations on a sample of files across all 20 disciplines = 487 sentences k - inter-annotator agreement P(A) - observed probability of agreement P(E) - expected probability of agreement Average k = 0.945 over the three moves
23
Further work on IADE Ongoing experiments to improve accuracy experimenting with different kernel parameters to find optimal models More annotation Inter-annotator agreement (3 annotators) Identification of steps Development of intelligent feedback Web interface design
24
Further research with IADE Evaluation of IADE effectiveness Learning potential Learner fit Meaning focus Authenticity Impact Practicality (Chapelle, 2001) Process/product research direction - interaction between use and outcome (Warschauer &Ware, 2006) Target for evaluation - “what is taught through technology” (Chapelle, 2007, p.30)
25
THANK YOU! Questions? Suggestions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.