Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris.

Similar presentations


Presentation on theme: "Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris."— Presentation transcript:

1 Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris

2 brought to you by … James Thomas Jan Pomíkalek Department of Information Technology Faculty of Informatics Masaryk University Brno Czech Republic

3 Data Driven Learning doctoral students of Faculty of Informatics faith and skills needed to ask question needed to be able to create queries needed to believe answers needed to trust descriptive accounts

4 What changed …  Web-based interface Bonito became WSE user friendly  CQL now optional  New features - new results! word sketches sketch differences thesaurus (statistical) frequency distribution (chunks/patterns)

5 TALC 2004 (2)  Corpus consultation hampered by students’ limited vocabulary different tasks needed concordances need to be sorted  Background: TALC 2002  Readability  Average word frequency of each concordance

6 Addressing issues of faith and skills  Classroom use of concordance printouts  Activities set for corpus use  Worksheets including instructions  Website of sample searches  Moodle’s glossary module

7 Addressing Problem 1 (cont)  lack of faith in general corpus use (3) students find the results convincing error correction of each other’s written work  Feedback from students Qualitative feedback only See abstract. BNC not “computer savvy”

8 Success created problem #2 BNC not “computer savvy”

9 BNC - limited application  Dated – 94% texts from 1985 to 1993 modern technology not accounted for  Technical vocabulary missing  Differences between word usage higher frequency of academic vocabulary not represented (Coxhead)  e.g. robust  Solution: revisit an old idea …

10 TALC 2004  Each dept at FI MU was invited to contribute academic papers to a new Informatics Corpus  Metatag sections to serve as models for own writing  language differences between introductions, methodology, conclusions,

11 Ran aground 1. demand for metadata – too fine-grained too labour-intensive few could see the point – unable to give priority to it 2. convoluted uploading interface no Windows version ??? time-consuming procedure for uploading

12 Addressing this Problem  Much improved interface  “Build Corp”  “Corpus Builder”  http://corpora/cb/  Configurable metadata list  Corpus configuration  POS tagging, lemmatization  Other transformation can be incorporated, e.g., HTML  text Notes on Corpus Builder  http://www.fi.muni.cz/~thomas/corpora/cb_text_upl oad.htm http://www.fi.muni.cz/~thomas/corpora/cb_text_upl oad.htm

13 Solutions (3)  the time demanded of the individuals Interface for converting pdfs Save set in folder Upload quickly Metalanguage (ACM) DEMO

14 Much improved interface  Building Word sketches  Statistical thesaurus  User accounts management  More user-friendly

15 Enter the Informatics Corpus  Currently contains  Uses to date Illustrative sentences Some worksheets of  Subjunctive  Etc

16 What the future holds  Language acquisition Consulting resources doesn ’ t necessarily lead to retention log lookups converted into interactive revision activities, automatically  Researching the effectiveness of DDL


Download ppt "Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris."

Similar presentations


Ads by Google