Download presentation
Presentation is loading. Please wait.
Published byRosamund Farmer Modified over 9 years ago
1
TALC 2006 1 Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris
2
TALC 20062 James Thomas & Jan Pomikálek Department of Information Technology Faculty of Informatics Masaryk University Brno Czech Republic
3
TALC 20063 Data Driven Learning doctoral students of Faculty of Informatics training and trust needed to ask questions needed to be able to create queries needed to believe answers needed to trust descriptive accounts
4
TALC 20064 TALC 2002 Corpus consultation hampered by students’ limited vocabulary different tasks needed concordances need to be sorted Readability Average word frequency of each concordance The design of a Lexical Difficulty Filter for language learning on the Internet (pdf)pdf
5
TALC 20065 What changed … Web-based interface Bonito became Word Sketch Engine (WSE) user friendly CQL now optional (example)example New features - new results! (example)example word sketches sketch differences thesaurus (statistical) frequency distribution (chunks/patterns)
6
TALC 20066 Addressing issues of faith and skills Worksheets including instructions example relating to the textbook example Classroom use of concordance printouts prepositions prepositions Activities set for corpus use example relating to the textbook example Error correction of each other’s written work
7
TALC 20067 Addressing Problem 1 (cont) Faith in general corpus use students find the results convincing and useful Feedback from students Qualitative feedback only See abstract.abstract BNC not “computer savvy”
8
TALC 20068 BNC - limited application Dated – 94% texts from 1985 to 1993 modern technology not accounted for Technical vocabulary missing Differences between word usage higher frequency of academic vocabulary not represented (Coxhead) see key words list Solution: revisit an old idea …
9
TALC 20069 TALC 2004 Each dept at FI MU was invited to contribute academic papers to a new Informatics Corpus Metatag sections to serve as models for own writing Language differences between introductions, methodology, conclusions
10
TALC 200610 Ran aground Demand for metadata – too fine-grained too labour-intensive few could see the point – unable to give priority to it Convoluted uploading interface
11
TALC 200611 Addressing Problem 2 “Build Corp” “Corpus Builder”Corpus Builder Configurable metadata list POS tagging, lemmatization Other transformations can be incorporated, e.g., HTML text Corpus configuration Building Word sketches Compiling statistical thesaurus User accounts management
12
TALC 200612 Simplified user’s procedure Interface for converting pdfs Abbyy FineReader Save set in folder Upload files Metadata (ACM) Notes provided to users Notes Demo
13
TALC 200613 An Informatics Corpus is born Currently contains 202 documents 2,763,259 tokens 18 ACM categories (over half documents in one category)
14
TALC 200614 Uses to date Key term extraction herehere Illustrative sentences Moodle’s glossary module Moodle Words in need of pronunciation attention Some worksheets of adjectives with prepositions adjectives Website of sample searches Website
15
TALC 200615 What the future holds Language acquisition consulting resources doesn’t guarantee retention log corpus consultation converted into interactive revision activities, automatically researching the effectiveness of DDL
16
TALC 200616 What the future holds Corpus Builder single click keywords extraction automatic conversion from various formats to plain text POS tagging for LOTE log user ’ s use
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.