Download presentation
Presentation is loading. Please wait.
1
Future challenges of Corpus Linguistics Voltaire comment from earlier: we see things from our own perspective How to “harness the power” of text archives, the Web, social media, etc Making our “structured” corpora relevant (even historical, with LION, etc) Non-English (simple Unicode compliance, my experience with the Corpus del Español) Copyright: Google Books; which laws apply? Quantitative: how are we doing? (cf. Stefan Gries) Teaching: is it really making a difference (DDL; beyond materials development) Reaching beyond our base; making our tools accessible to: – Non-specialists, journalists, political scientists, historians, etc Next page // users of COCA // McEnery’s plenary at AACL 2008 (British history) – Teachers and learners: how user-friendly are the interfaces – Translation How to “bring in” all of those who do CL, but don’t consider themselves CL-ists
2
Corpus Linguistics (lite) in the popular media: NY Times; Bush’s speeches
3
The role of new technologies in CL Advantages/disadvantages of using the Web as a new tool in CL Google-related – Google Books (Stanford; Jockers); how to mine these – Google New Archive (but dating issues; next page) Other text archives – Contemporary: Lexis-Nexis, ProQuest, EBSCO; tens of billions of words, BVC – Historical: e.g. TIME, Atlantic, New Republic, Sports Illustrated, ~500m words New Media – Twitter (via fire hose): 15m English each day; 5m Spanish, 5m Portuguese – Facebook (via Bing; fire hose in near future) – Blogs Challenge #1: Processing the texts – Getting exact queries to serve as “proxy” in unannotated texts (end up doing) – Write interface to query these by genre and date (for balance); insert data into database; annotate it and then present via KWIC, collocates, charts, etc – Copyright / licensing issues (“snippets” defense; my use of magazines, news) Challenge #2: Making more static, “structured” corpora relevant – How can you justify a stale, 15-20 (or even 2-3) year old corpus, when you can be creating and updating an ever-expanding corpus? (cf. COCA; updated once a year)
4
Google News Archive: end up doing: (cf 1923 TIME Corpus)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.