Future challenges of Corpus Linguistics Voltaire comment from earlier: we see things from our own perspective How to “harness the power” of text archives,

Slides:



Advertisements
Similar presentations
ELIBRARY CURRICULUM EDITION The ultimate K-12 curriculum and reference solution.
Advertisements

IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.
PROMOTING ONLINE GUIDES: EASIER THAN YOU THINK Artur Potosyan, Armenia twitter.com/healthrights facebook.com/healthrights.
Web 2.0 Collaborative Learning Tools By Dr Ken Ryba.
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
CLARIN licensing schemes Anje Müller Gjesdal & Gunn Inger Lyse, University of Bergen.
Corpus Creation for Lexicography Adam Kilgarriff, Michael Rundell Lexicography MasterClass, UK Elaine Ui Dhonnchadha ITE (Linguistics Institute of Ireland)
Are We Ready for the Digital Humanities? Atlantic Provinces Library Association Lisa Goddard Memorial University Libraries May 2011 Atlantic Provinces.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
An introduction to social networking Marga Navarrete 1 st June 2007.
Search Tools & Tips for PSC 231 Money in Politics Prepared by Ann Marshall February 5, 2013.
Sheila Compton Librarian, Dame Alice Owen’s School Potters Bar, Herts This materials remains the copyright of Sheila.
HCC class lecture 6 comments John Canny 2/7/05. Administrivia.
New Slovene corpora within the »Communication in Slovene« project Nataša Logar BergincSimon Krek University of LjubljanaAmebis, Kamnik Faculty of Social.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
INTEGRUM Databases from Russia and former USSR countries Tools for political, cultural, social and economic research
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
The term web2.0 refers to the development of online services that encourage collaboration, communication and information sharing. CILIP Scotland
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
Information Search Wayzata High School Media Center Jim Peterson, Media Specialist.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Welcome to the Library Media Center Basic Information on Library UHSSE LIBRARY RESOURCES AND WEBSITE UHSSE Catalog Librarian Collaborative Services Turnitin.com.
Welcome to Newcastle University Library for Modern Languages students.
Keeping up-to-date with the literature Ljilja Ristic & Angela Carritt February 2010 WISER.
Introduction to Florian Jaeger, For the Methods class, December 3 rd, 2003.
Adotomi.com | Copyright Adotomi 2013 Scaling Up While Maintaining Quality: Life After the Like Nadav Weinberg | Director of Business Development.
DATA VISUALIZATION Time to tell a story with data, and make it much more visual.
sound-effects-sound-clips-family-feud-download sound-effects-sound-clips-family-feud-download-
Free! Metryx is a formative assessment tool that allows teachers to "track, analyze, and differentiate" students across any number of customizable skill.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
Welcome to Georgia Library Learning Online for K-12 Schools
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands.
Introduction to Blogs as an Information Resource Kevin Reiss Rutgers School of Law- Library
Finding Credible Sources
Types of Informational Media 6 th Grade English. Books Books are used to find more in-depth information about a topic. They are considered more credible.
Digital Tools for Research and Teaching in Anthropology Lisa Spiro Digital Media Center October 2009 Image source:
WISER: Gadgets and Widgets Jane Rawson, Vere Harmsworth Library Emma Cragg, Sainsbury Library.
Developing your web presence Matt Lingard & Jane Secker Centre for Learning Technology.
CL 2005, Birmingham Web as Corpus Workshop Intro: Adam Kilgarriff 1 Web as Corpus Workshop Co-chairs: Marco Baroni Adam Kilgarriff Sebastian Hoffman.
Exploring the Role of Music in Secondary English and History Classrooms through Personal Practical Theory Michael Magee.
How Can Corpora Help Me To Be Successful in CO150?
LIB110: Locating Full-text Journal Articles Lingnan University Library Sep 2010
Web 2.0 in Higher Education Ellie Kutz, Professor Emerita of English and IT Faculty Liaison, Umass/Boston.
When TVT graduating seniors go off to college: We hope that they will ask meaningful questions We hope that the next step of their research journey will.
 A website, also written Web site, web site, or simply site, is a group of Web pages and related text, databases, graphics, audio, and video files that.
Digital publishing? Ingrid Tieken-Boon van Ostade Bridging the Unbridgeable Lunch Meeting 27 August 2015.
Using Corpora to Teach Vocabulary Helping Students Help Themselves 1.
By: Wesley Tedlock Digital Marketing: Blogging!. What’s the big deal about Blogs? Have an impact on different brands It’s a way to get your thoughts across.
3-Feb-16 Lerner, Born Digital, AJL Jewish Studies Born Digital Heidi G. Lerner Hebraica/Judaica Cataloger Stanford University Libraries
Digital tools + story ideas. Using RSS What is it? Really simple syndication Delivers news content to you.
Chapter 4 Accessing Primary Sources to Enhance Critical Thinking Dorothy Galanaugh Spencer Homan-Hepner Brittany Rimes Matt Samsel.
ATLAS Education & Outreach #Opening #Tweets S. Goldfarb – 2 Oct /10/20121.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
NCSL INTERNATIONAL SERVING THE WORLD OF MEASUREMENT Stalking Your NCSLI Buddies Or, how I learned to use Web 2.0. Note: This session is “live” on GatherPlace.
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Using language corpora in developing Arabic lessons & syllabuses
What is TexQuest? TexQuest is a statewide digital resources program that provides anytime, anywhere access to high quality, authoritative digital resources.
Using Corpora in Linguistics
Computational and Statistical Methods for Corpus Analysis: Overview
Corpus Linguistics I ENG 617
What is TexQuest? TexQuest is a statewide digital resources program that provides anytime, anywhere access to high quality, authoritative digital resources.
Corpus Linguistics I ENG 617
Morphoogle - A Multilingual Interface to a Web Search Engine
Great Resource of Newspapers and Magazines
BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH
Presentation transcript:

Future challenges of Corpus Linguistics Voltaire comment from earlier: we see things from our own perspective How to “harness the power” of text archives, the Web, social media, etc Making our “structured” corpora relevant (even historical, with LION, etc) Non-English (simple Unicode compliance, my experience with the Corpus del Español) Copyright: Google Books; which laws apply? Quantitative: how are we doing? (cf. Stefan Gries) Teaching: is it really making a difference (DDL; beyond materials development) Reaching beyond our base; making our tools accessible to: – Non-specialists, journalists, political scientists, historians, etc Next page // users of COCA // McEnery’s plenary at AACL 2008 (British history) – Teachers and learners: how user-friendly are the interfaces – Translation How to “bring in” all of those who do CL, but don’t consider themselves CL-ists

Corpus Linguistics (lite) in the popular media: NY Times; Bush’s speeches

The role of new technologies in CL Advantages/disadvantages of using the Web as a new tool in CL Google-related – Google Books (Stanford; Jockers); how to mine these – Google New Archive (but dating issues; next page) Other text archives – Contemporary: Lexis-Nexis, ProQuest, EBSCO; tens of billions of words, BVC – Historical: e.g. TIME, Atlantic, New Republic, Sports Illustrated, ~500m words New Media – Twitter (via fire hose): 15m English each day; 5m Spanish, 5m Portuguese – Facebook (via Bing; fire hose in near future) – Blogs Challenge #1: Processing the texts – Getting exact queries to serve as “proxy” in unannotated texts (end up doing) – Write interface to query these by genre and date (for balance); insert data into database; annotate it and then present via KWIC, collocates, charts, etc – Copyright / licensing issues (“snippets” defense; my use of magazines, news) Challenge #2: Making more static, “structured” corpora relevant – How can you justify a stale, (or even 2-3) year old corpus, when you can be creating and updating an ever-expanding corpus? (cf. COCA; updated once a year)

Google News Archive: end up doing: (cf 1923 TIME Corpus)