Introduction to Florian Jaeger, For the Methods class, December 3 rd, 2003.

Slides:



Advertisements
Similar presentations
Building Relationships with Family Friendly Materials Kat Cripe Florida State University.
Advertisements

Annotation, Alignment and Transcription: An extremely brief and basic introduction to Elan and Transcriber OLAC Tutorial at the Linguist Society of America.
Database Searching: Education Abstracts/Full Text & Professional Development Collection.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Using Treebanks tgrep2 Lecture 2: 07/12/2011. Using Corpora For discovery For evaluation of theories For identifying tendencies – distribution of a class.
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.
72% of all parents are concerned that other people could locate their child through their mobile phone using location based services.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Research methods in corpus linguistics Xiaofei Lu.
Researching for a Debate
An introduction to using the AmiGO Gene Ontology tool.
Planning and Designing a Website Session 8. Designing a Website Like all technical artefacts a website needs to be carefully planned and designed to be.
University of South Alabama Library Website Design and Accessibility Evaluation.
LESSON #3 WEB DESIGN. WHAT IS WEEBLY? Weebly.com is web 2.0 like online site builder with some advanced features. Is the perfect tool for creating classroom.
Branding through New Media UCBi Leaders Forum 2012.
INTERNET CHAPTER 12 Information Available The INTERNET contains a huge amount of information a huge amount of information information on any topic you.
A Case Study in Success Online How to generate revenue through content marketing.
Web Sites for amateur radio. So You want to make a Web Site? There are several things you need to know about web sites before you start to think about.
Kim Patch July 12, 2005 Utter Command: Human-Machine Linguistics, Human-Machine Grammar, and a New Speech Interface.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Social Media for Credit Unions? Facebook – Getting Started Adding content Promoting Advertising Summary W E L O O K A T T H I N G S D I F F E R E N T.
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
What’s Your Digital Marketing Strategy?. What is Digital Marketing? Computers Tablets Phones Social networks Traditional (Radio, TV) Ease of use.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
TEACHING VOCABULARY Калинина Е.А. доцент кафедры филологического образования СарИПКиПРО.
Researching language with computers Paul Thompson.
A Tutorial By Jennifer Wagner
Programming in HTML.  Programming Language  Used to design/create web pages  Hyper Text Markup Language  Markup Language  Series of Markup tags 
1 Computational Linguistics Ling 200 Spring 2006.
Introduction to Interactive Media 13: Writing for Interactive Media.
Online English.  Powerpoints are due after you reach the 5 th word (regardless of the day).  Must complete all boxes correctly for credit.  Vocab tests.
Chapter 6: NavigationCopyright © 2004 by Prentice Hall 6. Navigation Design Site-level navigation: making it easy for the user to get around the site Page-level.
Dreamweaver MX BTA3Open. Dreamweaver MX Application used for creating web sites Homepage must always be saved as index.htm All files names must be in.
Ms M’s Top Ten Google Search Tips Using Google (a search engine) Google’s mission is to organize the world’s information and make it universally accessible.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
AUTOMATED CAFETERIA SOLUTION WITH AUTHENTICATION Paul Coleman Madison County Public Schools Technology Support Intern.
Building an Effective Website. Start with a plan  Have clear goal and design your entire site with those goals in mind.  Research and develop a sketch.
How Can Corpora Help Me To Be Successful in CO150?
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Suggestions for Parents Take the time to see what your kids are doing online and what their interests are. Let them teach you about the Internet. Surf.
Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Introduction to EBSCOhost Tutorial support.ebsco.com.
3.4 Internet Strand 3 Sara Liquori. 3.4 Internet  A global computer network providing a variety of information and communication facilities, consisting.
A Tutorial By Jennifer Wagner
 Network  A _____ of computers that can _________ w/ each other  Examples of hardware  ______________ & communication lines  Internet  Hardware.
Utter Command Speech interface software that works the way your brain does Kimberly Patch
Charnelle Bacon & Brandon Carr. Benefits of a Social Web Share Create Connect  The social web is a place that one can share a multiplex of information,
ONZEminer Margaret Maclagan, ONZE director Robert Fromont, designer.
Rings of Responsibility
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
RSA Web Content Workshops - From Content To Conversation - by James Cherkoff.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Web coordinator workshop. Introduction Meet and greet –Who are you and what was the last website you visited? Comms team – here for support + our role.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
INTRODUCTION TO APPLIED LINGUISTICS
WEB DESIGN CONCEPTS Brayden Burr. UNDERSTANDING THE CONTENT.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Web Accessibility John Rochford Rich Caloggero UMMS Shriver Center
Where and How Wikis 4 Teachers Where and How
Tutorial Introduction to support.ebsco.com.
Louisiana: Our History.
Doing More With Your Website
ENDANGERED ANIMALS A RESEARCH PROJECT
Research using Databases and Google
Tutorial Introduction to help.ebsco.com.
Presentation transcript:

Introduction to Florian Jaeger, For the Methods class, December 3 rd, 2003

Some basic questions Where are our corpora? Where is the software? – Is there a list of all the stuff we have? – How can I access the software? Where do I start? What information is available where? – Are there tutorials for the available software? What kind of corpus work is supported at Stanford? – Corpora are only for those computational folks … ;-) And the most important question:

Why bother at all … Because we are often wrong with our (ad- hoc) intuitions – linguistic methodology is … – well, let’s not go there. While corpora have a lot of drawbacks (no negative evidence, genre specific, etc.) they offer a lot of opportunities. To illustrate my point, a little case study …

Hagit Borer: “Some notes on the Syntax and Semantics of Quantity” Talk for the Sem. Workshop, 10/31/2002 Claim: “The interpretation of bare plurals does not, actually, consist of any subset of (well-defined) singulars.” – 0.5 apples/apple – 1.0 apples/apple – 1.5 apples/apple – zero apples/apple

Hagit Borer: “Some notes on the Syntax and Semantics of Quantity” Talk for the Sem. Workshop, 10/31/2002 Hagit Borer’s judgments: – 0.5 apples/*apple – 1.0 apples/*apple – 1.5 apples/*apple – zero apples/*apple

Hagit Borer: “Some notes on the Syntax and Semantics of Quantity” Talk for the Sem. Workshop, 10/31/2002 Google’s count: – 0.5 apples (120)/*apple (179) – 1.0 apples (42)/*apple (23,600) – 1.5 apples (59)/*apple (362) – zero apples (194)/*apple (124) This also makes clear, some of the problems, so let’s take pears

Hagit Borer: “Some notes on the Syntax and Semantics of Quantity” Talk for the Sem. Workshop, 10/31/2002 Google’s count: – 0.1 pears (32)/*pear (118) – 0.5 pears (37)/*pear (50) – 0.7 pears (9)/*pear (14) – 1.0 pears (14)/*pear (24,000) – 1 pears (14)/?pear (7,480) – One pears (1,130)/?pear (3,060) – 1.5 pears (28)/*pear (316) – zero pears (3)/*pear (0) Conclusion: – It is amazing how many programs or computers products use fruit names. – The original judgments seem questionable. BUT: can we trust Google?

… GSearch Tutorial Grep Tutorial Tgrep Tutorial CQP Tutorial In addition to the indicated structure, all pages offer links to external pages, including corpora, software, tutorials, demos, etc. Local Support E-list & Corpus TA

Looking for a corpus There are several sites on the web that can help you to find out if what you are looking for exists: – Databases like David Lee’s site (see also our Top 10 list)David Lee’s siteour Top 10 list – The LDC databaseLDC database – Our list of corpora (next page) lists, see our site under ‘Support’ – Local: – Global:

Types of corpora Different languages Different media (speech, video, text) Different levels of annotation – No annotation – Transcribed speech or video – Sociological annotation (gender of speaker, average age of audience, dialect of speaker, etc.) – Discourse and textual information (publication date, number of discourse participants, discussion panel vs. novel, etc.) – Linguistic annotation (phonemes, prosody, syntax, morpho- syntax, lexemes, phonological segments & syllables, etc.)

Looking for a specific corpus List of available corpora – If the corpus is on AFS – If the corpus in on the Corpus Computer – If the corpus is on CD – If the corpus is on the WWW – If the corpus has special license conditions – If we don’t have the corpus

… GSearch Tutorial Grep Tutorial Tgrep Tutorial CQP Tutorial In addition to the indicated structure, all pages offer links to external pages, including corpora, software, tutorials, demos, etc. Local Support E-list & Corpus TA

Tools & software General Where to start: – Local online tutorials (see also external references and manuals) Local online tutorials – The corpus TA – Little helpers

A brief look at some tools BNC Web – Problem: Superiority “who the hell …”“who the hell …” – Problem: Distribution of “… is like …” – age dependent?Distribution of “… is like …” General information Age (easy export to e.g. Excel)Excel Crosstabs TGrep2 and Tgrep – Tutorial Tutorial – Examples: tgrep2 -c wsj_mrg.t2c.gz -l 'VP < (NP $. NP)‘ tgrep2 -c wsj_mrg.t2c.gz -l 'VP < (NP $. PP-DTV)‘ tgrep2 -c wsj_mrg.t2c.gz -l 'VP=foo < (/VB*/ < gave) & < (NP $ NP)‘ tgrep2 -c wsj_mrg.t2c.gz -l 'VP=foo < (/VB*/ < gave) & < (NP $ PP-DTV)'

Note: Tgrep is right-headed The following pattern matches an S which has a child A and another child that is a C and that the A has a child B: – S < (A < B) < C However, this pattern means that S has child A and that A has children B and C: – S < ((A < B) < C) It is equivalent to this: – S < (A < B < C)

Some more Tgrep2 syntax A < B A is the parent of (immediately dominates) B. A > B A is the child of B. A <N B B is the Nth child of A (the rst child is <1). A >N B A is the Nth child of B (the rst child is >1). A <, B Synonymous with A <1 B. A >, B Synonymous with A >1 B. A <-N B B is the Nth-to-last child of A (the last child is <-1). A >-N B A is the Nth-to-last child of B (the last child is >-1). A <- B B is the last child of A (synonymous with A <-1 B). A >- B A is the last child of B (synonymous with A >-1 B). A <` B B is the last child of A (also synonymous with A <-1 B). A >` B A is the last child of B (also synonymous with A >-1 B). A <: B B is the only child of A A >: B A is the only child of B A << B A dominates B (A is an ancestor of B).

Some more TGrep2 syntax A >> B A is dominated by B (A is a descendant of B). A <<, B B is a left-most descendant of A. A >>, B A is a left-most descendant of B. A <<` B B is a right-most descendant of A. A >>` B A is a right-most descendant of B. A <<: B There is a single path of descent from A and B is on it. A >>: B There is a single path of descent from B and A is on it. A. B A immediately precedes B. A, B A immediately follows B. A.. B A precedes B. A,, B A follows B. A $ B A is a sister of B (and A 6= B). A $. B A is a sister of and immediately precedes B. A $, B A is a sister of and immediately follows B. A $.. B A is a sister of and precedes B. A $,, B A is a sister of and follows B. A = B The node matched by A is also matched by B.

The alternative with windows TigerSearch 2.1; screen shots: – Grammar search Grammar search – Collocation search Collocation search

The end my friends Want to help? – The website can always use additions (short blurbs about software, your opinion about the user-friendliness of a certain web interface, etc.) Tschuessi!