Topics in Linguistics ENG 331

Slides:



Advertisements
Similar presentations
Part Two: Using Xaira to explore corpora Richard Xiao
Advertisements

Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Using Corpora in Linguistics
Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
USER GUIDE TO OPEN OFFICE BY MARTIN ROCHE 11K. CONTENTS.
Getting Started with Dreamweaver
1 Lesson 6 Exploring Microsoft Office 2007 Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 Working with MSSQL Server Code:G0-C# Version: 1.0 Author: Pham Trung Hai CTD.
Presented by: Introduction to iTunes U BCC on iTunes U.
Use PowerPoint to make an E-BOOK with voice embedded.
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
PSU Resources
MODULE 3 Internet Basics © Paradigm Publishing, Inc.1.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Copy all files on CD to D drive D:\workshop. Corpus: An Internet Metaphor  Web pages + search engine  Texts + Tools.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
ASP. ASP is a powerful tool for making dynamic and interactive Web pages An ASP file can contain text, HTML tags and scripts. Scripts in an ASP file are.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Text Mining for Music Research: Using word frequency to analyze content Janelle Varin The New School Music Library Association Conference Cincinnati, OH.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
XAIRA is an XML Aware Indexing and Retrieval Architecture ● Developed from the British National Corpus Sara program, it provides: – platform-independent.
Getting Started with Dreamweaver
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
Introduction to Corpus Linguistics
Statistical NLP: Lecture 7
Digital Text and Data Processing
Searching corpora.
AntConc is a freeware, multiplatform of application suitable for all types of users

How to Download and Install Windows Live Messenger
Using Corpora in Linguistics
Computational and Statistical Methods for Corpus Analysis: Overview
ALE161 國際行銷英文簡報技巧 International Marketing Presentation Techniques
Corpus Linguistics I ENG 617
Introduction to Corpus Linguistics: Exploring Collocation
Pemrograman V (PHP) “Introduction”
Introduction to Corpus Linguistics: Dispersion/concordance plots
Corpus Linguistics I ENG 617
Introduction to Corpus Linguistics: Key Word Analysis
Corpus Linguistics I ENG 617
Mess with Text: textual analysis using AntConc and TagAnt
Corpus Linguistics I ENG 617
Topics in Linguistics ENG 331
Windows Internet Explorer 7-Illustrated Essentials
Corpus Linguistics I ENG 617
The ‘grep’ Command Colin Masterson.
A Brief Intro to Corpus Techniques in ELT Research
GDSS – Digital Signature
Topics in Linguistics ENG 331
Corpus Linguistics I ENG 617
Wilson Databases ▪ OMNIFILE Full-Text
Introduction to Corpus Linguistics ENG 331
A Search for Discipline-Specific Vocabulary
Topics in Linguistics ENG 331
ICEweb 2 a new way of compiling high-quality web-based components for ICE corpora Martin Weisser Center for Linguistics & Applied Linguistics, Guangdong.
Getting Started with Dreamweaver
Using Voyant to Explore Text Data
Using GOLD to Tracking L2 Development
Introduction to Text Analysis
YOUR text YOUR text YOUR text YOUR text
Quick Start Guide   Installation GM Pro 7.4 5/10/2019.
Learning the Basics of Microsoft Word 2010 for Microsoft Windows
A new web-based corpus management and analysis platform
CALL Applications.
Presentation transcript:

Topics in Linguistics ENG 331 Rania Al-Sabbagh Department of English Faculty of Al-Alsun (Languages) Ain Shams University rsabbagh@alsun.asu.edu.eg Week 5

Offline Corpus Processors Offline corpus processors are software programs that you download, install, and use on your local machine. They do not usually come with corpora. Instead, you upload you own corpus on them. Pros and Cons: Pros: You work on your own corpus. There is no need to be connected to the internet. There are many offline corpus processors that you can choose from. Cons: They are not always free. Week 5

AntConc: An Overview AntConc is one offline corpus processor that is available for free. It comes with multiple functions including: Wordlists Concordancers Keyword recognition Collocations Data visualization Furthermore, it supports regular expressions. AntConc is one processors among many others made available by Laurence Anthony. Week 5

AntConc: Getting Started The download page is available from here. For windows, just double click the executable file and the software will start AntConc accepts three types of files: Text files HTML files XML files For illustration purposes, we will use the US constitution text file. To upload your file(s), File > Open File(s) or File > Open Dir Week 5

AntConc: Wordlist 1 The first thing you would like to know about your corpus is how many words it has. Through the Wordlist function you can count: Tokens: the total number of words including duplicates. Types (aka vocabulary size): the total number of words without duplicates. Lexical richness = Token/Type ratio = Tokens ÷ Types The Wordlist function from AntConc offers you: (1) the total number of tokens, (2) the total number of types, and (3) the raw frequency of every word. The Wordlist can be sorted by frequency or alphabetically. Week 5

AntConc: Wordlist 2 As we can see in the Wordlist, function words are typically the most frequent. How can we exclude these words from the Wordlist? In CL, these words are referred to as the stop words. You can make a list of them and upload it to the concordancer to avoid showing these words to you. To add your list of stop words, Tool Preferences > Wordlist > Use a stoplist below Week 5

Quiz How many words (tokens) are in the US constitution? What is the vocabulary size of the US constitution? What is the raw frequency of each of the following words? president legislature congress What is the normalized frequency of each of the words from the previous question? Week 5

AntConc: Concordance The concordance function allows you to see your query in content. You can select the window size that you want to see around your query. Week 5

AntConc: Concordance Plot The concordance plot is a visualization function. It allows you to see the distribution of your query in the different parts of the file. The plot of ‘president’ is as follows: It means that the query is the most frequent towards the end of the file. You can click at any given line to see the actual paragraphs in which the query is used. Week 5

AntConc: Collocates AntConc comes with a ‘collocates’ function to find out collocations. Similar to COCA, AntConc allows you to set the window size to find out adjacent and non-adjacent collocations. Furthermore, AntConc has the threshold option so that only words that meet the set threshold are considered. Week 5

Quiz What are the most frequency collocations with ‘president’ in: immediate following context immediate preceding context Week 5

AntConc: Keyword One last function in AntConc is the keyword list. In CL, a word is considered a keyword whose normalized frequency is exceptionally high in comparison with a reference corpus. The reference corpus is typically a very large, general corpus like these. To upload your reference corpus, Tool Preferences > Keyword > Use raw files OR Tool Preferences > Keyword > Use word lists Week 5