Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topics in Linguistics ENG 331

Similar presentations


Presentation on theme: "Topics in Linguistics ENG 331"— Presentation transcript:

1 Topics in Linguistics ENG 331
Rania Al-Sabbagh Department of English Faculty of Al-Alsun (Languages) Ain Shams University Week 11

2 Installation Prerequisites
We need to download and install these two before we start: Visual C Build Tools Editra: Python editor Notepad++ Week 11

3 Corpus Compilation It is always a good idea to look for a ready made corpus either from sources such as the LDC and ERLA or from individual researchers. However, sometimes you have to compile your own corpus. As you compile the corpus, you need to make sure that it follows the criteria of a well-designed corpus. Do you remember what those criteria are? In corpus and computational linguistics, corpus compilation is referred to as corpus harvesting as well. Week 11

4 Resources for Corpus Harvesting: Print Books
Depending on your study, you may compile your corpus from print books, online written resources, or audiovisual resources. For print books, one can check the following for a text machine-readable version of the books Project Gutenberg Oxford Internet Archive If such a version does not exist, one may need to work on a scanned version of the book and use an Optical Character Reader (OCR) software program. OCR programs convert scanned images into text files. They are never 100% accurate but they save much typing time. There are many free online OCRs, though. Week 11

5 Resources for Corpus Harvesting: Web as Corpus
When we compile data from online resources, we are using the “Web as Corpus”. This is a term coined a few years ago and there is an entire series of workshops that carry the same name as well as a SIG. Software programs used to compile corpora from the Web are referred to as scrappers, spiders, or crawlers. Week 11


Download ppt "Topics in Linguistics ENG 331"

Similar presentations


Ads by Google