Constructing Your Own Corpus from Written Language.

Slides:



Advertisements
Similar presentations
Creating Accessible PDF Documents Dick Hemenway CMAC Accessibility Committee.
Advertisements

Citavi – Adding References – Articles from EBSCOhost Databases
MS-Word XP Lesson 7.
In the top left corner of the page, click on Pages & Files. Click on the If Then Statements folder to select it. Click New – Create a Page. You can also.
Unit 3 Day 4 FOCS – Web Design. No Journal Entry.
1 Using the Notes Tool Revised Materials License Learn how to access Notes Using Notes Exporting Notes Importing Notes Printing Notes Searching.
Beginning A PowerPoint Presentation  To begin click on the Windows 2000 folder. Then double click on PowerPoint. If you do not see a Windows 2000 folder.
How to Create Accessible PowerPoint Presentations Elizabeth Tu and Thayer Watkins April, 2010.
Use Case Modelling Visual Annotator for studying ICU Notes Bacchus Beale.
Creating First Class Web Pages Log into your account.
 When you receive a new you will be shown a highlighted in yellow box where your can be found  To open your new just double click.
George Irwin Syracuse University.  Definitions  Creating PDF  Retrofitting PDF documents  Assistive technology and PDF  Resources.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
Creating a Web Page HTML, FrontPage, Word, Composer.
Positioning the Toolbar You can position the toolbar anywhere on the screen. You can also dock it at the top or side of the screen so that it stays in.
Panorama High School E.G.P./ Training to Put Students’ Grades on the Website Wednesday, September 29,
Copyright ©: SAMSUNG & Samsung Hope for Youth. All rights reserved Tutorials Software: Word processing Suitable for: Beginner Improver Advanced.
TECHNOLOGY TOOLS BY ALEXA BELTZ & MICKAELA PEREZ.
Copyright © Texas Education Agency, All rights reserved. 1 Web Technologies Website Development with Dreamweaver.
Web Technologies Website Development Trade & Industrial Education
CapturaTalk4Android Demonstration Abi James
Intro to C++. Getting Started with Microsoft Visual Studios Open Microsoft Visual Studios 2010 Click on file Click on New Project Choose Visual C++ on.
Using a Template to Create a Resume and Sharing a Finished Document
Enter the EMR – Other Reports is the most common panel where scanned documents reside Click on camera of the report/document you would like to view to.
Computing Theory: HTML Year 11. Lesson Objective You will: o Be able to define what HTML is - ALL o Be able to write HTML code to create your own web.
Web Design (2) Brackets - introduction. Brackets Brackets is a web design code editor It is an open-source project initiated by Adobe (creator of Dreamweaver)
How to Setup MS Word for a Research Paper Steps and Procedures.
Introduction to MS Word Surrey Services for Seniors.
How to Tag a Corpus Using Stanford Tagger. Accuracy All tokens: 97.32% Unknown words: 90.79%
Web Site Design and Editing Manual for Concordia Seminary Graduate School.
From the START menu choose Microsoft Word. Once Microsoft Word opens choose FILE -> OPEN.
By David J. Horgan  Applications ◦ Software that creates spoken audio files from text. ◦ Useful for editing and proofing papers ◦ Creating sound files.
1.Obtaining software 2.Sample pdf for this presentation 3.Checking accessibility of the pdf 4.Tackling inaccessibility 5.Tips and helpful links How to.
Updating the Laboratory Website. Useful Info Address: Everything is saved in the desktop.
Presented by Madhuriya Kumar Dutta Trade and Investment Facilitation Department Mekong Institute, Thailand 16 May 2012.
4th Grade Book Publishing Project: Animal ABC Book
Mark Turner Cuesta College Faculty Web Pages: Elegant Design with Students in Mind.
LIBS May 2005 Microsoft Word Setting Options.
How to use Microsoft Outlook Purdue University Engineering Projects in Community Service University Place Retirement Community Written by: Ashley Eckert.
Barbara White School of Information Technology XHTML template Saving files in Notepad.
 When you receive a new you will be shown a highlighted in yellow box where your can be found  To open your new just double click.
Web Page Design Introduction. The ________________ is a large collection of pages stored on computers, or ______________ around the world. Hypertext ________.
Convert PDF files to PowerPoint slides Extract specific PDF pages to PowerPoint - Support to convert encrypted PDF files - Convert PDF to PowerPoint 2003/2007/2010.
Introduction to FrontPage and Web Page Design. Topics Logging in to your site Creating a webpage Text formatting Page backgrounds Linking webpages Links.
Adobe Dreamweaver CS3 Developing a Web Page. Planning the Page Layout Use White SpaceUse White Space Limit media objectsLimit media objects KISSKISS Use.
How to have RSS feeds going directly into a folder in your Outlook.
The Basics of Managing Your Department Website March 8, 2012.
BASIC WORD PROCESSORS WEEK 5. BASIC WORD PROCESSORS Word Processor Word processor is a program which is used to edit text files and format them with font,
Printing and Paper Conservation for R.B. Stall High School Cost is $.03 per black & white page Cost is $0.827 per color page Stall has already used over.
Standard Toolbar Formatting Toolbar Paragraph Indent Markers Formatting Font Style And Font Size Setting Indents Paragraph alignment Paragraph And Line.
So – You want to learn how to put an article onto the state website. (Note: If you have not done so, you will need to review the web training provided.
News Article (Current Event) Odds and Ends By Mrs. Huffer.
1 Place Your Photo AND Logo on Your Home Page Website Manager Tutorials.
Computer Skills (1) Internet Explorer. To open the Internet Explorer: –Double click on the Internet Explorer icon on Desktop. –Or, from Start  All Programs.
Formatting a Research Paper Lesson 10 © 2014, John Wiley & Sons, Inc.Microsoft Official Academic Course, Microsoft Word Microsoft Word 2013.
Microsoft Word Tutorial Albert Kalim. Topics You Should Know About Start MS Word Start MS Word Open a document Open a document Enter text Enter text Change.
XP New Perspectives on Creating Web Pages With Word Tutorial 1 1 Creating Web Pages With Word Tutorial 1.
Convert a Word Document to PDF File With Bookmarks UNDP - POGAR Training and Support Document Required software: 1.MS Word Adobe Acrobat Professional.
Objectives  Explain the basic Unicode concepts in plain language  Install SILConverters 4.0  Install the converters for your branch  Convert several.
Word processing is the software package that enables you to create,edit, print and save documents for future retrieval reference. creating a document.
Formatting a Research Paper
Registering for Easy Bib and Creating a Works Cited Page
Tutorial: How to Creat a Website.
Introduction This document will show you how to Separate Book Files in Adobe Acrobat XI Pro This task is completed on downloaded pdf book files directly.
Google Drive and Gale Databases
HOW TO MAKE PAGES FOR A WEB SITE
Shelly Cashman: Microsoft Word 2016
Adobe Acrobat DC Accessibility - Metadata, Reading Order, Links
Correct document structure Easy for authors and accessible to readers
Presentation transcript:

Constructing Your Own Corpus from Written Language

Some likely sources for your corpus 1. From MS Word files 2. From the World Wide Web 3. From scanned books 4. From speech audio files

What you need MS Word Notepad PDF to Text Conversion Program (Simpo PDF to Text is a very good one)

Convert your files into plain text file Prefer UTF-8 Encoding, for it can represent all characters in every language, such as Chinese, Russian, Turkish, and so on. Give your files and folders a meaningful name (consistent and systematic)

1. From MS Word Files to Text Open your MS Word Document.  [File or MS Word Symbol on top left corner]  Save as  Other Formats  Save as Type = Plain Text  Save File Conversion window will pop up.  Select Other Encoding  Highlight Unicode (UTF-8)  OK  Close MS Word  Go back to the folder and find the text file you just saved.  Double click on it and it will open in Notepad. Check how it looks.

An easier way Use a WordToText converter For example, Zilla onverter.html onverter.html

Clean the tables Check the file and clean the parts that you do not want to include in your research. For example, you might want to exclude the names of the students, tables, figures, and references.

2. From the World Wide Web to Notepad, and Notepad to Text Find an article on the internet, may be from an online newspaper Using the mouse, left click and highlight the part of the text, then press ctrl + c. Open Notepad. Press ctrl + v to paste it.  File  Save as  Encoding = UTF-8  Save

3. From scanned books  Scan every page and save as Searchable PDF files.  Convert your PDF files to text files (You can use Simpo PDF to Text, Adobe Reader, PDF Creator)  Correct the mistakes (Sometimes there are tons of them)  Save the text files in UTF-8 Encoding

Tag Your Corpus for Other Information You may want to tag your corpus for information that is different from POS. For example, hedges, pauses, disagreement, metaphors, grammar mistakes, and so on. You need to do this by entering the annotations by hand. Or, you can use a software program that is especially designed for making this process faster for you.

Scenario 1 You have decided to create a corpus out of your students’ papers. You asked your students to their papers to you in MS Word format and they did. You want to study the types of contexts they prefer passive voice.

Scenario 2 You have decided to create a corpus out of the applied linguistics books and articles that you have read. You want to compare lexical bundles in them with the ones you use in your academic papers. Luckily some of the articles were already in PDF format but you had to scan some of your books.

Scenario 3 You want to create a corpus of newspaper headlines from New York Times and USA Today to compare their lengths.

Scenario 4 You have decided to create a corpus out of your own writing. You want to use all of the academic papers you wrote during your MA years.

Is there a faster way to follow these procedures? Yes! If you know a programming language, such as PERL, you can write a code and make most of the above mentioned procedures automatic.