Download presentation
Presentation is loading. Please wait.
Published byErick Jonas Powers Modified over 9 years ago
1
How to Tag a Corpus Using Stanford Tagger
2
Accuracy All tokens: 97.32% Unknown words: 90.79%
3
What You Need JRE: http://www.java.com/en/download/ie_manual.j sp?locale=en
4
To make sure that Windows can find the Java compiler and interpreter: Select Start -> Computer -> System Properties -> Advanced system settings -> Environment Variables -> System variables - > PATH. [ In Vista, select Start -> My Computer -> Properties -> Advanced -> Environment Variables -> System variables -> PATH. ] [ In Windows XP, Select Start -> Control Panel -> System -> Advanced -> Environment Variables -> System variables -> PATH. ] Prepend C:\Program Files\Java\jdk1.6.0_27\bin; to the beginning of the PATH variable. Click OK three times.
5
Installing Java (JRE) on your computer Click Start type cmd and press enter this will open the command prompt window type java –version and press enter you will get a message: java version “1.7.0” (or may be an older version) If you do not get this message it means you could not install Java correctly. Ask for help.
6
Install the Stanford POS Tagger Basic English Stanford Tagger Version 3.1.3: http://nlp.stanford.edu/software/stanford- postagger-2012-07-09.tgz
7
Installing Basic English Stanford Tagger Version 3.1.3 Click on the link that I provided above download the zip file. Unzip the file to Documents using an archive manager software, such as WinRAR, 7-Zip, or WinZip You might want to change the name of this unzipped folder to stanTagger. I do this because the original name is too long: stanford-postagger-2012-07-09
8
Create a Corpus Folder In stanTagger folder create two folders to hold your files. I name them myCorpus and myTaggedCorpus Now put some text files (or your corpus) in myCorpus Make sure there are no spaces in your file names. For example, writtenArgument.txt instead of written Argument.txt Carry your folder named stanTagger under C: so that you can find it easily.
9
Tagging Files Start your command window as described above Go to C: by typing the command cd.. twice Go in stanTagger by typing cd stanTagger
10
Tagging files To be able to use the Stanford-Tagger on every file automatically, we need to do some programming. We can do this with Perl or other programming languages, such as Java, PHP, Python, and so on. However, I found programming the Command Prompt to be the simplest and will share the code I prepared.
11
Tagging files Code to be used in Command Prompt: FOR %a IN (C:\stanTagger\myCorpus\*.txt) DO stanford-postagger models\left3words-wsj-0- 18.tagger myCorpus\%~nxa >myTaggedCorpus\%~nxa You can simply copy the above code and paste it in the Command Prompt
12
New Code! FOR %a IN (C:\stanTagger\myCorpus\*.txt) DO stanford-postagger models\wsj-0-18- left3words.tagger myCorpus\%~nxa >myTagge dCorpus\%~nxa
13
Newest Code! FOR %a IN (C:\stanTagger\myCorpus\*.txt) DO stanford-postagger models\english- left3words- distsim.tagger myCorpus\%~nxa >myTaggedCo rpus\%~nxa
14
Each file may take about 2-3 seconds and at the end, you will see that myTaggedChineseFolder contains the tagged files.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.