Download presentation
Presentation is loading. Please wait.
Published byVirgil Hoover Modified over 6 years ago
1
Team Members: Anna Tinnemore Gabriel Neer Yow-Ren Chiang
Lin Advanced Statistic Methods in NLP Project Presentation Team Members: Anna Tinnemore Gabriel Neer Yow-Ren Chiang 2018/12/4
2
PART 3 MaxEnt (yipee!) 2018/12/4
3
The Good Stuff: Simple feature templates and extraction
Elegant data structures for storage and easy access Pretty good results! 2018/12/4
4
The Bad Stuff: Hmmm 2018/12/4
5
Features A few short loops collected the most relevant context features No long-winded feature templates Easy-access hashes 2018/12/4
6
Decent Results Mid-nineties increasing with the size of the training data Result 1K 5K 10K 40K Accuracy 88.31% 93.55% 94.63% 96.34% Training Time 24 sec. 2 min 27 sec. 4min 28 sec. 18min 34 sec. 2018/12/4
7
PART 4 Task 2 Bagging 2018/12/4
8
Tie Function use Tie::File; use Fcntl; for my $bag_num (1 .. $B) {
# The Nth bag from file foo.txt becomes foo.txtbagN, etc. my $bag_name = "$file_name-bag$bag_num"; open (BAG, ">$bag_name") or die "Can't open $bag_name for writing: $!"; for { # Pick random line of file. my $line = $lines[ ]; print BAG "$line\n"; # Output to the bag. } } 2018/12/4
9
Combination VOTING!! 2018/12/4
10
Step 1: # Loop through file and remember words. Keep them grouped by sentence. while (<FILE>) { foreach { = split /\//; push ($wordtag[0])); } push 2018/12/4
11
Step 2: # Go through file and for each word, increase the count of its tag for { my $tag_index = 0; while (<FILE>) { foreach { = split /\//; my $tag = $wordtag[1]; $tags[$tag_index]->{$tag}++; $tag_index++; } 2018/12/4
12
Step 3: # Go through the sentences and print out each word/tag pair.
my $tag_index = 0; foreach my $sent { foreach my $word { my $tag = max_tag($tags[$tag_index]); $tag_index++; print "$word/$tag "; } print "\n"; 2018/12/4
13
Finding the “Best Tag” # Find the tag with the highest count.
sub max_tag { my $tag_hash = shift; (my $tag) = keys %$tag_hash; my $tag_count = $tag_hash->{$tag}; foreach (keys %$tag_hash) { if ($tag_hash->{$_} > $tag_count) { $tag = $_; $tag_count = $tag_hash->{$tag} } return $tag; 2018/12/4
14
Procedure Creating Bootstrap samples Combine_tool.pl
Treating the file as an array for lines. N random array indices are selected and each corresponding line is output to a file Combine_tool.pl opens the file corresponding to its first argument reads in all words, aggregated by sentence An array of tag hashes is created. For each file in its arg list, opens that file and reads the tags sequentially The hash item corresponding to the tag in the appropriate index of the tag area is incremented For each index, the hash label with the highest count is chosen as the correct tag Re-associate the tags with their words Print out the word/tag pairs 2018/12/4
15
Result Training Data Method 1K 5K 10K Trigram 85.68 / 83.35 / 85.46
92.12 / / 91.90 93.44 / / 93.32 TBL 90.64 / / 91.70 93.97 /93.75 / 94.91 94.91 / / 95.60 MaxEnt 88.31 / / 89.85 93.55 / / 94.05 94.63 / / 95.10 Comb 91.39 / / 92.45 94.87 / / 95.21 95.61 / / 95.55 2018/12/4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.