Download presentation
Presentation is loading. Please wait.
1
1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006
2
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 2 Resources – Laura is bugging me to make a CU Corpora page… Like this http://www.stanford.edu/dept/linguistics/ corpora/cas-home.html http://www.stanford.edu/dept/linguistics/ corpora/cas-home.html TGREP http://www.stanford.edu/dept/linguistics/ corpora/cas-tut-tgrep.html
3
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 3 Searching with pos tags and ! [word = "[tT]he" & !( pos = "DT" ) ]; wsj [ !(word = "water" | pos = "NN")]; [ !(word = "water") & !( pos = "NN")]; [ word != "water" & pos != "NN" ];
4
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 4 Operator precedence The precedence properties of the (logical) operators are defined by the following list, i.e. if operator x is listed before operator y, operator x has precedence over y. Operators are evaluated left-right =, !=, !, &, | [ ! word = "water" & ! pos = "NN" ]; disambiguates as [ !(word = "water") & !( pos = "NN")];
5
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 5 Searching sequences with | and ? "Bill" [pos = "NP"]; [pos = "NP"] [pos = "NP"] [pos = "NP"]; ([pos = "NP"] [pos = "NP"]) | ([pos = "NP"] "of" [pos = "NP"]); ([pos = "NP"] "of“? [pos = "NP"]); Note: First match applies
6
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 6 Corpus Position: wild cards and contexts "give" []* "up"; "give" []{0,5} "up"; "give" []* "up" within 7; "Clinton" expand to 5; "Clinton" expand left to 5; "Clinton" expand right to 5;
7
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 7 Assignments and Intersect Q1 = "rain"; Q2 = [pos="NN"]; intersect Q1 Q2; Q1 = [pos = "JJ"] [pos = "NN"]; Q2 = "acid" "rain"; intersect Q1 Q2; [word = "acid" & pos = "JJ"] [word = "rain" & pos = "NN"]
8
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 8 Structural restrictions "give" []* "up" within s; ("gain" []* "profit") | ("profit" []* "gain") within 3 s; ("gain" []* "profit") | ("profit" []* "gain") within article; "Clinton" expand left to 2 s;
9
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 9 Defining structural restrictions Nounphrase = [pos = "DT"] [pos = "JJ"] [pos = "NN"]; Nounphrase; [pos = “JJ”] Go back to select
10
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 10 For fun [pos = "V.*"][pos = "PN.*”] []* [pos = "V.*"][pos = "PN.*”] ( [pos = “V.*”] [pos = “PN.*”]) within s Not a question, not beginning of sentence…
11
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 11 less is more less cat ??/* | less Switches SPACE – next screenful b– previous screenful / /RNR search for pattern ? search backwards for pattern q - quit
12
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 12 Searching for a word tgrep Halloween – what happens? Why don’t you have to specify a file? babel>grep tgrep.cshrc # tgrep stuff #setenv TGREP_CORPUS /corpora/treebank2/tbl_075/tgrepabl/brwn_cmb.crp setenv TGREP_CORPUS /corpora/treebank2/tgrepabl/wsj_mrg.crp Count results: tgrep research | wc –l cat ??/* | grep Halloween | wc -l
13
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 13 Tgrep Switches -a Match on all patterns in a sentence -w Return the whole sentence -n Put the entire string on one line -t Print only the terminals
14
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 14 Viewing it in sentential context tgrep –wn Halloween | more tgrep –wn research | more (20,865 hits) Can also use less
15
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 15 Viewing it in sentential context tgrep –wn research | more
16
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 16 Searching by POS tgrep NNS | more Another way to do your sanity check
17
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 17 See more data? tgrep NNS | grep. | more
18
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 18 Sentential context (again) tgrep –wn NNS | more
19
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 19 Searching by syntactic constituent tgrep NP | more
20
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 20 Single-line outputs tgrep –n NP | more
21
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 21 Viewing tree-like output tgrep –w NP | head 20
22
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 22 Searching for relations between nodes tgrep ‘NP < CC’ | head -16
23
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 23 tgrep –g (whole language) A < B – A immediately dominates B A < B – A is immediately dominated by B A << B – A dominates B A >> B – A is dominated by B A. B – A immediately precedes B A.. B – A precedes B A<<,B – B is the leftmost descendent of A A<<‘B – B is the rightmost descendent of A
24
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 24 Alternation node names can be ORed e.g. tgrep ‘Clinton|Gore’ | head
25
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 25 Character classes Regular expressions tgrep ‘/[Cc]hild/’ | egrep. | head
26
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 26 Working towards that weird example… tgrep ‘/[Pp]resident/’ | head
27
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 27 Combining alternation and a regular expression tgrep ‘Clinton|Gore|[Pp]resident/’ | head
28
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 28 Searching for a transitive verb tgrep -w 'VP << like < NP << DT' | more
29
LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 29 Verbs + Particles tgrep -w 'VP kick tgrep 'VP << /kick.*/ <2 PRT' kick tgrep 'VP <1 VB <2 PRT' kick tgrep -nw 'VP <1 /VB.*/ <2 PRT' kick tgrep 'VP <1 (VB < kick) <2 PRT' kick tgrep 'VP <1 (/VB.*/ < kick) <2 PRT' kick
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.