Download presentation
Presentation is loading. Please wait.
Published byRachel Erickson Modified over 10 years ago
1
Regular expressions are regular Marek Pawelec wasaty@wasaty.pl
2
Outline 1.Regex vocabulary 2.Segmentation rules 3.Regex tagger 4.Regex text filter 5.Auto-translatables
3
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d))
4
Wildcards... Wildcards used in regular search: * – any text string ? – any single character...but somewhat different.
5
Regular expressions. – any character (or symbol, digit...) [ ] – a range [123] – digit 1 or 2 or 3 [1-3] – any digit from 1 to 3 [A-Za-z] – any letter [^A] – any character except A | – or 1|2|3 – 1 or 2 or 3
6
Ranges Both [ ] and | means or. What is the difference? [USDEUR] matches U or S or D or E or U or R USD|EUR matches USD or EUR
7
Special symbols \ – modifier (escape character). any character, but \. means dot \\ matches backslash \d – digit [0-9] \s – white space \w – any word character [A-Za-z0-9_] \u#### – unicode character, e.g. \u2212: –
8
Quantifiers ? – 0 or 1 \d? means zero or one digit * – 0 or more \d* means zero or more digits + – 1 or more \d+ meands at least one digit *? – zero or as little as possible +? – one or as little as possible greedy lazy
9
Quantifiers cont. {num} – value or range \d{4} = 4 digits, \d{2,4} = 2, 3 or 4 digits \d{,4} = from 1 to 4 digits \d{4,} = 4 or more
10
Groups ( ) – creates a group ($num recalls it) (?: ) – passive group (not numbered)
11
Assertions (?= ) – look ahead assertion memo(?=Q) will match memo in memoQ, but not in memory (?! ) – negative look ahead assertion memo(?!Q) will match memo in memory, but not in memoQ (?<! ) – negative look back assertion (?<!s)and will match and in band, but not in sand
12
#lists# A list contains variables: #currency# (EUR|USD|GBP|HUF) #cap# (A|B|C|D) = [ABCD]
13
Regular expressions in memoQ Segmentation rules Regexp tagger Regexp text filter Auto-translatables
14
Segmentation rules
19
#end##!#[\s]+#cap# #end##!#[\s]+[\d] #end##!#[\s]+#lpar#[\s]*#cap# #end##!#[\s]+#lpar#[\s]*[\d] #end#[\s]*#rpar##!#[\s]+#cap# #end#[\s]*#rpar##!#[\s]+[\d]
20
#end##!#[\s]+#cap# #end##!#[\s]+[\d] #end##!#[\s]+#lpar#[\s]*#cap# #end##!#[\s]+#lpar#[\s]*[\d] #end#[\s]*#rpar##!#[\s]+#cap# #end#[\s]*#rpar##!#[\s]+[\d]
25
#end##!#[\s]+#cap# = [:\!\?\.]#!#\s+[A-Z]
27
#end##!#[\s]+#cap# Unless: #abbr_long##!#[\s]+#cap# [\s]#abbr_short##!#[\s]+#cap# \s#cap#\.#!#[\s]+#cap#
30
Regex tagger
35
\<C:.*\>
42
0990-4905 / N537-0392 \d{4} - \d{4} [A-Z]\d{3} - \d{4}
46
ERR_GRP_NO_SAMPLE [A-Z]+ _[A-Z]+( )+
49
Tip: Regex tagger without regex
53
Regexp text filter
56
*Popup "Putty" "c:\util\putty.exe" \s*\*(.*)
60
*Popup.icon="$IconDir$\Fav_Star.ico" "Quick" "!DynamicFolder:$QuickLaunch$*.lnk" \w+(\s+\w+)* " \w = [A-Za-z0-9_]
62
Auto-translatables
66
Rule for EN/DE/FR HU number format conversion (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
67
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
68
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
69
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
70
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
71
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
72
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
73
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
74
12 345,67 12,345,67 12,345.67 12.345,67 12.345.67 12 345,67 12 345.67 12345,67 12345.67.12,345,67,12,345.67 0 12.345,67 012.345.67 12 345,67,0 12 345.67.0 12345,67 0 12345.670
75
(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4
76
Red elements are not necessary: (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $1 $2,$3
77
The same rule for EN HU only (?<!\d,|\d\.|\d) ([-–]?\d{2,3}),(\d{3})\.(\d+) (?!,\d|\.\d|\d) 12,345.67 12 345,67
78
(?<!\d,|\d\.|\d) ([-–]?\d{2,3}),(\d{3})\.(\d+) (?!,\d|\.\d|\d) 12,345.67 12 345,67
79
Day of the week, Month Day number (st, nd, rd, th) Year day of the week day number. month year
82
(#day#),?\s(#month#)\s (\d{1,2})(?:st|nd|rd|th)? \s(\d{4}) $1 $3. $2 $4
83
(#day#),?\s(#month#)\s(\d{1,2})(?:st|nd|rd|th)?\s(\d{4}) #day#:Friday piątek($1) #month#:May maja($2) 11th 11($3) 2012 2012($4) $1 $3. $2 $4
84
http://www.cheatography.com/davechild/ch eat-sheets/regular-expressions/http://www.cheatography.com/davechild/ch eat-sheets/regular-expressions/ http://www.regular- expressions.info/tutorial.htmlhttp://www.regular- expressions.info/tutorial.html http://regexlib.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.