Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular expressions are regular Marek Pawelec

Similar presentations


Presentation on theme: "Regular expressions are regular Marek Pawelec"— Presentation transcript:

1 Regular expressions are regular Marek Pawelec wasaty@wasaty.pl

2 Outline 1.Regex vocabulary 2.Segmentation rules 3.Regex tagger 4.Regex text filter 5.Auto-translatables

3 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d))

4 Wildcards... Wildcards used in regular search: * – any text string ? – any single character...but somewhat different.

5 Regular expressions. – any character (or symbol, digit...) [ ] – a range [123] – digit 1 or 2 or 3 [1-3] – any digit from 1 to 3 [A-Za-z] – any letter [^A] – any character except A | – or 1|2|3 – 1 or 2 or 3

6 Ranges Both [ ] and | means or. What is the difference? [USDEUR] matches U or S or D or E or U or R USD|EUR matches USD or EUR

7 Special symbols \ – modifier (escape character). any character, but \. means dot \\ matches backslash \d – digit [0-9] \s – white space \w – any word character [A-Za-z0-9_] \u#### – unicode character, e.g. \u2212: –

8 Quantifiers ? – 0 or 1 \d? means zero or one digit * – 0 or more \d* means zero or more digits + – 1 or more \d+ meands at least one digit *? – zero or as little as possible +? – one or as little as possible greedy lazy

9 Quantifiers cont. {num} – value or range \d{4} = 4 digits, \d{2,4} = 2, 3 or 4 digits \d{,4} = from 1 to 4 digits \d{4,} = 4 or more

10 Groups ( ) – creates a group ($num recalls it) (?: ) – passive group (not numbered)

11 Assertions (?= ) – look ahead assertion memo(?=Q) will match memo in memoQ, but not in memory (?! ) – negative look ahead assertion memo(?!Q) will match memo in memory, but not in memoQ (?<! ) – negative look back assertion (?<!s)and will match and in band, but not in sand

12 #lists# A list contains variables: #currency# (EUR|USD|GBP|HUF) #cap# (A|B|C|D) = [ABCD]

13 Regular expressions in memoQ Segmentation rules Regexp tagger Regexp text filter Auto-translatables

14 Segmentation rules

15

16

17

18

19 #end##!#[\s]+#cap# #end##!#[\s]+[\d] #end##!#[\s]+#lpar#[\s]*#cap# #end##!#[\s]+#lpar#[\s]*[\d] #end#[\s]*#rpar##!#[\s]+#cap# #end#[\s]*#rpar##!#[\s]+[\d]

20 #end##!#[\s]+#cap# #end##!#[\s]+[\d] #end##!#[\s]+#lpar#[\s]*#cap# #end##!#[\s]+#lpar#[\s]*[\d] #end#[\s]*#rpar##!#[\s]+#cap# #end#[\s]*#rpar##!#[\s]+[\d]

21

22

23

24

25 #end##!#[\s]+#cap# = [:\!\?\.]#!#\s+[A-Z]

26

27 #end##!#[\s]+#cap# Unless: #abbr_long##!#[\s]+#cap# [\s]#abbr_short##!#[\s]+#cap# \s#cap#\.#!#[\s]+#cap#

28

29

30 Regex tagger

31

32

33

34

35 \<C:.*\>

36

37

38

39

40

41

42 0990-4905 / N537-0392 \d{4} - \d{4} [A-Z]\d{3} - \d{4}

43

44

45

46 ERR_GRP_NO_SAMPLE [A-Z]+ _[A-Z]+( )+

47

48

49 Tip: Regex tagger without regex

50

51

52

53 Regexp text filter

54

55

56 *Popup "Putty" "c:\util\putty.exe" \s*\*(.*)

57

58

59

60 *Popup.icon="$IconDir$\Fav_Star.ico" "Quick" "!DynamicFolder:$QuickLaunch$*.lnk" \w+(\s+\w+)* " \w = [A-Za-z0-9_]

61

62 Auto-translatables

63

64

65

66 Rule for EN/DE/FR HU number format conversion (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

67 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

68 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

69 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

70 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

71 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

72 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

73 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

74 12 345,67 12,345,67 12,345.67 12.345,67 12.345.67 12 345,67 12 345.67 12345,67 12345.67.12,345,67,12,345.67 0 12.345,67 012.345.67 12 345,67,0 12 345.67.0 12345,67 0 12345.670

75 (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

76 Red elements are not necessary: (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $1 $2,$3

77 The same rule for EN HU only (?<!\d,|\d\.|\d) ([-–]?\d{2,3}),(\d{3})\.(\d+) (?!,\d|\.\d|\d) 12,345.67 12 345,67

78 (?<!\d,|\d\.|\d) ([-–]?\d{2,3}),(\d{3})\.(\d+) (?!,\d|\.\d|\d) 12,345.67 12 345,67

79 Day of the week, Month Day number (st, nd, rd, th) Year day of the week day number. month year

80

81

82 (#day#),?\s(#month#)\s (\d{1,2})(?:st|nd|rd|th)? \s(\d{4}) $1 $3. $2 $4

83 (#day#),?\s(#month#)\s(\d{1,2})(?:st|nd|rd|th)?\s(\d{4}) #day#:Friday piątek($1) #month#:May maja($2) 11th 11($3) 2012 2012($4) $1 $3. $2 $4

84 http://www.cheatography.com/davechild/ch eat-sheets/regular-expressions/http://www.cheatography.com/davechild/ch eat-sheets/regular-expressions/ http://www.regular- expressions.info/tutorial.htmlhttp://www.regular- expressions.info/tutorial.html http://regexlib.com

85


Download ppt "Regular expressions are regular Marek Pawelec"

Similar presentations


Ads by Google