Regular expressions are regular Marek Pawelec

Slides:



Advertisements
Similar presentations
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
Advertisements

Laws (Properties) of Logarithms
Filters using Regular Expressions grep: Searching a Pattern.
Note-taking  Identify the main points  Use bullet points to make a list  Make a table  Use abbreviations  Create a concept map or use a visual organiser.
STROUD Worked examples and exercises are in the text The logarithm segment of …
Objectives: 1.Be able to simplify expressions by applying the Rules of exponents Critical Vocabulary: Product of Powers Property Power of a Power Property.
Prerequisite Skills VOCABULARY CHECK
Post-Module JavaScript BTM 395: Internet Programming.
WEB DESIGN. warming up WEB DESIGN  Look at the HOMEPAGE OF YAHOO on page 56  Answer the following questions: 1) Why do people create and publish web.
Geometry Vocabulary- transformation- the change in the size, shape, or position of a figure. translation- slide--every point in a figure moves the same.
Do Now   Describe the translations in words (x, y)  (x – 5, y + 3) (x, y)  (x + 2, y - 1) (x, y)  (x + 0, y + 2)
Exponents Lesson 4-2.
Chapter 8 The Tangent Space. Contents: 8.1 The Tangent Space at a Point 8.2 The Differential of a Map 8.3 The Chain Rule 8.4 Bases for the Tangent Space.
Students will be able to: Translate between words and algebra; Evaluate algebraic expressions and use exponents. Objectives 1.1.
Notes Over 2.8 Rules for Dividing Negative Numbers. ( Same as Multiplying ) If there is an even number of negative numbers, then the answer is Positive.
Regular Expressions Pattern and String Matching in Text.
Regular Expressions (RegEx). Regular expression is like another language What is a regular expression? Literal (or normal characters) – Alphanumeric abc…ABC…
VARIABLES & EXPRESSIONS 1-1. VOCABULARY Variable – a letter that stands for a number Variable Expression – mathematical phrase that uses variables, numerals,
Algebra 1A Vocabulary 1-2 Review Problem 5 Suppose you draw a segment from any one vertex of a regular polygon to the other vertices. A sample for a regular.
CSE 202 – Formal Languages and Automata Theory 1 REGULAR EXPRESSION.
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
Geometry 12-5 Symmetry. Vocabulary Image – The result of moving all points of a figure according to a transformation Transformation – The rule that assigns.
1-3 Variables and Expressions Objectives: - Evaluate expressions containing variables - Translate verbal phrases into algebraic expressions.
1-3 Variables and Expressions Objectives: - Evaluate expressions containing variables - Translate verbal phrases into algebraic expressions.
Math Expression Evaluation With RegEx and Finite State Machines.
Lesson 7.1: 1.Vocabulary of Transformations 2.Symmetry 3.Property of Reflection.
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
Parallel embedded system design lab 이청용 Chapter 2 (2.6~2.7)
4 Starting Tips to Keep Your Car in Top Condition
Test1 Here some text. Text 2 More text.
Sunshine State Standards
Do Now: Using the picture below, decide whether the statements are true or false.
Graphing.
Lesson 4-1 Pages Factors and Monomials.
Polygons Similar and Congruent
Come in READY TO LEARN!!! HW: Maintenance Sheet 23
Combining Like-Terms with Distributive Property
HARMONICS AND FILTERS.
Selective Regular Expression Matching
Les Devoirs 5ème Year 8 Autumn T1.
Multiplying and Dividing Integers Unit 1 Lesson 11
Chapter 10 Image Segmentation.
Close Reading: Understanding Questions
Search and Find.
Your text here SPIN 3 Segments tekhnologic. Your text here SPIN 3 Segments tekhnologic.
[type text here] [type text here] [type text here] [type text here]
Operations with Imaginary Numbers
Your text here Your text here Your text here Your text here Your text here Pooky.Pandas.
Use Segments and Congruence
Section Name: Translations
Lessons Vocabulary Access 2016.
3.1 “I Can draw circles, and identify and determine relationships among the radius, diameter, center, and circumference.”
Your text here Your text here Your text here Your text here
Define evaluate and compare functions
La télé LEARNING OBJECTIVE: to talk about tv programmes and express whether you like them or not SUCCESS CRITERIA: Grade D+ detailed description of tv.
Format for Outlines Procedures.
[type text here] [type text here] [type text here] [type text here]
3.1 “I Can draw circles, and identify and determine relationships among the radius, diameter, center, and circumference.”
PolyAnalyst Web Report Training
Translations day 2 Today we are going to learn how to PARTITION a line segment into any ratio, ALGEBRAICALLY.
3.3 Day 1 - Interior Angles of Polygons
What is each an example of?
Prime Factorization FACTOR TREE.
A relation is a set of ordered pairs.
Lab 8: Regular Expressions
Chapter 5: Graphs & Functions
Area of Regular Polygons and Circles
Algebraic Expressions
Messages that need to be sent to communicate what needs to be communicated etcetera. Messages that need to be sent to communicate what needs to be communicated.
Presentation transcript:

Regular expressions are regular Marek Pawelec

Outline 1.Regex vocabulary 2.Segmentation rules 3.Regex tagger 4.Regex text filter 5.Auto-translatables

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d))

Wildcards... Wildcards used in regular search: * – any text string ? – any single character...but somewhat different.

Regular expressions. – any character (or symbol, digit...) [ ] – a range [123] – digit 1 or 2 or 3 [1-3] – any digit from 1 to 3 [A-Za-z] – any letter [^A] – any character except A | – or 1|2|3 – 1 or 2 or 3

Ranges Both [ ] and | means or. What is the difference? [USDEUR] matches U or S or D or E or U or R USD|EUR matches USD or EUR

Special symbols \ – modifier (escape character). any character, but \. means dot \\ matches backslash \d – digit [0-9] \s – white space \w – any word character [A-Za-z0-9_] \u#### – unicode character, e.g. \u2212: –

Quantifiers ? – 0 or 1 \d? means zero or one digit * – 0 or more \d* means zero or more digits + – 1 or more \d+ meands at least one digit *? – zero or as little as possible +? – one or as little as possible greedy lazy

Quantifiers cont. {num} – value or range \d{4} = 4 digits, \d{2,4} = 2, 3 or 4 digits \d{,4} = from 1 to 4 digits \d{4,} = 4 or more

Groups ( ) – creates a group ($num recalls it) (?: ) – passive group (not numbered)

Assertions (?= ) – look ahead assertion memo(?=Q) will match memo in memoQ, but not in memory (?! ) – negative look ahead assertion memo(?!Q) will match memo in memory, but not in memoQ (?<! ) – negative look back assertion (?<!s)and will match and in band, but not in sand

#lists# A list contains variables: #currency# (EUR|USD|GBP|HUF) #cap# (A|B|C|D) = [ABCD]

Regular expressions in memoQ Segmentation rules Regexp tagger Regexp text filter Auto-translatables

Segmentation rules

#end##!#[\s]+#cap# #end##!#[\s]+[\d] #end##!#[\s]+#lpar#[\s]*#cap# #end##!#[\s]+#lpar#[\s]*[\d] #end#[\s]*#rpar##!#[\s]+#cap# #end#[\s]*#rpar##!#[\s]+[\d]

#end##!#[\s]+#cap# #end##!#[\s]+[\d] #end##!#[\s]+#lpar#[\s]*#cap# #end##!#[\s]+#lpar#[\s]*[\d] #end#[\s]*#rpar##!#[\s]+#cap# #end#[\s]*#rpar##!#[\s]+[\d]

#end##!#[\s]+#cap# = [:\!\?\.]#!#\s+[A-Z]

#end##!#[\s]+#cap# Unless: #abbr_long##!#[\s]+#cap# [\s]#abbr_short##!#[\s]+#cap# \s#cap#\.#!#[\s]+#cap#

Regex tagger

\<C:.*\>

/ N \d{4} - \d{4} [A-Z]\d{3} - \d{4}

ERR_GRP_NO_SAMPLE [A-Z]+ _[A-Z]+( )+

Tip: Regex tagger without regex

Regexp text filter

*Popup "Putty" "c:\util\putty.exe" \s*\*(.*)

*Popup.icon="$IconDir$\Fav_Star.ico" "Quick" "!DynamicFolder:$QuickLaunch$*.lnk" \w+(\s+\w+)* " \w = [A-Za-z0-9_]

Auto-translatables

Rule for EN/DE/FR HU number format conversion (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

12 345,67 12,345,67 12, , , , ,345,67,12, , ,67, ,

(?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $2 $3,$4

Red elements are not necessary: (?<!(,|\.|\d|\d\s|\d'|\d)) ([-|\u2212]?[\d]{2,3}) (?:\.|,|\s|'|)(\d\d\d)(?:\.|,) ([\d]{1,2}|[\d]{4,}) (?!(,\d|\.\d|\d|\s\d|'\d|\d)) $1 $2,$3

The same rule for EN HU only (?<!\d,|\d\.|\d) ([-–]?\d{2,3}),(\d{3})\.(\d+) (?!,\d|\.\d|\d) 12, ,67

(?<!\d,|\d\.|\d) ([-–]?\d{2,3}),(\d{3})\.(\d+) (?!,\d|\.\d|\d) 12, ,67

Day of the week, Month Day number (st, nd, rd, th) Year day of the week day number. month year

(#day#),?\s(#month#)\s (\d{1,2})(?:st|nd|rd|th)? \s(\d{4}) $1 $3. $2 $4

(#day#),?\s(#month#)\s(\d{1,2})(?:st|nd|rd|th)?\s(\d{4}) #day#:Friday piątek($1) #month#:May maja($2) 11th 11($3) ($4) $1 $3. $2 $4

eat-sheets/regular-expressions/ eat-sheets/regular-expressions/ expressions.info/tutorial.htmlhttp:// expressions.info/tutorial.html