Download presentation
Presentation is loading. Please wait.
Published byJordan Cunningham Modified over 8 years ago
1
Editing Tons of Text? RegEx to the Rescue! Eric Cressey Senior UX Content Writer Symantec Corporation
2
Why regular expressions? http://xkcd.com/208/
3
Content maintenance adds up Properties and source code XML, HTML, structured content Localization Bugs
4
Sometimes we solve big problems Remove legacy HTML from 10,000+ page Flare project – One week of work instead of more than a month – Saved 4 weeks of work Update KB URLs in 20,000+ emails and files – Two weeks of work instead of months – Saved 10 weeks of work across multiple departments – No errors or missed references
5
Agenda 1.Basics 2.Syntax and examples 3.Tips for massive projects
6
Encountering regex in the wild can be scary ^[2-9]\d{2}-\d{3}-\d{4}$ ^#?([a-f0-9]{6}|[a-f0-9]{3})$ ^(?:(?:25[0-5]|2[0-4][0- 9]|[01]?[0-9][0- 9]?)\.){3}(?:25[0-5]|2[0-4][0- 9]|[01]?[0-9][0-9]?)$
7
What are they? Searches and regular expressions find patterns in text Add logic, precision, and flexibility to searches
8
Why use them?
9
Are there prerequisites? Your text editor must support regular expressions
10
How do you make them? 1.Start with the text you’re looking for 2.Identify a pattern 3.Add special characters 4.Test the regular expression to see if it matches what you want
11
Best practices 1.Use version control 2.Use a basic text editor 3.Test before replacing 4.Test again before committing
12
Syntax and examples Copyright © 2014 Symantec Corporation 12 1Dealing with variance 2Using positional context 3Matching unknown content 4Putting it all together: HTML patterns
13
Matching name variations
14
Start with what you know Regex Eric Text to match Eric Erik
15
Identify a pattern Regex Eri Text to match Eric Erik
16
Add special character and syntax Regex Eri[ck] Square braces define a set of allowed characters Text to match Eric Erik
17
This pattern also works Regex Eri(c|k) Or (Eric|Erik) Parenthesis group content together The pipe specifies OR logic Text to match Eric Erik
18
Matching URL variations
19
Start with what you know Regex https://www.symantec.com Include optional content Text to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com
20
Identify a pattern in the text you want to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com
21
Escape special characters with a backslash Regex https:\/\/www\.symantec\.com Text to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com
22
Add groups to logical sections with parentheses Regex (https:\/\/)(www\.)symantec\.com Text to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com
23
Indicate number of times to match each group Regex (https?:\/\/)?(www\.)?symantec\.co m +, *, or ? specifies how many times to match a group or character + one or more * zero or more ? zero or one Text to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com
24
Find first name when followed by last name
25
Start with what you know Regex Eric Text to match Eric Eric Creasey Eric Cressey Eric C
26
Add special characters and syntax Regex Eric(?= Cressey) (?=) is a positive lookahead. Eric is returned only if the next characters match the lookahead content Text to match Eric Eric Creasey Eric Cressey Eric C
27
How do positive lookaheads work? Eric(?= Cressey) 1.Finds “Eric” as usual 2.Evaluates the following content to see if it matches the lookahead content 3.If the content is the same, “Eric” is a match Eric Eric Creasey Eric Cressey Eric C
28
Find first name not followed by last name
29
There are negative lookaheads Regex Eric(?! Cressey) (?!) is a negative lookahead Eric is matched if the next characters do not match the lookahead content Text to match Eric Eric Creasey Eric Cressey Eric C
30
Find last name when it follows first name
31
There are also lookbehinds Regex (?<=Eric )Cressey (?<=) is a positive lookbehind Cressey is matched if the previous characters match the lookbehind content Text to match Eric Eric Creasey Eric Cressey Eric C Erik Cressey
32
How do positive lookbehinds work? (?<=Eric )Cressey 1.Evaluates each character to see if it follows “Eric ” 2.It gets to “C” and then evaluates the rest of the expression 3.Only the match outside the lookbehind is returned Eric Eric Creasey Eric Cressey Eric C Erik Cressey
33
Find last name when it doesn’t follow first name
34
There are negative lookbehinds Regex (?<!Eric )Cressey (?<!) is a negative lookbehind Cressey is matched if the previous characters do not match the lookbehind content Text to match Eric Eric Creasey Eric Cressey Eric C Bill Cressey Cressey
35
Get the value for a given string ID
36
Start with what you know Regex stringID= Text to match stringID=Hello, world! stringID= 안녕하세요, 세계 stringID=Hola món
37
Add special characters and syntax Regex (?<=stringID=).* Positive lookbehind means content must follow the string ID. (period) matches any character * is greedy and matches the previous character as many times as possible Text to match stringID=Hello, world! stringID= 안녕하세요, 세계 stringID=Hola món
38
Make sure your ampersands are encoded
39
Start with what you know Regex & Text to match & &
40
Add special characters and syntax Regex &(?!amp;) Only matches ampersand when not followed by amp; Useful if you don’t want to replace all occurrences Text to match & &
41
Get the content in an HTML tag
42
Start with what you know Regex.* Text to match Hello, world This is an example
43
Add special characters and syntax Regex (? ).*(?= ) You can use lookaheads and lookbehinds together Text to match Hello, world This is an example
44
Get a paragraph with a specific class
45
Start with what you know Regex Text to match Hello, world This is the second paragraph
46
Add syntax to match unknown content Regex.* Greedy matches return the longest match Text to match Hello, world This is the second paragraph
47
Temper greedy matches Regex.*? *? Lazy matches return the shortest match Text to match Hello, world This is the second paragraph
48
Get a paragraph based on one of many attributes
49
Use lazy matches to fill in unknown content Regex.*? Text to match Hello, world This is the second paragraph Goodbye
50
Multi-line replacements
51
Sometimes you want to insert multiple lines of text Find Hello Text to match Hello Replace with Hello Hi What’s up
52
You can use whitespace special characters in replacement text Find Hello Result Hello Hi What’s up Replace with Hello\nHi\nWhat’s up
53
Add tags around content
54
You can reference groups in replacement text Regex (.*?) Text to match This sentence has some legacy content we want to replace. Replacement $1 Updated text This sentence has some legacy content we want to replace.
55
Updating URLs
56
Groups are numbered sequentially Text to match http://www.verisign.de/support/mpki-for-ssl-support/ http://www.verisign.com/support/mpki-for-ssl-support/ http://www.verisign.com.au/support/mpki-for-ssl-support/ Replacement https://knowledge.symantec.com/$3$4 Updated text https://knowledge.symantec.com/de/support/mpki-for-ssl-support/index.html https://knowledge.symantec.com/support/mpki-for-ssl-support/index.html https://knowledge.symantec.com/au/support/mpki-for-ssl-support/index.html Regex http:\/\/www\.verisign(\.com)?(\.|\/)?(..\/)?(support\/mpki-for-ssl-support\/)
57
Let’s recap. Here’s what we’ve learned so far. Groups – OR logic – Using groups in replacement text Lookaheads and lookbehinds Special characters – frequency (*,+,?) – newlines (\n) – any character (.). – escape with backslash (\) if necessary
58
Tips for massive projects
59
The manual approach doesn’t scale well when… Multiple regex operations are needed Regex must be applied in a specific order You need to match a pattern within a pattern You are working with many files in many directories
60
Steps for manually editing files in a directory 1.Get all files in a directory 2.For each file: – If the extension is.properties,.xml, or.txt 1.Get the text. 2.Use regex to find and update URLs. 3.Save the file. 3.For each directory: 1. Repeat directory steps above.
61
Pseudo code for programmatically editing files Get all files in a directory For each file in directory If the extension is.properties,.xml, or.txt Get the text Use regex to find and update URLs Save the file For each directory, repeat directory steps above
62
Benefits of the programmatic approach Write each regex once You can perform them in a specific order Agile! Easy to update the program when requirements evolve Easy to test and iterate
63
You don’t have to start from scratch Get my basic program on GitHubGitHub Add regular expressions Visit eric.cressey.org for helpful resourceseric.cressey.org Feel free to ask me if you have questions
64
Takeaways If there’s a pattern, use regular expressions You only need to know a small part of regex syntax to automate most repetitive tasks You can save days or weeks of time on large projects
65
Resources Notepad++ - free text editor with regex support regex101.com - great for writing and testing your regex regex101.com eric.cressey.org - more regex tutorials eric.cressey.org
66
Thank you! Copyright © 2015 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice. Eric Cressey eric_cressey@symantec.com eric.cressey.org @Eric_Cressey
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.