Presentation is loading. Please wait.

Presentation is loading. Please wait.

Editing Tons of Text? RegEx to the Rescue! Eric Cressey Senior UX Content Writer Symantec Corporation.

Similar presentations


Presentation on theme: "Editing Tons of Text? RegEx to the Rescue! Eric Cressey Senior UX Content Writer Symantec Corporation."— Presentation transcript:

1 Editing Tons of Text? RegEx to the Rescue! Eric Cressey Senior UX Content Writer Symantec Corporation

2 Why regular expressions? http://xkcd.com/208/

3 Content maintenance adds up Properties and source code XML, HTML, structured content Localization Bugs

4 Sometimes we solve big problems Remove legacy HTML from 10,000+ page Flare project – One week of work instead of more than a month – Saved 4 weeks of work Update KB URLs in 20,000+ emails and files – Two weeks of work instead of months – Saved 10 weeks of work across multiple departments – No errors or missed references

5 Agenda 1.Basics 2.Syntax and examples 3.Tips for massive projects

6 Encountering regex in the wild can be scary ^[2-9]\d{2}-\d{3}-\d{4}$ ^#?([a-f0-9]{6}|[a-f0-9]{3})$ ^(?:(?:25[0-5]|2[0-4][0- 9]|[01]?[0-9][0- 9]?)\.){3}(?:25[0-5]|2[0-4][0- 9]|[01]?[0-9][0-9]?)$

7 What are they? Searches and regular expressions find patterns in text Add logic, precision, and flexibility to searches

8 Why use them?

9 Are there prerequisites? Your text editor must support regular expressions

10 How do you make them? 1.Start with the text you’re looking for 2.Identify a pattern 3.Add special characters 4.Test the regular expression to see if it matches what you want

11 Best practices 1.Use version control 2.Use a basic text editor 3.Test before replacing 4.Test again before committing

12 Syntax and examples Copyright © 2014 Symantec Corporation 12 1Dealing with variance 2Using positional context 3Matching unknown content 4Putting it all together: HTML patterns

13 Matching name variations

14 Start with what you know Regex Eric Text to match Eric Erik

15 Identify a pattern Regex Eri Text to match Eric Erik

16 Add special character and syntax Regex Eri[ck] Square braces define a set of allowed characters Text to match Eric Erik

17 This pattern also works Regex Eri(c|k) Or (Eric|Erik) Parenthesis group content together The pipe specifies OR logic Text to match Eric Erik

18 Matching URL variations

19 Start with what you know Regex https://www.symantec.com Include optional content Text to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com

20 Identify a pattern in the text you want to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com

21 Escape special characters with a backslash Regex https:\/\/www\.symantec\.com Text to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com

22 Add groups to logical sections with parentheses Regex (https:\/\/)(www\.)symantec\.com Text to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com

23 Indicate number of times to match each group Regex (https?:\/\/)?(www\.)?symantec\.co m +, *, or ? specifies how many times to match a group or character + one or more * zero or more ? zero or one Text to match www.symantec.com http://www.symantec.com https://www.symantec.com symantec.com http://symantec.com https://symantec.com

24 Find first name when followed by last name

25 Start with what you know Regex Eric Text to match Eric Eric Creasey Eric Cressey Eric C

26 Add special characters and syntax Regex Eric(?= Cressey) (?=) is a positive lookahead. Eric is returned only if the next characters match the lookahead content Text to match Eric Eric Creasey Eric Cressey Eric C

27 How do positive lookaheads work? Eric(?= Cressey) 1.Finds “Eric” as usual 2.Evaluates the following content to see if it matches the lookahead content 3.If the content is the same, “Eric” is a match Eric Eric Creasey Eric Cressey Eric C

28 Find first name not followed by last name

29 There are negative lookaheads Regex Eric(?! Cressey) (?!) is a negative lookahead Eric is matched if the next characters do not match the lookahead content Text to match Eric Eric Creasey Eric Cressey Eric C

30 Find last name when it follows first name

31 There are also lookbehinds Regex (?<=Eric )Cressey (?<=) is a positive lookbehind Cressey is matched if the previous characters match the lookbehind content Text to match Eric Eric Creasey Eric Cressey Eric C Erik Cressey

32 How do positive lookbehinds work? (?<=Eric )Cressey 1.Evaluates each character to see if it follows “Eric ” 2.It gets to “C” and then evaluates the rest of the expression 3.Only the match outside the lookbehind is returned Eric Eric Creasey Eric Cressey Eric C Erik Cressey

33 Find last name when it doesn’t follow first name

34 There are negative lookbehinds Regex (?<!Eric )Cressey (?<!) is a negative lookbehind Cressey is matched if the previous characters do not match the lookbehind content Text to match Eric Eric Creasey Eric Cressey Eric C Bill Cressey Cressey

35 Get the value for a given string ID

36 Start with what you know Regex stringID= Text to match stringID=Hello, world! stringID= 안녕하세요, 세계 stringID=Hola món

37 Add special characters and syntax Regex (?<=stringID=).* Positive lookbehind means content must follow the string ID. (period) matches any character * is greedy and matches the previous character as many times as possible Text to match stringID=Hello, world! stringID= 안녕하세요, 세계 stringID=Hola món

38 Make sure your ampersands are encoded

39 Start with what you know Regex & Text to match & &

40 Add special characters and syntax Regex &(?!amp;) Only matches ampersand when not followed by amp; Useful if you don’t want to replace all occurrences Text to match & &

41 Get the content in an HTML tag

42 Start with what you know Regex.* Text to match Hello, world This is an example

43 Add special characters and syntax Regex (? ).*(?= ) You can use lookaheads and lookbehinds together Text to match Hello, world This is an example

44 Get a paragraph with a specific class

45 Start with what you know Regex Text to match Hello, world This is the second paragraph

46 Add syntax to match unknown content Regex.* Greedy matches return the longest match Text to match Hello, world This is the second paragraph

47 Temper greedy matches Regex.*? *? Lazy matches return the shortest match Text to match Hello, world This is the second paragraph

48 Get a paragraph based on one of many attributes

49 Use lazy matches to fill in unknown content Regex.*? Text to match Hello, world This is the second paragraph Goodbye

50 Multi-line replacements

51 Sometimes you want to insert multiple lines of text Find Hello Text to match Hello Replace with Hello Hi What’s up

52 You can use whitespace special characters in replacement text Find Hello Result Hello Hi What’s up Replace with Hello\nHi\nWhat’s up

53 Add tags around content

54 You can reference groups in replacement text Regex (.*?) Text to match This sentence has some legacy content we want to replace. Replacement $1 Updated text This sentence has some legacy content we want to replace.

55 Updating URLs

56 Groups are numbered sequentially Text to match http://www.verisign.de/support/mpki-for-ssl-support/ http://www.verisign.com/support/mpki-for-ssl-support/ http://www.verisign.com.au/support/mpki-for-ssl-support/ Replacement https://knowledge.symantec.com/$3$4 Updated text https://knowledge.symantec.com/de/support/mpki-for-ssl-support/index.html https://knowledge.symantec.com/support/mpki-for-ssl-support/index.html https://knowledge.symantec.com/au/support/mpki-for-ssl-support/index.html Regex http:\/\/www\.verisign(\.com)?(\.|\/)?(..\/)?(support\/mpki-for-ssl-support\/)

57 Let’s recap. Here’s what we’ve learned so far. Groups – OR logic – Using groups in replacement text Lookaheads and lookbehinds Special characters – frequency (*,+,?) – newlines (\n) – any character (.). – escape with backslash (\) if necessary

58 Tips for massive projects

59 The manual approach doesn’t scale well when… Multiple regex operations are needed Regex must be applied in a specific order You need to match a pattern within a pattern You are working with many files in many directories

60 Steps for manually editing files in a directory 1.Get all files in a directory 2.For each file: – If the extension is.properties,.xml, or.txt 1.Get the text. 2.Use regex to find and update URLs. 3.Save the file. 3.For each directory: 1. Repeat directory steps above.

61 Pseudo code for programmatically editing files Get all files in a directory For each file in directory If the extension is.properties,.xml, or.txt Get the text Use regex to find and update URLs Save the file For each directory, repeat directory steps above

62 Benefits of the programmatic approach Write each regex once You can perform them in a specific order Agile! Easy to update the program when requirements evolve Easy to test and iterate

63 You don’t have to start from scratch Get my basic program on GitHubGitHub Add regular expressions Visit eric.cressey.org for helpful resourceseric.cressey.org Feel free to ask me if you have questions

64 Takeaways If there’s a pattern, use regular expressions You only need to know a small part of regex syntax to automate most repetitive tasks You can save days or weeks of time on large projects

65 Resources Notepad++ - free text editor with regex support regex101.com - great for writing and testing your regex regex101.com eric.cressey.org - more regex tutorials eric.cressey.org

66 Thank you! Copyright © 2015 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice. Eric Cressey eric_cressey@symantec.com eric.cressey.org @Eric_Cressey


Download ppt "Editing Tons of Text? RegEx to the Rescue! Eric Cressey Senior UX Content Writer Symantec Corporation."

Similar presentations


Ads by Google