Download presentation
Presentation is loading. Please wait.
1
Processing and Manipulating Text
Strings and RegEx Processing and Manipulating Text SoftUni Team Technical Trainers Software University © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
2
Table of Contents What is a String? Manipulating Strings
Comparing, Concatenating, Searching Extracting Substrings, Splitting Building and Modifying Strings Why the + Operator is Slow? Using the StringBuilder Class © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
3
Table of Contents (2) Regular Expressions Regular Expressions in C#
Characters Operators Constructs Regular Expressions in C# © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
4
Questions sli.do #Tech
5
What is a String? Strings are represented by System.String objects in .NET Framework String objects contain an immutable (read-only) sequence of characters Before initializing, a string variable has null value
6
Comparing, Concatenating, Searching, Extracting Substrings, Splitting
* Manipulating Strings Comparing, Concatenating, Searching, Extracting Substrings, Splitting (c) 2007 National Academy for Software Development - All rights reserved. Unauthorized copying or re-distribution is strictly prohibited.*
7
Comparing Strings Several ways to compare two strings:
Dictionary-based string comparison Case-insensitive Case-sensitive int result = string.Compare(str1, str2, true); // result == 0 if str1 equals str2 // result < 0 if str1 is before str2 // result > 0 if str1 is after str2 int result = string.Compare(str1, str2, false);
8
Comparing Strings (2) Equality checking by operator ==
Performs case-sensitive comparison Using the case-sensitive Equals() method The same effect like the operator == if (str1 == str2) { … } if (str1.Equals(str2)) { … }
9
Concatenating Strings
There are two ways to combine strings: Using the Concat() method Using the + or the += operators Any object can be appended to a string string str = string.Concat(str1, str2); string str = str1 + str2 + str3; string str += str1; string name = "Peter"; int age = 22; string s = name + " " + age; // "Peter 22"
10
Searching in Strings Finding a character or substring within given string str.IndexOf(string term) – returns the index of the first occurrence of term in str Returns -1 if there is no match str.LastIndexOf(string term) – returns the index of the last occurrence of term in str string = int firstIndex = // 5 int noIndex = .IndexOf("/"); // -1 string verse = "To be or not to be.."; int lastIndex = verse.LastIndexOf("be"); // 16
11
Problem: Count substring occurrences
You are given a text and a pattern Find how many times that pattern is in the text. Overlapping is allowed ababa caba aba aaaaaa aa 3 5 Welcome to SoftUni Java Check your solution here:
12
Solution: Count substring occurrences
string input = Console.ReadLine().ToLower(); string pattern = Console.ReadLine().ToLower(); int counter = 0; int index = input.IndexOf(pattern); while (index != -1) { counter++; index = input.IndexOf(pattern, index + 1) } Console.WriteLine(counter);
13
Extracting Substrings
str.Substring(int startIndex, int length) str.Substring(int startIndex) string filename string name = filename.Substring(8, 8); // name is Rila2005 string filename string nameAndExtension = filename.Substring(8); // nameAndExtension is Summer2009.jpg 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 C : \ P i c s R l a . j p g
14
Splitting Strings To split a string by given separator(s) use the following method: Example: string[] Split(params char[] separator) string listOfBeers = "Amstel, Zagorka, Tuborg, Becks."; string[] beers = listOfBeers.Split(' ', ',', '.'); Console.WriteLine("Available beers are:"); foreach (string beer in beers) { Console.WriteLine(beer); }
15
Other String Operations
* Other String Operations Replacing Substrings, Deleting Substrings, Changing Character Casing, Trimming (c) 2007 National Academy for Software Development - All rights reserved. Unauthorized copying or re-distribution is strictly prohibited.*
16
Replacing and Deleting Substrings
str.Replace(string match, string term) – replaces all occurrences of given string with another The result is a new string (strings are immutable) str.Remove(int index, int length) – deletes part of a string and produces a new string as result string cocktail = "Vodka + Martini + Cherry"; string replaced = cocktail.Replace("+", "and"); // Vodka and Martini and Cherry string price = "$ "; string lowPrice = price.Remove(2, 3); // $ 4567
17
Problem: Text filter You are given a text and a string of banned words
Replace all banned words in the text. You should replace with "*", equal to the word's length Linux, Windows It is not Linux, it is GNU/Linux. Linux is merely the kernel, while GNU adds the functionality... It is not *****, it is GNU/*****. ***** is merely the kernel, while GNU adds the functionality... Check your solution here:
18
Contains(…) checks if string contains another string
Solution: Text Filter string[] banWords = Console.ReadLine() .Split(…); // ToDo: Add seprators string text = Console.ReadLine(); foreach (var banWord in banWords) { if (text.Contains(banWord)) text = text.Replace(banWord, new string('*', banWord.Length)); } Console.WriteLine(text); Contains(…) checks if string contains another string
19
Changing Character Casing
Using the method ToLower() Using the method ToUpper() string alpha = "aBcDeFg"; string lowerAlpha = alpha.ToLower(); // abcdefg Console.WriteLine(lowerAlpha); string alpha = "aBcDeFg"; string upperAlpha = alpha.ToUpper(); // ABCDEFG Console.WriteLine(upperAlpha);
20
Trimming White Space str.Trim() – trims whitespaces at start and end of string str.Trim(params char[] chars) str.TrimStart() and str.TrimEnd() string s = " example of white space "; string clean = s.Trim(); Console.WriteLine(clean); // example of white space string s = " \t\nHello!!! \n"; string clean = s.Trim(' ', ',' ,'!', '\n','\t'); Console.WriteLine(clean); // Hello string s = " C# "; string clean = s.TrimStart(); // clean = "C# ";
21
String Operations Exercises in class
22
Building and Modifying Strings
Using the StringBuilder Class
23
StringBuilder: How It Works?
Capacity StringBuilder: Length = 9 Capacity = 15 H e l o , C # ! used buffer (Length) unused buffer StringBuilder keeps a buffer memory, allocated in advance Most operations use the buffer memory and do not allocate new objects
24
Changing the Contents of a String
* Changing the Contents of a String Use the System.Text.StringBuilder class for modifiable strings of characters: Use StringBuilder if you need to keep adding characters to a string public static string ReverseString(string s) { StringBuilder sb = new StringBuilder(); for (int i = s.Length - 1; i >= 0; i--) sb.Append(s[i]); } return sb.ToString(); Introducing the StringBuffer Class StringBuffer represents strings that can be modified and extended at run time. The following example creates three new String objects, and copies all the characters each time a new String is created: String quote = "Fasten your seatbelts, "; quote = quote + "it's going to be a bumpy night."; It is more efficient to preallocate the amount of space required using the StringBuffer constructor, and its append() method as follows: StringBuffer quote = new StringBuffer(60); // allocate 60 chars quote.append("Fasten your seatbelts, "); quote.append(" it's going to be a bumpy night. "); StringBuffer also provides a number of overloaded insert() methods for inserting various types of data at a particular location in the string buffer. Instructor Note The example in the slide uses StringBuffer to reverse the characters in a string. A StringBuffer object is created, with the same length as the string. The loop traverses the String parameter in reverse order and appends each of its characters to the StringBuffer object by using append(). The StringBuffer therefore holds a reverse copy of the String parameter. At the end of the method, a new String object is created from the StringBuffer object, and this String is returned from the method. (c) 2007 National Academy for Software Development - All rights reserved. Unauthorized copying or re-distribution is strictly prohibited.*
25
The StringBuilder Class
StringBuilder(int capacity) constructor allocates in advance buffer of size capacity Capacity holds the currently allocated space (in characters) this[int index] (indexer in C#) gives access to the char value at given position Length holds the length of the string in the buffer
26
The StringBuilder Class (2)
sb.Append(…) appends a string or another object after the last character in the buffer sb.Remove(int startIndex, int length) removes the characters in given range sb.Insert(int index, string str) inserts given string (or object) at given position sb.Replace(string oldStr, string newStr) replaces all occurrences of a substring sb.ToString() converts the StringBuilder to String
27
Demo: String concatenation
Given the code below try to optimize it to go under 10 secs. Do not change the loop nor the Convert.ToString() method
28
Groups, Quantifiers, Anchors
Regular Expressions Groups, Quantifiers, Anchors © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
29
Character Classes [nvj] matches any character that is either n, v or j
[^abc] – matches any character that is not a, b or c [0-9] - Character range: Мatches any digit frm 0 to 9 node.js v0.12.2 Abraham Lincoln In 1519 Leonardo da Vinci died at the age of 67.
30
Character Classes (2) \w – Matches any word character (a-z, A-Z, 0-9, _) \W – Matches any non-word character (the opposite of \w) \s – Matches any white-space character \S – Matches any non-white-space character (opposite of \s) \d – Matches any decimal digit \D – Matches any non-decimal digit (opposite of \d) aBcd 09_ &*^ Ю-Я aBcd 09_ &*^ Ю-Я \w – Matches any word character (a-z, A-Z, 0-9, _) \W – Matches any non-word character (the opposite of \w) \s – Matches any white-space character \S – Matches any non-white-space character (opposite of \s) \d – Matches any decimal digit \D – Matches any non-digit character (opposite of \d) © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
31
Quantifiers * - Matches the previous element zero or more times
+ - Matches the previous element one or more times ? - Matches the previous element zero or one time + \+\d* => + \+\d+ => \+\d? => +
32
Character Escapes Sometimes you will need to look for special characters like new lines or tabulations Then we have a new line This is a “tab” Name: Peter Phone: We can use character escapes in our RegEx like that: Name:\t\w+\nPhone:\s*\+\d+
33
Anchors ^ - The match must start at the beginning of the string or line $ - The match must occur at the end of the string Example – username validation pattern: Note: Test one by one, $ asserts string end ^\w{6,12}$ jeff_butt short johnny too_long_username
34
Problem: Match full name
You are given a sequence of words Find those who are full names You can use RegXr or Regex101 © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
35
Grouping Constructs (subexpression) - captures the matched subexpression and assigns it a number (?:subexpression) – Defines a non-capturing group \d{2}-(\w{3})-\d{4} => 22-Jan-2015 ^(?:Hi|hello),\s*(\w+)$ => Hi, Peter
36
Backreference Constructs
37
Backreference Constructs
\number – matches the value of a numbered subexpression \k<name> – matches the value of a named expression 05/08/2016 \d{2}(-|\/)\d{2}\1\d{4} => 05/08/2016 \d{2}(?<del>-|\/)\d{2}\k<del>\d{4} =>
38
Playing with RegEx Exercises in class
39
Using Built-In Regex Classes
Regular Expressions Using Built-In Regex Classes
40
Regex C# supports a built-in regular expression class - Regex
Located in System.Text.RegularExpressions namespace string pattern Regex regex = new Regex(pattern);
41
Validating String By Pattern
IsMatch(string text) – determines whether the text matches the pattern string text = "Today is "; string pattern Regex regex = new Regex(pattern); bool containsValidDate = regex.IsMatch(text); Console.WriteLine(containsValidDate); // True
42
Checking for a Single Match
Match(string text) – returns the first match that corresponds to the pattern string text = "Nakov: 123"; string pattern (\d+)"; Regex regex = new Regex(pattern); Match match = regex.Match(text); Console.WriteLine(match.Groups.Count); // 3 Console.WriteLine("Matched text: \"{0}\"", match.Groups[0]); Console.WriteLine("Name: {0}", match.Groups[1]); // Nakov Console.WriteLine("Number: {0}", match.Groups[2]); // 123
43
Checking for Matches Matches(string text) – returns a collection of matching strings that correspond to the pattern string text = "Nakov: 123, Branson: 456"; string pattern (\d+)"; Regex regex = new Regex(pattern); MatchCollection matches = regex.Matches(text, pattern); Console.WriteLine("Found {0} matches", matches.Count); foreach (Match match in matches) { Console.WriteLine("Name: {0}", match.Groups[1]); } // Found 2 matches // Name: Nakov // Name: Branson
44
Replacing With Regex Replace(string text, string replacement) – replaces all strings that match the pattern with the provided replacement string text = "Nakov: 123, Branson: 456"; string pattern string replacement = "999"; Regex regex = new Regex(pattern); string result = regex.Replace(text, replacement); Console.WriteLine(result); // Nakov: 999, Branson: 999
45
Problem: Replace <a> tag
You are given a html text. Replace all <a> tags in it with [URL]. <ul> <li> <a href=" </li> </ul> <ul> <li> [URL href=" </li></ul> Check your solution here:
46
Solution: Replace <a> tag
string text = Console.ReadLine(); while (text != "end") { string pattern = @"<a.*?href.*?=(.*)>(.*?)<\/a>"; string replace href=$1]$2[/URL]"; string replaced = Regex.Replace(text, pattern, replace); Console.WriteLine(replaced); text = Console.ReadLine(); } You can try also: @"<a.*href=((?:.|\n)*?(?=>))>((?:.|\n)*?(?=<))<\/a>" © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
47
Splitting With Regex Split(string text) – splits the text by the pattern Returns string[] string text = " "; string pattern string[] results = Regex.Split(text, pattern); Console.WriteLine(string.Join(", ", results)); // 1, 2, 3, 4
48
Built-in RegEx Exercises in class
49
Summary Strings are immutable sequences of characters
* Summary Strings are immutable sequences of characters Changes to the string create a new object, instead of modifying the old one StringBuilder offers good performance Regular expressions describe patterns for searching through text Define special characters, operators and constructs for building complex patterns (c) 2007 National Academy for Software Development - All rights reserved. Unauthorized copying or re-distribution is strictly prohibited.*
50
Version control systems
© Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
51
License This course (slides, examples, demos, videos, homework, etc.) is licensed under the "Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International" license Attribution: this work may contain portions from "Fundamentals of Computer Programming with C#" book by Svetlin Nakov & Co. under CC-BY-SA license "C# Part I" course by Telerik Academy under CC-BY-NC-SA license "C# Part II" course by Telerik Academy under CC-BY-NC-SA license © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
52
Free Trainings @ Software University
Software University Foundation – softuni.org Software University – High-Quality Education, Profession and Job for Software Developers softuni.bg Software Facebook facebook.com/SoftwareUniversity Software YouTube youtube.com/SoftwareUniversity Software University Forums – forum.softuni.bg © Software University Foundation – This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.