Lecture 19 Strings and Regular Expressions

Slides:



Advertisements
Similar presentations
JavaScript I. JavaScript is an object oriented programming language used to add interactivity to web pages. Different from Java, even though bears some.
Advertisements

2-1. Today’s Lecture Review Chapter 4 Go over exercises.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
1 Strings and Text I/O. 2 Motivations Often you encounter the problems that involve string processing and file input and output. Suppose you need to write.
Java Programming Strings Chapter 7.
IT151: Introduction to Programming
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
Fundamental Programming Structures in Java: Strings.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
COMP More About Classes Yi Hong May 22, 2015.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
Last Updated March 2006 Slide 1 Regular Expressions.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
PHP Workshop ‹#› Data Manipulation & Regex. PHP Workshop ‹#› What..? Often in PHP we have to get data from files, or maybe through forms from a user.
Copyright 2006 by Pearson Education 1 Building Java Programs Chapters 3-4: Using Objects.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 9 More About Strings.
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Regular Expressions CSC207 – Software Design. Motivation Handling white space –A program ought to be able to treat any number of white space characters.
Regular Expressions.
Using Data Within a Program Chapter 2.  Classes  Methods  Statements  Modifiers  Identifiers.
 Pearson Education, Inc. All rights reserved Introduction to Java Applications.
Lecture 3 Decisions (Conditionals). One of the essential features of computer programs is their ability to make decisions. Like a train that changes tracks.
Clearly Visual Basic: Programming with Visual Basic 2008 Chapter 24 The String Section.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Introduction to Java Java Translation Program Structure
Chapter 7: Characters, Strings, and the StringBuilder.
Chapter 3: Classes and Objects Java Programming FROM THE BEGINNING Copyright © 2000 W. W. Norton & Company. All rights reserved Java’s String Class.
Announcements Quiz 1 Next Monday. int : Integer Range of Typically –2,147,483,648 to 2,147,483,647 (machine and compiler dependent) float : Real Number.
8 1 String Manipulation CGI/Perl Programming By Diane Zak.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
C++ Memory Management – Homework Exercises
Lecture 2 D&D Chapter 2 & Intro to Eclipse IDE Date.
Regular Expressions Upsorn Praphamontripong CS 1110
Regular Expressions 'RegEx'.
Lecture 14 Throwing Custom Exceptions
Strings and Serialization
Chapter 10 Selected Single-Row Functions Oracle 10g: SQL
Strings, Characters and Regular Expressions
Regular Expressions in Perl
Yanal Alahmad Java Workshop Yanal Alahmad
Lecture 9 Shell Programming – Command substitution
Lecture 4 D&D Chapter 5 Methods including scope and overloading Date.
CompSci 230 Software Construction
Strings Part 1 Taken from notes by Dr. Neil Moore
Primitive Types Vs. Reference Types, Strings, Enumerations
Week 6 Discussion Word Cloud.
Advanced Programming Behnam Hatami Fall 2017.
Subroutines Idea: useful code can be saved and re-used, with different data values Example: Our function to find the largest element of an array might.
CSS 161 Fundamentals of Computing Introduction to Computers & Java
Introduction to C++ Programming
MSIS 655 Advanced Business Applications Programming
CMSC 202 Java Primer 2.
Data Manipulation & Regex
elementary programming
Lecture Notes - Week 2 Lecture-1. Lecture Notes - Week 2 Lecture-1.
Introduction to Java Applications
Topics Basic String Operations String Slicing
String Processing 1 MIS 3406 Department of MIS Fox School of Business
String methods 26-Apr-19.
Topics Basic String Operations String Slicing
Regular Expressions in Java
REGEX.
Strings Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Topics Basic String Operations String Slicing
Presentation transcript:

Lecture 19 Strings and Regular Expressions D&D 14 Date

Goals By the end of this lesson, you should: Be able to compare strings and extract substrings Be able to use regular expressions to check whether a string contains certain patterns Be able to use regular expression-based pattern replacement

Substring extraction Substring extraction String comparison Search Regular expressions Pattern and Match Summary import java.util.Scanner; import java.io.PrintStream; public class TestRoomDemo { public static void main(String[] args) { PrintStream p = System.out; // Lazy programmers... Scanner s = new Scanner(System.in); p.println("Please enter your first name: "); String firstName = s.next(); p.println("Please enter your last name: "); String lastName = s.next(); String upi = firstName.substring(0, 1) + lastName.substring(0, 3) + "123"; upi = upi.toLowerCase(); p.println("Your UPI could be: " + upi); … The second parameter to substring() is the index of the character after the last character we want.

String comparison Substring extraction String comparison Search Regular expressions Pattern and Match Summary if (upi.compareTo("whsu014") == 0) { p.println("You're William and meant to supervise MLT3"); } else if (upi.compareTo("csee015") <= 0) { p.println("You should have been in 206-220."); … else if (upi.compareTo("vwon320") >= 0) { p.println("You should have been in 423-348."); else { p.println("Were you really meant to sit the test?"); The compareTo() method returns 0 if its parameter matches the string that the method is called on, a negative value if the parameter precedes the string alphabetically, and a positive value if the parameter comes after the string in the alphabet.

Searching in strings Substring extraction String comparison Search Regular expressions Pattern and Match Summary … else if (upi.compareTo("jpet145") <= 0) { p.print("You should have been in "); String theatre = "HSB370/201N-370"; int slashPos = theatre.indexOf("/"); p.println(theatre.substring(slashPos + 1)); } The indexOf() method finds the first occurrence of its argument in the string that we invoke the method on. If we need a subsequent occurrence, we can add a second (int) parameter that gives the index at which to start the search. Note that we can also use the substring() method with only one parameter. In this case, we get the substring from the given position to the end of the string.

Regular expressions Substring extraction String comparison Search Regular expressions Pattern and Match Summary public static void main(String[] args) { PrintStream p = System.out; // Lazy programmers... Scanner s = new Scanner(System.in); p.println("Please enter a UPI to check: "); String upiCandidate = s.next(); p.println("You entered \"" + upiCandidate + "\"."); if (upiCandidate.matches("^[a-z]{3,4}\\d{3}$")) { p.println("Looks like a UPI to me!"); } else { p.println("Sorry, that's not a UPI."); The matches() method returns true if the regular expression that is passed as the parameter matches the string that the method is called on as a whole, and false if it doesn’t. Regular expressions (regexes) are a very powerful way of checking the format of strings, or finding whether a string contains a particular type of substring.

Regular expressions Substring extraction String comparison Search Regular expressions Pattern and Match Summary Regular expressions in Java are just packaged in strings. This means that we also need to escape backslashes (\): … if (upiCandidate.matches("^[a-z]{3,4}\\d{3}$")) { So the actual regular expression here is: Translated into English, this means: ^: the pattern to match must start at the beginning of the string (note: ignored by matches() as it is the default!) [a-z]: a lowercase character from a to z {3,4}: the previous character pattern (or subpattern in parentheses) occurring 3 to 4 times \d: a digit from 0 to 9 {3}: the previous character pattern (or subpattern in parentheses) occurring 3 times $: the pattern to match must end at the end of the string (note: ignored by matches() as it is the default!) ^[a-z]{3,4}\d{3}$

Useful regex examples A NZ car number plate: Substring extraction String comparison Search Regular expressions Pattern and Match Summary A NZ car number plate: Note this also matches plates such as ABC1234 – which it really shouldn’t. Better: The parentheses form subpatterns, and the | means OR: Start of the string followed by a parenthesised pattern followed by the end of the string. The parenthesised pattern consists of one of two alternative subpatterns, also parenthesized: Two uppercase characters followed by 1-4 digits OR Three uppercase characters followed by 1-3 digits ^[A-Z]{2,3}\d{1,4}$ ^(([A-Z]{2}\d{1,4})|([A-Z]{3}\d{1,3}))$

Useful regex examples A COMPSCI course number: Substring extraction String comparison Search Regular expressions Pattern and Match Summary A COMPSCI course number: An IPv4 network address (four integer numbers between 0 and 255 separated by dots): This also matches non-sensical addresses such as 999.000.0.555 though. Better (all in one line): A \. means “match a dot” We need a backslash escape here because a dot on its own means “match any character” ^COMPSCI\d{3}$ ^(\d{1,3}\.){3}\d{1,3}$ ^((\d|([1-9]\d)|(1\d{2})|(2[0-4]\d)|(25[0-5]))\.){3} (\d|([1-9]\d)|(1\d{2})|(2[0-4]\d)|(25[0-5]))$

Regex multipliers Substring extraction String comparison Search Regular expressions Pattern and Match Summary A “?” means that the previous character / parenthesized pattern may occur 0 or 1 time. Match “user” or “users”: A “+” means “one or more”. Match a non-empty string: A “*” means “any number of times”, and a “[^x] ” means “anything but x”. Match a pair of curly braces in a string: Backslash escapes are generally required for characters that are part of regular expression syntax, but may often be omitted when the syntax element would make no sense in the position otherwise. E.g., here the syntax has no opening curly brace, so we need not worry about escaping closing braces: ^users?$ .+ \{[^\}]*\} \{[^}]*}

Regex macros We have already met “\d”, which represents “a digit”. Substring extraction String comparison Search Regular expressions Pattern and Match Summary We have already met “\d”, which represents “a digit”. There are more: \n, \t, \r are the usual backslash escapes for newline, tab and carriage return. \w is a “word character”: any letter, digit or underscore (anything you might find in a Java variable name!) \s is any whitespace character \D means “any character that is not a digit” \W means “any character that is not a word character” \S means “anything that isn’t whitespace of sorts”

Regex character classes Substring extraction String comparison Search Regular expressions Pattern and Match Summary We have already met these in the UPI and IP address examples. Further examples: [a-zA-Z]: any upper of lowercase alphabet character [a-c]: any lowercase character a to c [^xyz]: anything but x, y, or z [bcdfjlqsvxyz]: any character listed (you can use this class to see whether a word could be from te reo māori, where these letters don’t occur) [-abc]: a, b, c or a hyphen

More on regexes Substring extraction String comparison Search Regular expressions Pattern and Match Summary This was just a rough introduction. More under: https://docs.oracle.com/javase/tutorial/essential/regex/index.html Test your own regexes on your own string example with the RegexChecker lecture example:

Regex replaceAll() Substring extraction String comparison Search Regular expressions Pattern and Match Summary String input = "We have 4,999 apples at only $4.99 a kg. …"; String newPrice = "5.49"; input = input.replaceAll("4\\.99", newPrice); System.out.println(input); The replaceAll() method returns a string in which all patterns matching the regular expression in the first parameter are replaced by the string in the second parameter. See also replaceFirst().

Pattern and Matcher Substring extraction String comparison Search Regular expressions Pattern and Match Summary A regular expression specified as a string need to be compiled into a Pattern object before they can be used. Together with the string that is to be matched, the expression is then used to generate a Matcher object that takes care of the actual matching. In the case of the matches()and replaceAll() methods etc., the methods perform these two steps internally for us. If we need to match multiple times with the same expression or string, this is inefficient. In these cases, it is better to pre-compile the expression into a Pattern object and re-use it. A Pattern object also allows more flexible matching with various flags that let us modify the matching behaviour.

What do we know Substring extraction String comparison Search Regular expressions Pattern and Match Summary We can extract substrings with substring(), find substrings with indexOf(), and compare strings alphabetically with compareTo(). Regular expressions are a powerful way to search for and manipulate complex patterns in strings. Regular expression syntax means that syntax characters must be backslash-escaped if they are meant to represent their literal character. In a Java string, the backslash from the escape needs a second backslash! If we want to use a regular expression many times, we should use a Pattern object.

Resources & Homework Substring extraction String comparison Search Regular expressions Pattern and Match Summary D&D Chapter 14 https://docs.oracle.com/javase/tutorial/essential/regex/ https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html Homework: Write a Java program that takes a String input from the console and checks whether it is a UPI (3 to 4 letters followed by 3 digits), an AUID (student ID number, 7 or 9 digits), or a name, or an e-mail address. Names for this purpose can contain any letters from the English alphabet, apostrophes or hyphens between letters, and spaces between parts of the name.

Next Lecture File I/O (Chapter 15)