CSCE 121:509-512 Introduction to Program Design and Concepts J. Michael Moore Spring 2015 Set 20: Regular Expressions 1.

Slides:



Advertisements
Similar presentations
Chapter 23 Text Processing Bjarne Stroustrup
Advertisements

“Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom.
National Constitution Day 1 st Amendment White Out! Activity.
Chapter 14 Perl-Compatible Regular Expressions Part 1.
Introduction to First Amendment Law. The First Amendment “Congress shall make no law respecting an establishment of religion, or prohibiting the free.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
“TIPS” PRESENTATION BY BILL MULQUEEN MAY 16 & 17, 2000.
IS 1181 IS 118 Introduction to Development Tools Chapter 4 String Manipulation and Regular Expressions.
First Amendment of the United States Constitution (1791) “Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise.
The First Amendment. Actual Text Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging.
Constitution Sydney Werlein, Ali Voss, Brian Jones.
What you will learn today: 1 What is the Bill of Rights? 2 What does the 1 st Amendment to the Constitution say about Freedom of Speech? 3 What are Civil.
The First Amendment Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom.
UNIX Filters.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
The First Amendment.
Last Updated March 2006 Slide 1 Regular Expressions.
CSCE 121: Introduction to Program Design and Concepts J. Michael Moore Fall 2014 Set 11: More Input and Output 1 Based on slides created by Bjarne.
MODULE 3: RESPONSIBILITY. As responsible journalists, staffs have obligations. Legal decisions have affected students’ rights. Statement of policy can.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
PHP Using Strings 1. Replacing substrings (replace certain parts of a document template; ex with client’s name etc) mixed str_replace (mixed $needle,
American Government and Politics (POLS 122) Professor Jonathan Day.
2.6 Protecting Individual Citizens 1 st & 4 th Amendments In Depth Government & Citizenship Timpanogos High School.
CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.
SIXTH GRADE WRITING CLASS “FREEDOM OF SPEECH” IN THE.
BANNED BOOKS. #1! 2CvlU.
Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
The Bill of Rights. Congress shall make no law The Bill of Rights Congress shall make no law a) respecting an establishment of religion,
The Constitution and your First Amendment Rights.
Basics of Religious Rights. 1 st Amendment Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof;
Daily Agenda Eng IV 6/19/13. Bellwork Take 5 minutes to respond to the quote. We will share after that. For John…Solve World Hunger in 3 steps or less.
The first amendment What it is and how it affects American media today.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Operators Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Amendment a·mend·ment P Pronunciation Key ( -m nd m nt) n. Pronunciation Key 1. The act of changing for the better; improvement:
MODULE 3: RESPONSIBILITY Responsibility Student journalists on the yearbook staff should follow important legal and ethical GUIDELINES. AS RESPONSIBLE.
Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of.
The first amendment What it is and how it affects American journalism.
The 1 st Amendment. Brainstorm… Imagine you are in a club or a group and you have a super important message. You need as many people as possible to hear.
Amendment I Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech,
Chapter 23 Text Processing Bjarne Stroustrup
Civics. 1 st amendment Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the.
What does it mean to be an American? In one word … From Sam Chaltain’s First Amendment 101 Part 1 80:Video:1236.
Objective  The students will:  Understand the differences between the “establishment” clause and the “free exercise” clause. Agenda  BOR review  1.
LEA 2 Cours de civilisation américaine J. Kempf Americans and religion 1.Centrality in American life 2.An ambiguous separation of churches and State 3.The.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
How does the 1 st Amendment apply today? SS8. Warm-Up: Re-write the following text in your own words… (pg. 34, left of your notebook) Congress shall make.
THE FIRST AMENDMENT EXPLAINED.
The First Amendment Journalism I Mr. Bruno. First Amendment to the Constitution Congress shall make no law respecting an establishment of religion, or.
Regular Expressions Upsorn Praphamontripong CS 1110
Perl-Compatible Regular Expressions Part 1
What is it and how does it affect American journalism?
The First Amendment.
1st Amendment Court Cases
Chapter 23 Text Processing
Chapter 23 Text Processing
Personal protections and liberties added to the Constitution for you!
Using Church Records for Historical and Genealogical Research
Chapter 23 Text Processing
Limiting Constitutional Rights: A Balancing Act
Americans and religion
The First Amendment Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom.
The First Amendment!.
REGEX.
Newspaper bhspioneerspirit.
Presentation transcript:

CSCE 121: Introduction to Program Design and Concepts J. Michael Moore Spring 2015 Set 20: Regular Expressions 1

CSCE 121: Set 20: Regular Expressions Text Processing 2 “all we know can be represented as text” – And often is Books, articles Transaction logs ( , phone, bank, sales, …) Web pages (even the layout instructions) Tables of figures (numbers) Graphics (vectors) Mail Programs Measurements Historical data Medical records … Amendment I Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the government for a redress of grievances.

CSCE 121: Set 20: Regular Expressions A problem: Read a ZIP code U.S. state abbreviation and ZIP code – two letters followed by five digits string s; while (cin>>s) { if (s.size()==7 && isletter(s[0]) && isletter(s[1]) && isdigit(s[2]) && isdigit(s[3]) && isdigit(s[4]) && isdigit(s[5]) && isdigit(s[6])) cout << "found " << s << '\n'; } Brittle, messy, unique code 3

CSCE 121: Set 20: Regular Expressions Problems with simple solution It’s verbose (4 lines, 8 function calls) We miss (intentionally?) every ZIP code number not separated from its context by whitespace – "TX77845", TX , and ATM77845 We miss (intentionally?) every ZIP code number with a space between the letters and the digits – TX We accept (intentionally?) every ZIP code number with the letters in lower case – tx77845 If we decided to look for a postal code in a different format we would have to completely rewrite the code – CB3 0DS, DK-8000 Arhus 4

CSCE 121: Set 20: Regular Expressions TX st try:wwddddd 2 nd (remember -1234):wwddddd-dddd What’s “special”? 3 rd :\w\w\d\d\d\d\d-\d\d\d\d 4 th (make counts explicit):\w2\d5-\d4 5 th (and “special”): \w{2}\d{5}-\d{4} But was optional? 6 th : \w{2}\d{5}(-\d{4})? We wanted an optional space after TX 7 th (invisible space): \w{2} ?\d{5}(-\d{4})? 8 th (make space visible): \w{2}\s?\d{5}(-\d{4})? 9 th (lots of space – or none):\w{2}\s*\d{5}(-\d{4})? 5

CSCE 121: Set 20: Regular Expressions Delimiters and Special Characters C++ strings mark beginning and end with " – What if we actually want to use the " symbol? – Escape… use \ before So use \" What if we actually want to use the \ symbol? – Use \\ Regular Expressions has it’s own rules – Put a regular expression in to a c++ string?? – Combine rules… 6

CSCE 121: Set 20: Regular Expressions Not standard C++ Regex is not part of c++ standard… yet Use external Libraries, e.g. Boost Usage varies slightly… check documentation 7

CSCE 121: Set 20: Regular Expressions #include using namespace std; int main() { ifstream in("file.txt");// input file if (!in) cerr << "no file\n"; regex pat ("\\w{2}\\s*\\d{5}(-\\d{4})?");// ZIP pattern // cout << "pattern: " << pat << '\n'; // printing of patterns is not C++11 // … } 8

CSCE 121: Set 20: Regular Expressions int lineno = 0; string line;// input buffer while (getline(in,line)) { ++lineno; smatch matches;// matched strings go here if (regex_search(line, matches, pat)) { cout << lineno << ": " << matches[0] << '\n'; // whole match if (1<matches.size() && matches[1].matched) cout << "\t: " << matches[1] << '\n‘; // sub-match } 9

CSCE 121: Set 20: Regular Expressions Results Input:address TX77845 ffff tx asasasaa ggg TX howdy zzz TX sss ggg TX cvzcv TX sdsas xxxTx77845xxx TX Output:pattern: "\w{2}\s*\d{5}(-\d{4})?" 1: TX : tx : TX : : TX : : Tx : TX :

CSCE 121: Set 20: Regular Expressions Regular Expression Syntax Regular expressions have a thorough theoretical foundation based on state machines – You can mess with the syntax, but not much with the semantics The syntax is terse, cryptic, boring, useful – Go learn it 11

CSCE 121: Set 20: Regular Expressions Characters with Special Meaning.Any single character (a ‘wildcard’) [Character class ( )Begin / end grouping \Next character has special meaning |Alternative (or) ^start of line; negation $end of line 12

CSCE 121: Set 20: Regular Expressions Repetition *zero or more +one or more ?Optional (zero or one) {n}exactly n times {n,}n or more times {n,m}at least n and at most m times 13

CSCE 121: Set 20: Regular Expressions Character Classes \da decimal digit \la lowercase letter \sa space (space, tab, etc.) \ua uppercase character \wa letter (a-z or A-Z) or digit (0-9) or an underscore (_) 14

CSCE 121: Set 20: Regular Expressions Character Classes \Dnot \d (a decimal digit) \Lnot \l (a lowercase letter) \Snot \s [a space (space, tab, etc.)] \Unot \u (a uppercase character) \Wnot \w [a letter (a-z or A-Z) or digit (0-9) or an underscore (_)] 15

CSCE 121: Set 20: Regular Expressions Examples – Xa{2,3}// Xaa Xaaa – Xb{2}// Xbb – Xc{2,}// Xcc Xccc Xcccc Xccccc … – \w{2}-\d{4,5}// \w is letter \d is digit – (\d*:)?(\d+) // 124: : – Subject: (FW:|Re:)?(.*) //. (dot) matches any character – [a-zA-Z] [a-zA-Z_0-9]*// identifier – [^aeiouy] // not an English vowel 16

CSCE 121: Set 20: Regular Expressions 17

CSCE 121: Set 20: Regular Expressions Acknowledgments Photo on title slide: paper-old / paper-old / Slides are based on those for the textbook: _text.ppt _text.ppt Regex cartoon: 18