Jordi Cortadella Department of Computer Science

Slides:



Advertisements
Similar presentations
Strings Testing for equality with strings.
Advertisements

Chapter 3 DATA: TYPES, CLASSES, AND OBJECTS. Chapter 3 Data Abstraction Abstract data types allow you to work with data without concern for how the data.
Character and String definitions, algorithms, library functions Characters and Strings.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 8: User-Defined Simple Data Types, Namespaces, and the string Type.
Chapter 7. 2 Objectives You should be able to describe: The string Class Character Manipulation Methods Exception Handling Input Data Validation Namespaces.
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
Introduction to Programming (in C++) Data types and visibility Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. Computer Science, UPC.
CPS120: Introduction to Computer Science Lecture 8.
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
1 Lab Session-III CSIT-120 Fall 2000 Revising Previous session Data input and output While loop Exercise Limits and Bounds Session III-B (starts on slide.
Elements of a C++ program 1. Review Algorithms describe how to solve a problem Structured English (pseudo-code) Programs form that can be translated into.
Reasoning with invariants Jordi Cortadella Department of Computer Science.
Data types and their representation Jordi Cortadella Department of Computer Science.
STRING Dong-Chul Kim BioMeCIS UTA 10/7/
Dr. Yang, QingXiong (with slides borrowed from Dr. Yuen, Joe) LT8: Characters and Strings CS2311 Computer Programming.
Chars and strings Jordi Cortadella Department of Computer Science.
Introduction to Programming (in C++) Data structures Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC.
Matrices Jordi Cortadella Department of Computer Science.
Numeric Types, Expressions, and Output ROBERT REAVES.
Prime numbers Jordi Cortadella Department of Computer Science.
CS1 Lesson 2 Introduction to C++ CS1 Lesson 2 -- John Cole1.
Program A computer program (also software, or just a program) is a sequence of instructions written in a sequence to perform a specified task with a computer.
Introduction to Programming (in C++) Algorithms on sequences. Reasoning about loops: Invariants. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept.
Sequences Jordi Cortadella Department of Computer Science.
Search algorithms for vectors Jordi Cortadella Department of Computer Science.
CPS120: Introduction to Computer Science
Introduction to Programming (in C++) Loops Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC.
Data & Data Types & Simple Math Operation 1 Data and Data Type Standard I/O Simple Math operation.
Data structure design Jordi Cortadella Department of Computer Science.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Arrays.
Sudoku Jordi Cortadella Department of Computer Science.
First steps Jordi Cortadella Department of Computer Science.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 8: User-Defined Simple Data Types, Namespaces, and the string Type.
Characters and Strings. Characters  New primitive char  char letter; letter = ‘a’; char letter2 = ‘C’;  Because computers can only represent numbers,
CS31: Introduction to Computer Science I Discussion 1A 4/16/2010 Sungwon Yang
Introduction to Programming (in C++) Numerical algorithms Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC.
ICT Introduction to Programming Chapter 4 – Control Structures I.
CPS120: Introduction to Computer Science Variables and Constants.
Vectors Jordi Cortadella Department of Computer Science.
Characters and Strings
A FIRST BOOK OF C++ CHAPTER 14 THE STRING CLASS AND EXCEPTION HANDLING.
Instructor: Alexander Stoytchev CprE 185: Intro to Problem Solving (using C)
ICS102 Lecture 8 : Boolean Expressions King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science.
Chapter 4: Control Structures I (Selection). Objectives In this chapter, you will: – Learn about control structures – Examine relational operators – Discover.
 Type Called bool  Bool has only two possible values: True and False.
7 - Programming 7J, K, L, M, N, O – Handling Data.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Chapter 8: Namespaces, the class string, and User-Defined Simple Data Types.
Lecture 19 Strings and Regular Expressions
Reasoning with invariants
Data Transfer ASCII FILES.
Chapter 3 Branching Statements
Compiler Design First Lecture.
COMP3221: Microprocessors and Embedded Systems
Primitive Types Vs. Reference Types, Strings, Enumerations
Logical Operators & Truth Tables.
String Encodings and Penny Math
Jordi Cortadella Department of Computer Science
CEV208 Computer Programming
Chapter 4 Selection.
Search algorithms for vectors
Jordi Cortadella Department of Computer Science
The Data Element.
Introduction to Computer Science
CS150 Introduction to Computer Science 1
String Encodings and Penny Math
The Data Element.
In your notebook… Purpose: What types of data can we store in C
An Introduction to STL.
3.0 - Design A software design specifies how a program will accomplish its requirements A design includes one or more algorithms to accomplish its goal.
COMPUTING.
Presentation transcript:

Jordi Cortadella Department of Computer Science Chars and strings Jordi Cortadella Department of Computer Science

Representation of characters (char) Character (char). Represent letters, digits, punctuation marks and control characters. Every character is represented by a code (integer number). There are various standard codes: American Standard Code for Information Interchange (ASCII) Unicode (wider than ASCII) Some characters are grouped by families (uppercase letters, lowercase letters and digits). Characters in a family have consecutive codes: 'a'…'z', 'A'…'Z', '0'…'9' Operators: given the integer encoding, arithmetic operators can be used, even though only addition and subtraction make sense, e.g. 'C'+1='D', 'm'+4='q', 'G'-1='F'. Introduction to Programming © Dept. CS, UPC

Representation of characters (char) ASCII code Introduction to Programming © Dept. CS, UPC

Strings Represent sequences of characters. Examples "Hello, world!", "This is a string", ":-)", "3.1416" "" is the empty string (no characters) 'A' is a character, "A" is a string Introduction to Programming © Dept. CS, UPC

Strings Strings can be treated as vectors of characters. Variables can be declared as follows: string s1; string s2 = “abc”; string s3(10,'x'); Note: use #include <string> in the header of a program using strings. Introduction to Programming © Dept. CS, UPC

Strings Examples of the operations we can do on strings: Comparisons: ==, !=, <, >, <=, >= Order relation assuming lexicographical order. Access to an element of the string: s3[i] Length of a string: s.size() Introduction to Programming © Dept. CS, UPC

String matching String x appears as a substring of string y at position i if y[i…i+x.size()-1] = x Example: “tree” is the substring of “the tree there” at position 4. Problem: given x and y, return the smallest i such that x is the substring of y at position i. Return -1 if x does not appear in y. Introduction to Programming © Dept. CS, UPC

String matching Solution: search for such i For every i, check whether x = y[i..i+x.size()-1] In turn, this is a search for a possible mismatch between x and y: a position j where x[j]  y[i+j] If there is no mismatch, we have found the desired i. As soon as a mismatch is found, we proceed to the next i. Introduction to Programming © Dept. CS, UPC

String matching // Returns the smallest i such that x == y[i..i+x.size()-1]. // Returns -1 if x does not appear in y int substring(const string& x, const string& y); y: X: Introduction to Programming © Dept. CS, UPC

String matching y: X: i j // Returns the smallest i such that x == y[i..i+x.size()-1]. // Returns -1 if x does not appear in y int substring(const string& x, const string& y); i y: X: j Introduction to Programming © Dept. CS, UPC

Beware of the lazy evaluation when j == x.size() String matching int substring(const string& x, const string& y) { // Inv: x is not a substring of y at positions 0..i-1 for (int i = 0; i <= y.size() – x.size(); ++i) { int j = 0; // Inv: x[0..j-1] == y[i..i+j-1] while (j < x.size() and x[j] == y[i + j]) ++j; if (j == x.size()) return i; } return -1; } Beware of the lazy evaluation when j == x.size() Introduction to Programming © Dept. CS, UPC

String matching A more compact solution with only one loop (but less redable). int substring(const string& x, const string& y) { int i = 0, j = 0; // Inv: x[0..j-1] == y[i..i+j-1] // and x does not appear in y in positions 0..i-1 while (i + x.size() <= y.size() and j < x.size()) { if (x[j] == y[i + j]) ++j; else { j = 0; ++i; } } if (j == x.size()) return i; return -1; } Introduction to Programming © Dept. CS, UPC

Anagrams An anagram is a pair of sentences that contain exactly the same letters, even though they may appear in a different order. Non-alphabetic characters are ignored. Example: AVE MARIA, GRATIA PLENA, DOMINUS TECUM VIRGO SERENA, PIA, MUNDA ET IMMACULATA Introduction to Programming © Dept. CS, UPC

Anagrams Design a function that checks that two strings are an anagram. The function has the following specification: // Returns true if s1 and s2 are an anagram, // and false otherwise. bool anagram(const string& s1, const string& s2); Introduction to Programming © Dept. CS, UPC

Anagrams A possible strategy for solving the problem could be as follows: First, we read the first sentence and count the number of occurrences of each letter. The occurrences can be stored in a vector. Next, we read the second sentence and discount the appearance of each letter. If a counter becomes negative, the sentences are not an anagram. At the end, all occurrences must be zero. Introduction to Programming © Dept. CS, UPC

Anagrams bool anagram(const string& s1, const string& s2) { const int N = int('z') - int('a') + 1; vector<int> count(N, 0); // Read the first sentence for (int i = 0; i < s1.size(); ++i) { char c = s1[i]; if (c >= 'a' and c <= 'z') ++count[int(c)-int('a')]; else if (c >= 'A' and c <= 'Z') ++count[int(c)-int('A')]; } // Read the second sentence for (int i = 0; i < s2.size(); ++i) { char c = s2[i]; if (c >= 'a' and c <= 'z') c = c – int('a') + int('A'); if (c >= 'A' and c <= 'Z') { // Discount if it is a letter int k = int(c) - int(‘A’); --count[k]; if (count[k] < 0) return false; } } // Check that the two sentences are an anagram for (int i = 0; i < N; ++i) { if (count[i] != 0) return false; } return true; } Introduction to Programming © Dept. CS, UPC

Summary Strings can be accessed as arrays of chars. String matching (or string searching) is a very frequent operation in file editing and web searching. Recommendation: design algorithms that are independent from the specific encoding of chars. Introduction to Programming © Dept. CS, UPC