Download presentation
Presentation is loading. Please wait.
1
ISBN 0-321-19362-8 Chapter 6 Data Types Character Strings Pattern Matching
2
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-2 Character String Types Values are sequences of characters Design issues: 1.Is it a primitive type or just a special kind of array? 2.How is it stored in memory? 3.Is the length of a string variable static or dynamic?
3
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-3 Character String Operations Assignment Comparison (==, >, etc.) Catenation –Sometimes an operator is provided (+ in Java,. in perl) –Some languages have a repetition operator (x in perl, * in python and ruby) Substring reference Pattern matching
4
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-4 Actual String Implementations Pascal –Not primitive; assignment and comparison only (of packed arrays) Ada, FORTRAN 90, and BASIC –Somewhat primitive –Assignment, comparison, catenation, substring reference –FORTRAN has an intrinsic for pattern matching –Ada code N := N1 & N2 (catenation) N(2..4) (substring reference)
5
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-5 Character String Implementations C and C++ –Not primitive - implemented as arrays of characters terminated by null character –Use char arrays and a library of functions that provide operations SNOBOL4 (a string manipulation language) –Primitive –Many operations, including elaborate pattern matching
6
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-6 Character String Implementations Java - String class (not arrays of char ) –Objects cannot be changed (immutable) –StringBuffer is a class for changeable string objects Javascript, Ruby – strings are objects with many operations Perl, PHP – strings are primitive
7
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-7 String Length Options 1. Static - FORTRAN 77, Ada, COBOL e.g. (FORTRAN 90) CHARACTER (LEN = 15) NAME; 2. Limited Dynamic Length - C and C++ actual length is indicated by a null character 3. Dynamic - SNOBOL4, Perl, JavaScript
8
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-8 Evaluation of String Types Aid to writability As a primitive type with static length, they are inexpensive to provide--why not have them? Dynamic length is nice, but is it worth the expense?
9
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-9 Character String Implementation Static length - compile-time descriptor Limited dynamic length - may need a run-time descriptor for length (but not in C and C++) Dynamic length - need run-time descriptor; allocation/deallocation is the biggest implementation problem
10
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-10 Character String Descriptors Compile-time descriptor for static strings Run-time descriptor for limited dynamic strings
11
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-11 Character String Interpolation Two literal representations of strings in many scripting languages –Single quoted strings are literals Every character inside is stored as written. (In some languages, a few characters may be treated specially.) These are like the double quoted strings in Java –Double quoted strings are interpolated Special characters have their regular meaning unless they have a backslash in front of them. Variable names are expanded, replaced by the value of the variable.
12
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-12 Pattern Matching A useful operation for strings Usually based on regular expressions Some languages have pattern matching built into the language (perl, python, ruby, …) Some languages implement pattern matching via external libraries or classes –Java has Pattern and Matcher classes
13
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-13 Recursive definition of a regular expressions Individual terminals are regular expressions If a and b are regular expressions so are –a | bchoice –absequence –(a)grouping –a*zero or more repetitions Nothing else is a regular expression
14
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-14 Examples Identifiers –letter(letter | digit)* Binary strings –(0 | 1)(0 | 1)* Binary strings divisible by 2 –(0 | 1)*0
15
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-15 Regular Expressions and Pattern Matching in Perl The operators =~ and !~ check for match and no match respectively. A pattern is enclosed between slashes as in /pattern/ If a pattern appears by itself, the variable $_ is checked for a match ^ at the beginning of the pattern means it must start at the beginning of the string $ at the end of a pattern means it must end at the end of the string
16
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-16 Example The following Perl script checks for the regular expression a+b+ #!/usr/bin/perl # This is a Perl script $_ = ; # Read into $_ if (/^a+b+$/) # Match $_ with a+b+ { print "yes\n";} else { print "no\n"; }
17
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-17 Pattern symbols Case insensitive\i Grouping() Choice| Between i and j occurrences{i, j} None of enclosed characters[^abc] One of enclosed characters[abc] 0 or 1 occurrences of previous character? 1 or more occurrences+ 0 or more occurrences* Any single character (except '\n'). MeaningSymbol
18
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-18 Character Classes There are several classes of characters that have special names \S \W \D Exclude Any whitespace\s Any letter, digit, or underscore \w Any digit\d Match
19
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-19 Anchors Used to specify position within a string \bpattern\b matches the word pattern but not patterned Not at word boundary\B Word boundary\b End of string$ Beginning of string^ PositionSymbol
20
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-20 String Manipulation split( /pattern/, string) splits a string into tokens using the pattern as the delimiter split( /, /, $fullName) tr/a..z/A..Z/ transliterates characters s/pattern/replacement/ replaces one occurrences of pattern in a string with replacement. A g after the last slash replaces all occurrences. –Use this same syntax for substitution in the vi editor
21
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.6-21 Pattern Memory Any part of a pattern that has parentheses around it will cause the matching text to be stored in pattern memory Within a pattern, you can use \1, \2, \3 to refer back to an earlier part of the pattern. This is called a back reference. After a match has completed, the variables $1, $2, … $9 contain the pieces from the last match.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.