LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong
Today's Topics A note on UTF-8 and PowerShell in Windows 10 Review Homework 6 Homework 7
Unicode and PowerShell Windows 10: Default console is not UTF-8 and uses ancient codepage technology (437 = US)! Set it to UTF-8. Note codepage change. Unfortunately, it now understands UTF-8, but fails to print the character!
Unicode and PowerShell Right-click menu bar Properties > Font Consult https://docs.micro soft.com/en- us/typography/fon t-list/ for the codepages that each font supports
Unicode and PowerShell Default console font is actually called Consolas Even the Lucida Console font family is limited.
Unicode and PowerShell Pick a known Japanese font licensed by Microsoft from Ricoh (Japan). MS Mincho
Unicode and PowerShell Et voilà!
Homework 6 Review Question 1: what's the difference between a) and b)? my @a = 4 x 4; my @a = (4) x 4; perl -le '@a = 4 x 4; print $#a; print "@a"' 4444 perl -le '@a = (4) x 4; print $#a; print "@a"' 3 4 4 4 4
Homework 6 Review read https://perldoc.perl.org/functions/split.html Question 2: what does split do here for a) vs. b)? my @a = split " ", 'this is a sentence.'; my @a = split //, 'this is a sentence.'; perl -le "@a = split \" \", 'this is a sentence.'; print \$#a; print \"@a\"" 3 this is a sentence. perl -le "@a = split //, 'this is a sentence.'; print \$#a; print \"@a\"" 18 t h i s i s a s e n t e n c e .
Homework 7 Read https://en.wikipedia.org/wiki/Disemvoweling Q1: Write a Perl program to remove vowels a, e, i, o, u from words typed into the command line. (Don't worry about y.) Hint: use split from HW 6 Question 2 Example:
Homework 7 A possible template for your code exists $vowel{$char}
Homework 7 Q2: Suppose we modified the program to print underscores instead of deleting vowels. Which quote below do you find easier to read? Translate the quote back into regular English orthography. All h_m_n b__ngs _r_ b_rn fr__ _nd _q__l _n d_gn_ty _nd r_ghts. Th_y _r_ _nd_w_d w_th r__s_n _nd c_nsc__nc_ _nd sh__ld _ct t_w_rds _n_ _n_th_r _n _ sp_r_t _f br_th_rh__d. All hmn bngs r brn fr nd ql n dgnty nd rghts. Thy r ndwd wth rsn nd cnscnc nd shld ct twrds n nthr n sprt f brthrhd. Due date: Monday night. One PDF file. Submit your code and example runs.
Python list ranges Perl has a range operator: .. less powerful in some ways, more powerful in others https://perldoc.perl.org/perlop.html#Range-Operators
Perl list ranges Python equivalent: for i in range(1,1000001): # code iterates setting $. (default variable) from 1, 2, .., 1000000
Perl: useful string functions chomp (useful with file I/O) vs. chop Note: multiple spaces ok with " " variant
Python: .split() string (sentence) splitting is an important part of text processing. Oftentimes we split strings by a regular expression: import re re.split(regex,s)
Perl: useful string functions Transliterate: tr/matchingcharacters/replacementcharacters/modifiers modifiers are optional:
Perl: useful string functions Perl doesn't have a built-in trim-whitespace-from-both-ends-of-a- string function. Can be mimicked using regex (more later) Python:
Python: strings Many methods that work on lists also work on strings
Python: strings List comprehension: sentence = ['A', 'big', 'cat', 'in', 'Tucson'] [x.lower() for x in sentence] Suppose we want to use .endswith() in a list comprehension: Reference: https://docs.python.org/3.7/library/stdtypes.html#text-sequence-type-str
Python: strings conditional list comprehensions