Computation with strings 1 Day 2 - 8/31/16

Slides:



Advertisements
Similar presentations
Computer Science & Engineering 2111 Text Functions 1CSE 2111 Lecture-Text Functions.
Advertisements

CS 100: Roadmap to Computing Fall 2014 Lecture 0.
Character and String definitions, algorithms, library functions Characters and Strings.
String and Lists Dr. Benito Mendoza. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list List.
Strings and regular expressions Day 10 LING Computational Linguistics Harry Howard Tulane University.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
PHP Workshop ‹#› PHP: The Basics. PHP Workshop ‹#› What is it? PHP is a scripting language commonly used on web servers. –Stands for “PHP: Hypertext Preprocessor”
Using Data Active Server Pages Objectives In this chapter, you will: Learn about variables and constants Explore application and session variables Learn.
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
CS 100: Roadmap to Computing Fall 2014 Lecture 01.
COMPUTATION WITH STRINGS 4 DAY 5 - 9/05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Chapter 8 Cookies And Security JavaScript, Third Edition.
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
Structured programming 3 Day 33 LING Computational Linguistics Harry Howard Tulane University.
COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Strings CS303E: Elements of Computers and Programming.
Built-in Data Structures in Python An Introduction.
Object-Oriented Program Development Using Java: A Class-Centered Approach, Enhanced Edition.
Internet & World Wide Web How to Program, 5/e © by Pearson Education, Inc. All Rights Reserved.
COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
OCR Computing GCSE © Hodder Education 2013 Slide 1 OCR GCSE Computing Python programming 8: Fun with strings.
8-1 Compilers Compiler A program that translates a high-level language program into machine code High-level languages provide a richer set of instructions.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
COMPUTATION WITH STRINGS 3 DAY 4 - 9/03/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Strings CSE 1310 – Introduction to Computers and Programming Alexandra Stefan University of Texas at Arlington 1.
REEM ALMOTIRI Information Technology Department Majmaah University.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
Python Basics.
String and Lists Dr. José M. Reyes Álamo.
Lists 1 Day /17/14 LING 3820 & 6820 Natural Language Processing
Topic: Python Lists – Part 1
Chapter 6 JavaScript: Introduction to Scripting
CSc 120 Introduction to Computer Programing II Adapted from slides by
Chapter 8 Text Files We have, up to now, been storing data only in the variables and data structures of programs. However, such data is not available.
Containers and Lists CIS 40 – Introduction to Programming in Python
CS 100: Roadmap to Computing
Primitive Data Types August 28, 2006 ComS 207: Programming I (in Java)
Lists 2 Day /19/14 LING 3820 & 6820 Natural Language Processing
CMPT 120 Topic: Python strings.
Computation with strings 2 Day 3 - 9/02/16
Variables, Expressions, and IO
Introduction to Scripting
JavaScript: Functions.
Computation with strings 3 Day 4 - 9/07/16
Strings Part 1 Taken from notes by Dr. Neil Moore
CISC101 Reminders Quiz 2 this week.
Learning to Program in Python
Regular expressions 2 Day /23/16
© Akhilesh Bajaj, All rights reserved.
Chapter 8 JavaScript: Control Statements, Part 2
Winter 2018 CISC101 12/1/2018 CISC101 Reminders
Data types Numeric types Sequence types float int bool list str
PHP.
T. Jumana Abu Shmais – AOU - Riyadh
String and Lists Dr. José M. Reyes Álamo.
Regular expressions 3 Day /26/16
Fundamentals of Python: First Programs
Contents Preface I Introduction Lesson Objectives I-2
15-110: Principles of Computing
CISC101 Reminders Assignment 2 due today.
JavaScript: Objects.
Introduction to Computer Science
Chapter 3: Selection Structures: Making Decisions
Computation with strings 4 Day 5 - 9/09/16
CS 100: Roadmap to Computing
Strings Taken from notes by Dr. Neil Moore & Dr. Debby Keen
CMPT 120 Topic: Python strings.
Control 1 Day /30/16 LING 3820 & 6820 Natural Language Processing
Presentation transcript:

Computation with strings 1 Day 2 - 8/31/16 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University

Course organization http://www.tulane.edu/~howard/NLP/ Is there anyone here that wasn't here on Monday? I have changed dates of last two quizzes in the 1.1.7. Schedule of assignments. I have updated 2. An Introduction to Python. Please go over it at your leisure and familiarize yourself with the command-line interface. I have probably done as much work as I have time for on 3. Natural language processing. We have a lot to do in a short amount of time this semester, so treat the section on computational culture as a bit of inspiration for the final project. NLP, Prof. Howard, Tulane University 31-Aug-2016

Computer hygiene You must turn your computer off every now and then, so that it can clean itself. By the same token, you should close applications every now and then. http://nomorehairloss.org/wp-content/uploads/2014/06/Hygiene.jpg NLP, Prof. Howard, Tulane University 31-Aug-2016

Installation of Python Can any one NOT get Spyder to do this? NLP, Prof. Howard, Tulane University 31-Aug-2016

Test >>> 237 + 9075 9312 Be sure to try the other arithmetic operators, subtraction (-), multiplication (*), and division (/). Does division work the way you expect? After you have tired of playing with math, play with some text: >>> word = 'msinairatnemhsilbatsesiditna' >>> 'anti' in word False >>> 'itna' in word True NLP, Prof. Howard, Tulane University 31-Aug-2016

§4. Computation with strings What is a string? A string is a sequence of characters delimited between single or double quotes. NLP, Prof. Howard, Tulane University 31-Aug-2016

Examples >>> monty = 'Monty Python' >>> monty >>> doublemonty = "Monty Python" >>> doublemonty >>> circus = 'Monty Python's Flying Circus' File "<stdin>", line 1 circus = 'Monty Python's Flying Circus' ^ LyntaxError: invalid syntax >>> circus = "Monty Python's Flying Circus" >>> circus "Monty Python's Flying Circus" >>> circus = 'Monty Python\'s Flying Circus' NLP, Prof. Howard, Tulane University 31-Aug-2016

The + and * operators >>> B = 'balloon' >>> B >>> B+'s' >>> B+s >>> 'red '+B >>> 'red '+B+'s' >>> B*2 >>> B+'s'*2 >>> (B+'s')*2 >>> B-'n' >>> B+2 >>> B+'2' A new string can be formed by combination or concatenation of two strings with + or repeating a string a number of times with *. Unfortunately, a character cannot be deleted with –: NLP, Prof. Howard, Tulane University 31-Aug-2016

4.1.1. Operator precedence NLP, Prof. Howard, Tulane University 31-Aug-2016

4.1.2. Data type >>> type(B) >>> type(2) Look at B in Variable Explorer. NLP, Prof. Howard, Tulane University 31-Aug-2016

4.2. Basic string methods Python supplies several methods that can be applied to strings to perform tasks. Lome of them are illustrated below. The input code is given, without the corresponding output. It is up to you to type them in to see what they do: >>> len(B) >>> len(B+'s') >>> len(B*2) >>> sorted(B) >>> len(sorted(B)) >>> set(B) >>> sorted(set(B)) >>> len(set(B)) NLP, Prof. Howard, Tulane University 31-Aug-2016

4.2.1. Nested or embedded operations NLP, Prof. Howard, Tulane University 31-Aug-2016

Tokens vs. types set(B) produces the set of characters in string B. One useful property of sets is that they do not contain duplicate elements. The process of removing repetitions performed by set() touches on a fundamental concept in language computation, that of the distinction between a token and a type. A representation in which repetitions are allowed is said to consist of tokens, while one in which there are no repetitions is said to consist of types. Thus set() converts the tokens of a string into types. There is one type of 'o' in 'balloon', but two tokens of 'o'. NLP, Prof. Howard, Tulane University 31-Aug-2016

4.2.3. Dot and method notation The material aggregated to a method in parentheses is called its argument(s). In the examples above, the argument B can be thought of linguistically as the object of a noun: the length of B, the alphabetical sorting of B, the set of B. But what if two pieces of information are needed for a method to work, for instance, to count the number of o’s in otolaryngologist? To do so, Python allows for information to be prefixed to a method with a dot: >>> B.count('o') The example can be read as “in B, count the o’s”, with the argument being the substring to be counted, 'o', and the attribute being the string over which the count progresses, or more generally: attribute.method(argument) What can be attribute and argument varies from method to method and so has to be memorized. NLP, Prof. Howard, Tulane University 31-Aug-2016

4.2.4. How to clean up a string There is a group of methods for modifying the properties of a string, illustrated below. You can guess what they do from their names: >>> L = 'i lOvE yOu' >>> L >>> L.lower() >>> L.upper() >>> L.swapcase() >>> L.capitalize() >>> L.title() >>> L.replace('O','o') >>> L.strip('i') >>> L2 = ' '+L+' ' >>> L2 >>> L2.strip() NLP, Prof. Howard, Tulane University 31-Aug-2016

4.3. How to find your way around a string >>> E = 'abcde' >>> E[0] >>> E[1] >>> E[4] >>> E[5] NLP, Prof. Howard, Tulane University 31-Aug-2016

0 = 1, Zero-based indexation You probably thought that the first character in a string should be given the number 1, but Python actually gives it 0, and the second character gets 1. There are some advantages to this format which do not concern us here, but we will mention a real- world example. In Europe, the floors of buildings are numbered in such a way that the ground floor is considered the zeroth one, so that the first floor up from the ground is the first floor, though in the USA, it would called the second floor. NLP, Prof. Howard, Tulane University 31-Aug-2016

In a picture NLP, Prof. Howard, Tulane University 31-Aug-2016

What does a negative index mean? NLP, Prof. Howard, Tulane University 31-Aug-2016

Slicing >>> E[2:5] >>> E[-6:-3] NLP, Prof. Howard, Tulane University 31-Aug-2016

No beginning or end >>> E[2:] >>> E[-2:] NLP, Prof. Howard, Tulane University 31-Aug-2016

A slice is a string >>> type(E[2:]) >>> E[:-1] + '!' NLP, Prof. Howard, Tulane University 31-Aug-2016

4.3.1.4. Extended slicing >>> K = 'abcdefghijk' NLP, Prof. Howard, Tulane University 31-Aug-2016

Format of a slice string[start:end:step] NLP, Prof. Howard, Tulane University 31-Aug-2016

4.3.1.5. How to reverse a string >>> K[::-1] NLP, Prof. Howard, Tulane University 31-Aug-2016

4.3.2. How to find an index given a character NLP, Prof. Howard, Tulane University 31-Aug-2016

>>> D = 'abcdabc' index() & rindex() find() & rfind() >>> D.index('d') >>> D.rindex('d') >>> D.index('a') >>> D.rindex('a') >>> D.find('d') >>> D.rfind('d') >>> D.find('a') >>> D.rfind('a') NLP, Prof. Howard, Tulane University 31-Aug-2016

index() or find() Where they differ lies in how they handle null responses: >>> D.find('z') -1 >>> D.index('z') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: substring not found NLP, Prof. Howard, Tulane University 31-Aug-2016

Find substrings >>> D.index('cda') index() & rindex() find() & rfind() >>> D.index('cda') >>> D.index('abc') >>> D.find('cda') >>> D.find('abc') NLP, Prof. Howard, Tulane University 31-Aug-2016

4.3.2.1. How to limit search to a substring index() & rindex() find() & rfind() >>> D.index('ab', 0, 3) >>> D.index('ab', 3) >>> D.find('ab', 0, 3) >>> D.find('ab', 3) NLP, Prof. Howard, Tulane University 31-Aug-2016

Format index/find(string, beginning, end) NLP, Prof. Howard, Tulane University 31-Aug-2016

4.3.3. Operator iteration >>> L = 'i lOvE yOu' >>> L[2:6].capitalize().upper() >>> L[-3:].capitalize().lower() >>> (L[:4].upper()+L[4:].lower()).swapcase() NLP, Prof. Howard, Tulane University 31-Aug-2016

4.2.5. Practice 1 What types are output by len(), sort(), set()? Here are two real life strings to work with: >>> mail = 'howard@tulane.edu' >>> url = 'http://www.tulane.edu/~howard/CompCultEN/' How would you strip out the user name and the server name from my email address? Internet addresses start with the transfer protocol that the site uses. For web pages, this is usually the hypertext transfer protocol, http. How would you strip this information out to leave just the address of the book? Following up on (b), how would you extract just Tulane’s server address? NLP, Prof. Howard, Tulane University 31-Aug-2016

4.3.4. Practice 1, cont. 3) Write the code to perform the changes given below on these two strings: >>> S = 'ABCDEFGH' >>> s = 'abcdefgh' Make the first 3 characters of S lowercase. Make the last 4 characters of s uppercase. Create a string from the first 4 characters of S and the last 4 characters of s and then switch its case. Join both strings and find every even character. Join both strings and reverse the order of the characters. Retrieve the index of 'E' and 'h'. NLP, Prof. Howard, Tulane University 31-Aug-2016

4.3.4. Practice 1, cont. What is the longest sequence of operators that you can make? NLP, Prof. Howard, Tulane University 31-Aug-2016

Next time 4.4.4. Practice 2 4.5. How to make a string longer than one line 4.6. Assignment and mutability 4.7. Date and time strings which has a practice. NLP, Prof. Howard, Tulane University 31-Aug-2016