Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project.

Slides:



Advertisements
Similar presentations
ThinkPython Ch. 10 CS104 Students o CS104 n Prof. Norman.
Advertisements

Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
Dictionaries: Keeping track of pairs
DICTIONARIES. The Compound Sequence Data Types All of the compound data types we have studies in detail so far – strings – lists – Tuples They are sequence.
Announcements Midterm next week! No class next Friday Review this Friday.
Stacks.
JaySummet IPRE Python Review 2. 2 Outline Compound Data Types: Strings, Tuples, Lists & Dictionaries Immutable types: Strings Tuples Accessing.
Group practice in problem design and problem solving
Lilian Blot CORE ELEMENTS COLLECTIONS & REPETITION Lecture 4 Autumn 2014 TPOP 1.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
Data Structures More List Methods Our first encoding Matrix.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
Announcements Project 2 Available Tomorrow (we will send mail) Will be due 11:59PM October 9 th (Sunday) Week 6 ( I will be traveling this week) Review.
The Structured Specification. Why a Structured Specification? System analyst communicates the user requirements to the designer with a document called.
CS 177 Week 11 Recitation Slides 1 1 Dictionaries, Tuples.
General Programming Introduction to Computing Science and Programming I.
 HTML stands for Hyper Text Mark-up Language. The coding language used to create documents for the World Wide Web  HTML is composed of tags. HTML tags.
Section 4.1 Format HTML tags Identify HTML guidelines Section 4.2 Organize Web site files and folder Use a text editor Use HTML tags and attributes Create.
Introduction to Python 2 Dr. Bernard Chen University of Central Arkansas PyArkansas 2011.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Data Collections: Dictionaries CSC 161: The Art of Programming Prof. Henry Kautz 11/4/2009.
CS190/295 Programming in Python for Life Sciences: Lecture 3 Instructor: Xiaohui Xie University of California, Irvine.
Collecting Things Together - Lists 1. We’ve seen that Python can store things in memory and retrieve, using names. Sometime we want to store a bunch of.
Announcements Additional office hours tomorrow: 3-6pm in HAAS 142 Midterm Bring a #2 pencil (test given via scantron sheet) Photo ID No Labs or Recitation.
Built-in Data Structures in Python An Introduction.
Q and A for Sections 2.9, 4.1 Victor Norman CS106 Fall 2015.
1 CS 177 Week 11 Recitation Slides Writing out programs, Reading from the Internet and Using Modules.
Compsci 6/101, Spring More on Python, Tools, Compsci 101 l APTs, Assignments, Tools  APT: Algorithmic Problem-solving and Testing  How to get.
Recap form last time How to do for loops map, filter, reduce Next up: dictionaries.
Data Collections: Lists CSC 161: The Art of Programming Prof. Henry Kautz 11/2/2009.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Chapter 9 Dictionaries and Sets.
Python Primer 1: Types and Operators © 2013 Goodrich, Tamassia, Goldwasser1Python Primer.
More about Strings. String Formatting  So far we have used comma separators to print messages  This is fine until our messages become quite complex:
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
Tuples and Dictionaries Intro to Computer Science CS1510, Section 2 Dr. Sarah Diesburg.
HTML Basics. HTML Coding HTML Hypertext markup language The code used to create web pages.
Midterm Review Important control structures Functions Loops Conditionals Important things to review Binary Boolean operators (and, or, not) Libraries (import.
CS190/295 Programming in Python for Life Sciences: Lecture 6 Instructor: Xiaohui Xie University of California, Irvine.
Dictionaries Intro to Computer Science CS 1510 Dr. Sarah Diesburg.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
Announcements No Labs / Recitation this week On Friday we will talk about Project 3 Release late afternoon / evening tomorrow Cryptography.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
1 CS Review, iClicker -Questions Week 15. Announcements 2.
Lists/Dictionaries. What we are covering Data structure basics Lists Dictionaries Json.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Q and A for Sections 2.9, 4.1 Victor Norman CS106 Fall 2015.
HTML Basics.
CS170 – Week 1 Lecture 3: Foundation Ismail abumuhfouz.
CMSC201 Computer Science I for Majors Lecture 22 – Searching
Containers and Lists CIS 40 – Introduction to Programming in Python
Announcements Project 4 due Wed., Nov 7
CSC 108H: Introduction to Computer Programming
Intro to PHP & Variables
Intro to Computer Science CS 1510 Dr. Sarah Diesburg
Topics Introduction to File Input and Output
Intro to Computer Science CS 1510 Dr. Sarah Diesburg
CHAPTER THREE Sequences.
Intro to Computer Science CS1510 Dr. Sarah Diesburg
ARRAYS 1 GCSE COMPUTER SCIENCE.
String and Lists Dr. José M. Reyes Álamo.
Fundamentals of Python: First Programs
Python Primer 1: Types and Operators
Python Review
“Everything Else”.
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Topics Introduction to File Input and Output
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
Introduction to Computer Science
Presentation transcript:

Announcements All groups have been assigned Homework: By this evening everyone in your group and set up a meeting time to discuss project 4 Project 4 will be released tomorrow You will have roughly 3 weeks to work on it

How do I work in a team? Communication Teams that do not communicate well do poorly on the project Understanding the assignment Teams that sit down and go over the assignment together do well Battle plan Outline the project in your own English text Code together Difficult parts of the project are best done together

Parsing Text The vast majority of the information present on the internet is in text form Data, webpages, etc We want to transform the data into a more usable form Examples we have seen thus far: Encoding of a matrix Encoding of a tree Project 3, changing text (encrypting and decrypting)

Example: Finding a nucleotide sequence We can find DNA sequences of parasites on the internet (typically in databases) Problem: we want to know if a sequence of nucleotides is in a particular parasite We not only want to know “yes” or “no” but which parasite

What the data looks like >Schisto unique AA gcttagatgtcagattgagcacgatgatcgattgaccgtgagatcgacga gatgcgcagatcgagatctgcatacagatgatgaccatagtgtacg >Schisto unique mancons0736 ttctcgctcacactagaagcaagacaatttacactattattattattatt accattattattattattattactattattattattattactattattta ctacgtcgctttttcactccctttattctcaaattgtgtatccttccttt

How are we going to do it? First, we get the sequences in a big string. Next, we find where the small subsequence is in the big string. From there, we need to work backwards until we find “>” which is the beginning of the line with the sequence name. From there, we need to work forwards to the end of the line. From “>” to the end of the line is the name of the sequence Yes, this is hard to get right.

Lets Review Some Python string.find(sub) – returns the lowest index where the substring sub is found or -1 string.find(sub, start) – same as above, except using the slice string[start:] string.find(sub, start, end) – same as above, except using the slice string[start:end]

Lets Review Some Python string.rfind(sub) – returns the highest index where the substring sub is found or -1 string.rfind(sub, start) – same as above, except using the slice string[start:] string.rfind(sub, start, end) – same as above, except using the slice string[start:end]

Clicker Question: are these programs equivalent? String.find(“two”)String.rfind(“two”) 21 A: yes B: no String = “two plus two is four”

Lets solve the problem!

def findSequence(seq): sequencesFile = "parasites.txt” file = open(sequencesFile,”r") sequences = file.read() file.close() seqloc = sequences.find(seq) if seqloc != -1: # Now, find the ">" with the name of the sequence nameloc = sequences.rfind(">",0,seqloc) # using rfind() here!! endline = sequences.find("\n",nameloc) print ("Found in ",sequences[nameloc:endline]) else: print ("Not found”)

Why -1? If.find or.rfind don’t find something, they return -1 If they return 0 or more, then it’s the index of where the search string is found. Note: last week we saw the urlib module It contains a method that lets you download a file from the internet How might you modify your program to first download the file from the internet prior to opening it?

Running the program >>> findSequence("tagatgtcagattgagcacgatgatcgattgacc") Found in >Schisto unique AA >>> findSequence("agtcactgtctggttgaaagtgaatgcttccaccgatt") Found in >Schisto unique mancons0736

One More Note on Parsing We saw how to read a file as a string or list of strings We saw how to leverage how data was structured to find specific information we were interested in What if there are many pieces we want to extract?

Revisiting Split String.split(delimiter) break the string String into parts, separated by the delimiter print (“a b c d”.split(“ “)) Would print: [‘a’, ‘b’, ‘c’, ‘d’] Some quirky cases for string.split() Explained in pre lab 10

Why is this useful? When reading in a file, we may have many interesting data items on a given line (or in the file) Example: Lab 10

How to glue everything together Step 1) get some interesting data Step 2) open the file Step 3) read the data from the file, either as one large string or a list of strings Step 4) break this string (or list of strings) into the data we want (rfind, find, split)

Abstract Example Getting values from a text file str = file.read() Lines = str.split(‘\n’)  list of strings for element in Lines: items = element.split(‘ ‘)  list of strings

Concrete Example foo = "bab cad eag” elem = foo.split(" ”) for i in elem: print(i.split("a")) ['b', 'b'] ['c', 'd'] ['e', 'g']

CQ:How can I parse all the words in a file? Assume we have read the file in as one big string (we used file.read()) and the file contains no punctuation A) first split on “\n” and for each element in the result, we split on “ “ B) only split on “ “

Concrete Clicker Example file = open(“text.txt”, “r”) content = file.read() line = content.split(“\n”) for i in line: print(i.split(“ ")) [‘This', ‘is'] [’a’, ‘file’] This is a file text.txt

Example: Get the temperature The weather is always available on the Internet. Can we write a function that takes the current temperature out of a source like or

The Internet is mostly text Web pages are actually text in the format called HTML (HyperText Markup Language) HTML isn’t a programming language, it’s an encoding language. It defines a set of meanings for certain characters, but one can’t program in it. We can ignore the HTML meanings for now, and just look at patterns in the text.

Where’s the temperature? The word “temperature” doesn’t really show up. But the temperature always follows the word “Currently”, and always comes before the “ ° ” <img src="/shared- local/weather/images/ps.gif" width="48" height="48" border="0"> <font size="-1" face="Arial, Helvetica, sans- serif"> Currently Partly sunny 54 ° F

We can use the same algorithm we’ve seen previously Grab the content out of a file in a big string. We’ve saved the HTML page previously. We‘ve seen how to grab it directly. Find the starting indicator (“Currently”) Find the ending indicator (“ °”) Read the previous characters

def findTemperature(): weatherFile = "ajc-weather.html” file = open(weatherFile,”r") weather = file.read() file.close() # Find the Temperature curloc = weather.find("Currently") if curloc <> -1: # Now, find the " °" following the temp temploc = weather.find(" °",curloc) tempstart = weather.rfind(">",0,temploc) print ("Current temperature:”,weather[tempstart+1:temploc]) if curloc == -1: print (”Can't find the temp”)

Homework your group members Read through the project 4 description when it becomes available

Announcements

Dictionaries in Python Useful Analogy: an actual Dictionary! English dictionaries provide an association between a Word and a Definition We us the Word to look up the Definition Given a definition it would be very hard to look up the word

Dictionaries Python Much like a dictionary for the English language, python dictionaries create an association between a key and a value Key corresponds to a Word in our analogy Value corresponds to a Definition

Dictionary Syntax A dictionary is a collection of elements Each element is a key/value key : value Just like a list is defined by [ ] a dictionary is defined by { } {‘key1’:value1, ‘key2’:value2, ‘key3’:value3}

Keys A key can be any immutable type (we will consider two types) Strings and Integers Much like the [index] is used to select out an element from a list, for a dictionary we use [key] A = {‘key1’:value1, ‘key2’:value2, ‘key3’:value3} print(A[‘key2’])

Example: Simple Phone Book phoneBook = {‘Luke’ : ’ ’, ‘Dr. Martino’ : ‘ ’} names are keys, phone numbers are values def lookup(key): return phoneBook[key] lookup(‘Dr. Martino’)

Clicker Question: are these programs equivalent? A = [‘mike’, ‘mary’, ‘marty’] print A[1] A = {0:’mike’, 1:’mary’, 2:’marty’} print A[1] 21 A: yes B: no

Clicker Question: are these programs equivalent? A = [‘mike’, ‘mary’, ‘marty’] print A[1] A = {1:’mary’, 2:’marty’, 0:’mike’} print A[1] 21 A: yes B: no

Key Differences from Lists Lists are ordered Index is implicit based on the list ordering Dictionaries are unordered Keys are specified and do not depend on order Lists are useful for storing ordered data, dictionaries are useful for storing relational data Motivating example from book: databases!

Updating a Dictionary Much like a list we can assign to a dictionary Abstract: dictionary[key] = newValue Concrete Example: A = {0:’mike’, 1:’mary’, 2:’marty’} print A[1] A[1] = ‘alex’ print A[1]

Adding to a Dictionary Much like a list we can append to a dictionary Abstract: dictionary[newKey] = newValue Concrete Example: A = {0:’mike’, 1:’mary’, 2:’marty’} print A[1] A[3] = ‘alex’ print A {0:’mike’, 1:’mary’, 2:’marty’, 3:’alex’}

Clicker Question: What is the output of this code? A = {0:’mike’, 1:’mary’, 2:’marty’, ‘marty’:2, ‘mike’:0, ‘mary’:1} A[3] = ‘mary’ A[‘mary’] = 5 A[2] = A[0] + A[1] A: {'mike': 0, 'marty': 2, 3: 'mary', 'mary': 5, 2: 'mikemary', 1: 'mary', 0: 'mike'} B: {'mike': 0, 'marty': 2, 'mary’:3, 'mary': 5, 2: 'mikemary', 1: 'mary', 0: 'mike'} C: {'mike': 0, 'marty': 2, 'mary’:3, 'mary': 5, 2:1, 1: 'mary', 0: 'mike'}

Printing a Dictionary A = {0:'mike', 1:'mary', 2:'marty’} for k,v in A.iteritems(): print k, ":", v Prints: 2 : marty 1 : mary 0 : mike A = {0:'mike', 1:'mary', 2:'marty’} for k in A: print k Prints: 2 1 0

Project 4: Frequency Analysis Intuition We can leverage a dictionary to calculate the number of times a particular letter occurs in a message We can use characters as the keys The number of times that character occurs is the value Increment the value each time we see a character Initially the value starts at 0

Some Additional Notation: Pairs in Python We can create pairs in python Example: tuple = (‘name’, 3) Such pairs are called tuples (see page 291) Tuples support the [] for selecting their elements Tuples are immutable (like strings) Further reading (section 5.3): and-sequences

Tuples We can think of tuples as an immutable list They do not support assignment Example: A = (‘me’, 5, 32, ‘joe’) print A[0] print A[3] A[2] = 4 <--- this throws an error

Creating a dictionary from a list Python provides the dict function to create a dictionary out of a list of pairs Example: dict([(0, ‘mike’),(1, ‘mary’),(2, ‘marty’)]) Why do I care? We can leverage list creation short cuts to populate dictionaries! Example: dict([(x, x**2) for x in range(10)])