Introduction to Data Programming

Slides:



Advertisements
Similar presentations
Helping your student with homework
Advertisements

The Information School of the University of Washington University of Washington1 Introduction INFO/CSE 100, Spring 2005.
A-1 © 2000 UW CSE University of Washington Computer Programming I Lecture 1: Overview and Welcome Dr. Martin Dickey University of Washington.
The Information School of the University of Washington Sept University of Washington1 Introduction INFO/CSE 100, Fall 2006 Fluency.
PLEASE GRAB A SEAT ANYWHERE FOR NOW. Welcome to the CMSC 201 Class!!! Mr. Lupoli ITE 207.
CSE 501N Fall ‘09 00: Introduction 27 August 2009 Nick Leidenfrost.
Welcome to Physics 2215! Physics Lab for Scientist & Engineers 1 Fall 2012.
1 Agenda Administration Background Our first C program Working environment Exercise Memory and Variables.
CSE 140 wrapup Michael Ernst CSE 140 University of Washington.
Computing Fundamentals Module Lesson 19 — Using Technology to Solve Problems Computer Literacy BASICS.
+ Introduction to Class IST210 Class Lecture. + Course Objectives Understand the importance of data, databases, and database management Design and implement.
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
Institute for Personal Robots in Education (IPRE)‏ CSC 170 Computing: Science and Creativity.
CSE 1105 Week 1 CSE 1105 Course Title: Introduction to Computer Science & Engineering Classroom Lecture Times: Section 001 W 4:00 – 4:50, 202 NH Section.
CSE 1105 Week 1 CSE 1105 Introduction to Computer Science & Engineering Time: Wed 4:00 – 4:50 Thurs 9:30 – 10:20 Thurs 4:00 – 4:50 Place: 100 Nedderman.
CSE 160 Wrap-Up CSE 160 Spring 2015 University of Washington 1.
Introduction to Data Programming CSE 160 University of Washington Spring 2015 Ruth Anderson 1 Slides based on previous versions by Michael Ernst and earlier.
Welcome to Financial Algebra CP Mr. Bill Mulligan Room D-24 Mathematics Department Newbury Park High School Please sign in and pickup a quarter sheet handout.
Welcome to Physics 2215! Physics Lab for Scientist & Engineers 1 Spring 2013.
CSC 241: Introduction to Computer Science I
CSc 120 Introduction to Computer Programing II
CITS1001 Object Oriented Programming and Software Engineering
Course Overview - Database Systems
CMSC 201 Computer Science I for Majors Lecture 01 – Introduction
CS101 Computer Programming I
Course Information and Introductions
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
Course Information and Introductions
EECS 110: Introduction to Programming for Non-Majors
Ms. Christine (Chris) Wieland
Introduction to Data Programming
Welcome To Mrs. Zughayyer’s Fourth Grade Classroom
Welcome to Mrs. McGovern’s 7th Grade Level 1 Mathematics Class.
University of Washington Computer Programming I
Introduction to Programmng in Python
UNIT 2 – CHAPTER 2 – LESSON 7 Introduction to Data.
Computer Science 102 Data Structures CSCI-UA
CSE1320 INTERMEDIATE PROGRAMMING
CSE1320 INTERMEDIATE PROGRAMMING
CS410: Text Information Systems (Spring 2018)
Week 1 Gates Introduction to Information Technology cosc 010 Week 1 Gates
Course Information and Introductions
CS Programming I Jim Williams, PhD.
CSE1320 INTERMEDIATE PROGRAMMING
CSE1320 INTERMEDIATE PROGRAMMING
Introduction to Data Programming
Course Overview - Database Systems
Introduction to Data Programming
Mr. Smith’s Schedule 1st Period Computer Science I
Introduction to Research & Scientific Literature
EECE 310 Software Engineering
Using Excel for Quantitative Skill Development
This Class This is a graduate level spatial modeling class in natural resources This will be one of the most challenging classes you’ll probably take You’ll.
CSE1320 INTERMEDIATE PROGRAMMING
CSE1320 INTERMEDIATE PROGRAMMING
Accelerated Introduction to Computer Science
CSE 444 Database Management Systems Spring 1999 University of Washington Introduction and Welcome © 1999 UW CSE 4/4/2019.
CMSC201 Computer Science I for Majors Final Exam Information
Computer Literacy BASICS
Lecture 1a- Introduction
Welcome to Physics 4302 Statistical & Thermal Physics!
Welcome to Physics 5305!!.
CSE 143: Computer Programming II
CSE 451: Operating Systems Winter 2007 Module 1 Course Introduction
Lecture 1a- Introduction
Introduction INFO/CSE 100, Spring 2006
Course Introduction Data Visualization & Exploration – COMPSCI 590
CS201 – Course Expectations
CSC 241: Introduction to Computer Science I
CSE 444 Database Management Systems Autumn 1997 University of Washington Introduction and Welcome © 1997 UW CSE 12/12/2019.
Presentation transcript:

Introduction to Data Programming CSE 160 University of Washington Spring 2018 Ruth Anderson Slides based on previous versions by Michael Ernst and earlier versions by Bill Howe

Agenda for Today What is this course? Course logistics Python!

Welcome to CSE 160! CSE 160 teaches core programming concepts with an emphasis on real data manipulation tasks from science, engineering, and business Goal by the end of the quarter: Given a data source and a problem description, you can independently write a complete, useful program to solve the problem

Course staff Lecturer: TAs: Ask us for help! Ruth Anderson Ollin Boer Bohan Linxing Jiang Lauren Martini Zhiheng Qin Siyu Wang Lingyue Zhang Alex Zhou Ask us for help!

Learning Objectives Computational problem-solving Writing a program will become your “go-to” solution for data analysis tasks Basic Python proficiency Including experience with relevant libraries for data manipulation, scientific computing, and visualization. Experience working with real datasets astronomy, biology, linguistics, oceanography, open government, social networks, and more. You will see that these are easy to process with a program, and that doing so yields insight.

What this course is not A “skills course” in Python …though you will become proficient in the basics of the Python programming language …and you will gain experience with some important Python libraries A data analysis / “data science” / data visualization course There will be very little statistics knowledge assumed or taught A “project” course the assignments are “real,” but are intended to teach specific programming concepts A “big data” course Datasets will all fit comfortably in memory No parallel programming

“It’s a great time to be a data geek.” -- Roger Barga, Microsoft Research “The greatest minds of my generation are trying to figure out how to make people click on ads” -- Jeff Hammerbacher, co-founder, Cloudera

All of science is reducing to computational data manipulation Old model: “Query the world” (Data acquisition coupled to a specific hypothesis) New model: “Download the world” (Data acquisition supports many hypotheses) Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST, PanSTARRS) Biology: lab automation, high-throughput sequencing, Oceanography: high-resolution models, cheap sensors, satellites 40TB / 2 nights Slide from Bill Howe, eScience Institute, his notes: Drowning in data; starving for information We’re at war with these engineering companies. FlowCAM is bragging about the amount of data they can spray out of their device. How to use this enormous data stream to answer scientific questions is someone else’s problem. “Typical large pharmas today are generating 20 terabytes of data daily. That’s probably going up to 100 terabytes per day in the next year or so.” “tens of terabytes of data per day” -- genome center at Washignton University Increase data collection exponentially with flowcam ~1TB / day 100s of devices Slide from Bill Howe, eScience Institute

Example: Assessing treatment efficacy number of follow ups within 16 weeks after treatment enrollment. Zip code of clinic Zip code of patient Question: Does the distance between the patient’s home and clinic influence the number of follow ups, and therefore treatment efficacy?

Python program to assess treatment efficacy # This program reads an Excel spreadsheet whose penultimate # and antepenultimate columns are zip codes. # It adds a new last column for the distance between those zip # codes, and outputs in CSV (comma-separated values) format. # Call the program with two numeric values: the first and last # row to include. # The output contains the column headers and those rows. # Libraries to use import random import sys import xlrd # library for working with Excel spreadsheets import time from gdapi import GoogleDirections # No key needed if few queries gd = GoogleDirections('dummy-Google-key') wb = xlrd.open_workbook('mhip_zip_eScience_121611a.xls') sheet = wb.sheet_by_index(0) # User input: first row to process, first row not to process first_row = max(int(sys.argv[1]), 2) row_limit = min(int(sys.argv[2]+1), sheet.nrows) def comma_separated(lst): return ",".join([str(s) for s in lst]) headers = sheet.row_values(0) + ["distance"] print comma_separated(headers) for rownum in range(first_row,row_limit): row = sheet.row_values(rownum) (zip1, zip2) = row[-3:-1] if zip1 and zip2: # Clean the data zip1 = str(int(zip1)) zip2 = str(int(zip2)) row[-3:-1] = [zip1, zip2] # Compute the distance via Google Maps try: distance = gd.query(zip1,zip2).distance except: print >> sys.stderr, "Error computing distance:", zip1, zip2 distance = "" # Print the row with the distance print comma_separated(row + [distance]) # Avoid too many Google queries in rapid succession time.sleep(random.random()+0.5) 23 lines of executable code!

Course logistics Website: http://www.cs.washington.edu/cse160 See the website for all administrative details Take notes! Homework 1 part 1 is due Friday As is a survey You get 5 late days throughout the quarter No assignment may be submitted more than 3 days late. (contact the instructor if you are hospitalized) If you want to join the class, sign sheet at front of class, email rea@cs.washington.edu, from your @u address

Academic Integrity Honest work is required of a scientist or engineer Collaboration policy on the course web. Read it! Discussion is permitted Carrying materials from discussion is not permitted Everything you turn in must be your own work Cite your sources, explain any unconventional action You may not view others’ work If you have a question about the policy, just ask us I trust you completely I have no sympathy for trust violations – nor should you

How to succeed No prerequisites Non-predictors for success: Past programming experience Enthusiasm for games or computers Programming and data analysis are challenging Every one of you can succeed There is no such thing as a “born programmer” Work hard Follow directions Be methodical Think before you act Try on your own, then ask for help Start early

Me (Ruth Anderson) Grad Student at UW: in Programming Languages, Compilers, Parallel Computing Taught Computer Science at the University of Virginia for 5 years PhD at UW: in Educational Technology, Pen Computing Current Research: Computing and the Developing World, Computer Science Education

Introductions Name Email address Major Year (1,2,3,4,5) Hometown Interesting Fact or what I did over break.