Maximal D-segments Maximal-scoring No subsegment has higher score No segment properly containing the segment satisfies the above No supersegment has higher.

Slides:



Advertisements
Similar presentations
1 Starting a Program The 4 stages that take a C++ program (or any high-level programming language) and execute it in internal memory are: Compiler - C++
Advertisements

Programming Types of Testing.
Chapter 6: User-Defined Functions I
 Monday, 9/30/02, Slide #1 CS106 Introduction to CS1 Monday, 9/30/02  QUESTIONS (on HW02, etc.)??  Today: Libraries, program design  More on Functions!
Chapter 2 The Algorithmic Foundations of Computer Science
C++ Programming: From Problem Analysis to Program Design, Second Edition Chapter 6: User-Defined Functions I.
Similar Sequence Similar Function Charles Yan Spring 2006.
Chapter 6: User-Defined Functions I
Chapter 10 Application Development. Chapter Goals Describe the application development process and the role of methodologies, models and tools Compare.
Filters using Regular Expressions grep: Searching a Pattern.
Functions. Program complexity the more complicated our programs get, the more difficult they are to develop and debug. It is easier to write short algorithms.
Chapter 6Java: an Introduction to Computer Science & Programming - Walter Savitch 1 l Array Basics l Arrays in Classes and Methods l Programming with Arrays.
Advanced File Processing
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
/425 Declarative Methods - J. Eisner /425 Declarative Methods Prof. Jason Eisner MWF 3-4pm (sometimes 3-4:15)
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Procedures and Functions Computing Module 1. What is modular programming? Most programs written for companies will have thousands of lines of code. Most.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
16-Dec-15Advanced Programming Spring 2002 sed and awk Henning Schulzrinne Dept. of Computer Science Columbia University.
Compiling “premature optimization is the root of all evil.” -Donald Knuth.
Finding Things Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
PYTHON FOR HIGH PERFORMANCE COMPUTING. OUTLINE  Compiling for performance  Native ways for performance  Generator  Examples.
Unix tools Regular expressions grep sed AWK. Regular expressions Sequence of characters that define a search pattern banana matches the text banana
Programming Languages Meeting 12 November 18/19, 2014.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
Pragmatic Drupal Development John Fiala Developer.
CSE 374 Programming Concepts & Tools
The need for Programming Languages
AP CSP: Cleaning Data & Creating Summary Tables
The Scenario Analysis If a car travels at 60 mph for two hours, how much distance will it cover? You find the answer easily, because you know the formula.
Chapter 6: User-Defined Functions I
Functions Review.
Let’s do a Bayesian analysis
Profiling for Performance in C++
Scripting Tools, languages and the Shell intERLab at AIT
Discussion Section 3 HW1 comments HW2 questions
CSC 352– Unix Programming, Fall 2012
User-Defined Functions
Michael Santacroce EECE 6083 Compiler Theory University of Cincinnati
Design by Contract Fall 2016 Version.
Operation System Program 4
Impact of Formal Methods in Biology and Medicine
Impact of Formal Methods in Biology and Medicine
Guide To UNIX Using Linux Third Edition
CISC101 Reminders Slides have changed from those posted last night…
Winter 2018 CISC101 12/1/2018 CISC101 Reminders
Prof. Jason Eisner MWF 3-4pm (sometimes 3-4:15)
Unix Talk #2 (sed).
Loops CIS 40 – Introduction to Programming in Python
Bioinformatics & Social Conundrums
Session I Database & Data Mining Speaker: Mehmet M. Dalkilic
DEBUGGING CS2110.
Chapter 6: User-Defined Functions I
Lesson 2 Get Started with Python – Post-Installation – Use the GUI.
Hank Childs, University of Oregon
1.5 Regular Expressions (REs)
Genome 540: Discussion Section Week 3
Lab 8: Regular Expressions
Winter 2019 CISC101 5/30/2019 CISC101 Reminders
Dr.s Khem Ghusinga and Alan Jones
Presentation transcript:

Maximal D-segments Maximal-scoring No subsegment has higher score No segment properly containing the segment satisfies the above No supersegment has higher score  NOT TRUE (see next slide for example) Maximum allowed dropoff D < 0 No subsegment has score < D Score >= S Where S >= -D

S 0 D sequence position cumulative score

S 0 D sequence position cumulative score

S 0 D sequence position cumulative score

position # read starts score D = 3 max = 0 start = 2 end = 2 cumul = -.05

position # read starts score D = 3 max = 0 start = 3 end = 3 cumul = -.05

position # read starts score D = 3 max = 0 start = 4 end = 4 cumul = -.05

position # read starts score D = 3 max = 0 start = 5 end = 5 cumul = -.05

position # read starts score D = 3 max =.52 start = 5 end = 5 cumul =.52

position # read starts score D = 3 max = 1.62 start = 5 end = 6 cumul = 1.62

position # read starts score D = 3 max = 1.62 start = 5 end = 6 cumul = 1.57

position # read starts score D = 3 max = 3.27 start = 5 end = 8 cumul = 3.27

position # read starts score D = 3 max = 3.79 start = 5 end = 9 cumul = 3.79

position # read starts score D = 3 max = 4.89 start = 5 end = 10 cumul = 4.89

position # read starts score D = 3 max = 4.89 start = 5 end = 10 cumul = 4.84

position # read starts score D = 3 max = 4.89 start = 5 end = 10 cumul = 4.79

position # read starts score D = 3 max = 4.89 start = 5 end = 10 cumul = 4.74

position # read starts score D = 3 max = 4.89 start = 5 end = 10 cumul = 4.69

position # read starts score D = 3 max = 4.89 start = 5 end = 10

Parameters N = expected length of normal copy number region (state 1) E = expected length of elevated copy number region (state 2) Transition probabilities: a 12 = 1/N, a 11 = 1 – 1/N a 21 = 1/E, a 22 = 1 – 1/E Emission probabilities: Symbols: 0, 1, 2, >=3 (number of read starts) Poisson-distributed with given means m 1 = average number of read starts per site across chromsome m 2 = 1.5m 1

Poisson-distributed emission probabilities p = m r e -m /r!

Poisson-distributed emission probabilities How do you calculate this? p = m r e -m /r!

Scoring function Maximum dropoff: Minimum segment score:

S 0 D sequence position cumulative score Why does S need to be >= -D?

S 0 D sequence position cumulative score Why does S need to be >= -D?

S 0 D sequence position cumulative score Why does S need to be >= -D?

What happens when we… increase N (expected length of normal copy number region)? decrease a 12 and increase a 11 decrease scores decrease D (allow larger dropoff) increase S (require higher score) increase E (expected length of elevated copy number region)? decrease a 21 and increase a 22 increase scores decrease D (allow larger dropoff) increase S (require higher score) a 12 = 1/N, a 11 = 1 – 1/N a 21 = 1/E, a 22 = 1 – 1/E e 1 (r) = Poisson(m 1, r) e 2 (r) = Poisson(1.5m 1, r)

What happens when we… increase N (expected length of normal copy number region)? decrease a 12 and increase a 11 decrease scores decrease D (allow larger dropoff) increase S (require higher score) increase E (expected length of elevated copy number region)? decrease a 21 and increase a 22 increase scores decrease D (allow larger dropoff) increase S (require higher score) a 12 = 1/N, a 11 = 1 – 1/N a 21 = 1/E, a 22 = 1 – 1/E e 1 (r) = Poisson(m 1, r) e 2 (r) = Poisson(1.5m 1, r)

What happens when we… increase N (expected length of normal copy number region)? decrease a 12 and increase a 11 decrease scores decrease D (allow larger dropoff) increase S (require higher score) increase E (expected length of elevated copy number region)? decrease a 21 and increase a 22 increase scores decrease D (allow larger dropoff) increase S (require higher score) a 12 = 1/N, a 11 = 1 – 1/N a 21 = 1/E, a 22 = 1 – 1/E e 1 (r) = Poisson(m 1, r) e 2 (r) = Poisson(1.5m 1, r)

What happens when we… increase N (expected length of normal copy number region)? decrease a 12 and increase a 11 decrease scores decrease D (allow larger dropoff) increase S (require higher score) increase E (expected length of elevated copy number region)? decrease a 21 and increase a 22 increase scores decrease D (allow larger dropoff) increase S (require higher score) a 12 = 1/N, a 11 = 1 – 1/N a 21 = 1/E, a 22 = 1 – 1/E e 1 (r) = Poisson(m 1, r) e 2 (r) = Poisson(1.5m 1, r)

late.txt

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth, 1974

What to do when Python is too slow profile use Cython use a different programming language

Cython Superset of Python Static type declarations Source code translated into optimized C code, then compiled as Python extension modules examples

Profiling Which parts of the code are taking the most time? python line_profiler example

Unix tools Regular expressions grep sed AWK

Regular expressions Sequence of characters that define a search pattern banana matches the text banana matches addresses Easier to write than read...

grep (globally search a regular expression and print) grep ‘>’ sequence.fasta prints all lines containing ‘>’ in sequence.fasta grep -c things.txt prints number of lines containing addresses in things.txt examples

sed (stream editor) makes changes in a file s for substitution sed ‘s/day/night/’ old > new  changes first occurrence of day on each line in old to night in new examples

AWK data extraction and reporting pattern { action } pattern specifies a test that is performed with each line read as input useful for processing tables of data examples

If you were a soon-to-graduate college senior or Ph.D. and you didn't have any "baggage", what kind of research would you want to do? Or would you even choose research again? I think the most exciting computer research now is partly in robotics, and partly in applications to biochemistry…It is hard for me to say confidently that, after fifty more years of explosive growth of computer science, there will still be a lot of fascinating unsolved problems at peoples' fingertips, that it won't be pretty much working on refinements of well- explored things. Maybe all of the simple stuff and the really great stuff has been discovered. It may not be true, but I can't predict an unending growth. I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level. - Donald Knuth, 2006