Discussion Section 3 HW1 comments HW2 questions

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Contest format 5 hours, around 8-12 problems One computer running (likely)Linux, plus printer 3 people on one machine No cell phones, calculators, USB.
Max Cut Problem Daniel Natapov.
Analysis of Algorithms Depth First Search. Graph A representation of set of objects Pairs of objects are connected Interconnected objects are called “vertices.
Gene Prediction: Similarity-Based Approaches (selected from Jones/Pevzner lecture notes)
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
NP-Complete Problems Problems in Computer Science are classified into
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
A452 – Programming project – Mark Scheme
D1: Matchings.
CS 330 Algorithms: Lecture 1 Shang-Hua Teng Department of Computer Science, Boston University.
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Lecture 13 Graphs. Introduction to Graphs Examples of Graphs – Airline Route Map What is the fastest way to get from Pittsburgh to St Louis? What is the.
1 Programming Languages Tevfik Koşar Lecture - II January 19 th, 2006.
1 3-COLOURING: Input: Graph G Question: Does there exist a way to 3-colour the vertices of G so that adjacent vertices are different colours? 1.What could.
Review for Final Andy Wang Data Structures, Algorithms, and Generic Programming.
Memory Management during Run Generation in External Sorting – Larson & Graefe.
ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: 2620a.htm Office: TEL 3049.
Sequencing The most simple type of program uses sequencing, a set of instructions carried out one after another. Start End Display “Computer” Display “Science”
Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.
HW4: sites that look like transcription start sites Nucleotide histogram Background frequency Count matrix for translation start sites (-10 to 10) Frequency.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Homework 2 (due:April 17th) Deadline : April 17th 11:59pm Where to submit? eClass “ 과제방 ” ( How to submit? Create a folder. The.
Graphs and Shortest Paths Using ADTs and generic programming.
Maximal D-segments Maximal-scoring No subsegment has higher score No segment properly containing the segment satisfies the above No supersegment has higher.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
BCA-II Data Structure Using C Submitted By: Veenu Saini
C++ Memory Management – Homework Exercises
Chapter 10 NP-Complete Problems.
Homework 3 (due:May 27th) Deadline : May 27th 11:59pm
SAGExplore web server tutorial for Module III:
Week 10.
Advanced Algorithms and DS CMPS 3013
CSE 2331/5331 Topic 9: Basic Graph Alg.
Week 7 Part 2 Kyle Dewey.
Homework 3 (due:June 5th)
week 1 - Introduction Goals
Introduction to Python
The Taxi Scheduling Problem
Discussion section #2 HW1 questions?
First discussion section agenda
Introduction to Python
ICS 353: Design and Analysis of Algorithms
Finding a Eulerian Cycle in a Directed Graph
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
Elementary Graph Algorithms
What is a Graph? a b c d e V= {a,b,c,d,e} E= {(a,b),(a,c),(a,d),
GreedyMaxCut a b w c d e 1.
ITEC 2620M Introduction to Data Structures
Warm-Up 1) Write the Now-Next equation for each sequence of numbers. Then find the 10th term of the sequence. a) – 3, 5, 13, 21, … b) 2, – 12, 72, – 432,
Week 4 Genome 540: Discussion
Topic 5: Heap data structure heap sort Priority queue
Lecture 11 CSE 331 Sep 23, 2011.
Week 5 Discussion Section
Y x Linear vs. Non-linear.
Homework 2 (due:May 15th) Deadline : May 15th 11:59pm
Homework 2 (due:May 5th) Deadline : May 5th 11:59pm
Homework Assignment #2 SIC/XE Assembler
Important Problem Types and Fundamental Data Structures
Genome 540: Discussion Section Week 3
Discussion Section Week 9
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
Analysis and design of algorithm
Huffman Coding Greedy Algorithm
Lecture 10 Graph Algorithms
Homework 2 (due:May 13th) Deadline : May 11th 11:59pm
Introduction to Python
Lecture 11 CSE 331 Sep 20, 2013.
Finding the maximum matching of a bipartite graph
Presentation transcript:

Discussion Section 3 HW1 comments HW2 questions Maximal D-Segment algorithm Useful data structures

HW1 comments Testing Working together Formatting Test cases with known output Built-in checks (e.g. match locations for a string and its reverse complement should have the same position, on opposite strands) Working together It’s okay to compare final output, just not code Formatting Submit a plain text file (not rtf or Word) Include your name in the filename

HW2 questions? Notes: Assume that any input graph text file lists the vertices in depth order Write your representation of the graph image in depth order Make sure you write the sequence graph file in depth order How do you find the vertex at the beginning of the path?

Maximal segment vs. Maximal D-segment No subsegment has a higher score No segment properly containing the segment satisfies the above condition Does NOT imply that no supersegment has a higher score Maximal D-segment: No subsegment has score < D, where dropoff D < 0 No D-segment properly containing the D-segment satisfies the above condition The segment score must be >= S, where S >= -D

D cumulative score S find the maximal d segments sequence position

D cumulative score S sequence position

D cumulative score S sequence position

Pseudo-code for the D-segment algorithm:

D = -3 S = 3 max = 0 start = 1 end = 1 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 1 end = 1 cumul = 0

D = -3 S = 3 max = 0 start = 2 end = 2 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 2 end = 2 cumul = 0

D = -3 S = 3 max = 0 start = 2 end = 2 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 2 end = 2 cumul = 0

D = -3 S = 3 max = 0 start = 3 end = 3 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 3 end = 3 cumul = 0

D = -3 S = 3 max = 0 start = 3 end = 3 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 3 end = 3 cumul = 0

D = -3 S = 3 max = 0 start = 4 end = 4 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 4 end = 4 cumul = 0

D = -3 S = 3 max = 0 start = 4 end = 4 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 4 end = 4 cumul = 0

D = -3 S = 3 max = 0 start = 5 end = 5 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 5 end = 5 cumul = 0

D = -3 S = 3 max = 0 start = 5 end = 5 cumul = 0 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0 start = 5 end = 5 cumul = 0

D = -3 S = 3 max = 0.52 start = 5 end = 5 cumul = 0.52 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0.52 start = 5 end = 5 cumul = 0.52

D = -3 S = 3 max = 0.52 start = 5 end = 5 cumul = 0.52 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 0.52 start = 5 end = 5 cumul = 0.52

D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.62 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.62

D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.62 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.62

D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.12 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.12

D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.12 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.12

D = -3 S = 3 max = 2.82 start = 5 end = 8 cumul = 2.82 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 2.82 start = 5 end = 8 cumul = 2.82

D = -3 S = 3 max = 2.82 start = 5 end = 8 cumul = 2.82 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 2.82 start = 5 end = 8 cumul = 2.82

D = -3 S = 3 max = 3.34 start = 5 end = 9 cumul = 3.34 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 3.34 start = 5 end = 9 cumul = 3.34

D = -3 S = 3 max = 3.34 start = 5 end = 9 cumul = 3.34 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 3.34 start = 5 end = 9 cumul = 3.34

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 4.44 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 4.44

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 4.44 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 4.44

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.94 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.94

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.94 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.94

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.44 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.44

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.44 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.44

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.94 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.94

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.94 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.94

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44

D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44 position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44

D-segment: 5, 10, 4.44 (start, end, max) position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # read starts 0 0 0 0 1 2 0 4 1 2 0 0 0 0 score -0.5 -0.5 -0.5 -0.5 0.52 1.1 -0.5 1.7 0.52 1.1 -0.5 -0.5 -0.5 -0.5 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44 D-segment: 5, 10, 4.44 (start, end, max)

HW3 Due 11:59pm on Sunday, January 28 Assignment: use D-segment algorithm to identify sequence segments with high copy number. Input: Count file reporting number of read starts at each location Scoring scheme Output: Number of normal and elevated copy-number segments List of elevated copy-number segments (start, end, score) Annotations for the first three segments (look up using UCSC genome browser) Histograms of read-start counts (i.e. number of positions with 0, 1, 2, and >=3 read-starts) for non-elevated and elevated segments

Useful data structures Arrays Fast indexing, pointer math is easy Linked lists Inserting/deleting/reordering is easy Hash tables/maps Good for looking up things Trees Useful for sorting/searching More on pros/cons of different data structures, and the runtimes of typical operations here.