Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discussion Section 3 HW1 comments HW2 questions

Similar presentations


Presentation on theme: "Discussion Section 3 HW1 comments HW2 questions"— Presentation transcript:

1 Discussion Section 3 HW1 comments HW2 questions
Maximal D-Segment algorithm Useful data structures

2 HW1 comments Testing Working together Formatting
Test cases with known output Built-in checks (e.g. match locations for a string and its reverse complement should have the same position, on opposite strands) Working together It’s okay to compare final output, just not code Formatting Submit a plain text file (not rtf or Word) Include your name in the filename

3 HW2 questions? Notes: Assume that any input graph text file lists the vertices in depth order Write your representation of the graph image in depth order Make sure you write the sequence graph file in depth order How do you find the vertex at the beginning of the path?

4 Maximal segment vs. Maximal D-segment
No subsegment has a higher score No segment properly containing the segment satisfies the above condition Does NOT imply that no supersegment has a higher score Maximal D-segment: No subsegment has score < D, where dropoff D < 0 No D-segment properly containing the D-segment satisfies the above condition The segment score must be >= S, where S >= -D

5

6 D cumulative score S find the maximal d segments sequence position

7 D cumulative score S sequence position

8 D cumulative score S sequence position

9 Pseudo-code for the D-segment algorithm:

10 D = -3 S = 3 max = 0 start = 1 end = 1 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 1 end = 1 cumul = 0

11 D = -3 S = 3 max = 0 start = 2 end = 2 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 2 end = 2 cumul = 0

12 D = -3 S = 3 max = 0 start = 2 end = 2 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 2 end = 2 cumul = 0

13 D = -3 S = 3 max = 0 start = 3 end = 3 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 3 end = 3 cumul = 0

14 D = -3 S = 3 max = 0 start = 3 end = 3 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 3 end = 3 cumul = 0

15 D = -3 S = 3 max = 0 start = 4 end = 4 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 4 end = 4 cumul = 0

16 D = -3 S = 3 max = 0 start = 4 end = 4 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 4 end = 4 cumul = 0

17 D = -3 S = 3 max = 0 start = 5 end = 5 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 5 end = 5 cumul = 0

18 D = -3 S = 3 max = 0 start = 5 end = 5 cumul = 0
position # read starts score D = -3 S = 3 max = 0 start = 5 end = 5 cumul = 0

19 D = -3 S = 3 max = 0.52 start = 5 end = 5 cumul = 0.52
position # read starts score D = -3 S = 3 max = 0.52 start = 5 end = 5 cumul = 0.52

20 D = -3 S = 3 max = 0.52 start = 5 end = 5 cumul = 0.52
position # read starts score D = -3 S = 3 max = 0.52 start = 5 end = 5 cumul = 0.52

21 D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.62
position # read starts score D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.62

22 D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.62
position # read starts score D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.62

23 D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.12
position # read starts score D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.12

24 D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.12
position # read starts score D = -3 S = 3 max = 1.62 start = 5 end = 6 cumul = 1.12

25 D = -3 S = 3 max = 2.82 start = 5 end = 8 cumul = 2.82
position # read starts score D = -3 S = 3 max = 2.82 start = 5 end = 8 cumul = 2.82

26 D = -3 S = 3 max = 2.82 start = 5 end = 8 cumul = 2.82
position # read starts score D = -3 S = 3 max = 2.82 start = 5 end = 8 cumul = 2.82

27 D = -3 S = 3 max = 3.34 start = 5 end = 9 cumul = 3.34
position # read starts score D = -3 S = 3 max = 3.34 start = 5 end = 9 cumul = 3.34

28 D = -3 S = 3 max = 3.34 start = 5 end = 9 cumul = 3.34
position # read starts score D = -3 S = 3 max = 3.34 start = 5 end = 9 cumul = 3.34

29 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 4.44
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 4.44

30 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 4.44
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 4.44

31 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.94
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.94

32 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.94
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.94

33 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.44
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.44

34 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.44
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 3.44

35 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.94
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.94

36 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.94
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.94

37 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44

38 D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44

39 D-segment: 5, 10, 4.44 (start, end, max)
position # read starts score D = -3 S = 3 max = 4.44 start = 5 end = 10 cumul = 2.44 D-segment: , 10, (start, end, max)

40 HW3 Due 11:59pm on Sunday, January 28
Assignment: use D-segment algorithm to identify sequence segments with high copy number. Input: Count file reporting number of read starts at each location Scoring scheme Output: Number of normal and elevated copy-number segments List of elevated copy-number segments (start, end, score) Annotations for the first three segments (look up using UCSC genome browser) Histograms of read-start counts (i.e. number of positions with 0, 1, 2, and >=3 read-starts) for non-elevated and elevated segments

41 Useful data structures
Arrays Fast indexing, pointer math is easy Linked lists Inserting/deleting/reordering is easy Hash tables/maps Good for looking up things Trees Useful for sorting/searching More on pros/cons of different data structures, and the runtimes of typical operations here.


Download ppt "Discussion Section 3 HW1 comments HW2 questions"

Similar presentations


Ads by Google