Topics read length distribution genome coverage

Slides:

Advertisements

Similar presentations

ThinkPython Ch. 10 CS104 Students o CS104 n Prof. Norman.

Advertisements

CS 11 C track: lecture 7 Last week: structs, typedef, linked lists This week: hash tables more on the C preprocessor extern const.

For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Python Control of Flow.

PYTHON PLOTTING CURVES CHAPTER 10_5 FROM THINK PYTHON HOW TO THINK LIKE A COMPUTER SCIENTIST.

Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.

“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?

Builtins, namespaces, functions. There are objects that are predefined in Python Python built-ins When you use something without defining it, it means.

Python Lists and Such CS 4320, SPRING List Functions len(s) is the length of list s s + t is the concatenation of lists s and t s.append(x) adds.

Fall Week 4 CSCI-141 Scott C. Johnson.  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles.

If statements while loop for loop

Built-in Data Structures in Python An Introduction.

Q and A for Sections 2.9, 4.1 Victor Norman CS106 Fall 2015.

Getting Started with Python: Constructs and Pitfalls Sean Deitz Advanced Programming Seminar September 13, 2013.

Introducing Python CS 4320, SPRING Resources We will be following the Python tutorialPython tutorial These notes will cover the following sections.

Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.

1 CSC 221: Introduction to Programming Fall 2011 Lists  lists as sequences  list operations +, *, len, indexing, slicing, for-in, in  example: dice.

Jim Havrilla. Invoking Python Just type “python –m script.py [arg]” or “python –c command [arg]” To exit, quit() or Control-D is used To just use the.

GE3M25: Computer Programming for Biologists Python, Class 5

1 CSC 221: Introduction to Programming Fall 2012 Lists  lists as sequences  list operations +, *, len, indexing, slicing, for-in, in  example: dice.

Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.

Short Read Workshop Day 5: Mapping and Visualization

Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 next week. See next slide. Both versions of assignment 3 are posted. Due today.

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Lists/Dictionaries. What we are covering Data structure basics Lists Dictionaries Json.

Chapter 3 Lists, Stacks, Queues. Abstract Data Types A set of items – Just items, not data types, nothing related to programming code A set of operations.

ARRAYS (Extra slides) Arrays are objects that help us organize large amounts of information.

Day 5 Mapping and Visualization

Winter 2009 Tutorial #6 Arrays Part 2, Structures, Debugger

How to python source: Web_Dev fan on pinterest.

Types CSCE 314 Spring 2016.

CS170 – Week 1 Lecture 3: Foundation Ismail abumuhfouz.

Sequences and Indexing

ECE Application Programming

Introduction to Python

Containers and Lists CIS 40 – Introduction to Programming in Python

IPYTHON AND MATPLOTLIB Python for computational science

Lecture 24: print revisited, tuples cont.

CS1010 Discussion Group 11 Week 7 – Two dimensional arrays.

While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.

For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble Notes for 2010: I skipped slide 10. This is.

Arrays We often want to organize objects or primitive data in a way that makes them easy to access and change. An array is simple but powerful way to.

Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.

Winter 2018 CISC101 12/1/2018 CISC101 Reminders

Python plotting curves chapter 10_5

Next Gen. Sequencing Files and pysam

More for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.

Sequence comparison: Local alignment

Python – a HowTo Peter Wad Sackett and Henrike Zschach.

Python Lists and Sequences

Advanced Python Concepts: Exceptions

EECE.2160 ECE Application Programming

EECE.2160 ECE Application Programming

Topics Sequences Lists Copying Lists Processing Lists

Next Gen. Sequencing Files and pysam

CISC101 Reminders Assignment 3 due next Friday. Winter 2019

Advanced Python Concepts: Exceptions

Next Gen. Sequencing Files and pysam

EECE.2160 ECE Application Programming

EECE.2160 ECE Application Programming

“Everything Else”.

EECE.2160 ECE Application Programming

Hash Maps Implementation and Applications

EECE.2160 ECE Application Programming

Lists Like tuples, but mutable. Formed with brackets: Assignment: >>> a = [1,2,3] Or with a constructor function: a = list(some_other_container) Subscription.

Presentation transcript:

Topics read length distribution genome coverage Practical Biocomputing 2018 Week 12

Insert length distribution map reads to reference (bowtie2) select reads where both mates map with high quality (samtools) python calculate mean, standard deviation plot histogram example: https://matplotlib.org/1.2.1/examples/pylab_examples/histogram_demo.html Practical Biocomputing 2018 Week 12

insert_size.py Practical Biocomputing 2018 Week 12 """================================================================================================= insert_size.py Calulate insert size based on a SAM file of mapped reads. To get only high quality mapped read pairs use the samtools command samtools view -q 20 -f 0x82 SRR5295840.bam > SRR5295840.mapped SAM format is (all one line, whitespace separated fields) SRR5295840.120 163 AT1G07250.1 101 44 1S150M = 347 398 NCT...CAG #A<...FJF AS:i:300 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:150 YS:i:300 YT:Z:CP the insert length is field 8, the last field before the sequence Michael Gribskov 1 April 2018 =================================================================================================""" import sys map = None try: map = open(sys.argv[1], 'r') except: print('unable to open input file ({}'.format(sys.argv[1])) exit(1) nread = 0 for line in map: nread += 1 print(line) if nread > 10: break print('{} reads read from {}'.format(nread, sys.argv[1])) exit(0) Practical Biocomputing 2018 Week 12

insert_size.py histogram boilerplate from example difficult to decipher error import sys import matplotlib.mlab as mlab import matplotlib.pyplot as plt map = None try: map = open(sys.argv[1], 'r') except: print('unable to open input file ({}'.format(sys.argv[1])) exit(1) nread = 0 lendata = [] for line in map: nread += 1 field = line.split() print('{}\t{}'.format(field[0], field[8])) lendata.append(field[8]) if nread > 1000: break print('\n{} reads read from {}'.format(nread, sys.argv[1])) n, bins, patches = plt.hist(lendata, bins=100, normed=1, facecolor='blue', alpha=0.75) plt.xlabel('Length') plt.ylabel('Probability') plt.title('Library Insert Length') plt.axis([40, 160, 0, 0.03]) plt.grid(True) plt.show() exit(0) Traceback (most recent call last): File "/scratch/snyder/m/mgribsko/biocomputing/utils/insert_size.py", line 38, in <module> n, bins, patches = plt.hist(lendata, bins=100, normed=1, facecolor='blue', alpha=0.75) File "/apps/rhel6/Anaconda/4.4.0-py36/lib/python3.6/site-packages/matplotlib/pyplot.py", line 3081, in hist stacked=stacked, data=data, **kwargs) File "/apps/rhel6/Anaconda/4.4.0-py36/lib/python3.6/site-packages/matplotlib/__init__.py", line 1898, in inner return func(ax, *args, **kwargs) File "/apps/rhel6/Anaconda/4.4.0-py36/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 6180, in hist if len(xi) > 0: TypeError: len() of unsized object Practical Biocomputing 2018 Week 12

insert_size.py it turns out that the data vector (lendata) must be floats actually they were strings now i get a plot but there’s nothing in it, but if i run in debugger i get something different disappears when the plt.axis() command runs duh, in the example the data is in a different range axis() sets the plot limits nread = 0 lendata = [] for line in map: nread += 1 field = line.split() print('{}\t{}'.format(field[0], field[8])) lendata.append(float(field[8])) if nread > 1000: break print('\n{} reads read from {}'.format(nread, sys.argv[1])) n, bins, patches = plt.hist(lendata, bins=100, normed=1, facecolor='blue', alpha=0.75) plt.xlabel('Length') plt.ylabel('Probability') plt.title('Library Insert Length') plt.axis([40, 160, 0, 0.03]) plt.grid(True) plt.show() Practical Biocomputing 2018 Week 12

insert_size.py it works! problems i have some negative lengths i want the bars to have a black outline insert = float(field[8]) insert = max( insert, -insert) lendata.append(insert) n, bins, patches = plt.hist(lendata, bins=100, normed=1, facecolor='blue', edgecolor='black', linewidth=0.25, alpha=0.75 ) Practical Biocomputing 2018 Week 12

insert_size.py all 15.7 M reads Practical Biocomputing 2018 Week 12

insert_size.py bells and whistles calculate mean and standard deviation import statistics as stat lenmean = stat.mean(lendata) lensd = stat.stdev(lendata) write mean and standard deviation on plot draw mean line on plot # the following is for adding annotation to the figure # must do it before plotting the histogram fig = plt.figure() ax = fig.add_subplot(111) n, bins, patches = plt.hist(lendata, bins=100, normed=1, facecolor='blue', edgecolor='black', linewidth=0.25, alpha=0.75 ) plt.xlabel('Length') plt.ylabel('Probability') plt.title('Library Insert Length') plt.grid(True, linestyle='-', linewidth=0.1) plt.text( 0.02, 0.9, 'mean: {:.1f}\nstandard deviation: {:.1f}'.format(lenmean, lensd), fontsize=10, transform=ax.transAxes) plt.axvline( lenmean, color='red', linewidth=1.5) Practical Biocomputing 2018 Week 12

read_dist.py plot the read distribution on a reference sequence uses SAM file again make an array the corresponds to the sequence increment the positions based the beginning position of the read (POS) and the alignment (CIGAR) """================================================================================================= Plot the distribution of reads on a reference sequence based on mapped reads in a SAM file SAM format is (all one line, whitespace separated fields) SRR5295840.120 163 AT1G07250.1 101 44 1S150M = 347 398 NCT...CAG #A<...FJF AS:i:300 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:150 YS:i:300 YT:Z:CP SRR5295840.120 QNAME read name 163 FLAG mapping bit flags AT1G07250.1 RNAME reference sequence name 101 POS leftmost position of mapped read 44 MAPQ mapping quality 1S150M CIGAR alignment = RNEXT name of mate/next read 347 PNEXT position of mate/next read 398 TLEN inferred insert size NCT...CAG SEQ sequence #A<...FJF QUAL quality the remaining columns are application specific AS:i:300 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:150 YS:i:300 YT:Z:CP =================================================================================================""" Practical Biocomputing 2018 Week 12

read_dist.py Practical Biocomputing 2018 Week 12 import sys # -------------------------------------------------------------------------------------------------- # main if __name__ == '__main__': map = None try: map = open(sys.argv[1], 'r') except: print('unable to open input file ({}'.format(sys.argv[1])) exit(1) nread = 0 seq = [] # assume that the mapped read begins at POS # use the CIGAR string to increment counts in the seq array for line in map: nread += 1 field = line.split() # print('{}\t{}'.format(field[0], field[8])) pos = int(field[3]) cigar = field[5] if nread > 1000: break Practical Biocomputing 2018 Week 12

read_dist.py CIGAR string, examples 151M – perfect match 151 matching bases 75S57M19S – 75 do not match, 57 match, 19 do not match (151 bases) 48S102M1 – 48 do not match, 102 match, 1 does not match 8S 82M 8I 2M 1I 36M 14S – 8 no match, 82 match 8 base insertion in read 2 base match 1 base insertion in read 36 base match, 14 base no match M alignment match (can be a sequence match or mismatch) I insertion to the reference D deletion from the reference N skipped region from the reference S soft clipping (clipped sequences present in SEQ) H hard clipping (clipped sequences NOT present in SEQ) P padding (silent deletion from padded reference) = sequence match X sequence mismatch Practical Biocomputing 2018 Week 12

read_dist.py CIGAR codes, mostly M and S, a few D or I Actions S increment sequence positions but do not count as a match M increment sequence positions and count as a match I (insertion in read) ignore, do not increment position D (deletion in read) ignore, increment sequence position M alignment match (can be a sequence match or mismatch) I insertion to the reference D deletion from the reference N skipped region from the reference S soft clipping (clipped sequences present in SEQ) H hard clipping (clipped sequences NOT present in SEQ) P padding (silent deletion from padded reference) = sequence match X sequence mismatch Practical Biocomputing 2018 Week 12

read_dist.py Practical Biocomputing 2018 Week 12 istr = '' m = 0 for char in cigar: if char.isdigit(): istr += char continue i = int(istr) if char == 'M': # matching positions m += 1 for j in range(pos,pos+i-1) seq[pos] += 1 pos += i - 1 elif char == 'S': # soft clipped positions elif char == 'I': # insertions in read, do nothing elif char == 'D': # deletion in read, increment pos else: # must be a character we don't care about, # we'll just ignore these for now pass return m # end of add_cigar istr = '' m = 0 for char in cigar: if char.isdigit(): istr += char continue i = int(istr) if char == 'M': # matching positions m += 1 for j in range(pos,pos+i-1) seq[pos] += 1 elif char not in ‘SD’: # must be a character we don't care about, # we'll just ignore these for now # I also gets dealt with here # M, S, and D increment the position, S and D do nothing else pos += i - 1 return m # end of add_cigar Practical Biocomputing 2018 Week 12

read_dist.py problem with seq list since i did not initialize it its ugly if i hardwire an arbitrary large value such as 50 M changes with the genome could read from command line i just have to know what the biggest sequence might be maybe i can do it on the fly? if __name__ == '__main__': map = None try: map = open(sys.argv[1], 'r') except: print('unable to open input file ({}'.format(sys.argv[1])) exit(1) nread = 0 seq = [] # assume that the mapped read begins at POS # use the CIGAR string to increment counts in the seq array bases_mapped = 0 for line in map: nread += 1 field = line.split() # print('{}\t{}'.format(field[0], field[8])) pos = int(field[3]) cigar = field[5] bases_mapped += add_cigar( seq, pos, cigar) if nread > 1000: break print('{} bases mapped'.format(bases_mapped)) exit(0) Traceback (most recent call last): File "/scratch/snyder/m/mgribsko/biocomputing/utils/read_dist.py", line 94, in <module> bases_mapped += add_cigar( seq, pos, cigar) File "/scratch/snyder/m/mgribsko/biocomputing/utils/read_dist.py", line 52, in add_cigar seq[pos] += 1 IndexError: list index out of range Practical Biocomputing 2018 Week 12

read_dist.py fixing the array overrun problem detecting the problem check the current end of the array versus the new alignment istr = '' m = 0 for char in cigar: if char.isdigit(): istr += char continue i = int(istr) if char == 'M': # matching positions #check to make sure seq list is big enough, if not add some more elements if pos + i + 1 > len(seq): extend_list(seq, pos + i + 10000) for j in range(pos, pos + i - 1): m += 1 seq[j] += 1 elif char not in 'SD': # must be a character we don't care about, we'll just ignore these for now # I gets dealt with here # only M, S, and D fall through to here # M, S, and D increment the position, S and D do nothing else pos += i - 1 return m # end of add_cigar Practical Biocomputing 2018 Week 12

read_dist.py fixing the array overrun problem extend_list() function def extend_list(arr, end, init=0): """--------------------------------------------------------------------------------------------- extend the existing list arr by adding indices from the current end of the list to the specified end pos (the new last index in list) :param arr: list :param end: last index to create :param init: value to initialize elements with :return: int, new list size ---------------------------------------------------------------------------------------------""" begin = len(arr) arr += [init for k in range(begin, end + 1)] return len(arr) Practical Biocomputing 2018 Week 12

read_dist.py in testing, i find a few sam lines cause problems filter them out (main program) bases_mapped = 0 for line in map: if line.startswith('@'): # skip header lines continue field = line.split() # print('{}\t{}'.format(field[0], field[8])) pos = int(field[3]) mapq = field[4] cigar = field[5] # filter some lines if mapq==0 or cigar =='*': nread += 1 bases_mapped += add_cigar(seq, pos, cigar) if nread > 100000: break print('{} bases mapped'.format(bases_mapped)) exit(0) Practical Biocomputing 2018 Week 12

read_dist.py the seq array is pretty sparse so i wrote a function to compress it 125166 35 125169 34 125173 37 125174 39 125176 40 125179 38 125180 37 125184 39 125185 40 125190 42 125191 43 125192 44 125193 48 125194 49 125195 50 125197 52 125202 56 125204 57 125210 60 125213 61 125214 62 125215 73 125216 76 125217 77 125218 84 125219 85 125220 92 125222 91 125223 93 125224 95 125226 97 125227 99 125228 100 125229 102 125236 114 125238 110 125240 112 125241 116 125243 118 125249 116 125250 121 125252 124 125253 127 125255 125 125258 126 125261 125 125265 126 125270 125 125272 123 125290 124 125294 122 125295 121 125298 119 125302 117 125304 118 125305 117 125308 4 125342 3 125368 4 125401 3 125422 4 125452 3 125492 2 125551 1 127052 0 127136 1 127181 0 127238 1 128630 0 128631 2 128633 6 128637 7 128650 9 128654 10 128655 14 128659 16 128660 21 128661 31 128662 33 128664 35 128665 36 128667 43 128668 44 128670 46 128671 48 128672 55 128673 59 128674 60 def condense_seq(seq): """--------------------------------------------------------------------------------------------- condense the seq array by removing consecutive positions with the same value. This allows easier printing. For plotting, each interval ends at the indicated position. pos1, val1 pos2, val2 ... SAM files use a 1 origin, so the first interval is 1 to pos1 with value val1, the second is pos1 + 1 to pos2 with val2, etc. :param seq: list of int, coverage at each base :return: list of tuples, (pos, val) ---------------------------------------------------------------------------------------------""" compressed = [] cover = seq[1] for pos in range(2, len(seq)): if seq[pos] != cover: compressed.append((pos - 1, cover)) cover = seq[pos] return compressed Chr4 gene 122851 125591 ID=AT4G00290;Name=AT4G00290 Chr4 mRNA 122851 125591 ID=AT4G00290.1;Parent=AT4G00290 Chr4 3'UTR 122851 123096 ID=AT4G00290:three_prime_UTR:1;Parent=AT4G00290.1;Name=AT4G00290:three_prime_UTR:1 Chr4 exon 122851 123207 ID=AT4G00290:exon:7;Parent=AT4G00290.1;Name=AT4G00290:exon:7 Chr4 exon 123362 123583 ID=AT4G00290:exon:6;Parent=AT4G00290.1;Name=AT4G00290:exon:6 Chr4 exon 123730 123837 ID=AT4G00290:exon:5;Parent=AT4G00290.1;Name=AT4G00290:exon:5 Chr4 exon 123966 124066 ID=AT4G00290:exon:4;Parent=AT4G00290.1;Name=AT4G00290:exon:4 Chr4 exon 124271 124548 ID=AT4G00290:exon:3;Parent=AT4G00290.1;Name=AT4G00290:exon:3 Chr4 exon 124627 125304 ID=AT4G00290:exon:2;Parent=AT4G00290.1;Name=AT4G00290:exon:2 Chr4 5'UTR 125301 125304 ID=AT4G00290:five_prime_UTR:2;Parent=AT4G00290.1;Name=AT4G00290:five_prime_UTR:2 Chr4 exon 125477 125591 ID=AT4G00290:exon:1;Parent=AT4G00290.1;Name=AT4G00290:exon:1 Chr4 5'UTR 125477 125591 ID=AT4G00290:five_prime_UTR:1;Parent=AT4G00290.1;Name=AT4G00290:five_prime_UTR:1 reverse strand Practical Biocomputing 2018 Week 12

read_dist.py Practical Biocomputing 2018 Week 12 majorlocator = MultipleLocator(1000) minorlocator = MultipleLocator(100) majorformatter = FormatStrFormatter('%d') fig, ax = plt.subplots(1, 1, figsize=(15,3)) pos = [i + 1 for i in range(0, len(seq))] ticks = [i for i in range(3000, 8000) if i % 100 == 0 ] ax.fill(pos, seq, linewidth=0.75) # plt.yscale('log') plt.xticks(ticks) plt.xlim(3000, 8000) plt.ylim(0, 150) ax.xaxis.set_major_locator(majorlocator) ax.xaxis.set_major_formatter(majorformatter) ax.xaxis.set_minor_locator(minorlocator) ax.set(xlabel='position', ylabel='Read Count', title='Read Distribution') ax.grid() plt.show() Practical Biocomputing 2018 Week 12

read_dist.py Practical Biocomputing 2018 Week 12 Chr4 exon 4127 4149 ID=AT4G00020:exon:27;Parent=AT4G00020.2;Name=BRCA2(IV):exon:27 Chr4 CDS 4127 4149 ID=AT4G00020:CDS:27;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:27 Chr4 exon 4227 4438 ID=AT4G00020:exon:26;Parent=AT4G00020.2;Name=BRCA2(IV):exon:26 Chr4 CDS 4227 4438 ID=AT4G00020:CDS:25;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:25 Chr4 CDS 4545 4749 ID=AT4G00020:CDS:22;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:22 Chr4 CDS 4839 4901 ID=AT4G00020:CDS:21;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:21 Chr4 CDS 4977 5119 ID=AT4G00020:CDS:19;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:19 Chr4 CDS 5406 5588 ID=AT4G00020:CDS:18;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:18 Chr4 CDS 5657 5855 ID=AT4G00020:CDS:17;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:17 Chr4 CDS 6605 6676 ID=AT4G00020:CDS:16;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:16 Chr4 CDS 6760 6871 ID=AT4G00020:CDS:15;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:15 Chr4 CDS 6975 7056 ID=AT4G00020:CDS:14;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:14 Chr4 CDS 7144 7194 ID=AT4G00020:CDS:13;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:13 Chr4 CDS 7294 7375 ID=AT4G00020:CDS:12;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:12 Chr4 CDS 7453 7638 ID=AT4G00020:CDS:11;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:11 Chr4 CDS 7712 7813 ID=AT4G00020:CDS:10;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:10 Chr4 CDS 7914 7947 ID=AT4G00020:CDS:8;Parent=AT4G00020.2;Name=BRCA2(IV):CDS:8 Practical Biocomputing 2018 Week 12

read_dist.py With exons marked Practical Biocomputing 2018 Week 12 # add exon locations exon = [('Chr4', 'CDS', 4127, 4149), ('Chr4', 'CDS', 4227, 4438), ('Chr4', 'CDS', 4545, 4749), ('Chr4', 'CDS', 4839, 4901), ('Chr4', 'CDS', 4977, 5119), ('Chr4', 'CDS', 5406, 5588), ('Chr4', 'CDS', 5657, 5855), ('Chr4', 'CDS', 6605, 6676), ('Chr4', 'CDS', 6760, 6871), ('Chr4', 'CDS', 6975, 7056), ('Chr4', 'CDS', 7144, 7194), ('Chr4', 'CDS', 7294, 7375), ('Chr4', 'CDS', 7453, 7638), ('Chr4', 'CDS', 7712, 7813), ('Chr4', 'CDS', 7914, 7947)] span = 8000 - 3000 for e in exon: begin = (e[2] - 3000)/span end = (e[3] - 3000)/span plt.axhline(5.0, begin, end, color='black', linewidth=6.0) Practical Biocomputing 2018 Week 12