Presentation is loading. Please wait.

Presentation is loading. Please wait.

Object-oriented Design and Programming Conrad Huang PC204, Fall 2004.

Similar presentations


Presentation on theme: "Object-oriented Design and Programming Conrad Huang PC204, Fall 2004."— Presentation transcript:

1 Object-oriented Design and Programming Conrad Huang PC204, Fall 2004

2 2 Requirements I have a bunch of mRNA sequences and I want to know where they are located on the mouse genome. I want a fast method because I plan to do this with lots of data sets. This is probably better than most requirement statements that you’re likely to get

3 3 Specification Genomic and mRNA sequences are stored in FASTA format FASTA Matches between genomic and mRNA (sub)sequences are found using a localization program, e.g., BLAT, SSAHA or MegaBLASTBLATSSAHAMegaBLAST Each match is defined by a (start, end) pair on the mRNA and a (chromosome, start, end) 3-tuple on the genomic sequence The genomic location of an mRNA sequence is defined by a set of matches that maximally covers the mRNA Imprecise, but workable. Needs statement of what constitutes acceptable results.

4 4 Design (OO) Objects –FASTA file –Genomic sequence –mRNA sequence –Match –Genomic location Operations –Read sequences from file –Get matches from localization program –Collate matches into genomic location Not uh-oh. O-O.

5 5 Relationships FASTA file Genomic sequences FASTA file mRNA sequences Matches Genomic locations Arrow direction indicates reference or composition

6 6 Classes FastaFile –used for reading both genomic and mRNA sequences Sequence –represents either genomic or mRNA sequence Match –obtained from localization program output GenomicLocation –either obtained from localization program output (BLAT) or composed from matches using our own algorithm (SSAHA or BLAT) Localization program output parser Some classes, like FastaFile, can serve as the implementation of more than one concept (file of genomic sequences and file of mRNA sequences)

7 7 Class Methods FastaFile –read(filename) parse FASTA file content into a list of Sequence instances Sequence –None data derived by localization program parser

8 8 Class Methods (cont.) LocalizationOutputParser –localize(sequence) run localization program and parse output if using BLAT, identify genomic location and matches from output if using SSAHA or MegaBLAST, get list of matches from output and compute genomic location

9 9 Class Methods (cont.) Match –None data filled in by LocalizationOutputParser GenomicLocation –None data filled in by LocalizationOutputParser

10 10 Instance Attributes Some attributes are dictated by the class –FastaFile must have a list or dictionary of sequences –These may be accessible externally Some attributes are dictated by the operation –Reading a FASTA file might use a variable to keep track of the last line read –These are often for internal use only Defining actual attributes in our classes is left as an exercise for the reader Yeah, I ran out of steam here, and there are already too many slides.

11 11 Module Organization sequence.py –defines FastaFile and Sequence location.py –defines Match and GenomicLocation blat.py, ssaha.py, megablast.py –defines BlatParser, SsahaParser and MegablastParser, respectively Keep classes that cannot stand independently in the same module

12 12 Design (finishing touches) Select implementation algorithms –file parsers (FASTA or localization output) are sometimes available on the Internet –recursion and result caching for generating genomic location from list of matches If this task is too big, you need to partition the problem further

13 13 Implementation Coding, testing and debugging Start with the class skeletons Write test code for each module Test modules separately when possible Test early and often Well, it’s about time! Project is due in 10 minutes.

14 14 Roll Out Release product to user –include User’s Guide description of options example usage test cases The code is, of course, already completely documented.


Download ppt "Object-oriented Design and Programming Conrad Huang PC204, Fall 2004."

Similar presentations


Ads by Google