Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biology is the science of reverse-engineering life

Similar presentations


Presentation on theme: "Biology is the science of reverse-engineering life"— Presentation transcript:

1 Biology is the science of reverse-engineering life
Living organisms are molecular machines capable of replicating themselves The functional unit of life is the “cell” 1-100 um in diameter contains a primary information store: the genome

2 The Structure of a Genome
Generally one or several strands of a polymer, called DNA, packaged into “chromosomes” Information is encoded as the order of the monomer sub-units (of types “A”, “C”, “G”, “T”) in the linear polymer Each cell carries the entire genome of the organism.

3 The Nature of DNA Linear, water-soluble, molecular data storage
The polymer is actually “double-stranded”-each strand the “reverse-complement” of the the other The double strand is 2 nm in diameter Each monomer unit, “base”, added lengthens the strand by 0.34 nm

4 DNA Storage Density Genome length of an average bacterium is 2 megabases (Mb) Human genome 3 gigabases (Gb) Typical DNA “prep” solution contains about 25 petabytes/ml. (A ml is about 20 drops of a liquid.)

5 Perl and Genomics Good: Bad: Result: frequently middleware
Perl is quick to write Excellent for parsing DWIM is good for the typical biologist Bad: Not as fast running Result: frequently middleware

6 My perl scripts 167 in my /bin
Most are for either dealing with system stuff or parsing output from other programs A few are meant to directly analyze “sequence data”

7 Example Sequence Analysis Program
ssr3.pl “ssr” is “simple sequence repeat” aka “microsatellite” E.g: >Echinomicrosat_01_B04_T7 XXXXXCAGAAGCGCTTCACAATTAAAAGCAAATCATACAAATATGATCAT CAGGCAGGCTATTTGAACACACTGTTTCGCACTGAACTCATAGTCACATT TCAGTCGTTCAGTGAGATGATTCATATGGCATAATTTGAACTGACGTTCG CTCTGACTATCGTTCAGCTCGTTGTGGGCACAATCGTTAGTCAGTTCGTT CACTCAACCACACACACACACACACACACGGAAACATCAGATTCGAGCTA AGCTCTTATTACAGCTGATCAGTAGGAGCACTGTTAGACAGTCTACTAAA TCAATATCAATTATCCCCCCCACACAACCATGGCTTCTGXXXXX

8 Example run of ssr3.pl >Echinomicrosat_01_B04_T7
%ssr3.pl Echinomicrosat_01_B04_T7.fasta Name Seq Len Range # of repetitions of sub unit Sub unit Echinomicrosat_01_B04_T of repeat "CA" >Echinomicrosat_01_B04_T7 XXXXXCAGAAGCGCTTCACAATTAAAAGCAAATCATACAAATATGATCAT CAGGCAGGCTATTTGAACACACTGTTTCGCACTGAACTCATAGTCACATT TCAGTCGTTCAGTGAGATGATTCATATGGCATAATTTGAACTGACGTTCG CTCTGACTATCGTTCAGCTCGTTGTGGGCACAATCGTTAGTCAGTTCGTT CACTCAACCACACACACACACACACACACGGAAACATCAGATTCGAGCTA AGCTCTTATTACAGCTGATCAGTAGGAGCACTGTTAGACAGTCTACTAAA TCAATATCAATTATCCCCCCCACACAACCATGGCTTCTGXXXXX

9 ssr3.pl core routine: while ( $sequences{$x}
=~ m/#Capture each ssr sub-unit within tolerance #Note "?" for lazy capture. Ensures "AC" is #the repeat unit instead of "ACAC" for example ([ACGT]{$min_repeat_unit_len,$max_repeat_unit_len}?) \1{$min_repeat_num,} /gix ) { my $repeat_unit = $1; my $start_of_ssr = $-[0]+1; my $end_of_ssr = $+[0]; my $ssr = $&; my $ssr_length = length($ssr)/length($repeat_unit);


Download ppt "Biology is the science of reverse-engineering life"

Similar presentations


Ads by Google