Pattern Recognition and Gene Finding

Slides:



Advertisements
Similar presentations
Syntax and Conventions Click to start This is best viewed as a slide show. To view it, click Slide Show on the top tool bar, then View show. Summary Some.
Advertisements

Click to start This is best viewed as a slide show. To view it, click Slide Show on the top tool bar, then View show. Integration of experimental evidence.
Click anywhere to go on to the next slide This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Chromosomes, Gene Expression and Epigenetics
The Search for Small Regulatory RNA Central Dogma: DNA to RNA to Protein Replication Processing / Translocation hnRNA rRNAtRNA mRNA.
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding.
13–2Manipulating DNA A.The Tools of Molecular Biology 1.DNA Extraction Homogenization: Cell walls, membranes, and nuclear material are broken Emulsification:
Frog’s eye view of the jungle (time frozen) Push to restart time.
Lives of the Scientist Genetic Basis of Differentiation Events in time and space...
BBSI Research Simulation News Project proposals - Monday, June 16 - Format (see News, Presentations and other dates) Renaissance fair and other events.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
Controlling the genes Lecture 15 pp Gene Expression Nearly all human cells have a nucleus (not red blood cells) Almost all these nucleated cells.
Click anywhere to go on to the next slide This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
RNA & Protein Synthesis. I. DNA to Genes A. We now know how the double helix is replicated but we still don’t know how it is then transformed into genes.
Shatha Khalil Ismael. Transformation Certain species of Gram- negative, gram- positive bacteria and some species of Archaea are transformable. The uptake.
Integrating the Bioinformatic Technology Group into your research programme Introduction People and Skills Examples Integrating the BTG Contacts BHRC Away.
Advanced Topics- Functions Introduction to MATLAB 7 Engineering 161.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Motif discovery and Protein Databases Tutorial 5.
Studying the genomes of organisms GENE TECHNOLOGY.
Welcome to Introduction to Bioinformatics Monday, 21 March 2005 Genome Comparison Coming attractions How to compare genomes Chi-squared analysis.
Integrated Bioinformatics Nature of research articles Comparison of genomes – Scenario Regular expressions in Python Installing and running Blast How to.
Welcome to Introduction to BioinformaticS Intro to Scenario 8 Identification of genes of foreign origin.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Green with envy?? Jelly fish “GFP” Transformed vertebrates.
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding Today is the last class. Would.
Click anywhere to go on to the next slide This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor.
Bacterial infection by lytic virus
bacteria and eukaryotes
Bacterial infection by lytic virus
Part 3 Gene Technology & Medicine
Transcription & Translation
Gene Regulation and Expression
Controlling the genes Lecture 15 pp
A Very Basic Gibbs Sampler for Motif Detection
Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Genomes and Their Evolution
COURSE OF MICROBIOLOGY
Learning Sequence Motif Models Using Expectation Maximization (EM)
Gene editing Scientific literacy in the field of Biology necessitates understanding the theory (Dobzhansky, 1973) Public trust in science.
Genomes and Their Evolution
DNA Replication 2.7 & 7.1.
Genomes and Their Evolution
Chapter 11 Gene Expression.
Recitation 7 2/4/09 PSSMs+Gene finding
Genomes and Their Evolution
Systems Vaccinology Immunity
Remember: Final Draft of Posters Due at 10 am tomorrow!
Biotechnology and Genetic Engineering PBIO 450/550
BLAST.
WHY IS EVERYONE CRAZY FOR CRISPR?
Introduction to Molecular Biology
A CRISPR Approach to Gene Targeting
BIOBASE Training TRANSFAC® ExPlain™
Deep Learning in Bioinformatics
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Introduction to Bioinformatics Tuesday, 19 March
Data Type 1: Microarrays
Presentation transcript:

Pattern Recognition and Gene Finding Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding

Pattern Recognition and Gene Finding Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding (Through software tools) An alternative

Lives of the Scientist

World’s Greatest Explorer

World’s Greatest Musicologist Expect = 4e-98

World’s Greatest Microbiologist

3337901 TACACCAGAT ATTGATGTCG TTTTGATGGA TGTAATGATG CCAGAAATGG 3337951 ACGGTTACGA AACAACAAGC TTAATCCGCC AAAACGAGCA ATTTAAATCT 3338001 TTGCCGATTA TTGCACTGAC AGCTAAAGCC ATGCAAGGCG ATCGCGAGAA 3338051 GTGTATTGAA GCGGGTGCAT CAGACTACAT CACCAAACCC GTAGATACTG 3338101 AACAACTGCT TTCACTCTTG CGTGTTTGGC TATACCGTTA ATTGGGGCAG 3338151 GGGGCAGGGA GCCGTTGCAA CTATTTCAAC CCTAATAGGG ATTTTGATGA 3338201 ATTGCAATTC CTCCTTCCTC TGGCTCTGCC ACCGTTCAGC AACTTGGTTT 3338251 CAATCCCTGA TAGGGATTTT GATGAATTGC AATATATTAT TTCACAACTG 3338301 GTAAAAACGC TAAAGGTTTA GTTTCAATCC CTGATAGGGA TTTTGATGAA 3338351 TTGCAATGTT AAACTGGTCT GCTTTGCCGA TACCCAAATA TTGCTAGGTT 3338401 TCAATCCCTG ATAGGGATTT TGATGAATTG CAATGAAATC AGAAACATCT 3338451 TTGATTTTTT TGACCATGTT TCAATCCCTG ATAGGGATTT TGATGAATTG 3338501 CAATTTTTTG GGGAAGAGGT AATCTGAAAC AGAATTTAGT ATTTGTTTCA 3338551 ATCCCTGATA GGGATTTTGA TGAATTGCAA TGTTGTTACT TAATCCGTCA 3338601 AATAGTCCCA TTAGATGTTT CAATCCCTGA TAGGGATTTT GATGAATTGC 3338651 AATTTTGTGT TACTTGAATT ACTTTGTTGT AATATGCTGG TTTCAATCCC 3338701 TGATAGGGAT TTTGATGAAT TGCAATCAGC AACGTATGCT GTGGGATGCT 3338751 GGATATGCAC GTTTCAATCC CTGATAGGGA TTTTGATGAA TTGCAATTTG 3338801 CATATCTCCA TCCAACTGTA TTCAGCTGAA AAGTTTCAAT CCCTGATAGG 3338851 GATTTTGATG AATTGCAATC TTCGGCATAA CCATTCTTCC ACCTCCAGTA

AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT

Blast Globin Expect = 4e-98 TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA Expect = 4e-98

Working Together Towards Discovery Surprise! Working Together Towards Discovery

Surprise!

AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT Program the computer Surprise!

Biology researchers do not program Program the computer 10 Biology and Microbiology Depts at major universities

Why hasn't it happened? Programming languages An alternative

Lives of the Scientist (Part II)

Repeated sequences bacterial genomes Genome of E. coli K12 str MG1655 genes genes REP sequences

Algorithm to extract REP sequences Pattern

Algorithm to extract REP sequences Pattern " "

Algorithm to extract REP sequences Pattern "repeat_region "

Algorithm to extract REP sequences Pattern "repeat_region "

Algorithm to extract REP sequences Pattern "repeat_region " Special symbols ... As many of previous character as possible

Algorithm to extract REP sequences Pattern "repeat_region ... " Special symbols ... As many of previous character as possible

Algorithm to extract REP sequences Pattern "repeat_region ... " Special symbols ... As many of previous character as possible # A single digit

Algorithm to extract REP sequences Pattern "repeat_region ...# " Special symbols ... As many of previous character as possible # A single digit

Algorithm to extract REP sequences Pattern "repeat_region ...#... " Special symbols ... As many of previous character as possible # A single digit

Algorithm to extract REP sequences Pattern "repeat_region ...#... " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside

Algorithm to extract REP sequences Pattern "repeat_region ...(#...) " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside

Algorithm to extract REP sequences Pattern "repeat_region ...(#...) " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside

Algorithm to extract REP sequences Pattern "repeat_region ...(#...) " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)** " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...) " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)* " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)* " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character .. As few of previous character as necessary

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*.. " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character .. As few of previous character as necessary

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*.. " Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character .. As few of previous character as necessary ' ' or ''

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..' '" Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character .. As few of previous character as necessary ' ' or ''

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'( )'" Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character .. As few of previous character as necessary ' ' or ''

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'(*..)'" Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character .. As few of previous character as necessary ' ' or ''

We start Go to: www.people.vcu.edu/~elhaij Click: MICR 653

www.people.vcu.edu/~elhaij Click MICR 653 Using Firefox www.people.vcu.edu/~elhaij Click MICR 653

biobike.csbc.vcu.edu

Function palette Workspace Results window

General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag The basic unit of BioBIKE is the function box. It consists of the name of a function, perhaps one or more required arguments, and optional keywords and flags. A function may be thought of as a black box: you feed it information, it produces a product.

General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag Function boxes contain the following elements: Function-name (e.g. SEQUENCE-OF or LENGTH-OF) Argument: Required, acted on by function Keyword clause: Optional, more information Flag: Optional, more (yes/no) information

General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag … and icons to help you work with functions: Option icon: Brings up a menu of keywords and flags Action icon: Brings up a menu enabling you to execute a function, copy and paste, information, get help, etc Clear/Delete icon: Removes information you entered or removes box entirely

Functions Sin Sin (angle) Angle

Functions Length Entity

Functions variable vs literal Length Entity "icahLnlna bormA" 14 Abraham Lincoln 192 "Abraham Lincoln" 14 variable vs literal

Functions list vs single value Length Entity "icahLnlna bormA" 14 Abraham Lincoln 192 "Abraham Lincoln" 14 US-presidents 44 list vs single value

single application of a function vs iteration of a function Functions Length Entity "icahLnlna bormA" 14 Abraham Lincoln 192 "Abraham Lincoln" 14 US-presidents (188 170 189 163 …) 44 single application of a function vs iteration of a function

Functions Arcsin Angle Sin Angle

Functions Arcsin Angle Sin (angle) Nested functions Evaluated from the inside out A box is replaced by its value

Functions "transposase" Gene (npf0076)

Functions Gene (npf0076) Nested functions Evaluated from the inside out A box is replaced by its value

CLOSE BOXES BEFORE EXECUTING White is incompatible with execution Pitfalls (the most common error in the language) Gene (npf0076) CLOSE BOXES BEFORE EXECUTING White is incompatible with execution

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'(*..)'" Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character .. As few of previous character as necessary ' ' or ''

Algorithm to extract REP sequences Pattern "repeat_region ...(#...)**(#...)*..'(*..)'" Special symbols ... As many of previous character as possible # A single digit () Capture what's inside * Any character .. As few of previous character as necessary ' ' or ''

Mining files for data BUT... Pattern matching Quick and easy Highly flexible Works great BUT... Unforgiving (1 mismatch  death)

Conserved motifs of methyltransferases Pattern "[DS]PP[YF]" Special symbols [ ] Character set

Searching for conserved motifs Pattern matching Quick and easy Unforgiving (1 mismatch  death) Ignores lots of information Position-specific scoring matrices (PSSMs)

Searching for conserved motifs What if you don’t have one? Pattern matching Quick and easy Unforgiving (1 mismatch  death) Ignores lots of information Position-specific scoring matrices (PSSMs) Needs training set What if you don’t have one?

Lives of the Scientist (Part III)

What to do with no training set? New pattern discovery (Meme, Gibbs sampler, BioProspector) snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Human sequences 5’ to transcriptional start “TATA box”?

How does Meme work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence

How does Meme work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences

How does Meme work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches GACAGGGCAGAA GCCCGGGTGTTT GCCGGGGACGCG GCCCCCGGGCCT GCCGCAGAGCTG 1 2 3 4 5 6 7 8 9 10 11 12 A 0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.4 0.2 0.0 0.2 0.2 C 0.0 0.8 1.0 0.4 0.4 0.2 0.0 0.2 0.2 0.4 0.4 0.0 G 1.0 0.0 0.0 0.4 0.6 0.6 1.0 0.2 0.6 0.4 0.0 0.4 T 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.2 0.4 0.4

How does Meme work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table GACAGGGCAGAA GCCCGGGTGTTT GCCGGGGACGCG GCCCCCGGGCCT GCCGCAGAGCTG 1 2 3 4 5 6 7 8 9 10 11 12 A 0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.4 0.2 0.0 0.2 0.2 C 0.0 0.8 1.0 0.4 0.4 0.2 0.0 0.2 0.2 0.4 0.4 0.0 G 1.0 0.0 0.0 0.4 0.6 0.6 1.0 0.2 0.6 0.4 0.0 0.4 T 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.2 0.4 0.4

How does Meme work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. If probability score high, remember pattern and score

How does Meme work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. If probability score high, remember pattern and score Step 6. Repeat Steps 1 - 5

What to do with no training set? New pattern discovery (Meme, Gibbs sampler, BioProspector) snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Human sequences 5’ to transcriptional start

Searching for conserved motifs Pattern matching Quick and easy Unforgiving (1 mismatch  death) Ignores lots of information Position-specific scoring matrices (PSSMs) Needs training set Meme, Gibbs sampler, et al (PSSM in reverse) Relatively unbiased Can't easily handle variable-length gaps

Moral of the Stories

Biology researchers do not program Program the computer 10 Biology and Microbiology Depts at major universities

Are you comfortable using programming in the service of your research? I have no experience in computer programming I am marginally experienced with programming I have extremely limited experience in computer programming I have very little experience I used to work a lot with programs such as Matlab and R I have never learned it before I have very little experience in computer programming I’m using now iTol service, uniprot, and DEG Minimal programming in actual languages I have no experience in computer programming

www.people.vcu.edu/~elhaij Click MICR 653 Using Firefox www.people.vcu.edu/~elhaij Click MICR 653

Scientific Questions I. What determines the beginning of a gene?

Scientific Questions I. What determines the beginning of a gene?

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? HIV

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated?

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated?

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs)

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs)

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data

Metabolic correlates to N-deprivation What enzymes of carbon metabolism are affected by N-starvation? Pentose Phosphate Pathway Glycogen metabolism Carbon fixation Cyanobacteria use primarily the reactions of the Pentose Phosphate Pathway to break down glucose derivatives. They use carbon fixation reactions to build glucose. These sets overlap a great deal.

Scientific Questions RNAseq I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data RNAseq

Measuring RNA through Microarrays RNA from cell type #1 + RNA from cell type #2 Spot Scan for red fluorescence Scan for green fluorescence Combine images Type #1 RNA > Type #2 RNA Type #2 RNA > Type #1 RNA Type #1 RNA  Type #2 RNA Courtesy of Inst. für Hormon-und Fortpflanzungsforschung, Universität Hamburg

Scientific Questions Difference in intensity chip to chip I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data different conditions or different replicates Difference in intensity chip to chip

Scientific Questions Difference in intensity chip to chip I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data different conditions or different replicates Difference in intensity chip to chip

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria GTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACAATAACTGGATAGCACTAGCAGAAGGGCTAGAAGGTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACGTAT

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria GTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACAATAACTGGATAGCACTAGCAGAAGGGCTAGAAGGTTTCAATCCCTGATAGGGATTTTAGAGGGTTTTAACGTAT

Scientific Questions VI. Finding targets for DNA-binding proteins

Scientific Questions I. What determines the beginning of a gene? II. Where in a bacterial genome are viruses integrated? III. Determination of short tandem repeats (STRs) IV. Analysis of gene expression data V. CRISPRs in enteric bacteria VI. Finding targets for DNA-binding proteins (targets known) VII. Finding targets for DNA-binding proteins (genes known)