Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.

Similar presentations


Presentation on theme: "Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic."— Presentation transcript:

1 Topics in Bioinformatics CS832b Bin Ma

2 Lecture 1: Basic

3 Three molecules we will study DNA A string over alphabet {A,C,G,T} RNA Primary structure – a string over alphabet {A,C,G,U} Secondary and tertiary structures Protein Primary structure – a string over alphabet {A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V} Secondary and tertiary structures

4

5

6 5’ 3’

7

8 DNA 5’…AGTAGCCTATGCGA…3’ …::::::::::::::… 3’…TCATCGGATACGCT…5’ 5’…AGTAGCCTATGCGA…3’

9 >CHRX GATCACCTGACATCAGGAGTTCAAGACCAGCCTGCCAACGTGGTGAAACC CCATCTCTACTAAAAATAGGAAATTCACCTGGTGGCAGGTGCCTGTAATC CCAGCTACTCGGGAGGCTGAGGCAGAAGAATCGCTTGAACCCAGGAGGTG GAGATTGCACTGAGCTGAGATCACGCCACTGCGCTCCAGCCTGGGTGACA GAGCAAGACTCCATAAAAAAAAAAATTATAACCTAATGATTAAATACTGT AGGGAAGAGCTTACCACAATTGCTGGCCCATGGCCAATGCTGGGTATAAG ACAGCTACTGCAAACAACCATGATGATGATACATCTCTTGTGTAGGGTTA GGTTGTTTGAGACACATTCTATGCTCCTTGATTTGATTGGAAGGTACCTT GGTTCCTTGGGGACTTGGAGGTGACGAAAGCCTCCCTGGGGACAAAACTC ACCTTCACTTCTCTAATATCAAGCTTCAGCAACCTGCTCCAGCTACAGCA CAGGGTTGGACAGGCCCAACAACAGAGGAAATCCACAAAGTGTGTCTTGA CACATACATCCACGGGGTCTAACGAGGTGAGGCCAATGACTGCTTCCACA CACCCCAGCCAGACTCTGACTTCACTCCCGGCAGGTTTCAGTAGACTTGG CAGCAGTTGGAGCGAGCTGGCTTCTTGCGGTAGGCAGCCATGTTGGAAGA GCTCCCAATAGTCCTCGTTTCCTGGTAATCTCATGCTTGGATCATCTTCT TCTCTTGAGTGAAGAGAAGAACTGCAGAGAGAGACAGAGACAGAGAGACA GATCACAGGGGCAGTTTCCCCCATACTGTTCTCAAGATAAATGAGTCAAC TCTTACACCTCTTTTCTCTGGTGTAAAACAAGGCTGGTGAACAGGCAGAG AGAACTGGGGTGTTGGAGTAGCATTGACCTTCCTTCTTCATCCCTCTATA ATCTCTCCTAGTGCAGGAGTAGGAAAACTAAAAATCACACGTCTGATCAT CTGTGATCTCAGAGTCTTGGACAAGCCTTGCTTGCCAATCAGCAGGGATG GGAGTTGGAGCCATCTCCAAGTGTCCCCCCACAAATCTATGTCCACCTGG AAGTTTCAAATGCAACTTTATTTGGGAAAGGCAATTTTGCAAATGTTATT AAGTGAAGGATCTAGGGATGAGATCATCCTGGAGTAGGGTGGGTCCTAGG TCAAATGACAGGAAATCTGCCCACCTCGGCCTCCCAAAGTGCTGGGATTA CAGGCATGAGCCACCAAACCTGGCCTATCATTGATTTAATGATTAATACG GTTAGGCTCTGTGTCCCCACCCAAATCTCATCTCAAATTGTAATTCCCAT GTGTCCAGGGAGGGAGCTTGTGGAAGGTGATTGGATCACAGGGGCAGTTT TTGTCATGCTGTTCTCATGATAAATGAGTCAATTCTCAGAAGAGATGATG GTTTTAAAGTGTGGCACTTCTTTGCTCTCTTGCTCTCTCTCTCTCCTGAG TAGACTGGCTCATTCTTTCTACTGGTTACAAGCAATAGAAGTGATAACAA AATTGATGGTTTCTCATTTCCTAAATGGTACCAGTGGATTCCTGGTTTCC TCTCTCTCTCTTCTCTCTCTCTATCAACTTTTCCCTCAATCTCTCTATCA ACCTCCCTCTCTCTCAATCTCAATCTCTCTCAGTCTCATTCTCAATCTCT TTTGCTCAATCTCTTTCTCAGCTTCTCTCCCTCAATTTCTCTTTTGCAAC TTCTCTCTCTCAGTCTGTGTCTCTCAATCTCCCTCTCTCAATCTCTCTTG TAGTCTCCCTGTCTCTCATACTCTCTCTGTTTCTGTCTGTCTCTGCCCTT GCTCTAGGGAAAGCAAGTTCTTATGCTGTAAGTTCTCCTGTAAAAAGGTC CACATGATACGGAACTGGCCATCTTTGGCCAACATGAGTGAGTTTAGAAG TGTGCCTTTCACCAGTTGAGCCTTCAAATGAGATCCCAGCCCTGGATGAC ACAGTGACAGTAACCTGCTAGGAACTGTGAACCAGAGGCACCCAGCCAAG CTGCTCCCAGACTCCCAACCCAGTGAAACCATAAGATAATAAATGCATGT TGTTTTAAGCTGCTAAGTTTGGGGGTCACTTGTTACACAGCAACAGCTGA CTCATACATTTTCTTTGAAATTGATTTCCACTTCTGTCACCAGCATCATT CCATAAATTTGCTCTATGTGCATTGCTGACCTGCAGTAGAAGTTTTGGAG AAGTGAACCACATCCCCTTATCTGCCATTTGACAGCAAGCAGCCTCAAAC ATTCATAATTTCTTTCCTGACTCTCCACTCCACACTGTTGCCTGCCTTCC TGGTTCCAGATCTTTGGATCTGGACTGACACCTGGGCACTGTCATAGGCA TCCGTGTGAAGAGACCACCAACAGGCTCTGTGTGAGCAATAAAGCTTTTT AATCACCTGGGTGCAGGTGGGCTGATTCTGAAAAGAGAGTCAGCAAAGAG TGGTGGGATTATCATTAGTTCTTATAGGTTCGGGATAGGTGGTGGAGTTA GGAGCAATTTTTTGTGGGCAGGGAGTGGATCTTACAAAGGACATTCTCAA GGGTGGGGATGATTTTACAAAGTACCTTCTTAAGGGCGGGGGAGGATATT ACAAAGTACCTTCTCAAGGGTGGGGATGATTTTACAAAGTACCTTCTTAA GGGCGGGGGAGGATATTACAAAGTACCTTCTCAAGGGTGGGGGTGGATAT TACAAAGTACCTTCTTAAGGGCAGGGGAGGATATTACAAAGTACCTTCTC AAGGGGGGGGATGATTTTACAAAGTACCTTCTTAAGGGCGGGGGAGGATA TTACAAAGTACCTTCTCAAGGGTGGGGGTGGATATTAGAAAGTACCTTCT Chromosome X is one of the 23 chromosomes in human genome. Chromosome X has 162 million base pairs.

10 Genome Sizes SpeciesSize in bps Amoeba dubia670,000,000,000 Homo sapiens3,400,000,000 Drosophila melanogaster180,000,000 Mycoplasma genitalium580,000 Human immunodeficiency virus type 1 9,750

11 Protein and Amino Acids

12 Protein

13 GOT Ecoli

14 A protein sequence >gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region … MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST A protein sequence may have a few hundreds to several thousands amino acids.

15 RNA

16 Animal cell Nucleus Chromatin Mitochondrion Nucleolus (rRNA synthesized) Plasma membrane Cell coat Cytoplasm

17 Protein synthesis

18

19 Genetic code..ATTCACAGTGGA....ATTCACAGTGGA.. I H S G

20 Notes on translation Reading frame Start and end codon Third base not important 5’ -> 3’

21 DNA replication

22 The Central Dogma of Molecular Biology DNARNAProtein transcripttranslation replication genotype phenotype

23 Exception – retroviruses DNARNAProtein transcripttranslation replication genotype phenotype

24 Protein Phenotype DNA (Genotype) Biology

25 Genes One gene encodes one protein (or sometimes RNA). Like a program, it starts with start codon (e.g. ATG), then each three code one amino acid. Then a stop codon (e.g. TGA) signifies end of the gene. Genes are dense in prokaryotes and sparse in eukaryotes. In the middle of a eukaryotic gene, there are introns that are spliced out (as junk) after transcription. Good parts are called exons. This is the task of gene finding.

26 Introns and Exons

27 Jumping genes Genes can jump over other genes.

28 Gene related diseases Hemophilia: on X chromosome. Sickle-Cell Anemia: single nucleotide mutation in the first exon of beta-globin gene (removes a cutting site). 1 in 12 African Americans are carriers. (sick for homozygotes) BRCA1 gene (chr. 17q) – responsible for ½ inherited breast cancer (10% of breast cancer) Fragile X syndrome (mentally retard) – 1 in 1250 males, 2500 females (dominate, but females have partially expressed good gene). FMR-1 gene: tri-nucleotide repeats >200 causes disease. P53 gene: chr. 17p, responsible for ½ of all cancers

29


Download ppt "Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic."

Similar presentations


Ads by Google