Presentation is loading. Please wait.

Presentation is loading. Please wait.

[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 3:

Similar presentations


Presentation on theme: "[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 3:"— Presentation transcript:

1 http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 3: Protein coding genes

2 http://cs173.stanford.edu [BejeranoWinter12/13] 2 Annonuncements http://cs173.stanford.edu/ is uphttp://cs173.stanford.edu/ – Course guidelines, lecture slides, etc. Communications via Pizza – Private Q: post to “instructors” not “class” – Auditors sign up too – Office hours TBA before HW1 Project groups: TBD after “shopping season” Tutorials: first three Wednesdays – Recommended to bring your laptop to UCSC tutorial 1/16 We will be recruiting for our lab from class – Many other labs on campus would love to have you too!

3 http://cs173.stanford.edu [BejeranoWinter12/13] 3 TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA CATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC CGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT AGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA AAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG ATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT CTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG AACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA GCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAA CTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGA TAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTT GGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTT CTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGT TTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATAC CTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCT TGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTA AGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGA GTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACA GCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAAC CAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAA CACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTG GTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTC TCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAAT GCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCT TGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT TCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCT ATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT TCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGA GATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTA TCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTT CATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTT CAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAA TAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGT ATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG

4 Central Dogma of Biology

5 http://cs173.stanford.edu [BejeranoWinter12/13] 5 Genomes, Genes & Proteins The most visible instructions in our genome are Genes. Genes explain exactly HOW to synthesize any protein. Proteins are the work horses of every living cell....ACGTACGACTGACTAGCATCGACTACGACTAGCAC... gene Genome: cell protein

6 Gene Structure http://cs173.stanford.edu [BejeranoWinter12/13] 6

7 Gene Processing 7 http://cs173.stanford.edu [BejeranoWinter12/13]

8 Translation: The Genetic Code 8 http://cs173.stanford.edu [BejeranoWinter12/13]

9 The gene centric genome 9 http://cs173.stanford.edu [BejeranoWinter12/13] “The Genetic code” A gene centric term. For a gene centric world. But fashions change. Controlled by mass media, technology, money, and a bit of scientific truth.

10 Visualizing Gene Structure http://cs173.stanford.edu [BejeranoWinter12/13] 10

11 Genes in the Human Genome 11 http://cs173.stanford.edu [BejeranoWinter12/13] There are ~25,000 protein coding genes in the human genome. (Even half way through sequencing the human genome, Researchers thought there will be well over 100,000 genes).

12 12 Everything in Genomics is a Moving Target The genomes (ie, assemblies) Their annotations Our understanding of Biology The portals Conclusion: write code that can be run... and rerun Why ~25,000? http://cs173.stanford.edu [BejeranoWinter12/13]

13 13 Gene Finding I: ab initio Challenge: “Find the genes, the whole genes, and nothing but the genes” Understand Biology  Write discovery tools (Our) answer depends on our understanding, data & tools

14 http://cs173.stanford.edu [BejeranoWinter12/13] 14 Gene (Protein really) Functions The most visible instructions in our genome are Genes. Genes explain exactly HOW to synthesize any protein. Proteins are the work horses of every living cell....ACGTACGACTGACTAGCATCGACTACGACTAGCAC... gene Genome: cell protein Just look at the cell. Lots and lots of different functions to perform. (“Only 20,000 genes”..)

15 http://cs173.stanford.edu [BejeranoWinter12/13] 15 First full draft of the Human Genome 2001 Human Genome Consortium (HGC) Celera

16 http://cs173.stanford.edu [BejeranoWinter12/13] 16 Biological Functions of the Human Gene Set [HGC, 2001] Focus on the X axis:

17 http://cs173.stanford.edu [BejeranoWinter12/13] 17 Molecular Functions of the Human Gene Set [Celera, 2001]

18 http://cs173.stanford.edu [BejeranoWinter12/13] 18 Biological vs. Molecular Function: Pathways Proteins with very different molecular functions participate to manifest a single biological function, for example: a pathway.

19 http://cs173.stanford.edu [BejeranoWinter12/13] 19 “Special” Function: Gene Regulation Gene 2,000 different proteins can bind specific DNA sequences. Proteins that regulate the transcription of other proteins are called transcription factors. Proteins DNA Protein binding site

20 http://cs173.stanford.edu [BejeranoWinter12/13] 20 The Importance of Gene Regulation The looks & capabilities of different cells are determined by the subset of genes they express. Different cell types express very different gene repertoires (from the same genome). To change its behavior a cell can change its transcriptional program. Think of it as a giant state machine…

21 http://cs173.stanford.edu [BejeranoWinter12/13] 21 “Special” Function: Cell Signaling Cells also talk with each other. They send and receive messages, and change their behavior according to messages they receive.

22 http://cs173.stanford.edu [BejeranoWinter12/13] 22 Signal Transduction Now its an even bigger state machine of individual state machines (=cells) talking with each other, orchestrating their individual activities.

23 http://cs173.stanford.edu [BejeranoWinter12/13] 23 Back to Genes & Their Functions Gene (DNA) sequence determines protein (AA) sequence, which determines protein (3D) structure, which determines protein’s function.

24 http://cs173.stanford.edu [BejeranoWinter12/13] 24 Protein Folding Protein folding is the challenge of deducing protein structure from protein sequence. It’s a tough one…

25 Gene Families, Gene Names 25 http://cs173.stanford.edu [BejeranoWinter12/13] Genes (proteins) come in families. Genes of the same family have similar sequences. Which is why the fold into similar structure and perform similar functions. Genes of the same family will typically have a “family name” followed by a (sequential) number or “first name”.

26 Alternative Splicing 26 http://cs173.stanford.edu [BejeranoWinter12/13]

27 Genes in the Human Genome 27 http://cs173.stanford.edu [BejeranoWinter12/13] When you only show one transcript per gene locus: If you ask the GUI to show you all well established gene variants:

28 http://cs173.stanford.edu [BejeranoWinter12/13] 28 Protein Domains A protein domain is a subsequence of the protein that folds independently of the other portions of the sequence, and often confers to the protein one or more specific functions. SKSHSEAGSAFIQTQQLHAAMADTFLEHMCRLDIDSAPITARNTG IICTIGPASRSVETLKEMIKSGMNVARMNFSHGTHEYHAETIKNV RTATESFASDPILYRPVAVALDTKGPEIRTGLIKGSGTAEVELKK GATLKITLDNAYMAACDENILWLDYKNICKVVEVGSKVYVDDGLI SLQVKQKGPDFLVTEVENGGFLGSKKGVNLPGAAVDLPAVSEKDI QDLKFGVDEDVDMVFASFIRKAADVHEVRKILGEKGKNIKIISKI ENHEGVRRFDEILEASDGIMVARGDLGIEIPAEKVFLAQKMIIGR CNRAGKPVICATQMLESMIKKPRPTRAEGSDVANAVLDGADCIML SGETAKGDYPLEAVRMQHLIAREAEAAMFHRKLFEELARSSSHST DLMEAMAMGSVEASYKCLAAALIVLTESGRSAHQVARYRPRAPII AVTRNHQTARQAHLYRGIFPVVCKDPVQEAWAEDVDLRVNLAMNV GKAAGFFKKGDVVIVLTGWRPGSGFTNTMRVVPVP

29 Alt. Splicing and Protein Repertoire 29 http://cs173.stanford.edu [BejeranoWinter12/13] Alternative splicing often produces protein variants that have a different domain composition, and thus perform different functions.

30 http://cs173.stanford.edu [BejeranoWinter12/13] 30 Retroposed Genes and Pseudogenes Pseudogenes (“dead genes”): Genomic sequences that resemble (originated from) genes that no longer make proteins. Retrogenes (“retrotranscribed”): Protein coding RNA that was reverse transcribed and inserted back into the genome. The RNA can be grabbed at any stage (partial/full transcript, before/during/after all introns are spliced).

31 Gene Ontologies 31 http://cs173.stanford.edu [BejeranoWinter12/13] 1.Make a controlled vocabulary of gene functions. 2.Annotate all genes using this vocabulary. Map: genes  papers  biological functions. (plenty room for Natural Language Processing) Used to catalog human gene functions, and also which genes are expressed where, what defects have been found when certain genes are mutated, etc.

32 Review Lecture 3 Central dogma recap –Focus on protein coding genes Gene structure –exon, intron, 3’/5’ utr, CDS recap –The genetic code –UCSC genome browser sneak peak –human genome stats –Gene finding I: ab initio Gene (protein) function –Cell structure, chemical reactions etc –Pathways (vs. function) –information processing roles TFs signaling: ligands, receptors, kinases Gene families –similar sequence -> structure -> function –protein domains –splice variants, alt promoters Special cases –Pseudogenes –Retroposed genes (and the distinction between the two) Gene ontologies http://cs173.stanford.edu [BejeranoWinter12/13] 32


Download ppt "[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 3:"

Similar presentations


Ads by Google