[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

Unit #3 Schedule: Last Class: – Sanger Sequencing – Central Dogma Overview – Mutation Today: – Homework 5 – StudyNotes 8a Due – Transcription, RNA Processing,
Tutorial 1 Biology background for the course. Genome sizes and number of genes OrganismGenome SizeNo. of genes E. coli4.6 Mb~4,300 genes Baker’s Yeast12.
Prof. Drs. Sutarno, MSc., PhD.. Biology is Study of Life Molecular Biology  Studying life at a molecular level Molecular Biology  modern Biology The.
Central dogma DNA is made (transcribed) into RNA RNA is made (translated) into protein.
Mutations Georgia Standard:
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Recap Sometimes it is necessary to conduct Bad Science – often the product of having too much information Human Genome Project changed natural scientists.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Chromosomes carry genetic information
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 3:
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
CS273A Lecture 5: Genes Enrichment, Gene Regulation I
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
[BejeranoFall14/15] 1 MW 12:50-2:05pm in Beckman B100 Profs: Serafim Batzoglou & Gill Bejerano CAs: Jim Notwell & Sandeep Chinchali.
Transcription Nicky Mulder Acknowledgements: Anna Kramvis for lecture material (adapted here)
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
From DNA to Proteins Lesson 1. Lesson Objectives State the central dogma of molecular biology. Describe the structure of RNA, and identify the three main.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
RNA and Protein Synthesis
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Protein Synthesis 12-3.
Chapter 13: RNA and Protein Synthesis
RNA and Protein Synthesis
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
From Gene To Protein Chapter 17. From Gene to Protein The “Central Dogma of Molecular Biology” is DNA  RNA  protein Meaning that our DNA codes our RNA.
8.6 Gene Expression and Regulation TEKS 5C, 6C, 6D, 6E KEY CONCEPT Gene expression is carefully regulated in both prokaryotic and eukaryotic cells.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
12.3 DNA, RNA, and Protein Objective: 6(C) Explain the purpose and process of transcription and translation using models of DNA and RNA.
 The central concept in biology is:  DNA determines what protein is made  RNA takes instructions from DNA  RNA programs the production of protein.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Bioinformatics and Computational Biology
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Genetics Review Honors Human Anatomy & Physiology Mr. Mazza
[BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh.
DNA Replication Review Three main steps: Helicase unzips/unwinds the DNA molecule DNA Polymerase brings in new nucleotides Ligase zips the new DNA back.
Gene Regulation In 1961, Francois Jacob and Jacques Monod proposed the operon model for the control of gene expression in bacteria. An operon consists.
CS173 Lecture 9: Transcriptional regulation III
Lesson 3 – Gene Expression
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
Molecular Genetics - From DNA to Trait Traits DNA To.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Section 3: DNA, RNA, and Protein
CS273A Lecture 2: Protein Coding Genes
The Ribosome Is part of the cellular machinery for translation, polypeptide synthesis Figure 17.1.
The Basics of Molecular Biology
CS273A Lecture 3: Non Coding Genes MW 12:50-2:05pm in Beckman B100
Enzymes and their functions involved in DNA replication
RNA and Protein Synthesis
RNA and Protein Synthesis
CS273A Lecture 7: Genes Enrichment, Gene Regulation I
Transcription.
Transcription.
Introduction to Bioinformatics II
What is RNA? Do Now: What is RNA made of?
From Prescription to Transcription: Genome Sequence as Drug Target
Control of Gene Expression in Eukaryotic cells
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
AH Biology: Unit 1 Proteomics and Protein Structure 1
Pharmacogenomic variability and anaesthesia
RNA and Protein Synthesis
The Structure of the Genome
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
credit: modification of work by NIH
Gene Structure.
Gene Structure.
Presentation transcript:

[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos Achlioptas CS273A Lecture 3: Protein coding genes

[BejeranoFall13/14] 2 Announcements is uphttp://cs273a.stanford.edu/ – Course guidelines, lecture slides, etc. Communications via Piazza –Auditors please sign up too – TA Office hours TBA before HW1 Project groups: TBD after “shopping season” Tutorials: First three Fridays – Recommended to bring your laptop to UCSC tutorial 10/4 Lots of genomics research happening on campus – If you enjoy this class many labs would love to have you!

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA CATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC CGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT AGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA AAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG ATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT CTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG AACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA GCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAA CTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGA TAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTT GGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAG...TTGCGAA GTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAA TGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGA TACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGT TCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACAT TTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAA AGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAAT ACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTAC AACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATAT CAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCG TTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTC TTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATT AATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGT TCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATG TTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATA CCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATG TTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTA AGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGA TTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATA GTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATG CTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACT TAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGAT TGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAAT 3

[BejeranoFall13/14] 4 The Biggest Challenge in Genomics… … is computational: How does this encode this ProgramOutput This “coding” question has profound implications for our lives

[BejeranoFall13/14] 5 The Biggest Challenge in Genomics… … is computational: How does this encode this ProgramOutput What genomic mutations predispose us to disease? Bugs

[BejeranoFall13/14] 6 The Biggest Challenge in Genomics… … is computational: How does this encode this Program What genomic mutations determine our drug response? DebuggingBugs

[BejeranoFall13/14] 7 The Biggest Challenge in Genomics… … is computational: How does this encode this ProgramOutput What in our genomes make us different from each other?

[BejeranoFall13/14] 8 The Biggest Challenge in Genomics… … is computational: How does this encode this ProgramOutput What in our genomes make us different from related species?

[BejeranoFall13/14] 9 The Biggest Challenge in Genomics… … is computational: How does this encode this ProgramOutput Why is our genome full of “memory leaks”?

[BejeranoFall13/14] 10 Genomics will affect multiple fields of CS Storage Compression Architecture Databases HCI etc.

[BejeranoFall13/14] 11 We need to understand the genome

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA CATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC CGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT AGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA AAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG ATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT CTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG AACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA GCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAA CTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGA TAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTT GGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTT CTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGT TTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATAC CTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCT TGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTA AGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGA GTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACA GCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAAC CAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAA CACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTG GTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTC TCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAAT GCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCT TGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT TCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCT ATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT TCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGA GATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTA TCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTT CATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTT CAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAA TAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGT ATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG 12

Central Dogma of Biology

[BejeranoFall13/14] 14 Genomes, Genes & Proteins The most visible instructions in our genome are Genes. Genes explain exactly HOW to synthesize any protein. Proteins are the work horses of every living cell....ACGTACGACTGACTAGCATCGACTACGACTAGCAC... gene Genome: cell protein

Gene Structure [BejeranoFall13/14] 15

Gene Processing 16 [BejeranoFall13/14]

Translation: The Genetic Code 17 [BejeranoFall13/14]

The gene centric genome 18 [BejeranoFall13/14] “The Genetic code” A gene centric term. For a gene centric world. There are in fact a number of additional genetic codes encoded in our genome..

Visualizing Gene Structure [BejeranoFall13/14] 19

Genes in the Human Genome 20 [BejeranoFall13/14] There are ~25,000 protein coding genes in the human genome. (Even half way through sequencing the human genome, Researchers thought there will be well over 100,000 genes). UCSC primer

[BejeranoFall13/14] 21 Gene Finding I: ab initio Computational Challenge: “Find the genes, the whole genes, and nothing but the genes” Understand Biology  Write discovery tools (Our) answer depends on our understanding, data & tools CS262 Winter

22 Everything in Genomics is a Moving Target The genomes (ie, assemblies) Their annotations Our understanding of Biology The portals Conclusion: write code that can be run... and rerun

[BejeranoFall13/14] 23 Gene (Protein really) Functions The most visible instructions in our genome are Genes. Genes explain exactly HOW to synthesize any protein. Proteins are the work horses of every living cell....ACGTACGACTGACTAGCATCGACTACGACTAGCAC... gene Genome: cell protein Just look at the cell. Lots and lots of different functions to perform. (“Only 20,000 genes”..)

[BejeranoFall13/14] 24 First full draft of the Human Genome 2001 Human Genome Consortium (HGC) Celera Serafim discussed the current state of sequencing

[BejeranoFall13/14] 25 Biological Functions of the Human Gene Set [HGC, 2001] Focus on the X axis:

[BejeranoFall13/14] 26 Molecular Functions of the Human Gene Set [Celera, 2001]

Gene Ontologies 27 [BejeranoFall13/14] 1.Make a controlled vocabulary of gene functions. 2.Annotate all genes using this vocabulary. Map: genes  papers  biological functions. (plenty room for Natural Language Processing) Used to catalog human gene functions, and also which genes are expressed where, what defects have been found when certain genes are mutated, etc.

[BejeranoFall13/14] 28 Genes & Their Functions Gene (DNA) sequence determines protein (AA) sequence, which determines protein (3D) structure, which determines protein’s function.

[BejeranoFall13/14] 29 Protein Folding Protein folding is the challenge of deducing protein structure from protein sequence. New CS faculty joining in February ’14: Ron Dror

Gene Families, Gene Names 30 [BejeranoFall13/14] Genes (proteins) come in families. Genes of the same family have similar sequences. Which is why the fold into similar structure and perform similar functions. Genes of the same family will typically have a “family name” followed by a (sequential) number or “first name”.

[BejeranoFall13/14] 31 Biological vs. Molecular Function: Pathways Proteins with very different molecular functions participate to manifest a single biological function, for example: a pathway.

[BejeranoFall13/14] 32 Some “Special” Functions: Gene Regulation Gene 2,000 different proteins can bind specific DNA sequences. Proteins that regulate the transcription of other proteins are called transcription factors. Proteins DNA Protein binding site

[BejeranoFall13/14] 33 The Importance of Gene Regulation The looks & capabilities of different cells are determined by the subset of genes they express. Different cell types express very different gene repertoires (from the same genome). To change its behavior a cell can change its transcriptional program. Think of it as a giant state machine…

[BejeranoFall13/14] 34 “Special” Function: Cell Signaling Cells also talk with each other. They send and receive messages, and change their behavior according to messages they receive.

[BejeranoFall13/14] 35 Signal Transduction Now its an even bigger state machine of individual state machines (=cells) talking with each other, orchestrating their individual activities.

Alternative Splicing 36 [BejeranoFall13/14]

Genes in the Human Genome 37 [BejeranoFall13/14] When you only show one transcript per gene locus: If you ask the GUI to show you all well established gene variants:

[BejeranoFall13/14] 38 Protein Domains A protein domain is a subsequence of the protein that folds independently of the other portions of the sequence, and often confers to the protein one or more specific functions. SKSHSEAGSAFIQTQQLHAAMADTFLEHMCRLDIDSAPITARNTG IICTIGPASRSVETLKEMIKSGMNVARMNFSHGTHEYHAETIKNV RTATESFASDPILYRPVAVALDTKGPEIRTGLIKGSGTAEVELKK GATLKITLDNAYMAACDENILWLDYKNICKVVEVGSKVYVDDGLI SLQVKQKGPDFLVTEVENGGFLGSKKGVNLPGAAVDLPAVSEKDI QDLKFGVDEDVDMVFASFIRKAADVHEVRKILGEKGKNIKIISKI ENHEGVRRFDEILEASDGIMVARGDLGIEIPAEKVFLAQKMIIGR CNRAGKPVICATQMLESMIKKPRPTRAEGSDVANAVLDGADCIML SGETAKGDYPLEAVRMQHLIAREAEAAMFHRKLFEELARSSSHST DLMEAMAMGSVEASYKCLAAALIVLTESGRSAHQVARYRPRAPII AVTRNHQTARQAHLYRGIFPVVCKDPVQEAWAEDVDLRVNLAMNV GKAAGFFKKGDVVIVLTGWRPGSGFTNTMRVVPVP

Alt. Splicing and Protein Repertoire 39 [BejeranoFall13/14] Alternative splicing often produces protein variants that have a different domain composition, and thus perform different functions.

[BejeranoFall13/14] 40 Retroposed Genes and Pseudogenes Pseudogenes (“dead genes”): Genomic sequences that resemble (originated from) genes that no longer make proteins. Retrogenes (“retrotranscribed”): Protein coding RNA that was reverse transcribed and inserted back into the genome. The RNA can be grabbed at any stage (partial/full transcript, before/during/after all introns are spliced).

Review Lecture 3 Central dogma recap –Focus on protein coding genes Gene structure –exon, intron, 3’/5’ utr, CDS recap –The genetic code –UCSC genome browser sneak peak –human genome stats –Gene finding I: ab initio Gene (protein) function –Cell structure, chemical reactions etc –Pathways (vs. function) –information processing roles TFs signaling: ligands, receptors, kinases Gene families –similar sequence -> structure -> function –protein domains –splice variants, alt promoters Special cases –Pseudogenes –Retroposed genes (and the distinction between the two) Gene ontologies [BejeranoFall13/14] 41