Bioinformatic PhD. course Bioinformatics Xavier Messeguer Peypoch (http://www.lsi.upc.es/~alggen) LSI Dep. de Llenguatges i Sistemes Informàtics BSC Barcelona.

Slides:



Advertisements
Similar presentations
Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
Advertisements

Introduction to genomes & genome browsers
The DNA Story Germs, Genes, and Genomics 4. Heredity Genes DNA Manipulating DNA.
DNAStructureandReplication. Transformation: Robert Griffith (1928)
Chapter 8 Microbial Genetics Biology 1009 Microbiology Johnson-Summer 2003.
MSc Bioinformatics for H15: Algorithms on strings and sequences
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
The chemical Basis of Inheritance. Chromatin / Chromosomes.
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
Basic Molecular Biology Many slides by Omkar Deshpande.
Lecture 2 Molecular Biology Primer Saurabh Sinha.
Workshop: computational gene prediction in DNA sequences (intro)
ECE 501 Introduction to BME
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
Genomes and Genetic Architecture. Life on Earth.
Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester
Topics The topics: basic concepts of molecular biology more on Perl
Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center Introduction to Bioinformatics.
Chapter 1 The Science of Genetics
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
--- History of Molecular Biology
Novel computational methods for large scale genome comparison PhD Director: Dr. Xavier Messeguer Departament de Llenguatges i Sistemes Informàtics Universitat.
What is genomics? Study of genomes. What is the genome? Entire genetic compliment of an organism.
Elements of Molecular Biology All living things are made of cells All living things are made of cells Prokaryote, Eukaryote Prokaryote, Eukaryote.
Chapter 13.2 (Pgs ): Ribosomes and Protein Synthesis
Human Genome Project. In 2003 scientists in the Human Genome Project obtained the DNA sequence of the 3 billion base pairs making up the human genome.
AP Biology A Lot More Advanced Biotechnology Tools Sequencing.
Write down what you know about the human genome project.
Genomics and Its Impact on Science and Society: The Human Genome Project and Beyond U.S. Department of Energy Genome Programs
DNA Structure & Function. Perspective They knew where genes were (Morgan) They knew what chromosomes were made of Proteins & nucleic acids They didn’t.
Sevas Educational Society All Rights Reserved, 2008 Module 1 Introduction to Bioinformatics.
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
© 2015 W. H. Freeman and Company CHAPTER 1 The Genetics Revolution Introduction to Genetic Analysis ELEVENTH EDITION Introduction to Genetic Analysis ELEVENTH.
Chap. 1 basic concepts of Molecular Biology Introduction to Computational Molecular Biology Chapter 1.
Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
MCB 7200: Molecular Biology Biotechnology terminology Common hosts and experimental organisms Transcription and translation Prokaryotic gene organization.
Lecture 10 Genes, genomes and chromosomes
Chapter 1 Introduction.
Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
Bailee Ludwig Quality Management. Before we get started…. ….Let’s see what you know about Genomics.
A Lot More Advanced Biotechnology Tools Sequencing.
Suffix Trees ALGGEN: Algorithmics and genetics group Dep. Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Dr. Xavier Messeguer.
Genome They are the volums of an encyclopaedia called Genome. Cell Nucleus Tissues The chromosomes contains the instruction of alive beings.
A Lot More Advanced Biotechnology Tools (Part 2) Sequencing.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Olga Karinski Course web site :
Chapter 2 Genome Organization and Evolution Dr Achraf El Allali.
RNA RNA is needed to make proteins: RNA is ribonucleic acid and is very similar to DNA except: 1. RNA has ribose sugar instead of deoxyribose sugar 2.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Bioinformatic PhD. course Bioinformatics Xavier Messeguer Peypoch ( LSI Dep. de Llenguatges i Sistemes Informàtics BSC Barcelona.
What is the ultimate job of the cell?. TO MAKE PROTEINS!
MCB 7200: Molecular Biology
PBIO 4500/5500: Biotechnology and Genetic Engineering
EL: To find out what a genome is and how gene expression is regulated
Genomes and Their Evolution
CHMI 2227E Biochemistry I Gene expression
Genome organization and Bioinformatics
CISC 667 Intro to Bioinformatics (Spring 2007) Review session for Mid-Term CISC667, S07, Lec14, Liao.
Basic Molecular Biology
DNA Structure - Part 1.
Structure of the Genome
In 2003 scientists in the Human Genome Project achieved a long-sought goal by obtaining the DNA sequence of the 3.2 billion base pairs (the order of As,
A Lot More Advanced Biotechnology Tools
Presentation transcript:

Bioinformatic PhD. course Bioinformatics Xavier Messeguer Peypoch ( LSI Dep. de Llenguatges i Sistemes Informàtics BSC Barcelona Supercomputing Center Universitat Politècnica de Catalunya

Contents 1. Biological introduction Exact Extended Approximate 4. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short sequences ( up to bps) Dot Matrix Pairwise alignment Multiple alignment 3. Comparison of large sequences ( more that bps) Data structures Suffix treesMUMs 4. String matching

Contents 1. Biological introduction Exact Extended Approximate 4. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short sequences ( up to bps) Dot Matrix Pairwise alignment Multiple alignment 3. Comparison of large sequences ( more that bps) Data structures Suffix treesMUMs 4. String matching

Genoma Els cromosomes són els volums d’una enciclopèdia anomanada Genoma. Cél.lula Nucli Teixit Les lletres, les paraules i les frases com són? El cromosomes contenen les instruccions necessàries per viure i reproduir-se.

Dna STRUCTURE 1953 – Watson and Crick discover the structure of DNA 1953 – Rosalind Franklin X difraction image of DNA

Cromosomes: les lletres Llavors un cromosoma serà:... A T A G G C T A C G C A A A C C G G T C T A... Dues cadenes de DNA: alfabet quatre bases { A, C, G, T} complementàries (A T, G C) Base

Cromosomes: les lletres... G A C T... Llavors G A C T = A G T C Les paraules i les frases com són?... C T G A.. Si busquem GACT dins la seqüència CACGACTATACGATATCGACTCATACGAGTCGTACGTA

Cromosomes: les frases i les paraules DNA: PromotorGen DNA: (gana matí estiu)(Torrades amb mantega i melmelada) DNA:

Cromosomes: activació d’una instrucció Transcripció DNA: PromotorGen DNA: (gana matí estiu)(Torrades amb mantega i melmelada) DNA: RNA:

Transcription

Cromosomes: activació d’una instrucció Traducció Transcripció Maduració DNA: PromotorGen DNA: (gana matí estiu)(Torrades amb mantega i melmelada) DNA: RNA:

Cromosomes: activació d’una instrucció Traducció Transcripció Maduració DNA: PromotorGen DNA: (gana matí estiu)(Torrades amb mantega i melmelada) DNA: RNA: Síntesi Com passa dins de la cel.lula?

Dogma central de la biologia molecular

Síntesi de les proteïnes Dins la cel.lula: mol/celtipus mRNA tRNA >3000 Proteïnes

Cromosomes: activació d’una instrucció Traducció Transcripció Maduració DNA: PromotorGen DNA: (gana matí estiu)(Torrades amb mantega i melmelada) DNA: RNA: Plegament: quines fases té?

Plegament de la proteïna

Proteins

QIKDLLVSSSTDLDTTLVLVNAIYFKGMW KTAFNAEDTREMPFHVTKQESKPVQMMCM NNSFNVATLPAEKMKILELPFASGDLSML VLLPDEVSDLERIEKTINFEKLTEWTNPN TMEKRRVKVYLPQMKIEEKYNLTSVLMAL GMTDLFIPSANLTGISSAESLKISQAVHG AFMELSEDGIEMAGSTGVIEDIKHSPESE QFRADHPFLFLIKHNPTNTIVYFGRYWSP

Activació d’una instrucció (a l’inversa) Traducció Transcripció Maduració DNA: PromotorGen DNA: (gana matí estiu)(Torrades amb mantega i melmelada) DNA: RNA: Plegament

Traducció Alfabet RNA {A,C,G,U} Alfabet proteïnes {A,I,H,…} Llavors la informació està codificada com ACUCCAUUCUUUAACAGGGCCAUAUCGGCUAUAGGCCGAGUUAGGUA CGAUUAGCACGGAUACUAGCAUAUGCAUCGUAUAGCAUCGAUUAGAA ACUCCAUUCUUUAACAGGGCCAUAUCGGCUAAGGCCGAGUUAGGUACGAUUAGCACGGAUAUAGCAUAUGCAUCGUAUAGCAUCGAUUAG que és la traducció de LRRLPGAATXXYRTFAAGTRRRXXXWA Traducció Transcripció Maduració RNA: Gen:

Maduració està distribuïda en trossos ACUCCAUUCUUUAAACCGUACUACACACACUACUGAUCGACGAUUACGACGACGAAAGGGCCAUAUCGGCUAACUACAUCAUAGACAACAUC ACGGAUCGUCUAAGGCCGAGUUAGGUACGAUUAACGUACGACUACCUAUCGUAUAUACAUCACGGAUAUAACCUAUCUACUACGAUUAACAC GAUCUAUCGUACGGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU La informació UCUCCAUUCUUUAACAGGAUAUCGGCUAAGGCCGAGUUAGGUACGAUUAGCACGGAUAUAGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU Traducció Transcripció Maduració RNA: Gen: UCUCCAUUCUUUAACAGGAUAUCGGCUAAGGCCGAGUUAGGUACGAUUAGCACGGAUAUAGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU ACUCCAUUCUUUAAACCGUACUACACACACUACUGAUCGACGAUUACGACGACGAAAGGGCCAUAUCGGCUAACUACAUCAUAGACAACAUC ACGGAUCGUCUAAGGCCGAGUUAGGUACGAUUAACGUACGACUACCUAUCGUAUAUACAUCACGGAUAUAACCUAUCUACUACGAUUAACAC GAUCUAUCGUACGGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU LRRLPGAATXXYRTFAAGTRRRXXXWA

Maduració Traducció Transcripció Maduració RNA: Gen: UCUCCAUUCUUUAACAGGAUAUCGGCUAAGGCCGAGUUAGGUACGAUUAGCACGGAUAUAGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU ACUCCAUUCUUUAAACCGUACUACACACACUACUGAUCGACGAUUACGACGACGAAAGGGCCAUAUCGGCUAACUACAUCAUAGACAACAUC ACGGAUCGUCUAAGGCCGAGUUAGGUACGAUUAACGUACGACUACCUAUCGUAUAUACAUCACGGAUAUAACCUAUCUACUACGAUUAACAC GAUCUAUCGUACGGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU LRRLPGAATXXYRTFAAGTRRRXXXWA

Transcripció DNA: Gana mati estiu Torrades amb mantegai melmelada ………………...cagctcgatacgttacgatctacgattacgatcatctatactatactatacgatatatctagatatcgatcta.ACTCCATTCTTTAAACCGTACTACACACACTAC TGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAACTACATCATAGACAACATCACGGATCGTCTAAGGCCGAGTTAGGTACGATTAACGTACGAC TACCTATCGTATATACATCACGGATATAACCTATCTACTACGATTAACACGATCTATCGTACGGCATATGCATCGTATAGCATCGATTAGAAT……………….. Traducció Transcripció Maduració RNA: Gen: ACUCCAUUUAACAGGGCCAUAUCGGCUAAGGCCGAGUUAGGUACGAUUAGCACGGAUAUAGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU ACUCCAUUCUUUAAACCGUACUACACACACUACUGAUCGACGAUUACGACGACGAAAGGGCCAUAUCGGCUAACUACAUCAUAGACAACAUC ACGGAUCGUCUAAGGCCGAGUUAGGUACGAUUAACGUACGACUACCUAUCGUAUAUACAUCACGGAUAUAACCUAUCUACUACGAUUAACAC GAUCUAUCGUACGGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU …………….ACTCCATTCTTTAAACCGTACTACACACACTACTGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAACTACATCATAGACA ACATCACGGATCGTCTAAGGCCGAGTTAGGTACGATTAACGTACGACTACCTATCGTATATACATCACGGATATAACCTATCTACTACGATTAAC ACGATCTATCGTACGGCATATGCATCGTATAGCATCGATTAGAAT…………………... Torrades amb mantega i melmelada

Cromosoma Torrades amb mantega i melmelada DNA: Torrades amb mantega i melmelada Traducció Transcripció Maduració RNA: Gen: ACUCCAUUUAACAGGGCCAUAUCGGCUAAGGCCGAGUUAGGUACGAUUAGCACGGAUAUAGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU ACUCCAUUCUUUAAACCGUACUACACACACUACUGAUCGACGAUUACGACGACGAAAGGGCCAUAUCGGCUAACUACAUCAUAGACAACAUC ACGGAUCGUCUAAGGCCGAGUUAGGUACGAUUAACGUACGACUACCUAUCGUAUAUACAUCACGGAUAUAACCUAUCUACUACGAUUAACAC GAUCUAUCGUACGGCAUAUGCAUCGUAUAGCAUCGAUUAGAAU …………….ACTCCATTCTTTAAACCGTACTACACACACTACTGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAACTACATCATAGACA ACATCACGGATCGTCTAAGGCCGAGTTAGGTACGATTAACGTACGACTACCTATCGTATATACATCACGGATATAACCTATCTACTACGATTAAC ACGATCTATCGTACGGCATATGCATCGTATAGCATCGATTAGAAT…………………... ………………...cagctcgatacgttacgatctacgattacgatcatctatactatactatacgatatatctagatatcgatcta.ACTCCATTCTTTAAA CCGTACTACACACACTACTGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAACTACATCATAGACAACATCACGGATC GTCTAAGGCCGAGTTAGGTACGATTAACGTACGACTACCTATCGTATATACATCACGGATATAACCTATCTACTACGATTAACAC GATCTATCGTACGGCATATGCATCGTATAGCATCGATTAGAAT……………….. DNA: Els gens ocupen el 8% del genoma ?

Part d’un cromosoma TACGTATACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTACGATGCGACGATGCGACGATCGT ACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGCGATGCGACGATGCGACGATCGTACGACTGCTAG CTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCA CACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACG GTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGC GACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATC GATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGC GACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATC ACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCT AGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTAGCTGCATGCTAGCGATGCT ACGATCGATGCTATACGACGATCGTAGCTTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTCGATG CGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGT ACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTG CATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACAC CGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTAC GATCGATGCTATACGACGATCGTAGCTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACG ATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTAC GTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGAT GCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGATCGTCGATCGTcagctcgatacgttacgatctacgattacgatcatctatactatactatacgatatatctagatatcgatcta.ACTCCATTCTTTAAACCGTACTACACACACTA CTGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAACTACATCATAGACAACATCACGGATCGTCTAAGGCCGAGTTAGGTACGATTAACGTACGACTACCTATCGTATATACATCACGGATATAACCTATCTACTACGATTAACACGATC TATCGTACGGCATATGCATCGTATAGCATCGATTAGAATACGTATACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACG ATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACC GCGCACGATCACACGATGCGACGATGCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCT ACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACA CGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTA CACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGAC GATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGCGATGCGACGATGCGACGATCGTACGA CTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGC ACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGAT CGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGA CGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGT AGCTATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATG CGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCG TACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGTATCCTACGTACGATCGTGCAGCATCGATGC TACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGC AGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGATGCATGCTAGCGATGCTACGA CGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATG CGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGAT CGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGT ACGACTGCTAGCTACGCATGCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGC ATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTGTCACGTAGCATGCTGACGTACGATCGATTCGATCGATCGTACGATCGTAGCTAGCTA GTCGTAGCGACGTAGGATTCACGTAGCGATGCGTAGCGTAGCATGCTGACGATGCATCGATCGATGCATCATGCTAGCGTAGCTAGCTAGCATGACTGATCGATTAACGGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACG ACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGA TGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGT ACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAG CATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGATCGTATGCTAGCTAGCATGCATGCATGCATGCAT

On es troba? TACGTATACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTACGATGCGACGATGCGACGATCGT ACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGCGATGCGACGATGCGACGATCGTACGACTGCTAG CTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCA CACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACG GTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGC GACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATC GATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGC GACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATC ACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCT AGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTAGCTGCATGCTAGCGATGCT ACGATCGATGCTATACGACGATCGTAGCTTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTCGATG CGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGT ACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTG CATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACAC CGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTAC GATCGATGCTATACGACGATCGTAGCTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACG ATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTAC GTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGAT GCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGATCGTCGATCGTCAGCTCGATACGTTACGATCTACGATTACGATCATCTATACTATACTATACGATATATCTAGATATCGATCTA.ACTCCATT CTTTAAACCGTACTACACACACTACTGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAACTACATCATAGACAACATCACGGATCGTCTAAGGCCGAGTTAGGTACGATTAACGTACGACTACCTATCGTATATACATCACGGATATAAC CTATCTACTACGATTAACACGATCTATCGTACGGCATATGCATCGTATAGCATCGATTAGAATACGTATACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACC GCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACG TTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATG CGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGT ACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAG CGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCAC GATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGCGA TGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTAC GTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGC TGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTAC ACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCT ACGATCGATGCTATACGACGATCGTAGCTATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACG GTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATC ACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGTATCCT ACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTAC GTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGT ACGATGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACG GTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGA TGCTACGATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCA CACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCG ACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTGTCACGTAGCATGCTGACGTACGATCGATTCG ATCGATCGTACGATCGTAGCTAGCTAGTCGTAGCGACGTAGGATTCACGTAGCGATGCGTAGCGTAGCATGCTGACGATGCATCGATCGATGCATCATGCTAGCGTAGCTAGCTAGCATGACTGATCGATTAACGGTACGTATCCTACGTACGA TCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTAT CCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCGATCGATATTAATGCAATCATG CAGCTGCATGCTAGCGATGCTACGTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGACTGCTAGCTACGCATGCCTACGT ACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGATCGTATGCT AGCTAGCATGCATGCATGCATGCAT

Human genome Cromosoma 1: 246Mb ……. Cromosoma 22: 47Mb Cromosoma X: 149Mb Cromosoma Y: 58Mb 3000 milions de bases (27 enciclopèdies catalanes) 2001 – Draft of the human genome

Human chromosomes

What’s in the human genome? gene non- coding part gene coding part (2%) “parasitic” repetitive elements microsatellitesDNA long repeats

Annotation

Comparison with another genomes OrganismGenome Size (Bases)Estimated Genes Human (Homo sapiens)3000 million30,000 Laboratory mouse (M. musculus)2600 million30,000 Mustard weed (A. thaliana)100 million25,000 Roundworm (C. elegans)97 million19,000 Fruit fly (D. melanogaster)137 million13,000 Yeast (S. cerevisiae)12.1 million6,000 Bacterium (E. coli) 4.6 million3,200 Human immunodeficiency virus (HIV) Genbank:

Homework 1.TGDSJavier14. ZFP161 2.NR1H2Dmitry15. PROZ 3.ATP5L2Ana Iris 4.MYCL3David 5.ETAA16Patricia 6.CRYBA2Rogeli 7.LOC389199Atif 8.NOS3Aina 9.FSCN3Isaac 10.C9orf122Maria Merce 11.MTTS1Romina 12.AMELYGuillem 13.BiT1Raul