Download presentation
Presentation is loading. Please wait.
1
The BIG Goal “The greatest challenge, however, is analytical. … Deeper biological insight is likely to emerge from examining datasets with scores of samples.” Eric Lander, “array of hope” Nat. Gen. volume 21 supplement pp 3 - 4, 1999. Bio-informatics: Provide methodologies for elucidating biological knowledge from biological data.
2
Genetic Information Central Paradigm of Bio-informatics
3
Molecular Structure Genetic Information Central Paradigm of Bio-informatics
4
Molecular Structure Genetic Information Biochemical Function Central Paradigm of BioInformatics
5
Molecular Structure Genetic Information Biochemical Function Symptoms Central Paradigm of Bio-informatics
6
Molecular Structure Genetic Information Biochemical Function Symptoms Central Paradigm of Bio-informatics
7
http://www.sanger.ac.uk/PostGenomics/S_pombe/presentations/EMBOCopenhagenWebsite.pdf Computer Science Tools are Crucial
8
New bio-technologies create huge amounts of data. It is impossible to analyze data by manual inspection. Novel mathematical, statistical, algorithmic and computational tools are necessary !
9
http://cbms.st-and.ac.uk/academics/ryan/Teaching/SB&Bioinf/lecture1.htm Automated Sequencing
10
What is Bio-Informatics ? A field of science in which Biology, Computer Science and Information Technology merge into a single discipline. Computers (& software tools) are used to collect, analyze and interpret biological information at the molecular level. Goal: To enable the discovery of new biological insights and create a global perspective for biologists.
11
Development of new algorithms and statistical methods to assess relationships among members of large data sets. Analysis and interpretation of various types of data. Development and implementation of tools to efficiently access and manage different types of information. Disciplines
12
Why Use Bio-Informatics ? An explosive growth in the amount of biological information necessitates the use of computers for cataloging and retrieval of data (> 3 billion bps, > 30,000 genes). The human genome project. Automated sequencing. GenBank has over 16 Billion bases and is doubling every year !!!
13
New Types of Biological Data Micro arrays - gene expression. Multi-level maps: genetic, physical: sequence, annotation. Networks of protein-protein interactions. Cross-species relationships: Homologous genes. Chromosome organization. http://www.the-scientist.com/yr2002/apr/research 020415.html
14
A more global view of experimental design. (from “one scientist = one gene/protein/disease” paradigm to whole organism consideration). Data mining - functional/structural information is important for studying the molecular basis of diseases, diagnostics, developing drugs (personal medicine), evolutionary patterns, etc. Why Bio Informatics ? (cont.)
15
http://www.library.csi.cuny.edu/~davis/Bioinfo_326/lectures/lect14/lect_14.html Why Bio Informatics ? (cont.)
16
http://www.usgenomics.com/technology/index.shtml Principle milestones in data mining and genome analysis: Sanger method for sequencing, invented in 1977 (winner of the Nobel Prize in 1980), Polymerase chain reaction (PCR), invented in 1989 (awarded the Nobel Prize in 1993). Future of Genomic Research
17
The next step: Locate all the genes and understand their function. This will probably take another 15-20 years !
18
Disease Genes Discovered
20
One can efficiently find information: Using databases and software on the web. Question: How likely are you to use a free bio-informatics library of accessible software ? http://www.cryst.bbk.ac.uk/classlib/BBSRC_poster/potential.html The job of biologists is changing…
21
Molecular Biology Analysis Software Tools - Freely Available on the Web. - Highlights
22
Broad Classification of Biological Databases http://www.mrc-lmb.cam.ac.uk/genomes/madanm/pres/biodb.htm
23
ENTREZ - PubMed NCBI
24
http://www3.ncbi.nlm.nih.gov/Entrez/index.html
25
Genome Proteome Transcriptome Gene function Metabolome Glycome 89,300 1,701 Google search PubMed 2.1x10 6 76,566 9,960 229 1.2x10 6 6.5x10 5 1,170 29 Post-genomic terms (Oct. 2002) 138 6 PubMed Hits Proteome From: Computational Proteomics, Mark B Gerstein, Yale U.
26
http://cbms.st-and.ac.uk/academics/ryan/Teaching/SB&Bioinf/lecture1.htm
30
Similarity / Analogy Examples: If looks like an elephant, and smells like an elephant– it’s an elephant. If walks like a duck, and quacks like a duck– it’s a duck. http://cbms.st-and.ac.uk/academics/ryan/Teaching/molbiol/Bioinf_files/v3_document.htm
31
Similarity Search in Databanks Find similar sequences to a working draft. As databanks grow, homologies get harder, and quality is reduced. Alignment Tools: BLAST & FASTA (time saving heuristics- approximations). >gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'. Length = 369 Score = 272 bits (137), Expect = 4e-71 Identities = 258/297 (86%), Gaps = 1/297 (0%) Strand = Plus / Plus Query: 17 aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76 |||||||||||||||| | ||| | ||| || ||| | |||| ||||| ||||||||| Sbjct: 1 aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59 Query: 77 agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136 |||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| || Sbjct: 60 agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119 Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196 |||||||| | || | ||||||||||||||| ||||||||||| || |||||||||||| Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179 Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256 ||||||||| | |||||||| |||||||||||||||||| |||||||||||||||||||| Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239 Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313 || || ||||| || ||||||||||| | |||||||||||||||||| |||||||| Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296 Pairwise alignment:
32
Multiple Sequence Alignment Multiple alignment: find protein families and functional domains.
33
Structure - Function Relationships structure function sequence
34
Protein Structure (domains)
35
Phylogeny Evolution - a process in which small changes occur within species over time. These changes could be monitored today using molecular techniques. The Tree of Life: A classical, basic science problem, since Darwin’s 1859 “Origin of Species”.
36
Origin of the universe ? Formation of the solar system First self replicating systems Prokaryotes/ eukaryotes Plant/ animals Invertebrates/ vertebrates Mammalian radiation Tree of Life: Searching Protein Sequence Databases - How far can we see back ?
37
Write down all of human DNA on a single CD (“completed” 2001). Identify all genes, their location and function (far from completion). The Human Genome Project (HGP)
38
Example for Gene Localization Bio-Tool (FISH).
39
Fluorescent labeled probes hybridize to specific chromosomal locations. Example application: low resolution localization of a gene. FISH - Fluorescence In-Situ Hybridization.
40
Sequencing Genes & Gene Assembly Automated sequencing
41
Gene Finding Only 2-3% of the human genome encodes for functional genes. Genes are found along large non-coding DNA regions. Repeats, pseudo-genes, introns, contamination of vectors, are very confusing.
42
Gene Finding - cont. Find special gene patterns: Translation start and stop sites (open reading frames - ORF). Transcription factors, promoters. Intron splice sites. Etc…
44
Micro Arrays (“DNA Chips”) New biotechnology breakthrough: measure RNA expression levels of thousands of genes (in one experiment).
45
The Idea Behind Micro Arrays
46
Clustering Analysis of Gene Expression Data DNA chips and personalized medicine (leading edge, future technologies).
47
Pharmaco-genomics Use DNA information to measure and predict the reaction to drugs. Personalized medicine. Faster clinical trials: selected populations. Less drug side-effects.
48
Protein and Other Arrays Sequencing the human genome => finite problem. Studying the proteome => endless possible variations, dynamic. Protein array Future fields of study: Proteins + Genomics = Proteomics Lipids + Genomics = Lipomics Sugars + Genomics = Glycomics
49
Understanding Mechanisms of Disease EC number compound
50
SEQUENCE ALIGNMENT ORTHOLOG GENES (Taxonomy) CONSERVED DOMAINS CODING REGIONS 3-D STRUCTURE GENE FAMILIES MUTATIONS & POLYMORPHISM GENOME MAPS CELLULAR LOCATION SIGNAL PEPTIDE Putting it all together: Bio-Informatics SEQUENCES & LITERATURE
51
GENE EXPRESSION, GENES FUNCTION, DRUG & PERSONAL THERAPY CODING REGIONS SEQUENCE ALIGNMENT ORTHOLOG GENES (Taxonomy) CONSERVED DOMAINS GENE FAMILIES MUTATIONS & POLYMORPHISM GENOME MAPS CELLULAR LOCATION SIGNAL PEPTIDE 3-D STRUCTURE Putting it all together: Bio-Informatics
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.