Bioinformatics Basics Cyrus Courtesy from LO Leung Yau’s original presentation.

Slides:



Advertisements
Similar presentations
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Advertisements

Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Introduction to BioInformatics GCB/CIS535
Inferring Prototypical Transcriptional Regulatory Network from Genome Sequence LO Leung Yau 7 th May, 2009.
Transcription & Translation Biology 6(C). Learning Objectives Describe how DNA is used to make protein Explain process of transcription Explain process.
What makes you look like your parents? Your parents passed down their DNA to you. What’s carried in your DNA that gives you your traits & characteristics?
RNA and Protein Synthesis
Bioinformatics Basics Cyrus Chan, Peter Lo, David Lam Courtesy from LO Leung Yau’s original presentation.
Protein synthesis and replication
Gene expression.
FROM GENE TO PROTEIN: TRANSCRIPTION & RNA PROCESSING Chapter 17.
GENE EXPRESSION.
Chapter 10 – DNA, RNA, and Protein Synthesis
How does DNA work? Building the Proteins that your body needs.
Chapter 11 DNA and Genes. Proteins Form structures and control chemical reactions in cells. Polymers of amino acids. Coded for by specific sequences of.
Biology 10.1 How Proteins are Made:
Chapter 10 Table of Contents Section 1 Discovery of DNA
Q2 WK8 D3 & 4. How does DNA’s message travel OUT of the nucleus and INTO THE CELL, where the message gets expressed as a protein??? This is known as…
Intelligent Systems for Bioinformatics Michael J. Watts
GENE EXPRESSION © 2007 Paul Billiet ODWSODWS. Two steps are required 1. Transcription The synthesis of mRNA use the gene on the DNA molecule as a template.
Transcription and Translation
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
End Show Slide 1 of 39 Copyright Pearson Prentice Hall 12-3 RNA and Protein Synthesis RNA and Protein Synthesis.
 We know that DNA is the genetic material and its sequence of nucleotide bases carry some sort of code. This code holds instructions that tell a cell.
DNA Notes DAY 2 Replication, overview of transcription, overview of translation WARM UP What is the base pairing rule? Who created it?
KEY CONCEPT DNA structure is the same in all organisms.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
KEY CONCEPT DNA structure is the same in all organisms.
Lecture #3 Transcription Unit 4: Molecular Genetics.
BSC Developmental Biology Patterns of Inheritance EvolutionEcology.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Structure of RNA  Structure  Nucleic acid made up of nucleotides  composed of Ribose, phosphate group, and nitrogenous base  Nitrogenous bases  Adenine.
Bonus Trivia DNA Structure Translation Transcriptio n Replication
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
What is central dogma? From DNA to Protein
RNA & Protein Synthesis
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
RNA and Protein Synthesis Mr. Cobb GCA Fall 2011.
Bioinformatics and Computational Biology
Processes DNA RNAMisc.Protein What is the base pair rule? Why is it important.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
CHAPTER 13 RNA and Protein Synthesis. Differences between DNA and RNA  Sugar = Deoxyribose  Double stranded  Bases  Cytosine  Guanine  Adenine 
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
RNA, Transcription, and the Genetic Code. RNA = ribonucleic acid -Nucleic acid similar to DNA but with several differences DNARNA Number of strands21.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
Higher Human Biology Unit 1 Human Cells KEY AREA 3: Gene Expression.
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
Introduction to molecular biology Data Mining Techniques.
Replication, Transcription, and Translation. Replication Where does replication occur in eukaryotes? Nucleus! In what phase does DNA replication occur?
8.2 KEY CONCEPT DNA structure is the same in all organisms.
Chapter Eight: From DNA to Proteins
bacteria and eukaryotes
Chapter 10 – DNA, RNA, and Protein Synthesis
Data-intensive Computing: Case Study Area 1: Bioinformatics
Protein Synthesis From genes to proteins.
Pharmacogenetics and Pharmacoepidemiology
Protein Synthesis in Detail
DNA and RNA Chapter 12.
12-3 RNA and Protein Synthesis
Introduction to Bioinformatics II
How Proteins are Made Biology I: Chapter 10.
Protein Synthesis.
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
Pharmacogenetics and Pharmacoepidemiology
RNA & Protein synthesis
12-3 RNA and Protein Synthesis
Molecular Genetics Glencoe Chapter 12.
4/6 Objective: Explain the steps and key players in transcription.
Presentation transcript:

Bioinformatics Basics Cyrus Courtesy from LO Leung Yau’s original presentation

Outline Biological Background  Cell  Protein  DNA & RNA  Central Dogma  Gene Expression Bioinformatics  Sequence Analysis  Phylogentic Trees  Data Mining

Biological Background – Cell Basic unit of organisms  Prokaryotic  Eukaryotic A bag of chemicals Metabolism controlled by various enzymes Correct working needs  Suitable amounts of various proteins Picture taken from

Biological Background – Protein Polymer of 20 types of Amino Acids Folds into 3D structure Shape determines the function Many types  Transcription Factors  Enzymes  Structural Proteins  … Picture taken from

Biological Background – DNA & RNA DNA  Double stranded  Adenine, Cytosine, Guanine, Thymine  A-T, G-C  Those parts coding for proteins are called genes RNA  Single stranded  Adenine, Cytosine, Guanine, Uracil Picture taken from

Biological Background – Genes Genes – protein coding regions 3 nucleotides code for one amino acid There are also start and stop codons

Biological Background — in a nutshell Abstractions Functional Units: Proteins Templates: RNAs Blueprints: DNAs Templates: RNAs Blueprints: DNAs Not only the information (data), but also the control signals about what and how much data is to be sent Proteins (TFs) so help

Biological Background – Sequences Abstractions Sequences acatggccgatcaggctgtttttgtgtgcctgtttttctattttacgtaaatcaccctgaacatgtTTGCATCAacctact ggtgatgcacctttgatcaatacattttagacaaacgtggtttttgagtccaaagatcagggctgggttgacctgaatact ggatacagggcatataaaacaggggcaaggcacagactc FT intron <1..28 FT /gene="CREB" FT /number=3 FT /experiment="experimental evidence … FT recorded" FT exon FT /gene="CREB" FT /number=4 FT /experiment="experimental evidence … FT recorded" FT intron 175..>189 FT /gene="CREB" FT /number=4 Annotations Visualizations

Biological Background – DNA  RNA  Protein Picture taken from gene

Biological Background – DNA  RNA  Protein Transcriptional Regulatory Network is the complex interaction between genes, transcription factors (TF) and transcription factor binding sites (TFBS). Other functions Transcription Factors Binding sites GenesPromoter regions

Complex Interactions between Genes, TFs and TFBSs

Biological Background – DNA  RNA  Protein Transcriptional Regulatory Network is the complex interaction between genes, transcription factors (TF) and transcription factor binding sites (TFBS). Other functions Transcription Factors Binding sites GenesPromoter regions

Gene Expression Microarray Data High throughput Measures RNA level Relies on A-T, G-C pairing Can monitor expression of many genes Picture taken from

Gene Expression Microarray Data Picture taken from Genes Time points/Condiditions Colors: Expression (RNA) Levels

Bioinformatics — Sequence Analysis Alignments  a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequencesDNARNAproteinstructural evolutionary

Bioinformatics — Sequence Analysis Pair-wise alignments  Method: dynamic programming! No penalty for the consecutive ‘-’s before and after the sequence to be aligned \\Pc91106\Old_FYP\Bioinformatics for FYPs\CSC3220 Lectures

Bioinformatics — Sequence Analysis Multiple (global) sequence alignment  Also dynamic programming (but can’t scale up!)

Bioinformatics — Sequence Analysis Multiple local sequence alignment  i.e. Motif (pattern) discovery >seq1 acatggccgatcagctggtttttgtgtgcctgtttctgaatc >seq2 ttctattttacgtaaatcagcttgaacatgtacctactggtg >seq3 atgcacctttgatcaataccagctagacaaacgtgtgttg >seq4 agtccaaagatcagggctggctgaatactggatcagct >seq5 cagctacagggcatataaaggggcaaggcacagactc Such overrepresented patterns are often important components (e.g. TFBSs if the sequences are promoters of similar genes). TFBSs are the controlling key holes in gene regulation!

DNA motifs Similar DNA fragments across individuals and/or species  TFBS Motifs: DNA fragments similar to “TATAA” are common in order to make genes functioning  Expensive and time-consuming to try a large set of candidates in biological experiments Transcription RNA Translation Protein TATAA TFBS (controlling) Gene (functioning) TF Transcription Factor DNA

Motif discovery CGATTGA f Similar controlled functions e.g. cancer gene activities Maximized TFBS Motif Discovery SNP (single nucleotide polymorphism) Motif Discovery … DNA from different people Normal Disease! A A A C C C T T T G G G AT CG … … … … f Normal Disease! distinguish Maximized

Bioinformatics — Data mining Classification  To predict!  Pre-processing—tidy up your materials!  Feature selection—the key points to go over  Classifier—the thinking style/manner of how to combine the key points and get some answer  Training—your practice of your thinking manner with answers known  Validation—mock quiz to evaluate what you’ve learnt from the training  Testing—your examination! \\Pc91106\Old_FYP\Bioinformatics for FYPs\CSC5180 Data Mining Notes\c3class1.pdf Underfitting & Overfitting

TRANSFAC Project TF-Transcription Factors, important regulators TFBS-Transcription Factor Binding Site, major regulatory elements TRANSFAC-The most representative DB for TFs and TFBSs Modeling: statistical models, representations, Markov chains; Discovery: stochastic searching, indexing (suffix trees) 1 Relationship: TF-TFBS; TFBS- Gene… (understanding, prediction) Mining: text mining, approximate matching 2 Annotations: accurate wet-lab candidates (reduced labor and costs); Computation: large scale data processing; parallel computing 3 Representative Publications [1] Gang Li, Tak-Ming Chan, Kwong-Sak Leung and Kin-Hong Lee, A Cluster Refinement Algorithm for Motif Discovery, IEEE/ACM Transaction on Computational Biology and Bioinformatics (accepted) [2] Tak-Ming Chan, Kwong-Sak Leung, Kin-Hong Lee, TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics, 2008, 24(3), pp

Bioinformatics — Data mining Evaluation (scores!)  Confusion Matrix  Binary Classification Performance Evaluation Metrics  Accuracy  Sensitivity/Recall/TP Rate  Specificity/TN Rate  Precision/PPV  … \\Pc91106\Old_FYP\Bioinformatics for FYPs\CSC5180 Data Mining Notes\c3class3.pdf

Bioinformatics — Data mining Evaluation  ROC (Receiver Operating Characteristics)  Trade-off between positive hits (TP) and false alarms (FP)

Not The End Your corresponding tutor will have more project-specific stuff to tell you Thanks Q & A