Data-intensive Computing: Case Study Area 1: Bioinformatics

Slides:



Advertisements
Similar presentations
The DNA Connection.
Advertisements

RNA and Protein Synthesis
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
12-3: RNA AND PROTEIN SYNTHESIS Biology 2. DNA double helix structure explains how DNA can be copied, but not how genes work GENES: sequence of DNA that.
Q2 WK8 D3 & 4. How does DNA’s message travel OUT of the nucleus and INTO THE CELL, where the message gets expressed as a protein??? This is known as…
Transcription and Translation
DNA to Eye Color? Just How does it Happen? Problem? How do we go from DNA to individual traits?
RNA & Protein Synthesis.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
National 5 Biology Course Notes Part 4 : DNA and production of
Protein Synthesis: DNA CONTAINS THE GENETIC INFORMATION TO PRODUCE PROTEINS BUT MUST FIRST BE CONVERTED TO RND TO DO SO.
Protein Synthesis Transcription. DNA vs. RNA Single stranded Ribose sugar Uracil Anywhere Double stranded Deoxyribose sugar Thymine Nucleus.
RNA and Protein Synthesis
RNA Structure and Protein Synthesis Chapter 10, pg
3.5 transcription and translation by arielle lafuente.
DNA The Code of Life.
DNA, RNA & Genetics Notes
Leaving Cert Biology Genetics – section 2.5 Genetics ( RNA), 2.5.5,
 RNA: Ribonucleic Acid  3 types  Helps cells make protein  Single strand of nucleotides: › Ribose sugar › Phosphate › Nitrogen bases  Adenine, uracil,
Protein Synthesis Transcription. DNA vs. RNA Single stranded Ribose sugar Uracil Anywhere Double stranded Deoxyribose sugar Thymine Nucleus.
RNA  Structure Differences:  1. Instead of being double stranded, RNA is a single stranded molecule. (ss)  2. The sugar in RNA is ribose. It has one.
RNA and Protein Synthesis Chapter How are proteins made? In molecular terms, genes are coded DNA instructions that control the production of.
You are what you eat!.  Deoxyribonucleic Acid  Long, double-stranded chain of nucleotides  Contains genetic code  Instructions for making the proteins.
Chapter 13: RNA and Protein Synthesis Mr. Freidhoff.
RNA and Transcription. Genes Genes are coded DNA instructions that control the production of proteins within the cell To decode the genetic message, you.
Genetics.
DNA and RNA.
Molecular Genetics Transcription & Translation
The DNA connection Coulter.
Protein Synthesis From genes to proteins.
(3) Gene Expression Gene Expression (A) What is Gene Expression?
DNA.
From DNA to Proteins Transcription.
13.3 RNA & Gene Expression I. An Overview of Gene Expression A. RNA
Pharmacogenetics and Pharmacoepidemiology
Protein Synthesis.
Protein Synthesis.
Transcription and Translation Chapter 12
Nucleotide.
12-3 RNA and Protein Synthesis
Structure, Function, Replication
Cells, Chromosomes, DNA and RNA
DNA and RNA Structure and Function
GENETICS (Geneology) the study of “genes” Inheritable traits that
Protein synthesis: Overview
The nucleus is the 'command center' of the cell
RNA and Transcription DNA RNA PROTEIN.
Protein Synthesis.
Pharmacogenetics and Pharmacoepidemiology
Protein Synthesis RNA.
RNA: Structures and Functions
DNA, RNA, and Protein Synthesis
It’s Wednesday!! Don’t be content with being average. Average is as close to the bottom as it is to the top!
Our Genetic Code.
Translation and Transcription
RNA is a nucleic acid made of linked nucleotides.
Making Proteins Transcription Translation.
4/6 Objective: Explain the steps and key players in transcription.
Transcription and Translation
DNA: The Molecule of Heredity
Genes and Protein Synthesis Review
Nucleic Acids.
Replication, Transcription, Translation
DNA Transcription and Translation
DNA Structure and Function Notes
Transcription and the RNA code
4/2 Objective: Explain the steps and key players in transcription.
The Structure of DNA.
Presentation transcript:

Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/3/2018

Human Genetics Genomics Human Genome project Proteomics Diseasome Tree of life project Phylogenetics 6/3/2018

Human cell Base pair of DNA: CG, AT C – cytosine, G – guanine, A – adenine , T - thymine Each human cell contains approximately 3 billion base pairs. The DNA of a single cell contains so much information that if it were represented in printed words, simply listing the first letter of each base would require over 1.5 million pages of text! If laid end-to-end, the DNA strand measures about 2 – 3 meters. DNA is a single large molecule at the nucleus of cell It is coiled a double helix Each strand of the DNA molecule is made of A, C, G and T: example: AAAGTTCTTAATTA that will be matched on the other strand by the matching base: TTTCAAGAATTAAT These string of alphabets contain all the codes needed for the human functions Ref text: Bioinformatics: Databases, tools and algorithms, by. O. Bosu and S.K. Thukral 6/3/2018

More details Sequence of base pairs are grouped to make sense: genes When a gene inside needs to be activated, the DNA molecule at the cell nucleus uncoils and unfurls to the right extent to expose that gene From the exposed ends of the DNA a RNA is formed. mRNA or messenger RNA is formed that carries with it the “print” of the open DNA section RNA and DNA differ in one respect: RNA does not contain T or thymine but it has uracil (U). RNA is short-lived Once mRNA is formed open sections of the DNA close off. 6/3/2018

Protein formation mRNA travels to the cytoplasm where it meets the ribosome (rRNA) Ribosome reads the code in the mRNA (codon) and form the amino acids. Twenty amino acids are prevalent in human cells. Ex: codon GCU GCC GCA correspond to alanine In effect ribosome is a process control computer that takes in as input codons and produces amino acids as output. Amino acids polymerize and form polypeptide chains called proteins Proteins fold and form the basic structures such as skin and hair. Even though brain controls major human functions at the cell level it is the DNA that has the command and control. DNA is fixed code for a given human. (WORM characteristics) 6/3/2018

Life’s processes DNA is “program” that controls functions, operations and structure of a cell and in turn that of our life processes. Life processes are in fact dependent of the program in a DNA and the hundreds of millions of ribosomes. Life in this context appears as an immense distributed system. 6/3/2018

Bioinformatics Can we study, understand and analyze the complexity of the immensely complex system? It structure and programs? University of Arizona’s tree of life project (ToL): http://tolweb.org Human Genome project (NIH and DOE): collecting approximately 30,000 genes in human DNA and determining the sequences three billion bases that make up the human DNA. Out of the 30000 genes we do not know the functions of more than 50% of them. 99.9% of the nucleotide sequence is same for all of us 0.1% is attributed to individual differences such as race, color of skin, disposition to diseases High throughput sequencing is generating ultra scale biological data: how to analyze this data? That is a data-intensive problem. 6/3/2018

Existing solutions? Traditional databases: store, retrieve, analyze and/or predict huge biological data Software tools for implementing algorithms, and developing applications for in-silico experiments Visualization tools, user interfaces, web accessibility for search through data Machine learning and data mining methodologies. 6/3/2018

Databases Taxonomy DB Genomics Sequence db Structure db Proteomic database (PDB) Micro-array db Expression db Enzyme db Disease db Molecular biology db 6/3/2018

Tools Data analysis tools Prediction tools Modeling tools MySQL Perl Prediction tools Clustering Modeling tools Surface prediction, predicting area of interest, protein-protein interaction Alignment tools Many more: http://galaxyproject.org/ 6/3/2018

How can we help? How can we leverage our knowledge of large scale data management to address bioinformatics problems? DC methods. Large number of tools and data: how we standardize the efforts so that they are complementary or repetitive? Cloud computing. 6/3/2018

Text Mining vs Genetic Sequence Mining (Dot plot)   C O R E L A T I N S H P   A C T G 6/3/2018