Sequencing Data Analysis

Slides:



Advertisements
Similar presentations
Structure of DNA. Polymerase Chain Reaction - PCR PCR amplifies DNA –Makes lots and lots of copies of a few copies of DNA –Can copy different lengths.
Advertisements

PCR way of copying specific DNA fragments from small sample DNA material "molecular photocopying" It’s fast, inexpensive and simple Polymerase Chain Reaction.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
RNA and Protein Synthesis
Additional Powerful Molecular Techniques Synthesis of cDNA (complimentary DNA) Polymerase Chain Reaction (PCR) Microarray analysis Link to Gene Therapy.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
RNA = RiboNucleic Acid Synthesis: to build
10-2: RNA and 10-3: Protein Synthesis
The polymerase chain reaction (PCR) rapidly
DNA Replication DNA mRNA protein transcription translation replication Before each cell division the DNA must be replicated so each daughter cell can get.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Objective 2: TSWBAT describe the basic process of genetic engineering and the applications of it.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
Qai Gordon and Maddy Marchetti. What is Polymerase Chain Reaction? Polymerase Chain Reaction ( PCR ) is a process that amplifies small pieces of DNA to.
RNA and Protein Synthesis
RNA AND PROTEIN SYNTHESIS RNA vs DNA RNADNA 1. 5 – Carbon sugar (ribose) 5 – Carbon sugar (deoxyribose) 2. Phosphate group Phosphate group 3. Nitrogenous.
Transcription and Translation
LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert.
Protein Synthesis 12-3.
RNA Ribonucleic Acid. Structure of RNA  Single stranded  Ribose Sugar  5 carbon sugar  Phosphate group  Adenine, Uracil, Cytosine, Guanine.
Do Now: On the “Modeling DNA” handout, determine the complimentary DNA sequence and the mRNA sequence by using the sequence given.
12-3 RNA and Protein Synthesis
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Processes DNA RNAMisc.Protein What is the base pair rule? Why is it important.
TRANSCRIPTION Copying of the DNA code for a protein into RNA Copying of the DNA code for a protein into RNA 4 Steps: 4 Steps: Initiation Initiation Elongation.
Polymerase Chain Reaction A process used to artificially multiply a chosen piece of genetic material. May also be known as DNA amplification. One strand.
Molecular Genetic Technologies Gel Electrophoresis PCR Restriction & ligation Enzymes Recombinant plasmids and transformation DNA microarrays DNA profiling.
Semiconservative DNA replication Each strand of DNA acts as a template for synthesis of a new strand Daughter DNA contains one parental and one newly synthesized.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
RNA processing and Translation. Eukaryotic cells modify RNA after transcription (RNA processing) During RNA processing, both ends of the primary transcript.
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
RNA and Protein Synthesis. RNA Structure n Like DNA- Nucleic acid- composed of a long chain of nucleotides (5-carbon sugar + phosphate group + 4 different.
CAMPBELL BIOLOGY Reece Urry Cain Wasserman Minorsky Jackson © 2014 Pearson Education, Inc. TENTH EDITION CAMPBELL BIOLOGY Reece Urry Cain Wasserman Minorsky.
12-3 RNA and Protein Synthesis Page 300. A. Introduction 1. Chromosomes are a threadlike structure of nucleic acids and protein found in the nucleus of.
Gene Expression : Transcription and Translation 3.4 & 7.3.
Polymerase Chain Reaction
Part 3 Gene Technology & Medicine
Microbial Genomes and techniques for studying them.
Cancer Genomics Core Lab
Transcription Translation
Polymerase Chain Reaction
PCR uses polymerases to copy DNA segments.
Chapter 10 How Proteins are Made.
Topic DNA.
Chapter 14 Bioinformatics—the study of a genome
Screening a Library for Clones Carrying a Gene of Interest
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Transcription & Translation.
Introduction to Bioinformatics II
RNA Ribonucleic Acid.
What is RNA? Do Now: What is RNA made of?
12-3 RNA and Protein Synthesis
PCR uses polymerases to copy DNA segments.
PCR uses polymerases to copy DNA segments.
Polymerase Chain Reaction (PCR) & DNA SEQUENCING
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
12-3 RNA and Protein Synthesis
Polymerase Chain Reaction (PCR)
PCR uses polymerases to copy DNA segments.
Basic Local Alignment Search Tool
PCR uses polymerases to copy DNA segments.
PCR uses polymerases to copy DNA segments.
GENE TECHNOLOGY Chapter 13.
Using the DNA Sequence Knowing the sequence of an organism’s DNA allows researchers to study specific genes, to compare them with the genes of other organisms,
Polymerase Chain Reaction (PCR) & DNA SEQUENCING
Sequencing Data Analysis
PCR uses polymerases to copy DNA segments.
Presentation transcript:

Sequencing Data Analysis Debashis Sahoo Department of Computer Science CSE291 – H00 – Lecture 17

Sanger dideoxy sequencing--basic method Single stranded DNA 3’ 5’ 5’ 3’ a) Anneal the primer

An automated sequencer The output

Sequence output Computer calls Raw data GNNTNNTGTGNCGGATACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGCACCACCAC CACCACCACCCCATGGGTATGAATAAGCAAAAGGTTTGTCCTGCTTGTGAATCTGCGGAACTTATTTATGATCCAGAAAG GGGGGAAATAGTCTGTGCCAAGTGCGGTTATGTAATAGAAGAGAACATAATTGATATGGGTCCTAAGTGGCGTGCTTTTG ATGCTTCTCAAAGGGAACGCAGGTCTAGAACTGGTGCACCAGAAAGTATTCTTCTTCATGACAAGGGGCTTTCAACTGCA ATTGGAATTGACAGATCGCTTTCCGGATTAATGAGAGAGAAGATGTACCGTTTGAGGAAGTGGCANTCCANATTANGAGT TAGTGATGCAGCANANAGGAACCTAGCTTTTGCCCTAAGTGAGTTGGATAGAATTNCTGCTCAGTTAAAACTTCCNNGAC ATGTAGAGGAAGAAGCTGCAANGCTGNACANAGANGCAGNGNGANAGGGACTTATTNGANGCAGATCTATTGAGAGCGTT ATGGCGGCANGTGTTTACCCTGCTTGTAGGTTATTAAAAGNTCCCGGGACTCTGGATGAGATTGCTGATATTGCTAGAGC

Amplifying DNA in Vitro: The Polymerase Chain Reaction (PCR) The polymerase chain reaction, PCR, can produce many copies of a specific target segment of DNA A three-step cycle—heating, cooling, and replication—brings about a chain reaction that produces an exponentially growing population of identical DNA molecules

The three main steps of PCR Step 1: Denature DNA At 95C, the DNA is denatured (i.e. the two strands are separated) Step 2: Primers Anneal At 40C- 65C, the primers anneal (or bind to) their complementary sequences on the single strands of DNA Step 3: DNA polymerase Extends the DNA chain At 72C, DNA Polymerase extends the DNA chain by adding nucleotides to the 3’ ends of the primers.

PCR: Polymerase Chain Reaction Step 1: denaturation Step 2: annealing Step 3: extension

PCR PCR tubes PCR C1000 Thermal Cycler

Denaturation of DNA This occurs at 95 ºC mimicking the function of helicase in the cell.

Step 2 Annealing or Primers Binding Reverse Primer Forward Primer Primers bind to the complimentary sequence on the target DNA. Primers are chosen such that one is complimentary to the one strand at one end of the target sequence and that the other is complimentary to the other strand at the other end of the target sequence.

Step 3 Extension or Primer Extension DNA polymerase catalyzes the extension of the strand in the 5-3 direction, starting at the primers, attaching the appropriate nucleotide (A-T, C-G)

The next cycle will begin by denaturing the new DNA strands formed in the previous cycle

The Size of the DNA Fragment Produced in PCR is Dependent on the Primers The PCR reaction will amplify the DNA section between the two primers. If the DNA sequence is known, primers can be developed to amplify any piece of an organism’s DNA. Forward primer Reverse primer Size of fragment that is amplified

FASTA >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

FASTQ @HWUSI-EAS466_0001:1:1:6:1464#0/1 CAAATGTCTATTTTNTCCGTCAATCTGTGAGTGNCA +HWUSI-EAS466_0001:1:1:6:1464#0/1 abaa`[]aaaaaaaB_aa`_aaaaa^W]\_VV^Ba` @HWUSI-EAS466_0001:1:1:6:579#0/1 ACCTGGTCCTCTTTNAAGACGCGATGTGTCACGNTG +HWUSI-EAS466_0001:1:1:6:579#0/1 `aa_YWY`abaa`aBa_T`_O`Y__VYQ[aaBBBBB @HWUSI-EAS466_0001:1:1:6:1050#0/1 CGAATATCGTGACCNACCGCGGTACAATTGCATNCT +HWUSI-EAS466_0001:1:1:6:1050#0/1 a``aaaa`Y\T`aaBaa_^_``\`a```[O]__Ba`

The different types of BLAST BLAST = Basic Local Alignment Search Tool “The most popular data mining tool ever” BLASTN DNA sequence vs. DNA sequence database BLASTP protein sequence vs. protein sequence database BLASTX DNA sequence translated in 6 reading frames vs. protein sequence database tBLASTX DNA sequence translated in 6 reading frames vs. DNA sequence database translated in 6 frames

Steps to use Blast #1) Paste sequence here #2) Choose search set (Either nucleotide collection or Protein Data Bank) #4 push blast button #3) select program to use

An example of aligning text strings Raw Data ??? T C A T G C A T T G 2 matches, 0 gaps T C A T G | | C A T T G 3 matches (2 end gaps) T C A T G . | | | . C A T T G 4 matches, 1 insertion T C A - T G | | | | . C A T T G T C A T - G | | | | . C A T T G

Terminologies of sequence comparison Sequence identity -- exactly the same Amino Acid or Nucleotide in the same position. Sequence similarity -- Substitutions with similar chemical properties. Sequence homology -- general term that indicates evolutionary relatedness among sequences; we usually measure of percentage identity of sequence homology Pairwise alignment -- used to find the best-matching piecewise (local) or global alignments of two query sequences. Pairwise alignments can only be used between two sequences at a time. Multiple sequence alignment -- try to align all of the sequences in a given query set.

Where are the coding regions? TCAGCGAAGATGAGATAGTTTTTAAAGGTGGGATTTCCCCACCTTTAAAAAGCGAGAAGTCCCGGTTTTAAAGAGGAGTAAAATCCTCTTTTTCTAGCCCACTCAGGTGGTTTTTTTGGTTTTCGCTCCTTGCCGCATCTTCTGTGCCTTTGATGGCGGCTGGTTGGGGTGAAAGGCTGCATATTCCAGAATTTCAGACAGTAGATTGTTTTTGAAATCTTCCGTTTTATCGTTGACGAACTTAACCATCCTGTTGAAATCATCTTCCTTTGATACACCTTCAGGAAATGCCTTAGGAACTGATGTTTGGCTATCCAAGGCATCTTGCAATATCTGCACGATCTCCGAATTCATTGATCGCCCATTGGCCTTTGCTCTGGCGGCAACTGCGTCACGCATACCGTCAGGCATCCTAACTGTAAATCTCTCAATGAAAGCTGGATCTTCTTTTTCAGTCATCATCTTAAACCATAAAAATTTATACAAAACACACTAGCATCATATTGACATTACCCACAATGACATCATAATGGTGTCAGGCATCAAAATGATGTCATCATGACAAGGGGAAAGTAAATGCAAGATGTTCTCTATACAGGTCGTAAGAACGACAGCTTTCAGCTTCGTCTGCCTGAGCGAATGAAAGAAGAGATCCGTCGCATGGCAGAGATGGACGGCATTTCGATTAATTCTGCAATCGTGCAGCGCCTTGCTAAAAGCTTGCGTGAGGAAAGAGTTAATGGGCAGTAAAAACAGCGAAGCCCGGAAGTGTGGGGACACTAACCGGGCTTCTAATGTCAGTTACCTAGCGGGAAACCAACAATGACCAGTATAGCAATCTTTGAAGCAGTAAACACTATCTCTCTTCCATTCCACGGACAGAAGATCATAACTGCGATGGTGGCGGGTGTGGCGTATGTGGCAATGAAGCCCATCGTGGAAAACATCGGTTTAGACTGGAAGAGCCAGTATGCCAAGCTCGTTAGTCAGCGTGAAAAGTTCGGGTGTGGTGATATCACCATACCTACCAAAGGTGGTGTTCAGCAGATGCTTTGCATCCCTTTGAAGAAACTGAATGGATGGCTCTTCAGCATTAACCCAGCAAAAGTACGTGATGCAGTTCGTGAAGGTTTAATTCGCTATCAAGAAGAGTGTTTTACAGCTTTGCACGATTACTGGAGCAAAGGTGTTGCAACGAATCCCCGGACACCGAAGAAACAGGAAGACAAAAAGTCACGCTATCACGTTCGCGTTATTGTCTATGACAACCTGTTTGGTGGATGCGTTGAATTTCAGGGGCGTGCGGATACGTTTCGGGGGATTGCATCGGGTGTAGCAACCGATATGGGATTTAAGCCAACAGGATTTATCGAGCAGCCTTACGCTGTTGAAAAAATGAGGAAGGTCTACTGATTGGCGTATTGGAAGGCGCAAAAAGAAAAGCCAGCAGATGGGCTGCTGGCATTCATTGGGTATATGAACTTTCGGAGAACATATGAAGTCAATTATCAAGCATTTTGAGTTTAAGTCAAGTGAAGGGCATGTAGTGAGCCTTGAGGCTGCAAGCTTTAAAGGCAAGCCAGTTTTTTTAGCAATTGATTTGGCTAAGGCTCTCGGGTACTCAAATCCGTCA

Exon prediction in Eukaryotic DNA using Genescan: Net result is a protein sequence GeneScan looks for start and stop codons, promoters, splice sites, polyA tails, provides statistics for coding potential

NGS sequencing pipeline http://www.slideshare.net/mkim8/a-comparison-of-ngs-platforms

Sequencing steps Library preparation Library amplification Parallel sequencing Voelkerding KV et al., J Mol Diagn (2010) 12,539-51.

NGS Application Whole genome sequencing Whole exome sequencing RNA sequencing ChIP-seq/ChIP-exo CLIP-seq GRO-seq/PRO-seq Bisulfite-Seq

Shyr D, Liu Q. Biol Proced Online. (2013)15,4 Patient Technologies Data Analysis Integration and interpretation point mutation Small indels Further understanding of cancer and clinical applications Genomics WGS, WES Copy number variation Functional effect of mutation Structural variation Differential expression Transcriptomics RNA-Seq Network and pathway analysis Gene fusion Alternative splicing RNA editing Integrative analysis Methylation Epigenomics Bisulfite-Seq ChIP-Seq Histone modification Transcription Factor binding Shyr D, Liu Q. Biol Proced Online. (2013)15,4