Bioinformatics 生物信息学理论和实践 唐继军 北京林业大学计算生物学中心

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

DNA Technology & Gene Mapping Biotechnology has led to many advances in science and medicine including the creation of DNA clones via recombinant clones,
V) BIOTECHNOLOGY.
13-2 Manipulating DNA.
3 September, 2004 Chapter 20 Methods: Nucleic Acids.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Single DNA Sequence Analysis Tools BME 110: CompBio Tools Todd Lowe May 6, 2008.
Gene Cloning Techniques for gene cloning enable scientists to prepare multiple identical copies of gene-sized pieces of DNA. Most methods for cloning pieces.
© Wiley Publishing All Rights Reserved. Working with a Single DNA Sequence.
Reading the blueprint of life DNA sequencing. Introduction The blueprint of life is contained in the DNA in the nuclei of eukaryotic cells and simply.
DOT PLOT Daniel Svozil. Software choice source: Bioinformatics for Dummies.
DNA Technology and Genomics
Objective 2: TSWBAT describe the basic process of genetic engineering and the applications of it.
Analysis of single sequences. Toolboxes EMBOSS –Many portals. (E.g)E.g Biology Workbench ExPasy proteomics tools U. Mass. Med. School.Biotools.
From Haystacks to Needles AP Biology Fall Isolating Genes  Gene library: a collection of bacteria that house different cloned DNA fragments, one.
CHAPTER 20 BIOTECHNOLOGY: PART I. BIOTECHNOLOGY Biotechnology – the manipulation of organisms or their components to make useful products Biotechnology.
Chapter 20~DNA Technology & Genomics. Who am I? Recombinant DNA n Def: DNA in which genes from 2 different sources are linked n Genetic engineering:
DNA Technology Chapter 12. Applications of Biotechnology Biotechnology: The use of organisms to perform practical tasks for human use. – DNA Technology:
Trends in Biotechnology
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
1 Genetics Faculty of Agriculture Instructor: Dr. Jihad Abdallah Topic 13:Recombinant DNA Technology.
Technological Solutions. In 1977 Sanger et al. were able to work out the complete nucleotide sequence in a virus – (Phage 0X174) This breakthrough allowed.
Manipulating DNA.
Biological engineering The recombinant DNA technique Recombinant DNA Any DNA molecule formed by joining DNA fragments from different sources. Commonly.
Module 1 Section 1.3 DNA Technology
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
Section 2 Genetics and Biotechnology DNA Technology
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
Remember the limitations? –You must know the sequence of the primer sites to use PCR –How do you go about sequencing regions of a genome about which you.
Manipulation of DNA. Restriction enzymes are used to cut DNA into smaller fragments. Different restriction enzymes recognize and cut different DNA sequences.
DNA Technology Chapter 12. Transgenic Organisms Contain recombinant DNA – Nucleotide sequences from 2+ different sources Cells express original AND newly.
Biology 417 Week 1, Lecture #2 With input from: Yung Huang, Luis Sanchez, Lee Lin, Leticia Argueta, Kay Nguyen PGM 2000 Revised SBS.
Chapter 13 Table of Contents Section 1 DNA Technology
Uses of DNA technology You will need to convince a grant committee to fund further research into your area of application of DNA technology Read your assigned.
19.1 Techniques of Molecular Genetics Have Revolutionized Biology
 The process by which desired traits of certain plants and animals are selected and passed on to their future generations is called selective breeding.
Bioinformatics 生物信息学理论和实践 唐继军 北京林业大学计算生物学中心
Review from last week. The Making of a Plasmid Plasmid: - a small circular piece of extra-chromosomal bacterial DNA, able to replicate - bacteria exchange.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
DNA TECHNOLOGY AND GENOMICS CHAPTER 20 P
Bioinformatics 生物信息学理论和实践 唐继军
Chapter 9: Genetic Engineering
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
NOTES - CH 15 (and 14.3): DNA Technology (“Biotech”)
Chapter 16 Microbial Genomics “If we should succeed in helping ourselves through applied genetics before vengefully or accidentally exterminating ourselves,
Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.
DNA, RNA, Proteins. Nucleic Acids: “Information Molecules” DNA – Sugar Deoxyribose – Nitrogenous Bases Adenine Guanine Cytosine Thymine RNA – Sugar Ribose.
Isolating Genes By Allison Michas and Haylee Kolding.
13-2: Manipulating DNA Biology 2. Until very recently breeders could not change the DNA of the plants/animals they were breeding Scientists use DNA structure.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
Chapter 13 Genetics and Biotechnology 13.1 Applied Genetics.
Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.
Transcription Turning DNA into RNA. Promoter Region Promoter sites: locations on DNA just before the gene Transcription factors (proteins) bind at promoter.
Biotechnology.
Part 3 Gene Technology & Medicine
Human Genome Project.
Gene Cloning Techniques for gene cloning enable scientists to prepare multiple identical copies of gene-sized pieces of DNA. Most methods for cloning pieces.
Recombinant DNA (DNA Cloning)
COURSE OF MICROBIOLOGY
Section 2 Genetics and Biotechnology DNA Technology
Chapter 13.2 Manipulating DNA.
Chapter 4 “DNA Finger Printing”
DNA Technology Now it gets real…..
Chapter 14 Bioinformatics—the study of a genome
DNA Technology & Genomics
Recombinant DNA Technology
CHAPTER 20 DNA TECHNOLOGY.
Introduction to Bioinformatics II
Presentation transcript:

Bioinformatics 生物信息学理论和实践 唐继军 北京林业大学计算生物学中心

Download and install programs Unzip or untar unzip If file.tar.gz, tar xvfz file.tar.gz Go to the directory and “./configure” Then “make”

System subroutine system ("ls –ltr");

sub ReadFasta { my ($fname) open(FILE, $fname) or die "Cannot open $fname\n"; my $data = ""; = (); while(my $line = ) { if ($line =~ /^>/) { if ($data ne "") { $data); } $data = ""; } $data.= $line; } if ($data ne "") { $data); } close FILE; }

print "Please input file name:\n"; my $fname = ; = ReadFasta($fname); my $len = $#dnas + 1; for (my $i = 0; $i < $len; $i++) { for (my $j = $i+1; $j < $len; $j++) { for (my $k = $j+1; $k < $len; $k++) { $fname = "$i\_$j\_$k"; print $fname; open(OUT, ">$fname"); print OUT $dnas[$i]; print OUT $dnas[$j]; print OUT $dnas[$k]; close OUT; system ("./clustalw2 $i\_$j\_$k"); }

Debug Notice there are problems in a program is hard Find the source of the problem is even harder Good debug tool: print Better tool: debugger

Perl debugger perl –d program arguments n: next line s: step in r: run until the end of the current sub, repeat c: continue to the next breakpoint

Check source l List next several lines l 8-10 List line 8-10 l 100 List line 100 l subname List subroutine subname f restrcit.pl Switch to view restrict.pl

Breakpoint b 100 Add a breakpoint at line 100 of the current file b subname Add a breakpoint at this subroutine B Remove a break point B 100 will remove a breakpoint at line 100 B * will remove all breakpoints

See variable p $var Print the value of the variable y var Display my variable V display variables V var w $var Watch this var, stop when the value is changed

Working with Single DNA Sequences

Learning Objectives Discover how to manipulate your DNA sequence on a computer, analyze its composition, predict its restriction map, and amplify it with PCR Find out about gene-prediction methods, their potential, and their limitations Understand how genomes and sequences and assembled

Outline 1.Cleaning your DNA of contaminants 2.Digesting your DNA in the computer 3.Finding protein-coding genes in your DNA sequence 4.Assembling a genome

Cleaning DNA Sequences In order to sequence genomes, DNA sequences are often cloned in a vector (plasmid, YAC, or cosmide) Sequences of the vector can be mixed with your DNA sequence Before working with your DNA sequence, you should always clean it with VecScreen

VecScreen /VecScreen.html Runs a special version of Blast A system for quickly identifying segments of a nucleic acid sequence that may be of vector origin

What to do if hits found If hits are in the extremity, can just remove them If in the middle, or vectors are not what you are using, the safest thing is to throw the sequence away

Computing a Restriction Map It is possible to cut DNA sequences using restriction enzymes Each type of restriction enzyme recognizes and cuts a different sequence: EcoR1: GAATTC BamH1: GGATCC There are more than 900 different restriction enzymes, each with a different specificity The restriction map is the list of all potential cleavage sites in a DNA molecule You can compile a restriction map with

Cannot get it work!

Making PCR with a Computer Polymerase Chain Reaction (PCR) is a method for amplifying DNA PCR is used for many applications, including Gene cloning Forensic analysis Paternity tests PCR amplifies the DNA between two anchors These anchors are called the PCR primer

Designing PCR Primers PCR primes are typically 20 nucleotides long The primers must hybridize well with the DNA On biotools.umassmed.edu, find the best location for the primers: Most stable Longest extension

Analyzing DNA Composition DNA composition varies a lot Stability of a DNA sequence depends on its G+C content (total guanine and cytosine) High G+C makes very stable DNA molecules Online resources are available to measure the GC content of your DNA sequence Also for counting words and internal repeats

Counting words ATGGCTGACT A, T, G, G, C, T, G, A, C, T AT, TG, GG, GC, CT, TG, GA, AC, CT ATG, TGG, GGC, GCT, CTG, TGA, GAC, ACT

EMBOSS servers European Molecular Biology Open Software Suite

ORF EMBOSS NCBI

ncbi.nlm.nih.gov/gorf/gorf.html

Internal repeats A word repeated in the sequence, long enough to not occur by chance Can be imperfect (regular expression) Dot plot is the best way to spot it

arbl.cvmbs.colostate.edu/molkit

Predicting Genes The most important analysis carried out on DNA sequences is gene prediction Gene prediction requires different methods for eukaryotes and prokaryotes Most gene-prediction methods use hidden Markov Models

Predicting Genes in Prokaryotic Genome In prokaryotes, protein-coding genes are uninterrupted No introns Predicting protein-coding genes in prokaryotes is considered a solved problem You can expect 99% accuracy

Finding Prokaryotic Genes with GeneMark GeneMark is the state of the art for microbial genomes GeneMark can Find short proteins Resolve overlapping genes Identify the best start codon Use exon.gatech.edu/GeneMark Click the “heutistic models”

Predicting Eukaryotic Genes Eukaryotic genes (human, for example) are very hard to predict Precise and accurate eukaryotic gene prediction is still an open problem ENSEMBL contains 21,662 genes for the human genome There may well be more genes than that in the genome, as yet unpredicted You can expect 70% accuracy on the human genome with automatic methods Experimental information is still needed to predict eukaryotic genes

Finding Eukaryotic Genes with GenomeScan GenomeScan is the state of the art for eukaryotic genes GenomeScan works best with Long exons Genes with a low GC content It can incorporate experimental information Use genes.mit.edu/genomescan

Producing Genomic Data Until recently, sequencing an entire genome was very expensive and difficult Only major institutes could do it Today, scientists estimate that in 10 years, it will cost about $1000 to sequence a human genome With sequencing so cheap, assembling your own genomes is becoming an option How could you do it?

Sequencing and Assembling a Genome (I) To sequence a genome, the first task is to cut it into many small, overlapping pieces Then clone each piece

Sequencing and Assembling a Genome (II) Each piece must be sequenced Sequencing machines cannot do an entire sequence at once They can only produce short sequences smaller than 1 Kb These pieces are called reads It is necessary to assemble the reads into contigs

Sequencing and Assembling a Genome (III) The most popular program for assembling reads is PHRAP Available at Other programs exist for joining smaller datasets For example, try CAP3 at pbil.univ-lyon1.fr/cap3.php