Download presentation
Presentation is loading. Please wait.
Published byMakayla Goodger Modified over 10 years ago
1
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao Chavan Maharashtra Open University, Nashik – 422222, India
2
Introduction Programmes and Courses SEP–SBI081– U01-CP1 OC-SEP –SBI081–CP1-01
3
School of Science and Technology, Online Counseling Resource… Credits Academic Inputs by Sonali Alkari Faculty YCMOU Nagpur Centre, Faculty LAD college P.G. D of Biotechnology Research officer Ankur Seeds Pvt Ltd sonalisa_alkari@yahoo.co.in Sonalisaal@rediffmail.com © 2007, YCMOU. All Rights Reserved.
4
School of Science and Technology, Online Counseling Resource… © 2007, YCMOU. All Rights Reserved. How to Use This Resource Counselor at each study center should use this presentation to deliver lecture of 40-60 minutes during Face-To-Face counseling. Discussion about students difficulties or tutorial with assignments should follow the lecture for about 40-60 minutes. Handouts (with 6 slides on each A4 size page) of this presentation should be provided to each student. Each student should discuss on the discussion forum all the terms which could not be understood. This will improve his writing skills and enhance knowledge level about topics, which shall be immensely useful for end exam. Appear several times, for all the Self-Tests, available for this course. Student can use handouts for last minutes preparation just before end exam.
5
School of Science and Technology, Online Counseling Resource… © 2007, YCMOU. All Rights Reserved.5 Learning Objectives After studying this module, you should be able to : Explain w hat is genome & genomics Describe Genome projects State genome assembly State assembly statistics, algorithms, softwares © 2007, YCMOU. All Rights Reserved.
6
School of Science and Technology, Online Counseling Resource… What is GENOMICS? The word " genome " was coined in about 1930, even though scientists didn't know then what the genome was made of. Genomics is the study of an organism's genome and the use of the genes. Genomics deals with the systematic use of genome information, associated with other data, to provide answers in biology, medicine, and industry. Genomics has the potential of offering new therapeutic methods for the treatment of some diseases, as well as new diagnostic methods. Other applications are in the food and agriculture sectors. © 2007, YCMOU. All Rights Reserved.
7
School of Science and Technology, Online Counseling Resource… What is Genome projects?-1 The major tools and methods related to genomics are bioinformatics, genetic analysis, measurement of gene expression, and determination of gene function. Genomics appeared in the 1980s and took off in the 1990s with the initiation of genome projects for several species. Genome projects are scientific endeavors that ultimately aim to determine the complete genome sequence of an organism (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus). © 2007, YCMOU. All Rights Reserved.
8
School of Science and Technology, Online Counseling Resource… What is Genome projects?-2 The genome sequence for any organism requires the DNA sequences for each of the chromosomes in an organism to be determined. For bacteria, which usually have just one chromosome, a genome project will aim to map the sequence of that chromosome. Humans, with 22 pairs of chromosomes and 2 sex chromosomes, will require 24 separate chromosome sequences in order to represent the completed genome. © 2007, YCMOU. All Rights Reserved.
9
School of Science and Technology, Online Counseling Resource… Applications of Genome Research Some current and potential applications of genome research include Molecular medicine Energy sources and environmental applications Risk assessment Bioarchaeology, anthropology, evolution, and human migration DNA forensics (identification) Agriculture, livestock breeding, and bioprocessing © 2007, YCMOU. All Rights Reserved.
10
School of Science and Technology, Online Counseling Resource… Genome Assembly-1 The process through which scientists decode the DNA sequence of an organism is called sequencing. In 1975 Frederick Sanger developed the basic sequencing technology that is still widely used today. While this technology has been continuously improved over the past 30 years, we can only decode between 1,000 and 2,000 base-pairs of DNA at a time -- a significant limitation given that even the simplest viruses contain tens of thousands of base-pairs, bacteria contain millions, and mammalian genomes contain billions of base-pairs. © 2007, YCMOU. All Rights Reserved.
11
School of Science and Technology, Online Counseling Resource… Genome Assembly-2 To overcome this limitation, scientists have developed a technique called shotgun sequencing whereby the DNA sequence of an organisms is sheared into a large number of small fragments (Figure 1), Figure 1. Original DNA is broken into a collection of fragments Figure 2. The ends of each fragment (drawn in green) are sequenced © 2007, YCMOU. All Rights Reserved.
12
School of Science and Technology, Online Counseling Resource… Genome Assembly-3 the ends of the fragments are sequenced (Figure 2), then the resulting sequences are joined together using a computer program called an assemble (Figure 3). © 2007, YCMOU. All Rights Reserved.
13
School of Science and Technology, Online Counseling Resource… Genome Assembly-4 Genome assembly refers to the process of taking a large number of short DNA sequences, all of which were generated by a shotgun sequencing project, and putting them back together to create a representation of the original chromosomes from which the DNA originated. In a shotgun sequencing project, all the DNA from a source (usually a single organism, anything from a bacterium to a mammal) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines, which can read up to 900 nucleotides or bases at a time. The four bases are adenine, guanine, cytosine, and thymine, represented as AGCT. © 2007, YCMOU. All Rights Reserved.
14
School of Science and Technology, Online Counseling Resource… Genome Assembly-5 Genome assembly refers to the process of taking a large number of short DNA sequences, all of which were generated by a shotgun sequencing project, and putting them back together to create a representation of the original chromosomes from which the DNA originated. In a shotgun sequencing project, all the DNA from a source (usually a single organism, anything from a bacterium to a mammal) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines, which can read up to 900 nucleotides or bases at a time. The four bases are adenine, guanine, cytosine, and thymine, represented as AGCT. © 2007, YCMOU. All Rights Reserved.
15
School of Science and Technology, Online Counseling Resource… Genome Assembly-6 A genome assembly algorithm works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or reads, overlap. These overlapping reads can be merged together, and the process continues. Genome assembly is a very difficult computational problem, made more difficult because genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and some occur in thousands of different locations, especially in the large genomes of plants and animals. © 2007, YCMOU. All Rights Reserved.
16
School of Science and Technology, Online Counseling Resource… Assembly Statistics-1 The assembler relies on the basic assumption that two sequence reads (two strings of letters produced by the sequencing machine) that share a same string of letters originated from the same place in the genome (Figure 3). Using such overlaps between the sequences, the assembler can join the sequences together in a manner similar to solving a jigsaw puzzle. It is important to note that the shotgun sequencing process is inherently "wasteful" as, due to the randomness of the shearing process, assembly is only possible once enough sequences are generated to cover the genome 8 to 10 times. © 2007, YCMOU. All Rights Reserved.
17
School of Science and Technology, Online Counseling Resource… Assembly Statistics-2 Intuitively, this phenomenon can be understood by thinking of a sidewalk as it begins to rain. As raindrops fall randomly across the sidewalk, dry spots persist for quite a while, corresponding to regions of the genome that are not represented in the set of shotgun reads. Mathematically, this phenomenon was modeled by Eric Lander and Michael Waterman in 1988. They examined the correlation between the oversampling of the genome (also called coverage) and the number of contiguous pieces of DNA (commonly called contigs) that can be re- constructed by an idealized assembly program. © 2007, YCMOU. All Rights Reserved.
18
School of Science and Technology, Online Counseling Resource… Assembly Statistics-3 Figure 4 shows a plot of the Lander-Waterman equation for a genome of 1Mbp (mega base pairs = 1,000,000 base pairs). Between 8 and 10-fold coverage the model predicts that most of the genome will be assembled into a small number of contigs (approx. 5 for a 1Mbp genome). Figure 4. Lander- Waterman estimation of number of contigs w.r.t. genome coverage © 2007, YCMOU. All Rights Reserved.
19
School of Science and Technology, Online Counseling Resource… Assembly Algorithms The many assembly programs available to researchers differ in the details of their implementation and of the algorithms employed, however they all primarily fall into four general categories Greedy assemblers - The first assembly programs followed a simple but effective strategy in which the assembler greedily joins together the reads that are most similar to each other. One disadvantage of the simple greedy approach is that because local information is considered at each step, the assembler can be easily confused by complex repeats, leading to mis-assemblies. © 2007, YCMOU. All Rights Reserved.
20
School of Science and Technology, Online Counseling Resource… Greedy Assemblers An example is shown in Figure 4, where the assembler joins, in order, reads 1 and 2 (overlap = 200 bp), then reads 3 and 4 (overlap = 150 bp), then reads 2 and 3 (overlap = 50 bp) thereby creating a single contig from the four reads provided in the input. Figure 4. Greedy assembly of four reads. © 2007, YCMOU. All Rights Reserved.
21
School of Science and Technology, Online Counseling Resource… Eulerian Path-1 Eulerian path approaches are based on early attempts to sequence genomes through a technique called sequencing by hybridization. I n this technique, instead of generating a set of reads, scientists identified all strings of length k (k- mers) contained in the original genome. While this experimental method did not produce a viable alternative to Sanger sequencing, it led to the development of an elegant approach to sequence assembly. This approach, also based on a graph-theoretic model, breaks up each read into a collection of overlapping k-mers. © 2007, YCMOU. All Rights Reserved.
22
School of Science and Technology, Online Counseling Resource… Eulerian Path-2 Each k-mer is represented in a graph as an edge connecting two nodes corresponding to its k-1 bp prefix and suffix respectively. It is easy to see that, in the graph containing the information obtain from all the reads, a solution to the assembly problem corresponds to a path in the graph that uses all the edges - an Eulerian path. One advantage of the Eulerian approach is that repeats are immediately recognizable while in an overlap graph they are more difficult to identify. © 2007, YCMOU. All Rights Reserved.
23
School of Science and Technology, Online Counseling Resource… BAC-by-BAC (hierarchical) Sequencing-1 In order to avoid some of the complexity involved in assembling large genomes, scientists developed a hierarchical approach. First, the genome is broken up into a collection of large fragments (between 40 and 200 kbp) called Bacterial Artificial Chromosomes or BACs. The BACs location along the genome is then mapped using specialized laboratory experiments. A minimal tiling path of BACs is chosen such that each base in the genome is covered by at least one BAC, and the overlap between BACs is minimized. © 2007, YCMOU. All Rights Reserved.
24
School of Science and Technology, Online Counseling Resource… BAC-by-BAC (hierarchical) Sequencing-2 Each BAC is then sequenced through the standard shotgun method, the resulting assemblies being combined into an assembly for each chromosome using the information provided by the tiling paths (Figure 5). Figure 5. BAC-by-BAC approach. The long lines represent individual BACs. The minimal tiling path is represented by thick lines. Each BAC in the tiling path is then sequenced through the shotgun method © 2007, YCMOU. All Rights Reserved.
25
School of Science and Technology, Online Counseling Resource… Assembly Software-1 AMOS was initiated at The Institute for Genomic Research by Steven Salzberg, Mihai Pop, and Art Delcher. AMOS (A Modular, Open-Source assembler) is a well-known open source. The home of AMOS is currently http://amos.sourceforge.net/. © 2007, YCMOU. All Rights Reserved.
26
School of Science and Technology, Online Counseling Resource… Assembly Software-2 The Celera Assembler was the assembler developed by Gene Myers, Granger Sutton, Art Delcher, and others at Celera Genomics. Celera Assembler demonstrated the applicability of the shotgun method to the assembly of a whole eukaryotic genome by successfully assembling the genome of the fruit fly Drosophila melanogaster. Until this achievement, the assembly of large genomes was done using BAC-by-BAC sequencing. Celera Assembler was a key element in the successful assembly of the human genome by Celera Genomics and is currently used in numerous bacterial and eukaryotic projects. © 2007, YCMOU. All Rights Reserved.
27
School of Science and Technology, Online Counseling Resource… Assembly Software-3 Atlas - Assembly program developed at the Baylor College of Medicine. Atlas is specifically designed to optimize the assembly of BAC-by-BAC projects and uses a hybrid approach combining some of the advantages of BAC-by-BAC sequencing and whole-genome shotgun. Like Phusion, Atlas uses phrap as a low- level assembly tool. © 2007, YCMOU. All Rights Reserved.
28
School of Science and Technology, Online Counseling Resource… What You Learn-1… You have learnt : A genome is all of a living thing's genetic material. Genomics is the study of an organism's genome and the use of the genes. Genome projects are scientific endeavors that ultimately aim to determine the complete genome sequence of an organism. Genome assembly refers to the process of taking a large number of short DNA sequences, all of which were generated by a shotgun sequencing project, and putting them back together to create a representation of the original chromosomes from which the DNA originated. © 2007, YCMOU. All Rights Reserved.
29
School of Science and Technology, Online Counseling Resource… What You Learn-2… Y ou have learnt : Genome assembly works on the principles of statistics. There are many assembly programs available to researchers like Greedy assemblers, Eulerian path,Align-layout-consensus and BAC-by-BAC (hierarchical) sequencing. Most research institutes that sequence DNA use their own software for assembling the sequences that they produce. The very common genome assembly softwares are Phred/Phrap, AMOS, Celera Assembler,TIGR Assembler etc. © 2007, YCMOU. All Rights Reserved.
30
School of Science and Technology, Online Counseling Resource… Critical Thinking Questions 1.Describe the genome and genomics. 2.State the the various types of genome Assembly algorithms. 3.Write a short note on genome assembly. 4.Describe different types of genome assembly softwares. © 2007, YCMOU. All Rights Reserved.30 © 2007, YCMOU. All Rights Reserved.
31
School of Science and Technology, Online Counseling Resource… Hints For Critical Thinking Question 1.Organisms hereditary material and study of an organism's genome and the use of the genes. 2.Greedy assemblers, Eulerian path,Align-layout- consensus and BAC-by-BAC (hierarchical) sequencing. 3.The process of taking a large number of short DNA sequences, and putting them back together to create a representation of the original chromosomes from which the DNA originated. 4.Phred/Phrap, AMOS, Celera Assembler,TIGR Assembler etc. separation on the basis of size. © 2007, YCMOU. All Rights Reserved.31 © 2007, YCMOU. All Rights Reserved.
32
School of Science and Technology, Online Counseling Resource… Study Tips:1 Book1 Title: Principles of Genome Analysis And Genomics Author: Primrose, S.B. & Twyman, R.M. Publisher: Blackwell Publishing Company. Book2 Title: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins Author: Andreqas D. Baxevanis, B. F. Francis Ouellette Publisher: John Wiley and Sons, New York © 2007, YCMOU. All Rights Reserved.
33
School of Science and Technology, Online Counseling Resource… Study Tips:2 Book3 Title: Bioinformatics Sequence and Genome Analysis Author:David W. Mount. Publisher: Cold Spring Harborlaboratory Press. Book4 Title: Bionformatics Concepts, Skills and Application Author: Rastogi, S.C., Mendiratta N, Rastogi, Publisher: CBS Publishers & Distributors © 2007, YCMOU. All Rights Reserved.
34
School of Science and Technology, Online Counseling Resource… Study Tips www.en.wikipedia.org Microsoft Encarta Encyclopedia http://en.wikipedia.org/wiki/ Wikipedia the free encyclopedia © 2007, YCMOU. All Rights Reserved.
35
School of Science and Technology, Online Counseling Resource… End of the Presentation Thank You © 2007, YCMOU. All Rights Reserved.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.