BIOS816/VBMS818 Lecture 6 – Sequence Assembly Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Website:

Slides:



Advertisements
Similar presentations
Enter Presentation Everything you expect …plus DNASIS MAX 2.0 Sequence Analysis Software.
Advertisements

Whats New in Office 2010?. Major Changes in Office 2010 The Office Ribbon, which first made its appearance in Office 2007, now appears in all Office 2010.
MS® PowerPoint.
© Paradigm Publishing, Inc Excel 2013 Level 2 Unit 1Advanced Formatting, Formulas, and Data Management Chapter 1Advanced Formatting Techniques.
Microsoft Excel. Click on “Start,” then “Microsoft Office Excel.”
 Use the Left and Right arrow keys or the Page Up and Page Down keys to move between the pages. You can also click on the pages to move forward.  To.
Using Macros and Visual Basic for Applications (VBA) with Excel
Excel Tutorial 6 Managing Multiple Worksheets and Workbooks
Chapter 2 Creating a Research Paper with Citations and References
Using a Template to Create a Resume and Sharing a Finished Document
XP Information Technology Center - KFUPM1 Microsoft Office FrontPage 2003 Creating a Web Site.
Microsoft Excel 2010 Chapter 7
Access Tutorial 3 Maintaining and Querying a Database
1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
1 of 6 SharePoint sites share much of the same underlying functionality, and most can be customized using the same techniques. So whether you have a basic.
1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
FIRST COURSE Creating Web Pages with Microsoft Office 2007.
Using Microsoft Outlook: Basics. Objectives Guided Tour of Outlook –Identification –Views Basics –Contacts –Folders –Web Access Q&A.
Teacher Development Broward County Public Schools Matching Aspiring Teachers with Teacher Mentors Office of Talent Development formerly known as HRD.
1 After completing this lesson, you will be able to: Format numeric data. Adjust the size of rows and columns. Align cell content. Create and apply conditional.
Chapter 10 Creating a Template for an Online Form
Chapter 3 Maintaining a Database
Transferring Course Materials to the Web. Creating a Web Site With a Template To create a Web site with a template 1.Start FrontPage. 2.On the File menu,
SQL Maestro Hello World IQ Associates. Contents Initial setup Hello World.
Microsoft Office 2003 Illustrated Introductory with Programs, Files, and Folders Working.
Chapter 2 Querying a Database MICROSOFT ACCESS 2010.
Microsoft Office 2003—PowerPoint1 Learning Microsoft ® Office 2003 – Deluxe Edition Teaching Concepts Visual Aid.
CIS 205—Web Design & Development Flash Chapter 1 Getting Started with Adobe Flash CS3.
Morpho Activity Start Entering/Practicing with real data.
| | Tel: | | Computer Training & Personal Development Outlook Express Complete.
Chapter 2 Creating a Research Paper with References and Sources Microsoft Word 2013.
Creating a Web Site to Gather Data and Conduct Research.
Phred/Phrap/Consed Analysis A User’s View Arthur Gruber International Training Course on Bioinformatics Applied to Genomic Studies Rio de Janeiro 2001.
CARLSON SOFTWARE CONFERENCE DANIEL V. SYPERSMA VICTOR GRAPHICS.
Vector NTI. Go Herd! Download your sequence and open the file Click your name on my web page on the class genes page
Instructors begin using McGraw-Hill’s Homework Manager by creating a unique class Web site in the system. The Class Homepage becomes the entry point for.
Slide 1 of 24 1) Launch Fireworks 2) Under File, choose New 3) In the New Document dialog box, enter Width: 100, Height 160, Resolution 72, and choose.
1 of 8 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Dreamweaver MX. 2 Overview of Templates n Templates represent a web page design or _______ that will be common to multiple pages. n There are two situations.
® Microsoft Office 2010 Access Tutorial 3 Maintaining and Querying a Database.
ACTIVINSPIRE TRAINING Tips and tools for creating Flipcharts on ActivInspire.
Learning With Computers I (Level Green) ©2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly.
Laboratory Exercise # 9 – Inserting Graphics to Documents Office Productivity Tools 1 Laboratory Exercise # 9 Inserting Graphics to Documents Objectives:
Microsoft Access 2010 Chapter 10 Administering a Database System.
The Next Generation. Parent Access Grade History and Attendance.
COMPREHENSIVE Access Tutorial 3 Maintaining and Querying a Database.
Microsoft Publisher 2010 Chapter 1 Creating a Flyer.
Chester High School ● Take Attendance ● Set up the grade book ● Enter assignments ● Record Scores ● Running Reports.
Specials Teachers Grade Book Training Brent Wolf & Stacy Smith.
Creating a Dynamic Web Page Template Module 5: Beyond the Basics with Expression Web LESSON 10.
XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 7 1 Microsoft Office FrontPage 2003 Tutorial 8 – Integrating a Database with a FrontPage.
Getting Started with Word & Saving Guided Lesson.
(1) Store fragment sequences; (2) Recognize overlapping sequences and create aligned assemblies, called contigs; (3) Display, edit and output the contigs.
Introducing Dreamweaver. Dreamweaver The web development application used to create web pages Part of the Adobe creative suite.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Welcome to the combined BLAST and Genome Browser Tutorial.
®® Microsoft Windows 7 Windows Tutorial 7 Managing Multimedia Files.
Adobe ® Photoshop ® CS6 Chapter 1 Editing a Photo.
Chapter 7 Creating Templates, Importing Data, and Working with SmartArt, Images, and Screen Shots Microsoft Excel 2013.
1 Chapter 15 Creating a Presentation. Practical Computer Literacy, 2 nd edition Chapter 15 2 What’s inside and on the CD? In this chapter, you will learn.
Chapter 11 Enhancing an Online Form and Using Macros Microsoft Word 2013.
Shelly Cashman: Microsoft Word 2016
After completing this lesson, you will be able to:
Chapter 1 Editing a Photo
Module 6: Creating Web Pages and Working with Channels
Chapter 2 Creating a Research Paper with References and Sources
Vector NTI Introduction
Presentation transcript:

BIOS816/VBMS818 Lecture 6 – Sequence Assembly Guoqing Lu Office: E115 Beadle Center Tel: (402) Website:

A Whole Genome Shotgun Sequencing Project NATURE August 2000 pp. 801.

Introduction to Sequence Assembly Sequence assembly –also known as fragment assembly –assembling DNA fragments (both text sequences and chromatograms) from automated sequencers, into longer contiguous sequences or “contigs”

Introduction to Sequence Assembly Raw sequence data from the sequencer in the form of graphical trace files Viewed and converted into textual sequence files Align fragments and create assemblies Note that –Not all bases can be read correctly –Not all bases are equally reliable –Current sequencing methods allow reading of ~1000 bases per gel –Vector contamination

Available Sequence Assembly Systems GCG Fragment Assembly Package VNTI ContigExpress Staden GAP4 Phred/phrap/consed TIGR Web Contig Assembly ProgramContig Assembly Program …

GCG Fragment Assembly Package Only works with text-based sequence files Does not work directly with automated sequencer trace files Can generate sequence files from trace files using FromTrace

A GCG Fragment Assembly Project Initializes a new project Incorporates individual sequence files into the project Automatic identification of overlaps and arrangement of ordered contigs Multiple sequence editor Presents the reader with a graphical representation

Sequence Data Import Must be in a supported format –GCGGCG –FastA –StadenStaden Enter in SeqEd Enter in GelEnter

New Project Create a new fragment assembly project –Creates a new set of directories and files –DO NOT alter these files and directories GB:M13mp18,GB:SynpBR322 GAATTC, GGATCC

GelEnter Sequence editor Works like SeqEd For entering new fragments or importing fragments Existing fragments are modified with GelAssemble

GelMerge Finds overlaps between fragments and contigs Compares every fragment with every other fragment Settings determine the stringency necessary for an overlap ?

Calculating an Overlap Word Size (* 7 *) Stringency (* 0.80 *) –What fraction of words must match? Minimum overlap length (* 14 *) Sequence 1 Sequence

GelView Displays the structure of the fragments and contigs graphically Shows the current state of the fragment assembly project

ContigExpress A program for assembling DNA fragments (both text sequences and chromatograms) from automated sequencers, into longer contiguous sequences or “contigs”

Launch ContigExpress (CE) From the Start menu choose Programs | InforMax | Vector NTI Suite 8 | ContigExpress NOTE: CE Can be launched fro most other Vector NTI Suite applications Download Demo ProjectsDownload Demo Projects, then open it

End Trimming By Sequence Characteristics With sequences highlighted, choose Edit | Trim Selected Fragment Ends Click Settings and review the options: –For 5’ End; For 3’ End; Leave all settings as the default, click OK and then click Calculate! Any regions meeting the trim criteria defined above will be in red and lowercase Click OK then right-click on the gray column heading bar and choose Columns Double-click each of Length, 3’Trimmed bases and 5’ Trimmed bases Click OK

Trimming Using Phred Quality Values Select all fragments in the Project pane then right-click and choose Load phred quality values Click Quality Values If you have data with associated Phred quality values, navigate to the.qual file and click Open Click OK Imported Phred data, the scores may be used to trim sequence data Select sequences in the right-hand pane and choose Edit | Trim Selected Fragments Ends Using Phred QVs Review the Settings options: –Trim bases with QV less than: Select the threshold below which bases will be trimmed –Trim 5’/3’ bases: Specify which end(s) you wish to trim Click OK

Selecting Plasmid Regions for Vector Trimming From the Vector NTI Explorer, open the DNA molecule pUC19 Set the selection to 351bp to 500bp (to include the polylinker) Choose Tools | Send to | Polylinker to Contig Express Check Selection Only and Direct then click OK Name the file ‘pUC19 ( )direct.seq’ then click Save Repeat for the complement and name the file ‘pUC19 ( )comp.seq’

Trimming for Vector Contamination Highlight the sequences in the right-hand pane Choose Edit | Trim Selected Fragments For Vector Contamination… Click Settings In the Polylinker list, check the sequences defined earlier (pUC19 ( )direct and pUC19 ( )comp) Highlight the name pUC19 ( )direct, click Add REN Sites, choose Enzlist25.dat then click Open Click HindIII (it will change color from gray to blue) Repeat for pUC19 ( )comp Click OK then click Calculate! Any contaminated regions will be in red and lowercase Click OK

Calling Secondary Peaks With all 12 sequences highlighted, choose Edit | Call Secondary Peaks For Selected Fragments Review the settings (Allow Ns to be Replaced, Allow Edited Bases to be Replaced, Set Threshold) Click Unselect All Fragments Check Allow Ns to be replaced Check the box next to ONE4KANR in the left hand pane (ensure this is the only fragment checked) and move the sliding bar to choose the threshold and observe the result in the sequence window. Choose 85%, the viewer will display secondary bases with heights 85% (or greater) as tall as the higher peak Click OK This tool can be used to resolve occurrence of double peaks in a chromatogram

Saving a Project Choose Project | Save As... and save the Project to your desktop as ‘Tutorial.cep’ Note: Tools such as BLAST Search, BioPlot are available from the menu bar all of the ContigExpress viewers

Assembly Setup From the Contig Express Project Window, choose Assemble | Assembly Setup –Contig Assembly Tab: Definition of various parameters such as length and % identity allowed for overlap –Alignment Tab: Define parameters for the alignments generated between fragments in contig creation (e.g. the score assigned to matching nucleotides or a mismatch). These are greyed out when using Linear Assembly –Algorithm Tab: Two algorithms are available –Light Settings Tab: Light contigs disregard chromatogram data and editing done on light contigs isn’t reflected in the original fragment sequences. Light contig assembly is preferred for assembling very large projects Leave all selections as the defaults and click OK

Pairwise Assembly Linear Assembly

Assembling Contigs From the List Pane on the right-hand side, highlight all 12 fragments Choose Assemble | Assemble Selected Fragments and click OK when the assembly is complete The Tree Pane on the left hand side shows the Assembly (Assembly 1) Click the Content View icon to show the tree/branching of contigs Click the History View icon In the List pane, the arrows indicate if fragment was included (blue) or attempted to be included (gray) in the assembly Highlight the name of the Contig containing most fragments (Contig1) in the List pane Click the Show Unassembled Fragments icon to deselect it and thus view only those fragments that are part of the contig. Click the icon again to return to the original view

Exporting the Contig Consensus Sequence from Vector NTI In the Contig Express Project Viewer, highlight the name Contig 2 in the List pane Right-click and choose Export Contig | To GenBank file Save the file to your desktop With Contig 2 still highlighted, choose Edit | Copy Return to the Vector NTI Explorer Choose Edit | Paste In the New DNA/RNA Molecule dialog box click OK (leave the name as Contig 2) In the Vector NTI Explorer, the Contig 2 molecule should now be present Double-click the name Contig 2 to open the Molecule Viewer The consensus sequence is now available for restriction mapping, editing, annotation and other analyses in Vector NTI

phred/phrap/consed Developed at the University of Washington –Phil Green (phrap) –Brent Ewing (phred) –David Gordon (consed)

Sequence Assembly PHRED –Base calling with quality scores PHRAP –Sequence Assembly CONSED –Assembly visualization/Editing

Quality Scores Phred assigns a quality value to each called base Phrap uses the quality value during automated assembly Consed displays the qualities in different shades

Exercise Log onto your biocomp2 account Create a directory: $ mkdir sequenceAssembly Fetch all mu* sequence files to the directory –$ cd sequenceAssembly –$ fetch mu*.seq Add fetched sequences to seqlab working.list Highlight all mu*.seq sequence files and run Gelstart, Gelenter, Gelmerge, Gelassemble, Gelview Export the fetched sequences to a file called mu.genbank in genbank format Use ftp to transfer mu.genbank to your desktop computer Drag and drop mu.genbank file to ContigExpress Project Window Select all fragments and Run Assembly Selected Fragments

Answer the following questions Summarize the outcome of the assembly and compare the results generated from the two sequence assembly systems How many contigs resulted? What were the lengths of the contigs? What were the sequences of the contigs?