(1) Store fragment sequences; (2) Recognize overlapping sequences and create aligned assemblies, called contigs; (3) Display, edit and output the contigs.

Slides:



Advertisements
Similar presentations
Database Basics. What is Access? Database management system Computer-based equivalent of a manual database Makes it easy to organize and update information.
Advertisements

CC SQL Utilities.
ResponseCard XR Creating Tests and Homework ®. Navigating the Menu Press the MENU button to bring up the Main Menu. Press the Down Arrow twice to select.
MS-Access XP Lesson 1. Introduction to MS-Access Database Management System Software (DBMS) Store data in databases Database is a collection of table.
Benchmark Series Microsoft Access 2010 Level 1
 Use the Left and Right arrow keys or the Page Up and Page Down keys to move between the pages. You can also click on the pages to move forward.  To.
CPIT 102 CPIT 102 CHAPTER 1 COLLABORATING on DOCUMENTS.
Understanding Microsoft Excel
Objectives 1.Identify the functions of a spreadsheet 2.Identify how spreadsheets can be used. 3.Explain the difference in columns and rows. 4.Locate specific.
Shape Editor Programming Example
Chapter 5 Editing Text Files
BIOS816/VBMS818 Lecture 6 – Sequence Assembly Guoqing Lu Office: E115 Beadle Center Tel: (402) Website:
1 Using Editors Editors let you create and edit ASCII files UNIX normally includes two editors: vi and Emacs Vi and Emacs are screen editors: they display.
Guide To UNIX Using Linux Third Edition
Microsoft Office Word 2013 Core Microsoft Office Word 2013 Core Courseware # 3250 Lesson 8: Using Productivity Tools.
Ch 71 Using ATTRIB, SUBST, XCOPY, DOSKEY, and the Text Editor.
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
Advanced File Processing
Advanced Shell Programming. 2 Objectives Use techniques to ensure a script is employing the correct shell Set the default shell Configure Bash login and.
Lesson 1 – Microsoft Excel The goal of this lesson is for students to successfully explore and describe the Excel window and to create a new worksheet.
1 Lesson 22 Getting Started with Access Essentials Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Lesson 11-Locating, Printing, and Archiving User Files.
Lecture DNA Sequence Analysis and Fragment Assembly System (FAS)
REVIEW 2 Exam History of Computers 1. CPU stands for _______________________. a. Counter productive units b. Central processing unit c. Copper.
Creating a Web Site to Gather Data and Conduct Research.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
T HE VI EDITOR. vi has 2 modes: command mode (initial or "default" mode) insert mode [Esc] is used to switch to command mode. In general, vi commands:
Chapter 8 iComponents and Parameters. After completing this chapter, you will be able to perform the following: –Create iMates –Change the display of.
Lesson 17 Getting Started with Access Essentials
1 The EDIT Program The Edit program is a full screen text editor that allows you to: Create text files Create text files Edit an existing text files Edit.
® Microsoft Office 2010 Excel Tutorial 1: Getting Started with Excel.
Cell Alignment By default, text is left aligned and values are right aligned. You can also adjust vertical alignment.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 23 Getting Started with Access Essentials 1 Morrison / Wells / Ruffolo.
Lesson 11: Looking at Files and Folders what a file or folder is on the computer how to recognize a file or folder on the desktop how to recognize the.
Microsoft Office 2007 Access Chapter 3 Maintaining a Database.
1 ADVANCED MICROSOFT WORD Lesson 14 – Editing in Workgroups Microsoft Office 2003: Advanced.
Computing Fundamentals Module Lesson 10 — File Management with Windows Explorer Computer Literacy BASICS.
Basic Editing Lesson 2.
Microsoft Access 2010 Chapter 10 Administering a Database System.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 3 BACKNEXTEND 3-1 LINKS TO OBJECTIVES Modify a Table – Add, Delete, Move Fields Modify a Table.
Maintaining a Database Access Project 3. 2 What is Database Maintenance ?  Maintaining a database means modifying the data to keep it up-to-date. This.
Microsoft Access. Microsoft access is a database programs that allows you to store retrieve, analyze and print information. Companies use databases for.
Unix Session IV.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Ch 91 Pipes, Filters and Redirection. Ch 92 Overview Will use redirection to redirect standard input and standard output.
Key Applications Module Lesson 17 — Organizing Worksheets Computer Literacy BASICS.
Lesson 1 – Microsoft Excel * The goal of this lesson is for students to successfully explore and describe the Excel window and to create a new worksheet.
1 © 2012 John Urrutia. All rights reserved. Chapter 6 The vi Editor.
Introduction to Unix (CA263) File Editing By Tariq Ibn Aziz.
Chapter Three The UNIX Editors.
CREATING DATABASE Presenter: Jolanta Soltis. When to use Excel Use Excel when you: –Require a flat or non-relational view of your data (you do not need.
Relational Database Techniques
Work with Tables and Database Records Lesson 3. NAVIGATING AMONG RECORDS Access users who prefer using the keyboard to navigate records can press keys.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Understanding Microsoft Excel Lesson 1 – Microsoft Excel 2013.
PestPac Software. Leads The Leads Module allows you to track all of your pending sales for your company from the first contact to the close. By the end.
Understanding Microsoft Excel
Understanding Microsoft Excel
Understanding Microsoft Excel
Creating LOVs and Editors
Guide To UNIX Using Linux Third Edition
Vi Editor.
Chapter 7: Introduction to CLIPS
Understanding Microsoft Excel
Understanding Microsoft Excel
Lesson 23 Getting Started with Access Essentials
Grauer and Barber Series Microsoft Access Chapter One
Presentation transcript:

(1) Store fragment sequences; (2) Recognize overlapping sequences and create aligned assemblies, called contigs; (3) Display, edit and output the contigs for further analysis. Fragment Assembly System (FAS) Assemble overlapping fragment sequences from a sequencing project. Contig 1 Contig 2 Consensus A contig may not contain more than 1,650 fragments and may not be longer than 200,000 bases. No single fragment may be longer than 2,500 bases

GelStartGelEnterGelMergeGelAssembleGelViewGelDisassemble Begins a fragment assembly session bycreating a new fragment assembly project or by identifying an existing project. Enters a fragment sequences to a fragment assembly project from your terminal keyboard, a digitizer, or existing sequence files. Aligns the sequences in a fragment assembly project into assemblies called contigs. A multiple sequence editor for viewing and editing contigs assembled by GelMerge. Displays the structure of the contigs in a fragment assembly project. Breaks up the contigs in a fragment assembly project into single fragments.

Use GelStart to create a new project database for each sequencing project. For each new project, GelStart creates a new directory, named after the project, as a subdirectory of your current working directory. gcg% gelstart -check Minimal Syntax: % gelstart [-NAME=]MyProject -Default Prompted Parameters: -NEWproject begins a new sequencing project -VECtors=GB:M13mp18,GB:SynpBR322 highlights specified sequences in GELENTER -SITes=GAATTC,GGATCC highlights specified patterns in GELENTER Local Data Files: None Optional Parameters: -DELete deletes a whole project! -NOMONitor suppresses the screen monitorGelStart

SeqEd is an interactive editor for entering and modifying sequences and for assembling parts of existing sequences into new genetic constructs. You can enter sequences from the keyboard or from a digitizer.SeqED AGTCTTAGTCGATCGTAcTGCATRCGA....|: :| i :.| | | | | "sample.seq" 27 nucleotides d screen mode command mode

Screen Mode G, A, T,... - insert a sequence character - delete a sequence character H - delete a sequence character /TAACG - find the next occurrence of TAACG (last pattern entered is the default) 1 - move to start of the sequence E - move to end of the sequence [n] - go ahead n characters [n] - go back n characters - go up to check sequence - go down to original sequence 'markcharacter - go to marked position 37 - go to position 37 (any positive integer) < - go back 50 characters > - go ahead 50 characters R - redraw the screen D - enter command mode [n] is an optional numeric parameter.

Command Mode EDit seqname - get a new sequence file to edit [n] Include [seqname] - insert another sequence [at position n] (SeqEd prompts for range and strand) s,f Delete - delete a range of bases [s] Check [/Blind] - check a range of bases [beginning at s] 37 - go to base 37 REDraw - redraw the screen [n] COmment comment - insert a comment [at position n] [n] COmment - enter comment editing mode [at position n] [n] HEAding - edit documentary heading [at line n] change - enter screen mode ( is sufficient) screen - enter screen mode ( is sufficient) OVERstrike - enter overstrike mode INSert - enter insert mode [n] Mark markcharacter - mark the sequence [at position n] PERFect - require finds to be perfect matches PROtein - set sequence type to PROTEIN NUCleotide - set sequence type to NUCLEOTIDE [s,f] Write [seqname] - write [a part of] the sequence to a file DIGitizer - enter digitizer mode RELoad - enter reload mode ACCept - terminate reload mode Help - show commands in screen and command modes [s,f] EXit [seqname] - write [a part of] the sequence and quit Quit - quit the editor without writing the sequence [n] indicates an optional parameter. s and f are numbers for start and finish of a range of interest

%gcg1 gelstart GelStart begins a fragment assembly session by creating a new fragment assembly project or by identifying an existing project. What is the name of your fragment assembly project? bio GELSTART cannot find this project. Is it a new one (* No *) ? y You have a new project named "bio". Which vector sequence(s) would you like highlighted? gb:m13mp18 Which restriction site(s) would you like highlighted ? GAATTC Project BIO has 0 fragments in 0 contigs. You are ready to run the other fragment assembly programs.GelStart

GelEnter is a sequence editor that accepts sequence data.GelEnter gcg% gelenter –check Minimal Syntax: % gelenter [-INfile1=]mu*.seq Prompted Parameters: None Local Data Files: set.keys (must be in your current working directory to be used) Optional Parameters: -ENTER=mu*.seq enters existing files into the database -STAden enters existing Staden format files into the database -FASTA enters existing FASTA format files into the database -SINGlecommand automatically returns to screen mode after each command -PERFect sets find to search for perfect symbol matches -VECtors=gb:synpbr322 highlights sequences from pBR322 -SITes=gaattc highlights GAATTC patterns -LANes=g,A,T,C sets lane order for digitizer -MINOverlap=10 sets minimum overlap length for Reload command -PCTOverlap=95 sets stringency for the Reload command -TOLerance=0.4 sets tolerance for digitizing ambiguity (0 to 1), with 1 being the most tolerant

GelEnter accepts any valid GCG sequence character. Once you enter sequences into a project database, you can no longer edit them with GelEnter.GelEnter gcg2 21% gelenter seq02.dat GelEnter adds fragment sequences to a fragment assembly project. It accepts sequence data from your terminal keyboard, a digitizer, or existing sequence files. "seq02" 593 nucleotides IUB/GCG Meaning A A C C G G T/U T M A or C R A or G W A or T S C or G Y C or T K G or T V A or C or G H A or C or T D A or G or T B C or G or T X/N G or A or T or C./~ gap character

GelMerge automatically recognizes overlaps among all of the sequences in a project database and creates aligned assemblies, called contigs, from the overlapping sequences. These contigs are stored in the project database. As you add new sequences that connect separate contigs to the project database, GelMerge aligns the contigs into larger assemblies.GelMerge % GelMerge What word size (* 7 *) ? What fraction of the words in an overlap must match (* 0.80 *) ? What is the minimum overlap length (* 14 *) ? Reading Comparing Aligning Writing... Input Contigs: 12 Output Contigs: 3 CPU time: (seconds)

Minimal Syntax: % gelmerge -Default Prompted Parameters: -WORdsize=7 sets word size for overlap determination -STRIngency=0.8 sets minimum fraction of matching words in overlap -MINOverlap=14 sets minimum length of overlap Local Data Files: -MATRix1=gelmergedna.cmp assigns the scoring matrix for contig assembly -MATRix2=gelmergelocaldna.cmp assigns the scoring matrix for vector recognition Optional Parameters: -MINIdentity=14 sets minimum run of identical bases found at least once in an overlap between two contigs -MAXGap=10 sets maximum gap size for overlap determination -GAPweight=8 sets gap creation penalty in contig assembly -LENgthweight=2 sets gap extension penalty in contig assembly -ARChive creates contigs from the original gel readings -WORKing creates contigs from individual working fragment (with gaps removed) -REPortfile[=Filename] writes report of recognized vector sequences -EXCise removes vector sequences from single-fragment contigs -VECTORSTrigency=0.8 sets minimum fraction of matches in vector recognition -VECTORMINIdentity=12 sets minimum run of identical bases found at least once in a match between vector and fragment -VECTORMAXGap=5 sets maximum gap size in first step of vector recognition -VECTORGAPweight=30 sets gap creation penalty in vector recognition -VECTORLENgthweight=3 sets gap extension penalty in vector recognition -NOMERge suppresses contig assembly -NOMONitor suppresses screen trace of program progress -NOSUMmary suppresses screen summary at the end of the program -BATch submits program to the batch queue

After assembling contigs with GelMerge, use the contig editor, GelAssemble, to review and modify the alignments. After choosing a contig for review, GelAssemble lets you edit the individual sequences in that contig to resolve inconsistencies. GelAssemble creates a consensus sequence that uses the IUB nucleotide ambiguity codes. You can modify a sequence and change the alignment in the same way you edit text with a text editor. Although GelMerge assembles and aligns contigs automatically, you can assemble contigs manually using GelAssemble. For example, you could manually assemble separate contigs that do not share sufficient overlap for GelMerge to assemble automatically. You can also separate fragments from a contig if you believe they should not be included. Once you are satisfied with a contig, you can store it in the sequencing project database.GelAssemble seq03 > GTTCATCAGTCTTGGTGGAGAAGTTCGACAGATGCCATTGGCAGATTTCACCGATGGTTC 220 seq01 > GTTCATCAGTCTTGGTGGAGAAGTTCGACAGATGCCATTGGCAGATTTCACCGATGGTTC 540 CONSENSUS > GTTCATCAGTCTTGGTGGAGAAGTTCGACAGATGCCATTGGCAGATTTCACCGATGGTTC Screen mode Command mode D

Keys Pressed Action [n] move ahead [n bases] [n] move back [n bases] [n] move up [to row n] [n] move down [to row n] > scroll one screen to the right < scroll one screen to the left 1 move to start of the sequence E move to end of the sequence 165 move to base 165 in sequence /GATTC find next occurrence of GATTC A move to next ambiguity in alignment R move to next ambiguity in sequence V move to next gap in consensus D enter Command Mode L toggle alignment display enlargement W redraw the screen O toggle INSERT/OVERSTRIKE mode ! summary of current sequence ? display these help screens G recalculate the consensus G A T C.... add base at the cursor delete a base, or move sequence left H delete a base, or move sequence left move the sequence to the right X delete alignment column I restore alignment column B begin selecting a range for removal N remove the selected range P insert the removed range - reject current fragment Gelassemble Screen Mode

Gelassemble Command Mode [a,b] specifies a range of fragments. [x,y] specifies a range of bases. [n] is an optional numeric parameter. EDit [ContigName] replace current contig with a new contig CONTIGs select another contig for editing WRite write a contig to the database EXit write the contig and quit QUIT quit without writing ERASE delete current contig from the database 238 move to position 238 in the current fragment [x,y] PRETTYout [FileName] write the sequence alignment [position x - y] [a,b] SEQOUT write fragments [a - b] to sequence files BIGPICture [FileName] write bar schematic to an output file OVERstrike select OVERSTRIKE sequence edit mode NOOVERstrike select INSERT sequence edit mode [x,y] CONSensus recalculate the consensus sequence [a,b] LOCk lock strands [a through b] [a,b] Unlock unlock strands [a through b] [x,y] SELect select bases [x through y] REMove remove the selected bases [n] INSert insert the removed bases [at position n] CAncel cancel the selection

[x,y] DElete delete bases [x through y] GOTo [FragmentName] move to strand by name FInd GAATC find the next occurrence of GAATC DIfferences show differences from the consensus MAtches show matches with the consensus Neither show neither matches nor differences REDraw redraw the screen Help display these help screens SORt [DEScending] sorts strands by their offsets in alignment [a,b] MOve moves a strand [from line a to line b] OPen opens a blank line at the cursor position [a,b] ANChor anchors strands [a through b] [a,b] NOANchor unanchors strands [a through b] LOad [ContigName] loads another contig into the Edit Screen REVerse reverse-complement the (anchored) strand(s) [n] Offset shifts the current fragment [to begin at n] REJect removes the current fragment from the screen NODUPlicate removes a duplicated fragment from the screen SPAWN renames a duplicated fragment SEParate makes two contigs from anchored and unanchored strands

GelView displays bar diagrams that show the overlaps among the fragments in each contig, providing a schematic view of the whole sequencing project.GelView Gelview  filename.vew. cat/more filename.view GELVIEW Fragment Assembly contig display of Project: bio May 4, :42 Contig: seq01 3 seq > 2 seq > C CONSENSUS > | | | | | | Contig: seq04 3 seq02 < seq > C CONSENSUS > | | | | | | Contig: seq05 2 seq > C CONSENSUS > | | | | | | Fragments in 3 Contigs

GelDisassemble breaks up the contigs in a sequencing project, thus recreating the database as a collection of single fragments.GelDisassemble % geldisassemble Are you sure you want to disassemble your project (* No *) ? Yes 1) Emptying "relation" directory.... 2) Emptying "consensus directory.... 3) Copying "working" to "consensus".... 4) Creating "relation".... Gel Project Disassembled