DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.

Slides:



Advertisements
Similar presentations
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

An Introduction to Python – Part II Dr. Nancy Warter-Perez.
Python November 18, Unit 7. So Far We can get user input We can create variables We can convert values from one type to another using functions We can.
Introduction to Python
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
Python Control of Flow.
NGS Analysis Using Galaxy
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
COMPE 111 Introduction to Computer Engineering Programming in Python Atılım University
Introduction to Python
General Computer Science for Engineers CISC 106 Lecture 02 Dr. John Cavazos Computer and Information Sciences 09/03/2010.
REVIEW 2 Exam History of Computers 1. CPU stands for _______________________. a. Counter productive units b. Central processing unit c. Copper.
MES Genome Informatics I - Lecture V. Short Read Alignment
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
Strings CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
Input, Output, and Processing
Strings CS303E: Elements of Computers and Programming.
Q and A for Sections 2.9, 4.1 Victor Norman CS106 Fall 2015.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
(A Very Short) Introduction to Shell Scripts CSCI N321 – System and Network Administration Copyright © 2000, 2003 by Scott Orr and the Trustees of Indiana.
AP Computer Science edition Review 1 ArrayListsWhile loopsString MethodsMethodsErrors
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
GE3M25: Computer Programming for Biologists Python, Class 5
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
3 Basics © 2010 David A Watt, University of Glasgow Accelerated Programming 2 Part I: Python Programming.
Chapter 10 Loops: while and for CSC1310 Fall 2009.
Midterm Exam Topics (Prof. Chang's section) CMSC 201.
Python Basics  Values, Types, Variables, Expressions  Assignments  I/O  Control Structures.
C Language 1 Program Looping. C Language2 Topics Program looping Program looping Relational operators / expressions Relational operators / expressions.
Strings CSE 1310 – Introduction to Computers and Programming Alexandra Stefan University of Texas at Arlington 1.
Strings CSE 1310 – Introduction to Computers and Programming Alexandra Stefan University of Texas at Arlington 1.
DAY 3. ADVANCED PYTHON PRACTICE SANGREA SHIM TAEYOUNG LEE.
Indentations makes the scope/block Function definition def print_message (): print “hello” Function usages print_message () hubo.move ()// hubo is a class.
Q and A for Sections 2.9, 4.1 Victor Norman CS106 Fall 2015.
From Reads to Results Exome-seq analysis at CCBR
Input, Output and Variables GCSE Computer Science – Python.
ENGINEERING 1D04 Tutorial 2. What we’re doing today More on Strings String input Strings as lists String indexing Slice Concatenation and Repetition len()
Python Basics.
Introduction to python programming
G. Pullaiah College of Engineering and Technology
Agenda Introduction Computer Programs Python Variables Assignment
Topics Designing a Program Input, Processing, and Output
REPETITION CONTROL STRUCTURE
Algorithmic complexity: Speed of algorithms
Introduction to Python
Containers and Lists CIS 40 – Introduction to Programming in Python
ECS10 10/10
Engineering Innovation Center
Iterations Programming Condition Controlled Loops (WHILE Loop)
Lecture 4B More Repetition Richard Gesick
Introduction to Python
And now for something completely different . . .
CS190/295 Programming in Python for Life Sciences: Lecture 6
CISC101 Reminders Quiz 1 grading underway Next Quiz, next week.
Iteration: Beyond the Basic PERFORM
Python Tutorial for C Programmer Boontee Kruatrachue Kritawan Siriboon
Algorithmic complexity: Speed of algorithms
Topics Designing a Program Input, Processing, and Output
Fundamentals of Python: First Programs
Language Constructs Construct means to build or put together. Language constructs refers to those parts which make up a high level programming language.
Topics Designing a Program Input, Processing, and Output
For loops Taken from notes by Dr. Neil Moore
Topics Designing a Program Input, Processing, and Output
Algorithmic complexity: Speed of algorithms
Data Types Every variable has a given data type. The most common data types are: String - Text made up of numbers, letters and characters. Integer - Whole.
Computational Pipeline Strategies
More Basics of Python Common types of data we will work with
COMPUTING.
Presentation transcript:

DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM

INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping against reference sequence  Understanding of Alignment  Calling Variations from alignments result  Understanding of variants calling format

FLOW CHART SolexaQA bwa bowtie2 bwa bowtie2 Alignment samtools SAM samtools BAM Sorted BAM samtools bcftools samtools bcftools pileup VCF selection JoinMap4 Map construction DNA/RNA NGS platform Raw read Sequences Raw read Sequences Quality trimming This is what we are going to do in this course

RAW READS – FASTQ FORMAT Read ID: Machine ID, FlowCell No. Read sequences + Quality seqeunces Phred Score Q=-10 log 10 P Phred ScoreProbability of incorrect base call Accuracy 101/1090% 201/10099% 301/ %

ASCII CODE

QUALITY TRIMMED FASTQ BeforeAfter

ALIGNMENT (BOWTIE2)  FM index  Similar with Burrows-Wheeler Transform  Reducing turnaround time in sequence alignment  More faster than bwa  Insertion/Deletion of small size can be detected  This is for free!!

BURROWS-WHEELER TRANSFORM  So called, bowtie2-build  Reference sequences must be transformed before alignment  Command  $ bowtie2-build  Usually using same name for input and output  $ bowtie2-build Gmax_189.fa Gmax_189.fa  Vradi_ver6.fa.?.bt2, Vradi_ver6.fa.rev.?.bt2 will be created

CREATING SAM FILE  Command  bowtie2 –x -U -S  It will take some time  SAM file will be created

SAM FILE

SAM TO BAM  SAM  is an human readable format  BAM  is an binary file which is not readable for human  is computer readable  is much compact in file size  samtools  samtools view –bS [input.sam] > [output.bam]

BAM FILE  Looks like this  Can you read this?

SORTING ALIGNMENT  BAM sort  samtools  $samtools sort [input.bam] [output.bam]  E.g.) $samtools sort cheongja3.bam cheongja3.bam.sorted  Alignment will be sorted

CALLING VARIATION  Reference fasta file should be indexed  $samtools faidx [reference.fa]  Using samtools pileup and bcftools  $samtools mpileup –DSuf [reference.fa] [input.bam] | bcftools view –vcg - > [output.vcf]

VCF FORMAT

FILTERING OUT SNP  $grep –v ‘INDEL’ [input.vcf] > [output.vcf]  vcfutil.pl varFilter –d [integer] –D [integer] –Q [integer]  -Q INT minimum RMS mapping quality for SNPs [10]  -d INT minimum read depth [2]  -D INT maximum read depth [ ]

TODAY’S PRACTICE  Real data analysis  Basic python class

THANK YOU  Q & A

DAY 2. PRACTICE- BASIC PYTHON LANGUAGE CLASS TAEYOUNG LEE

VARIABLE

 String type  All the characters are string type  ‘a’, ‘b’, ‘c’, ‘d’, ‘0’, ‘1’, ‘2’, ‘3’, ‘0.1’…  You have to use ‘’ or “” for string type  A : variable A  ‘A’ : string value A  Special character(\ = ₩ )  ‘\n’ : newline character  ‘\t’ : tab  ‘\’’ : ‘  ‘\”’ : “  ‘\\’ : \ VARIABLE TYPE

 Integer type  All the integers are integer type  1, 2, 3, 4, 100, 72038,  Float type  Represent decimal number or fractional number  1/3, 0.23, 1.8, VARIABLE TYPE

 Cannot use add between str and int type variable  ‘Crop’ + ‘ Genomics’ = ‘Crop Genomics’  ‘Crop’ + ‘4555’ = ‘Crop4555’  ‘880’ + ‘4555’ = ‘ ’  = 5435  ‘880’ = error  ‘Crop’ = error  Between str and float also. CHARACTERISTICS OF VARIABLE

 If you use float at least once, that variable will be float  5/2 = 2  1+2 = 3  5.0/2 = 2.5  5/2.0 = 2.5  = 3.0 CHARACTERISTICS OF VARIABLE

 You can multiply string variable  2*3 = 6  ‘2’*3 = 222  ‘hello’*3 = hellohellohello  Hello*3 vs. ‘Hello’*3 CHARACTERISTICS OF VARIABLE

 You can use these kind of symbols in integer and float type variable  +, -, *, /  //, % CHARACTERISTICS OF VARIABLE

 List  Dictionary OTHER VARIABLES

 List  Is set by []  The list of other values or variable  List_a = [1,2,’a’,’b’,[a,b]]  List also can value of list  Can get empty value  List_b = [] OTHER VARIABLES

 Dictionary  Is set by {}  Like a dictionary, had keys and values  Dic_a = {‘English’:‘ 영어 ’} → Dic_a[‘English’] = ‘ 영어 ’  One key only have one value whatever, list, string, integer or dictionary  Usage)  Dic_amino_acid = {‘ATG’:‘Met’, ‘TGA:*’}  Dic_amino_acid = {} Dic_amino_acid[‘ATG’] = ‘Met’  Key = [‘ATG’,’TGA’] value = [‘Met’, ‘*’] Dic_amino_acid = dict(zip(key,value)) OTHER VARIABLES

 vi filename.py  Python code files have.py as extension START PYTHON CODING

 What is the fuction  Already set fuction by other programmer  Ex) print, if, for, open, etc..  Print (standard output function)  Function for print something  Usage)  Print a  Print ‘a’  Print ‘a’*3  Print 3*4  Print print with newline character  Print ‘a\n’ BASIC FUNCTIONS

 Standard input functions  Input  For integer  Raw_input  For string  Usage)  A = input(“enter some integers”)  B = raw_input(“enter some words”) BASIC FUNCTIONS

 If  For judgment  If conditional sentence were satisfied, some command were executed  If not, the other command were executed BASIC FUNCTIONS Meaning Math Symbol Python Symbols Less than<< Greater than>> Less than or equal≤<= Greater than or equal≥>= Equals=== Not equal≠!= Containin Not containnot in

 If BASIC FUNCTIONS True False if … elif … else Status AStatus BStatus C

 Functions for loop  For  Useful for limited loop  Usage) For variable_name in list_name:  range()  make list of integer  Ex) range(2) = [0,1] range(1,5) = [1,2,3,4] range(1,5,2) = [1,3] BASIC FUNCTIONS  len()  Calculate length  Ex) len(‘ABC’) = 3 len([1,2]) = 2

 Functions for loop  While  Useful for infinite loop  Usage while conditional_sentence:  If conditional sentence is true, loop are work.  While 1 mean always true, so it is infinite loop BASIC FUNCTIONS

 break & continue  They are always used with if and loop functions  break  If conditional sentence is true, the loop will be terminated  continue  If conditional sentence is true, that element of loop will be passed BASIC FUNCTIONS

1. Make a file which contains Gm05's gene information using /data2/python_study/Gmax_109_gene_exons.gff3 2. Write down the python script for print "This is sequence file" 3. Write down the python script for print the things which you were entered using standard input 4. You can get two integer entered by standard input and save them into variable A and B and save their sum into C. print variable C. 5. You can get two integer entered by standard input and print the bigger one 6. You can make dictionary for pairs between codon and amino acid. Print amino acid matched with codon entered by standard input 7. Same with 2, but repeat infinitely using loop sentence PRACTICE

 For choosing elements of list and string INDEXING

 For choosing the range of list and string SLICING

EXPANDED SLICING

 List1 = List2 vs. List1 = List2[:] SLICING LIST

 1. The number is entered by standard input and print multiplying matrix of that number. For example you input 3, you have to get this result 3 * 1 = 3, 3 * 2 = 6, 3 * 3 = 9, 3* 4 = 12, 3 * 5 = 15, 3 * 6 = 18, 3 * 7 = 21, 3 * 8 = 24, 3 * 9 = 27 Hint) you need to use loop sentence  2. Enter string by standard input and print the length of string  3. Enter string and integer by standard input and print string repeatedly the number of time you enter  4. Enter two strings using standard input and save into s1 and s2. if two strings are different, concatenate two string and print, else, print ‘same’  5. Enter two strings using standard input and save into s1 and s2. If s2 is longer than s1 and the length of s1 is odd number, concatenate s1, s2 and print, else, concatenate s2, s1 and print  6. Enter string using standard input and print reverse of that PRACTICE

THAT’S IT FOR TODAY  Q & A