DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM
INDEX Day 2 Get familiar with NGS Understanding of NGS raw read file Quality issue Alignment/Mapping against reference sequence Understanding of Alignment Calling Variations from alignments result Understanding of variants calling format
FLOW CHART SolexaQA bwa bowtie2 bwa bowtie2 Alignment samtools SAM samtools BAM Sorted BAM samtools bcftools samtools bcftools pileup VCF selection JoinMap4 Map construction DNA/RNA NGS platform Raw read Sequences Raw read Sequences Quality trimming This is what we are going to do in this course
RAW READS – FASTQ FORMAT Read ID: Machine ID, FlowCell No. Read sequences + Quality seqeunces Phred Score Q=-10 log 10 P Phred ScoreProbability of incorrect base call Accuracy 101/1090% 201/10099% 301/ %
ASCII CODE
QUALITY TRIMMED FASTQ BeforeAfter
ALIGNMENT (BOWTIE2) FM index Similar with Burrows-Wheeler Transform Reducing turnaround time in sequence alignment More faster than bwa Insertion/Deletion of small size can be detected This is for free!!
BURROWS-WHEELER TRANSFORM So called, bowtie2-build Reference sequences must be transformed before alignment Command $ bowtie2-build Usually using same name for input and output $ bowtie2-build Gmax_189.fa Gmax_189.fa Vradi_ver6.fa.?.bt2, Vradi_ver6.fa.rev.?.bt2 will be created
CREATING SAM FILE Command bowtie2 –x -U -S It will take some time SAM file will be created
SAM FILE
SAM TO BAM SAM is an human readable format BAM is an binary file which is not readable for human is computer readable is much compact in file size samtools samtools view –bS [input.sam] > [output.bam]
BAM FILE Looks like this Can you read this?
SORTING ALIGNMENT BAM sort samtools $samtools sort [input.bam] [output.bam] E.g.) $samtools sort cheongja3.bam cheongja3.bam.sorted Alignment will be sorted
CALLING VARIATION Reference fasta file should be indexed $samtools faidx [reference.fa] Using samtools pileup and bcftools $samtools mpileup –DSuf [reference.fa] [input.bam] | bcftools view –vcg - > [output.vcf]
VCF FORMAT
FILTERING OUT SNP $grep –v ‘INDEL’ [input.vcf] > [output.vcf] vcfutil.pl varFilter –d [integer] –D [integer] –Q [integer] -Q INT minimum RMS mapping quality for SNPs [10] -d INT minimum read depth [2] -D INT maximum read depth [ ]
TODAY’S PRACTICE Real data analysis Basic python class
THANK YOU Q & A
DAY 2. PRACTICE- BASIC PYTHON LANGUAGE CLASS TAEYOUNG LEE
VARIABLE
String type All the characters are string type ‘a’, ‘b’, ‘c’, ‘d’, ‘0’, ‘1’, ‘2’, ‘3’, ‘0.1’… You have to use ‘’ or “” for string type A : variable A ‘A’ : string value A Special character(\ = ₩ ) ‘\n’ : newline character ‘\t’ : tab ‘\’’ : ‘ ‘\”’ : “ ‘\\’ : \ VARIABLE TYPE
Integer type All the integers are integer type 1, 2, 3, 4, 100, 72038, Float type Represent decimal number or fractional number 1/3, 0.23, 1.8, VARIABLE TYPE
Cannot use add between str and int type variable ‘Crop’ + ‘ Genomics’ = ‘Crop Genomics’ ‘Crop’ + ‘4555’ = ‘Crop4555’ ‘880’ + ‘4555’ = ‘ ’ = 5435 ‘880’ = error ‘Crop’ = error Between str and float also. CHARACTERISTICS OF VARIABLE
If you use float at least once, that variable will be float 5/2 = 2 1+2 = 3 5.0/2 = 2.5 5/2.0 = 2.5 = 3.0 CHARACTERISTICS OF VARIABLE
You can multiply string variable 2*3 = 6 ‘2’*3 = 222 ‘hello’*3 = hellohellohello Hello*3 vs. ‘Hello’*3 CHARACTERISTICS OF VARIABLE
You can use these kind of symbols in integer and float type variable +, -, *, / //, % CHARACTERISTICS OF VARIABLE
List Dictionary OTHER VARIABLES
List Is set by [] The list of other values or variable List_a = [1,2,’a’,’b’,[a,b]] List also can value of list Can get empty value List_b = [] OTHER VARIABLES
Dictionary Is set by {} Like a dictionary, had keys and values Dic_a = {‘English’:‘ 영어 ’} → Dic_a[‘English’] = ‘ 영어 ’ One key only have one value whatever, list, string, integer or dictionary Usage) Dic_amino_acid = {‘ATG’:‘Met’, ‘TGA:*’} Dic_amino_acid = {} Dic_amino_acid[‘ATG’] = ‘Met’ Key = [‘ATG’,’TGA’] value = [‘Met’, ‘*’] Dic_amino_acid = dict(zip(key,value)) OTHER VARIABLES
vi filename.py Python code files have.py as extension START PYTHON CODING
What is the fuction Already set fuction by other programmer Ex) print, if, for, open, etc.. Print (standard output function) Function for print something Usage) Print a Print ‘a’ Print ‘a’*3 Print 3*4 Print print with newline character Print ‘a\n’ BASIC FUNCTIONS
Standard input functions Input For integer Raw_input For string Usage) A = input(“enter some integers”) B = raw_input(“enter some words”) BASIC FUNCTIONS
If For judgment If conditional sentence were satisfied, some command were executed If not, the other command were executed BASIC FUNCTIONS Meaning Math Symbol Python Symbols Less than<< Greater than>> Less than or equal≤<= Greater than or equal≥>= Equals=== Not equal≠!= Containin Not containnot in
If BASIC FUNCTIONS True False if … elif … else Status AStatus BStatus C
Functions for loop For Useful for limited loop Usage) For variable_name in list_name: range() make list of integer Ex) range(2) = [0,1] range(1,5) = [1,2,3,4] range(1,5,2) = [1,3] BASIC FUNCTIONS len() Calculate length Ex) len(‘ABC’) = 3 len([1,2]) = 2
Functions for loop While Useful for infinite loop Usage while conditional_sentence: If conditional sentence is true, loop are work. While 1 mean always true, so it is infinite loop BASIC FUNCTIONS
break & continue They are always used with if and loop functions break If conditional sentence is true, the loop will be terminated continue If conditional sentence is true, that element of loop will be passed BASIC FUNCTIONS
1. Make a file which contains Gm05's gene information using /data2/python_study/Gmax_109_gene_exons.gff3 2. Write down the python script for print "This is sequence file" 3. Write down the python script for print the things which you were entered using standard input 4. You can get two integer entered by standard input and save them into variable A and B and save their sum into C. print variable C. 5. You can get two integer entered by standard input and print the bigger one 6. You can make dictionary for pairs between codon and amino acid. Print amino acid matched with codon entered by standard input 7. Same with 2, but repeat infinitely using loop sentence PRACTICE
For choosing elements of list and string INDEXING
For choosing the range of list and string SLICING
EXPANDED SLICING
List1 = List2 vs. List1 = List2[:] SLICING LIST
1. The number is entered by standard input and print multiplying matrix of that number. For example you input 3, you have to get this result 3 * 1 = 3, 3 * 2 = 6, 3 * 3 = 9, 3* 4 = 12, 3 * 5 = 15, 3 * 6 = 18, 3 * 7 = 21, 3 * 8 = 24, 3 * 9 = 27 Hint) you need to use loop sentence 2. Enter string by standard input and print the length of string 3. Enter string and integer by standard input and print string repeatedly the number of time you enter 4. Enter two strings using standard input and save into s1 and s2. if two strings are different, concatenate two string and print, else, print ‘same’ 5. Enter two strings using standard input and save into s1 and s2. If s2 is longer than s1 and the length of s1 is odd number, concatenate s1, s2 and print, else, concatenate s2, s1 and print 6. Enter string using standard input and print reverse of that PRACTICE
THAT’S IT FOR TODAY Q & A