BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.

BNFO 615 Usman Roshan

Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine Reads are fragments of a longer DNA sequence present in the sample given as input to the machine Usually in the millions – Genome sequence: a reference DNA sequence much longer than the read length

ACCAG ACCCG Heterozygous SNP ATT--A ATTGA Heterozygous indel ATTGA Human genome reference Short reads from a single individual ATTGA ATTAA Homozygous SNP encoded as 2 ATTGA (2, 1, 0, 1) ACCAG Here no variant is reported. Short read alignment

Applications – Genome assembly – RNA splicing studies – Gene expression studies – Discovery of new genes – Discovering of cancer causing mutations

Short read alignment Two approaches – Hashing based algorithms BFAST SHRIMP MAQ STAMPY (statistical alignment) – Burrows Wheeler transform Bowtie BWA

BFAST overview PLoS ONE 4(11): e7767.

BFAST algorithm PLoS ONE 4(11): e7767.

BFAST masked keys

Short read alignment Empirical performance: Simulated data: – Extract random substrings of fixed length with random mutations and gaps – Realign back to reference genome Real data: – Paired reads: two ends of the same sequence – Count number of paired reads within 500 to 10000 bases of each other

Short read alignment Courtesy of Genome Res. June 2011 21: 936-939;

Short read alignment

Metagenomics Study of DNA sequences (fragments) from environmental samples Bioinformatics problem: classify the DNA sequence of an organism Methods: – Sequence similarity – Machine learning Key papers

Metagenomics Project: use an advanced GPU sequence similarity program to classify simulated reads Data: available from published study Steps: – Align simulated metagenomic reads to a set of genomes with programs called MaxSSmap and NextGenMap. – Since these are simulated we know the true genome names – Determine number of correctly aligned pairs

Genome alignment Comparison of two large genome sequences Bioinformatics problem: comparison of distantly related genome sequences like chicken and mouse Methods: – Mainly hash table based methods Key papers

Genome alignment Project: develop a GPU program for accurate genome alignment Data: simulated genomes available from published study Alignathon Steps: – Make fragments of one genome – Align each fragment to other genome (with short read mapping) – Determine accuracy with mafcomparator program

Whole genome phylogeny Whole genome phylogeny to understand evolution at the genome level Bioinformatics problem: compare phylogeny from whole genome data vs concatenated data vs multiple gene trees Methods: – Concatenate alignment – Multiple gene trees Key papers

Whole genome phylogeny Project: determine accuracy of phylogenies from simulated whole genome data with three different methods – Gene concatenation – Tree reconciliation – Whole genome distance based methods (new) Data – Simulated genome sequences from Evolver program Steps – Construct trees with three methods – Determine their Robinson-Foulds distance

Variant detection Variants in unmapped reads could contain Bioinformatics problem: are there variants in unmapped reads? Methods: – GATK for variant detection Key papers

Variant detection Project: determine variants in unmapped reads Data: reads from 1000 genomes human sample (NA12878) that are unmapped by BWA Steps: – Map reads with MaxSSmap and NextGenMap programs – Run GATK to determine variants

Projects Metagenomics: Clarissa and Komal Genome alignment: Kayla and Kathryn Risk prediction webserver: Edward and Ittehad Variant detection: Shijie and Sruti Whole genome phylogeny: Catherine and Samantha

BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.

Similar presentations

Presentation on theme: "BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.

Similar presentations

Presentation on theme: "BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine."— Presentation transcript:

Similar presentations

About project

Feedback