Elephant Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.

Slides:



Advertisements
Similar presentations
Introduction 1.Ordering of P. knowlesi contigs v P. falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating.
Advertisements

Updating the human reference assembly V.A. Schneider, P. Flicek, T. Graves, T. Hubbard & D.M. Church for the Genome Reference Consortium
Homology Based Analysis of the Human/Mouse lncRNome
Click to edit Master title style Irys data analysis January 10 th, 2014.
Lecture 14 Genome sequencing projects
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
Some new sequencing technologies. Molecular Inversion Probes.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Expanding the Tool Kit for BAC Extension Summary of completion criteria developed for NSF Tomato Sequencing Workshop January 14, 2007.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Zebra Finch Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Stickleback Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis 4.Files and images are at
Genetic and physical maps around the sex-determining M- locus of the dioecious plant asparagus Telgmann-Rauber et al
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
© Wiley Publishing All Rights Reserved.
Mouse Genome Sequencing
Tomato genome annotation pipeline in Cyrille2
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Tomato Chromosome 4: A Mapping & Sequencing Update 28 th September 2005 Christine Nicholson Mapping Core Group Welcome Trust Sanger Institute, UK.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Supplementary Figure S1 Percentage of peaks from Trf1 +/+ p53 -/- -Cre vs Trf1  /  p53 -/- -Cre comparison that are located in non subtelomeric and subtelomeric.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
The Changing Face of Sequencing
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
RNA Sequencing I: De novo RNAseq
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
HeterochromatinEuchromatin Relative chromosome length Relative bivalent diameter X 1.23 X 1.00 Relative area Relative optical density.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
The Wellcome Trust Sanger Institute
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Accessing and visualizing genomics data
Repetitive element (RE) mediated DNA level recombination by non-allelic homologous recombination (NAHR) as the mechanism for disperse duplication of a.
Drosophila Genomics Where are we now? Where are we going? Christopher Shaffer, Wilson Leung, Sarah Elgin Dept of Biology; Washington University in St.
What is BLAST? Basic BLAST search What is BLAST?
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
Welcome to the combined BLAST and Genome Browser Tutorial.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard, shinisaurus crocodilurus Jian gao, qiye li, zongji.
Tomato Sequencing Project Meeting at SOL 2008, Oct. 15, 2008
Cross_genome: Assembly Scaffolding using Cross-species Synteny
Denovo genome assembly of Moniliophthora roreri
M. roreri de novo genome assembly using abyss/1.9.0-maxk96
Genome sequence assembly
Professors: Dr. Gribskov and Dr. Weil
Basics of BLAST Basic BLAST Search - What is BLAST?
Pre-genomic era: finding your own clones
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
GEP Annotation Workflow
Jin Zhang, Jiayin Wang and Yufeng Wu
Identification and Characterization of pre-miRNA Candidates in the C
Basic Local Alignment Search Tool
A Sequenciação em Análises Clínicas
Basic Local Alignment Search Tool (BLAST)
Assembly of Solexa tomato reads
Estimating the Rate of Gene Conversion on Human Chromosome 21
Presentation transcript:

Elephant Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis

Zebra Finch Genome The Genome assembly is downloaded from ftp://ftp.ncbi.nih.gov/genbank/genomes/Eukaryote s/vertebrates_mammals/Loxodonta_africana/Lox afr3.0/ ftp://ftp.ncbi.nih.gov/genbank/genomes/Eukaryote s/vertebrates_mammals/Loxodonta_africana/Lox afr3.0/ This assembly contains 693 scaffolds(GL…) and 1658 contigs (AAGU…), but they are not mapped to chromosomes. Total gapped length is 3,196mb and none gapped sequence length is 3,118mb.

Seg Dup detection pipelines WGACto detect Seg Dup in genomic assemblies by looking for homologous pairs ( >1 kb in length >90% identity).

Parameters and notes for WGAC pipeline Repeats –Because the elephant repeats library is not available, we masked out the combined sequence space of winMask and repeatmasker spaces. –The repeatMasker only using the default is not good enough. Tested by blast. –The combined masking space is good enough. Blast parsing seeds in WGAC pipeline: –the seed size is 500 bp.

Result from WGAC Pipeline Total pairs of WGAC detected (>1 kb and >90% identity) Interchromosome pairs Intrachromosome pairs 5709 Total WGAC NR (bp) 128,672,221 NR inter 97,156,068 NR intra 55,296,067 Total genome size (with gap) 3,196,721,236 Notes: The inter, and intra are based on scaffold and contigs rather than chromosomes.

General analysis of WGAC length and identity distribution 1.Length distribution peaked at 1-2 kb, intra > inter, with 87% of WGAC related to chrUn. 2.Identity distribution peaked at 97-98%. Few are higher than 99%.

NR distribution (AllDupLen.xls) Because the scaffold and contigs are not mapped to chromosome, there is no NR distribution on each chromosome In general, the large scaffold has less SD, and smaller scaffold and has higher SDs, especially those less than 1mb. All contigs has high percentage of the SDs.

Initial stats is in allstat.xls

WGAC page, not yet set up

WSSD analysis done by Tin not yet Downloaded the WGS reads; about 11,683,735 reads from trace archive at NCBI. Downloaded zfinch-finished BACs. These BACs are used to determine the threshold for WGS depth coverage. For 5-kb window, the average number of reads is 59. The threshold for 5-kb window is 110, for 1-kb it’s 22. Used UCSC taeGut1 database rmsk tables as input to mask the genome for repeats with divergence <=10%. (UCSC rmsk options: RepeatMasker -align -s -species 'Taeniopygia guttata')

WSSD results not yet available A total of 16,076 regions with 44,218,871 bp were found in wssdGE10K_nogap.tab (which has a 10-k cut-off). 13,782 of them are on chrUn. A summary table of WGAC intersect with WSSD is at

General view showing WGAC (>5kb) and WSSD on all chromosomes not done yet, may be on large scaffold Grey above lines are WSSD Brow below lines are WGAC

Union of WSSD and WGAC gene intersect with Seg Dups not available A nonredundant union of WGAC and WSSD is generated with cut- off size at 10 kb (AllDup10kb.tab). There are 3,839 NR regions with 50,902,487 bp, which is about 10 mb more than WSSD alone. However, be aware there may be false positive sites, especially on chrUn, since we know there are high false positive WGACs on chromosomes and chrUn.

Summary table 1 not avaible totalchrNchrUn No. nr intervalfile wssd (bp) 44,218,87111,237,98535,080,886729wssdGE10K_nogap.tab wgac (bp) 384,501,909232,493,308152,008, oo.weild10kb.join.all.cull AllDup (bp) 394,988,746235,022,961159,965, allDUP Wssd and Wgac shared 8,195,5773,182,1285,013,449 Genome (bp) 1,233,186,3411,057,961,026175,225,315

Large SDs >=10 kb SD >=10 kb in size were pulled out. There are a total of 3,839 intervals with length 50,902,487 bp in the allDup.tab.

result