Presentation is loading. Please wait.

Presentation is loading. Please wait.

Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.

Similar presentations


Presentation on theme: "Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome."— Presentation transcript:

1 Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome

2 Why This Study Is Impressive?
Lower sequencing cost + more sophisticated algorithms = more species with genome assemblies The problem ?  These assemblies are fragmented, contain gaps and errors which make downstream applications difficult This study  Uses combined technologies (3) that give the most continuous de novo mammalian assembly 1- Long reads for contig formation 2 – Short reads for consensus validation 3 – Scaffolding by optical and chromatin interaction mapping 400 fold improvement in continuity Only 649 gaps De novo = starting from the beginning Contig = overlapping sequence data Consensus validation = confirmation or corroboration; the declaration of validity. Scaffolds = are composed of contigs and gaps.

3 How is This Applicable in The Real World?
Agriculture  accurate genome reference is essential for plants and animal species Researchers of such organism GMO = is the result of a laboratory process where genes from the DNA of one species are extracted and artificially forced into the genes of an unrelated plant or animal. The foreign genes may come from bacteria, viruses, insects, animals or even humans.

4 Current Progress of Gene Sequencing
Progress has been made in techniques to generate contig regions Finish the genome is the challenge  extremely difficult for repetitive genomes The human genome  Draft in 2001  followed by 3 years curation by 18 institutions Short-read sequencing  inexpensive, yield draft genome assemblies but they are highly fragmented  (hence this paper combining 3 techniques)

5 Current Progress of Gene Sequencing
Repetition in the genome is the biggest challenge in its assembly  leads to gaps Scaffolding technologies  Order and orient the assembly of contigs (used in this paper) Chromosome interaction mapping  identifies long-range chromosome interactions Optical mapping  inexpensive, HD scaffolding data Both methods have limited ability to scaffold small contigs in fragmented short read assemblies Scaffolds = are composed of contigs and gaps.

6 Current Progress of Gene Sequencing
Single-molecule sequencing  produces reads of 10’s kb but has high error rate The Pacific Biosciences sequencing platform  produces reads at an average of 14 kb (peak over 60 kb)  used to construct bacterial and continuous eukaryotic genomes Combination of long –read sequencing + long-range scaffolding = most efficient way to produce near-complete genome reference assemblies

7 - Combination of long –read sequencing + long-range scaffolding = most efficient way to produce near-complete genome reference assemblies

8 Online Methods Listed Animals  Under IACUC-approved protocol and other federal regulations Reference individual selection  DNA panel composed of 96 US goats assembled to find most homozygous goat  Determined by raw count of homozygous methods Genome sequencing, analysis and sequencing Conflict resolutions  To resolve misassembles in prior steps Assembly polishing and contaminant identification Assembly annotation Gap resolution and repeat analysis Centromeric and telomeric repeat analysis Fosmid end sequencing and analysis Statistical analysis Code availability Data availability - Detailed and above undergraduate understanding, but students may refer to the online methods if they recognize these methods / steps listed and would like to know more

9 RR Genome sequencing, analysis and sequencing Conflict resolutions  To resolve misassembles in prior steps Gap resolution and repeat analysis

10 Results Adult male goat (San Clemente breed) sequenced
Goat had high degree of homozygosity to minimize heterozygous alleles to simplify the genome assembly Long-read single-molecule sequencing High fidelity short-read sequencing Optical mapping (scaffolding tech) Chromatin interaction mapping (scaffolding map) Stepwise assembly of this complementary data as observed in table 1

11 Stepwise assembly of complementary data
Validated with statistical methods

12 Research Limitations RH mapping used to maximize the accuracy of the final reference assembly Corrected 21 inversions  consisting of 83 scaffolds Corrected 4 misplacements before final gap filling ARS1  Final assembly After error correction and validation, ARS 1 contains 4 discrepancies with the RH map  needs further research to fix these (figure 3) ARS1 compares favorably with the human genome ! RH map = radiation hybrid mapping  technique used to map mammalian chromosomes Uses x-ray breakage of chromosomes to determine the distances between DNA markers as well as their order on the chromosomes ARS 1 = autonomously replicating sequence 1 ARS1 Human Genome Scaffolds 31 24 Gaps 649 832

13

14 -R

15 Implications Paper presents near-finished reference genome for the domestic goat using: Long-read single-molecule sequencing High fidelity short-read sequencing Optical mapping (scaffolding tech) Chromatin interaction mapping (scaffolding map) Unlike cattle that come from two different subspecies, dometic goats appear to come from one single ancestor  bezoar33 This new assembly strategy is superior in accuracy and cost effectiveness compared to the past Provides new standard reference for ruminant genetics Creation of the reference goat genome could mean easier identification of adaptive variants in the sequence data of descendent breeds

16 Adult male San Clemente breed
Unlike cattle that come from two different subspecies, dometic goats appear to come from one single ancestor  bezoar33 This new assembly strategy is superior in accuracy and cost effectiveness compared to the past

17 Discussion Long-read sequencing  improved mammalian genome assemblies
Complex genomic regions continue to interfere with the complete assembly Current long-read technologies  still falling short Cannot regularly produce completely assembled chromosomes Scaffolding technologies  Must be reliable and affordable  becomes important to generate HD, finished reference genome because current long-read is not enough This is why this paper combined all three methods ! Demonstrated that optical + chromatin interaction mapping are complementary and useful in conjunction with long-read assemblies

18 Discussion Methods of this study reduced the cost of genome finishing
It would cause around $100,000 to perform a similar genome assembly using current PacBio RS II and the scaffolding techniques used in this study 3X cost of a short-read assembly but would provide unparallel gain in continuity and quality of the genome assembly From this study, it is expected that these methods will allow the de novo assembly of many vertebrate species without compromising the quality ! - The Pacific Biosciences sequencing platform  produces reads at an average of 14 kb (peak over 60 kb)  used to construct bacterial and continuous eukaryotic genomes

19 Questions?

20 References Bickhart, Derek M. et al. Single-Molecule Sequencing and Chromatin Conformation Capture Enable De Novo Reference Assembly of the Domestic Goat Genome. Nature Genetics , 6 Mar. 2017,


Download ppt "Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome."

Similar presentations


Ads by Google