Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.cottonmarker.org Cotton Marker Database (CMD) for Genetic And Genome Research www.cottonmarker.org Anna Blenda, Pengfei Xuan, David Camak, Feng Luo,

Similar presentations


Presentation on theme: "Www.cottonmarker.org Cotton Marker Database (CMD) for Genetic And Genome Research www.cottonmarker.org Anna Blenda, Pengfei Xuan, David Camak, Feng Luo,"— Presentation transcript:

1 Cotton Marker Database (CMD) for Genetic And Genome Research Anna Blenda, Pengfei Xuan, David Camak, Feng Luo, Don Jones (presenting author) ICGI-2010, Canberra, Australia

2 CMD Objectives Collect and integrate all public cotton molecular markers (SSRs and SNPs) as a cotton community resource. Accelerate utilization of molecular markers in cotton breeding. Provide data retrieval and search tools. Provide stand-alone data mining tools. Facilitate collaboration domestically and internationally The CMD database resourse is available for deposit and data mining of the cotton genomes sequencing data

3 CMD New Features 1.Primer Redundancy 2. Traits
CMD New Features 1.Primer Redundancy 2. Traits 3. Updated Cmap Viewer with QTLs 4. New System Platform - powerful computing 5. Future/Work in Progress   

4 View section: 1. Primer Redundancy SSR Projects SNP Projects
1. Primer Redundancy View section:      SSR Projects SNP Projects SSRs    Markers Homology    SSR Primers Primer Redundancy    Panel    Publications    Maps

5 Importance of the Primer Redundancy Check:
Importance of the Primer Redundancy Check: Initial step in the analysis of the CMD cotton SSRs collection redundancy. Avoidance of generating marker redundancy. Financial component is critical (spending money for non-redundant SSR markers only). Direct effect on the efficiency of the molecular breeding research.

6 Primer Redundancy Summary Page:
Primer Redundancy Summary Page: - 18,002 primer sequences analyzed; 2,570 (14.2%) redundant primer sequences; Types of primer sequence match: forward-forward; reverse-reverse; forward-reverse; reverse-forward.

7 Threshold value for primer sequence match: 81%
Threshold value for primer sequence match: 81% The threshold value (81% or higher for sequence match): chosen based on the threshold value analyses (from 70% to 100% match); - below 81% match primer redundancy increases dramatically

8 List of Redundant Sequences
List of Redundant Sequences

9 Primer Redundancy Individual Pages
Primer Redundancy Individual Pages

10 Redundant primer Info from View/Search SSR pages
Redundant primer Info from View/Search SSR pages

11 Downloads Page CMD SSR Primer Redundancy results available from the Downloads page: -excel format

12 Search by Primer Redundancy
Search by Primer Redundancy

13 Example of published traits and QTLs associated with traits
2. Traits in Cotton Linked to the Genetically Mapped SSR David Camak, undergraduate student (Erskine College) Example of published traits and QTLs associated with traits

14 QTL Start & Stop Positions Publication Reference
Spreadsheet with Annotated Trait Data Trait Symbol Marker Interval for QTL Trait Name QTL/gene Name R2 Value Trait-linked SSR QTL/gene? Cross QTL Start & Stop Positions SSR Genetic Position Publication Reference Linkage Group QTL Span Marker Type This is what David Camak did – he annotated into the excel spreadsheet the information from current pubs regarding ag.important traits in cotton and mapped cotton SSRs linked with those traits. Trait Description

15 Results Twenty-nine agriculturally important traits were analyzed overall Total number of SSR markers associated with those traits was 142 The total number of crosses/genetic maps analyzed was 15 Initial results of David Camak’s undergraduate research project. The annotation of traits is being continued.

16 Agriculturally Important Traits Annotated
Boll Size Boll Weight Bolls per Plant Color Components Yellowness Fiber Span Length (2.5%, 50%) Fiber Elongation Fiber Fineness Fiber Maturity Fiber Perimeter Fiber Strength Fiber Micronaire Lint Index Lint Percentage Lint Yield Number of Seed/Boll Reflectance Seed Cotton Yield Seed Index Seed Weight Spiny Bollworm Resistance Short Fiber Index Wall Thickness Weight Fitness Uniformity Index Genic Male Sterility

17 Number of Cotton SSRs Associated/Linked with the Analyzed Agriculturally Important Cotton Traits

18 Agriculturally Important Traits
1 Agriculturally Important Traits Number of SSRs Associated with Each Trait Fiber Elongation 15 Fiber Length Yellowness 14 Fiber Strength 13 Fiber Strength (kNm/kg) 12 Fiber Reflectance Micronaire Boll Size (g) 11 Lint Percentage 10 Short Fiber Index Lint Cotton Yield (kg/ha) 9 Maturity 2.5% Fiber Span Length (mm) 8 Fiber Maturity Seed Cotton Yield (kg/ha) Seed Index (g) Fiber Elongation Percentage 7

19 Agriculturally Important Traits
2 Agriculturally Important Traits Number of Unique SSRs Associated with Each Trait Wall Thickness 7 50% Fiber Span Length (mm) 6 Boll Weight Genic Male Sterility Micronaire Reading 5 Weight Fitness Bolls per Plant 4 Fiber Perimeter Fiber Strength (cN/tex) 2 Spiny Bollworm Resistance Fiber Length (mm) 1 Fiber Length Uniformity *These numbers are continually updated as molecular research and breeding uncover more trait-linked SSRs

20 View Traits Go to CMD Click on Traits

21 Search SSRs Listed by Traits Traits by Published Symbol
Results Search SSRs Listed by Traits Traits by Published Symbol Click on any Trait

22 SSR Linked with Selected Trait Choose Trait Based on SSR
List of SSRs Choose Trait Based on SSR Click

23 Trait Data From Spreadsheet
Trait Information 1 QTL and Marker Information Positions for Genetic Mapping All Relevant Data from Spreadsheet Click on 1 or 2 2

24 1 Specific Molecular Marker Source Page
Forward and Reverse Primer Sequences Molecular Marker (SSR) Other Useful Information Related to Specific SSR

25 Search feature available on any page, including the homepage
2 Trait Search Page Simple search for agriculturally important traits in cotton Search feature available on any page, including the homepage

26 3. Updated CMap Viewer with QTLs
26 cotton genetic maps are available to view and compare in CMap Viewer; QTL information was added 5) Updates in CMAp (26 maps, QTL info) Consensus map:BC1-RIL: ("Guazuncho2" (G. hirsutum) x "VH8-4602" (G. barbadense)) 2009 Reference map: Comprehensive Reference Map (CRM) CottonDB 2010 BC1: (G. hirsutum "Emian22" x G. barbadense "Pima3-79") x "Emian22" 2008 BC1: Hai-7124 x Junmian BC1: [(G. hirsutum "TM-1" x G. barbadense "Hai7124") x "TM-1"] 2007 BC1: [(G. hirsutum "TM-1" x G. barbadense "Hai7124") x "TM-1"] 2006 BC1: (("Guazuncho2" (G. hirsutum) x "VH8-4602" (G. barbadense)) x "Guazuncho2") 2004 BC2: (("Guazuncho2" (G. hirsutum) x "VH8-4602" (G. barbadense)) x "Guazuncho2") 2005 DH: Vgs x (TM-1 x Hai-7124) 2005 F2: Deltapine x Giza F2: Deltapine-61 x Texas F2: G. arboreum ("Jianglingzhongmian" x "Zhejiangxiaoshanlushu") 2008 F2: Xinluzao-1 x Hai F2: G. hirsutum "CRI36" x G. barbadense "Hai7124" 2007 F2: G. hirsutum "Handan208" x G. barbadense "Pima90" 2007 F2: G. hirsutum "Handan208" x G. barbadense "Pima90" 2005 F2: Hai-7124 x Junmian F2: G. hirsutum race "Palmeri" x G. barbadense Acc. "K101" 2007 F2: G. hirsutum race "Palmeri" x G. barbadense Acc. "K101" 2004 F2: TM-1 x WT F2: Yumian-1 x T F2: Acala-44 x Pima S RIL: 7235 x TM RIL: Zhongmiansuo-12 x RIL: "TM-1" (G. hirsutum (AD1)) x "3-79" (G. barbadense (AD2)) 2006 RIL: "TM-1" (G. hirsutum (AD1)) x "3-79" (G. barbadense (AD2)) 2005 4WC: (Simian-3 x Sumian-12) x (Zhong-4133 x 8891) 2008

27 4. New Virtualization /HPC System Platform
Palmetto HPC - CMD was moved to virtual machines for high-performance computing (HPC); jobs submitted by users transfer to Clemson Palmetto HPC; - very powerful computing resource ( more than computing notes); daily remote backup. ssh Event Channel Virtual MMU Virtual CPU Control IF Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) GuestOS (CentOS 5.4) Device Manager & Control s/w VM0 CMD web page Cmap Cgi-bin CMBioTools VM1(cmdweb) Front-End Device Drivers (CentOS GuestOS 5.4) MySQL PostgreSQL VM2(databases) (CentOS 5.4)) Unmodified User Software VM3(gmod) Safe HW IF Xen Virtual Machine Monitor Back-End SMP Warriors HPC Cluster - Live and development website host one virtual machine based on virtualization technology. (ready to cloud computing) Computing job submit by user will transfer to Clemson Palmetto HPC. Very powerful computing resource ( more than 5000 computing notes) Daily remote snapshot backup. Remote Daily Backup snapshot

28 Future - 100 SSRs from Siva Kumpatla (Dow Agro): a collaborative project with Texas A&M , SSRs mapped on TM-1 x 3-79 map; - 200 SSRs from Ramesh Kantety; Updating of the mapped SSR data is in progress More SNP data is coming; Annotation of traits/genes that are mapped and linked to SSRs/SNPs is in progress

29 3 pipelines were designed (Pengfei Xuan):
Future (cont.) 3 pipelines were designed (Pengfei Xuan): 1. Eukaryotic Automated Structural Annotation Pipeline 2. Transposable elements denovo 3. Transposable elements annotation

30 1). Eukaryotic Automated Structural Annotation Pipeline
(work in progress) EST based refinement (PASA) Finalize best annotation Phase 3 Genome Sequence Gene Finders EST Database (PASA) Database Comparisons Consensus prediction Manually build gene models (200 genes) Gene Finder Use gene models as Training set Repeat Masker Preliminary gene finding Phase 1 Phase 2 Manual check - Aimed to identify a vast majority of genes; raw sequences are run through a series of programs and scripts (“pipeline”) in an automated way; generates a basic working gene set as a starting point for further work. The pipeline was designed by Pengfei for CUGI initially, but we are planning to implement it into CMD. It will be very handy when genome sequences are available.

31 2). Transposable Elements Computational Identification
(work in progress) This pipeline is searching the genome sequences for TEs and creates a library file of TEs for a genome of interest this pipeline starts by comparing the genome with itself using BLASTER. Then it cluster matches with GROUPER, RECON and PILER, clustering programs specific for interspersed repeats. For each cluster, it builds a multiple alignment from which a consensus sequence is derived. Finally these consensus are classified according to TE features and redundancy is removed. At the end we obtain a library of classified, non-redundant consensus sequences.

32 3). Transposable Elements Annotation
(work in progress) This pipeline mines a genome using a library of TEs from TEdenovo pipeline. Identified TEs are filtered and annotated. TEannot: this pipeline mines a genome with a library of TE sequences, for instance the one produced by the TEdenovo pipeline, using BLASTER, RepeatMasker and CENSOR. An empirical statistical filter is applied to discard false-positive matches. Short simple repeats (SSRs) are annotated along the way with TRF, RepeatMasker and MREPS. Then the pipeline chains, with MATCHER via dynamic programming, TE fragments belonging to the same, disrupted copy. A "long join" procedure is subsequently applied to connect distant fragments. Finally annotations are exported into GFF3 and gameXML files.

33 CMD TEAM Anna Blenda, PI Feng Luo,collaborator Pengfei Xuan
CMD TEAM Anna Blenda, PI Research Assistant Professor, Genetics and Biochemistry Clemson University Feng Luo,collaborator Assistant Professor, School of Computing Clemson University Pengfei Xuan M.S. student Computer Science Clemson University David Camak former member, currently M.S. student Biology SELU

34 Acknowedgements Cotton Incorporated
Acknowedgements Cotton Incorporated

35 Thank you!


Download ppt "Www.cottonmarker.org Cotton Marker Database (CMD) for Genetic And Genome Research www.cottonmarker.org Anna Blenda, Pengfei Xuan, David Camak, Feng Luo,"

Similar presentations


Ads by Google