Www.cottonmarker.org Cotton Marker Database (CMD) for Genetic And Genome Research www.cottonmarker.org Anna Blenda, Pengfei Xuan, David Camak, Feng Luo,

Slides:



Advertisements
Similar presentations
This course is designed for system managers/administrators to better understand the SAAZ Desktop and Server Management components Students will learn.
Advertisements

Implementing Tableau Server in an Enterprise Environment
Lettuce genetic map viewer is written in PHP and uses GD library. The viewer interacts with tables in the relational mySQL database and creates graphical.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Dorrie Main, Jing Yu, Sook Jung, Chun-Huai Cheng, Stephen Ficklin, Ping Zheng, Taein Lee, Richard Percy and Don Jones.
Introduction to NRSP databases and other breeding databases.
Customized cloud platform for computing on your terms !
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
1 Welcome to the Quantitative Trait Loci (QTL) Tutorial This tutorial will describe how to navigate the section of Gramene that provides information on.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
The Hymenoptera Genome Database (HGD, is an informatics resource supporting genomics of hymenopteran insect species. It currently.
Transposable Elements (TE) in genomic sequence Mina Rho.
Jing Yu, Sook Jung, Chun-Huai Cheng, Stephen Ficklin, Ping Zheng, Taein Lee, Richard Percy, Don Jones, Dorrie Main.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
Jodi Humann, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Sook Jung, Jill Wegrzyn, David Neale and Dorrie Main An easy to use, web-based solution for specialty.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Copyright OpenHelix. No use or reproduction without express written consent1.
Phenotype Curation Susan R. McCouch Department of Plant Breeding Cornell University.
Development of a Cotton Marker Database (CMD) for Gossypium genome and genetic research CMD Main Goals Collect and integrate.
Welcome to the combined BLAST and Genome Browser Tutorial.
Jing Yu, Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Ping Zheng, Jodi L. Humann, Deah McGaughey, Heidi Hough, Stephen P. Ficklin, B. Todd Campbell,
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Progress on TripalBIMS Breeding Information Management System in Tripal Sook Jung, Taein Lee, Chun-Huai Chen, Jing Yu, Ksenija Gasic, Todd Campbell, Kate.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
Improvement of SSR Redundancy Identification by Machine Learning Approach Using Dataset from Cotton Marker Database Pengfei Xuan 1,2, Feng Luo 2, Albert.
T3/Tutorials: Data Submission
Association between SSR markers and
Breeding Information Management System
VI-SEEM Data Discovery Service
Genome Sequence Annotation Server
CottonGen: An Up-to-Date Resource Enabling Genetics, Genomics and Breeding Research for Crop Improvement Plant and Animal Genome Conference XXV Jing Yu1,
Overview for Breeders Jing Yu, Sook Jung, Chun-Huai Chung, Taein Lee, Ping Zheng, Jodi Humann, Deah McGaughey, Morgan Frank, Kirsten Scott, Heidi Hough,
A Breeders Perspective on using the Breeding Information Management System for Cotton Breeding Todd Campbell, Taein Lee, Sook Jung, Jing Yu, Don Jones.
Genome Sequence Annotation Server
the Genome Database for Rosaceae: New Data and Functionality
Department of Genetics • Stanford University School of Medicine
MD Online IEP System Instructional Series – PD Activity #7
Functional Annotation of the Horse Genome
CottonGen An Online Resource for the Cotton Community
Plant and Animal Genome Conference XXIV
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Cloud computing mechanisms
TAMU Bovine QTL db and viewer
for the Cotton Community
Updates and Future Direction
Welcome to the Markers Database Tutorial
Membership Login/sign in
Using CottonGen for Crop Improvement
Genome Database for Rosaceae:
Basic Local Alignment Search Tool (BLAST)
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
Membership Login/sign in
Membership Login/sign in
How to Effectively Search and Download Data in CottonGen
Applying principles of computer science in a biological context
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
CottonGen: Enabling Cotton Research through Big-Data Analysis and Integration Jing Yu, Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Ping Zheng,
Key CottonGen Features For more information contact:
About CGD/ Getting Started
2016 Beltwide Cotton Conference
Membership Login/sign in
For more information contact:
Presentation transcript:

www.cottonmarker.org Cotton Marker Database (CMD) for Genetic And Genome Research www.cottonmarker.org Anna Blenda, Pengfei Xuan, David Camak, Feng Luo, Don Jones (presenting author) ICGI-2010, Canberra, Australia

www.cottonmarker.org CMD Objectives Collect and integrate all public cotton molecular markers (SSRs and SNPs) as a cotton community resource. Accelerate utilization of molecular markers in cotton breeding. Provide data retrieval and search tools. Provide stand-alone data mining tools. Facilitate collaboration domestically and internationally The CMD database resourse is available for deposit and data mining of the cotton genomes sequencing data www.cottonmarker.org

CMD New Features 1.Primer Redundancy 2. Traits www.cottonmarker.org CMD New Features 1.Primer Redundancy 2. Traits 3. Updated Cmap Viewer with QTLs 4. New System Platform - powerful computing 5. Future/Work in Progress    www.cottonmarker.org

View section: 1. Primer Redundancy SSR Projects SNP Projects www.cottonmarker.org 1. Primer Redundancy View section:      SSR Projects SNP Projects SSRs    Markers Homology    SSR Primers Primer Redundancy    Panel    Publications    Maps www.cottonmarker.org

Importance of the Primer Redundancy Check: www.cottonmarker.org Importance of the Primer Redundancy Check: Initial step in the analysis of the CMD cotton SSRs collection redundancy. Avoidance of generating marker redundancy. Financial component is critical (spending money for non-redundant SSR markers only). Direct effect on the efficiency of the molecular breeding research.

Primer Redundancy Summary Page: www.cottonmarker.org Primer Redundancy Summary Page: - 18,002 primer sequences analyzed; 2,570 (14.2%) redundant primer sequences; Types of primer sequence match: forward-forward; reverse-reverse; forward-reverse; reverse-forward. www.cottonmarker.org

Threshold value for primer sequence match: 81% www.cottonmarker.org Threshold value for primer sequence match: 81% The threshold value (81% or higher for sequence match): chosen based on the threshold value analyses (from 70% to 100% match); - below 81% match primer redundancy increases dramatically www.cottonmarker.org

List of Redundant Sequences www.cottonmarker.org List of Redundant Sequences

Primer Redundancy Individual Pages www.cottonmarker.org Primer Redundancy Individual Pages

Redundant primer Info from View/Search SSR pages www.cottonmarker.org Redundant primer Info from View/Search SSR pages

www.cottonmarker.org Downloads Page CMD SSR Primer Redundancy results available from the Downloads page: -excel format www.cottonmarker.org

Search by Primer Redundancy www.cottonmarker.org Search by Primer Redundancy www.cottonmarker.org

Example of published traits and QTLs associated with traits 2. Traits in Cotton Linked to the Genetically Mapped SSR David Camak, undergraduate student (Erskine College) Example of published traits and QTLs associated with traits

QTL Start & Stop Positions Publication Reference Spreadsheet with Annotated Trait Data Trait Symbol Marker Interval for QTL Trait Name QTL/gene Name R2 Value Trait-linked SSR QTL/gene? Cross QTL Start & Stop Positions SSR Genetic Position Publication Reference Linkage Group QTL Span Marker Type This is what David Camak did – he annotated into the excel spreadsheet the information from current pubs regarding ag.important traits in cotton and mapped cotton SSRs linked with those traits. Trait Description

Results Twenty-nine agriculturally important traits were analyzed overall Total number of SSR markers associated with those traits was 142 The total number of crosses/genetic maps analyzed was 15 Initial results of David Camak’s undergraduate research project. The annotation of traits is being continued.

Agriculturally Important Traits Annotated Boll Size Boll Weight Bolls per Plant Color Components Yellowness Fiber Span Length (2.5%, 50%) Fiber Elongation Fiber Fineness Fiber Maturity Fiber Perimeter Fiber Strength Fiber Micronaire Lint Index Lint Percentage Lint Yield Number of Seed/Boll Reflectance Seed Cotton Yield Seed Index Seed Weight Spiny Bollworm Resistance Short Fiber Index Wall Thickness Weight Fitness Uniformity Index Genic Male Sterility

www.cottonmarker.org Number of Cotton SSRs Associated/Linked with the Analyzed Agriculturally Important Cotton Traits

Agriculturally Important Traits 1 Agriculturally Important Traits Number of SSRs Associated with Each Trait Fiber Elongation 15 Fiber Length Yellowness 14 Fiber Strength 13 Fiber Strength (kNm/kg) 12 Fiber Reflectance Micronaire Boll Size (g) 11 Lint Percentage 10 Short Fiber Index Lint Cotton Yield (kg/ha) 9 Maturity 2.5% Fiber Span Length (mm) 8 Fiber Maturity Seed Cotton Yield (kg/ha) Seed Index (g) Fiber Elongation Percentage 7

Agriculturally Important Traits 2 Agriculturally Important Traits Number of Unique SSRs Associated with Each Trait Wall Thickness 7 50% Fiber Span Length (mm) 6 Boll Weight Genic Male Sterility Micronaire Reading 5 Weight Fitness Bolls per Plant 4 Fiber Perimeter Fiber Strength (cN/tex) 2 Spiny Bollworm Resistance Fiber Length (mm) 1 Fiber Length Uniformity *These numbers are continually updated as molecular research and breeding uncover more trait-linked SSRs

View Traits Go to CMD Homepage @ www.cottonmarker.org Click on Traits

Search SSRs Listed by Traits Traits by Published Symbol Results Search SSRs Listed by Traits Traits by Published Symbol Click on any Trait

SSR Linked with Selected Trait Choose Trait Based on SSR List of SSRs Choose Trait Based on SSR Click

Trait Data From Spreadsheet Trait Information 1 QTL and Marker Information Positions for Genetic Mapping All Relevant Data from Spreadsheet Click on 1 or 2 2

1 Specific Molecular Marker Source Page Forward and Reverse Primer Sequences Molecular Marker (SSR) Other Useful Information Related to Specific SSR

Search feature available on any page, including the homepage 2 Trait Search Page Simple search for agriculturally important traits in cotton Search feature available on any page, including the homepage

3. Updated CMap Viewer with QTLs 26 cotton genetic maps are available to view and compare in CMap Viewer; QTL information was added 5) Updates in CMAp (26 maps, QTL info) Consensus map:BC1-RIL: ("Guazuncho2" (G. hirsutum) x "VH8-4602" (G. barbadense)) 2009 Reference map: Comprehensive Reference Map (CRM) CottonDB 2010 BC1: (G. hirsutum "Emian22" x G. barbadense "Pima3-79") x "Emian22" 2008 BC1: Hai-7124 x Junmian-1 2007 BC1: [(G. hirsutum "TM-1" x G. barbadense "Hai7124") x "TM-1"] 2007 BC1: [(G. hirsutum "TM-1" x G. barbadense "Hai7124") x "TM-1"] 2006 BC1: (("Guazuncho2" (G. hirsutum) x "VH8-4602" (G. barbadense)) x "Guazuncho2") 2004 BC2: (("Guazuncho2" (G. hirsutum) x "VH8-4602" (G. barbadense)) x "Guazuncho2") 2005 DH: Vgs x (TM-1 x Hai-7124) 2005 F2: Deltapine x Giza-83 2008 F2: Deltapine-61 x Texas-701 2007 F2: G. arboreum ("Jianglingzhongmian" x "Zhejiangxiaoshanlushu") 2008 F2: Xinluzao-1 x Hai-7124 2008 F2: G. hirsutum "CRI36" x G. barbadense "Hai7124" 2007 F2: G. hirsutum "Handan208" x G. barbadense "Pima90" 2007 F2: G. hirsutum "Handan208" x G. barbadense "Pima90" 2005 F2: Hai-7124 x Junmian-1 2007 F2: G. hirsutum race "Palmeri" x G. barbadense Acc. "K101" 2007 F2: G. hirsutum race "Palmeri" x G. barbadense Acc. "K101" 2004 F2: TM-1 x WT-936 2005 F2: Yumian-1 x T586 2005 F2: Acala-44 x Pima S-7 2004 RIL: 7235 x TM-1 2007 RIL: Zhongmiansuo-12 x 8891 2007 RIL: "TM-1" (G. hirsutum (AD1)) x "3-79" (G. barbadense (AD2)) 2006 RIL: "TM-1" (G. hirsutum (AD1)) x "3-79" (G. barbadense (AD2)) 2005 4WC: (Simian-3 x Sumian-12) x (Zhong-4133 x 8891) 2008

4. New Virtualization /HPC System Platform Palmetto HPC - CMD was moved to virtual machines for high-performance computing (HPC); jobs submitted by users transfer to Clemson Palmetto HPC; - very powerful computing resource ( more than 5000 computing notes); daily remote backup. ssh Event Channel Virtual MMU Virtual CPU Control IF Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) GuestOS (CentOS 5.4) Device Manager & Control s/w VM0 CMD web page Cmap Cgi-bin CMBioTools VM1(cmdweb) Front-End Device Drivers (CentOS GuestOS 5.4) MySQL PostgreSQL VM2(databases) (CentOS 5.4)) Unmodified User Software VM3(gmod) Safe HW IF Xen Virtual Machine Monitor Back-End SMP Warriors HPC Cluster - Live and development website host one virtual machine based on virtualization technology. (ready to cloud computing) Computing job submit by user will transfer to Clemson Palmetto HPC. Very powerful computing resource ( more than 5000 computing notes) Daily remote snapshot backup. Remote Daily Backup snapshot

www.cottonmarker.org Future - 100 SSRs from Siva Kumpatla (Dow Agro): a collaborative project with Texas A&M , SSRs mapped on TM-1 x 3-79 map; - 200 SSRs from Ramesh Kantety; Updating of the mapped SSR data is in progress More SNP data is coming; Annotation of traits/genes that are mapped and linked to SSRs/SNPs is in progress www.cottonmarker.org

3 pipelines were designed (Pengfei Xuan): www.cottonmarker.org Future (cont.) 3 pipelines were designed (Pengfei Xuan): 1. Eukaryotic Automated Structural Annotation Pipeline 2. Transposable elements denovo 3. Transposable elements annotation

1). Eukaryotic Automated Structural Annotation Pipeline www.cottonmarker.org (work in progress) EST based refinement (PASA) Finalize best annotation Phase 3 Genome Sequence Gene Finders EST Database (PASA) Database Comparisons Consensus prediction Manually build gene models (200 genes) Gene Finder Use gene models as Training set Repeat Masker Preliminary gene finding Phase 1 Phase 2 Manual check - Aimed to identify a vast majority of genes; raw sequences are run through a series of programs and scripts (“pipeline”) in an automated way; generates a basic working gene set as a starting point for further work. The pipeline was designed by Pengfei for CUGI initially, but we are planning to implement it into CMD. It will be very handy when genome sequences are available.

2). Transposable Elements Computational Identification (work in progress) This pipeline is searching the genome sequences for TEs and creates a library file of TEs for a genome of interest this pipeline starts by comparing the genome with itself using BLASTER. Then it cluster matches with GROUPER, RECON and PILER, clustering programs specific for interspersed repeats. For each cluster, it builds a multiple alignment from which a consensus sequence is derived. Finally these consensus are classified according to TE features and redundancy is removed. At the end we obtain a library of classified, non-redundant consensus sequences.

3). Transposable Elements Annotation (work in progress) This pipeline mines a genome using a library of TEs from TEdenovo pipeline. Identified TEs are filtered and annotated. TEannot: this pipeline mines a genome with a library of TE sequences, for instance the one produced by the TEdenovo pipeline, using BLASTER, RepeatMasker and CENSOR. An empirical statistical filter is applied to discard false-positive matches. Short simple repeats (SSRs) are annotated along the way with TRF, RepeatMasker and MREPS. Then the pipeline chains, with MATCHER via dynamic programming, TE fragments belonging to the same, disrupted copy. A "long join" procedure is subsequently applied to connect distant fragments. Finally annotations are exported into GFF3 and gameXML files.

CMD TEAM Anna Blenda, PI Feng Luo,collaborator Pengfei Xuan www.cottonmarker.org CMD TEAM Anna Blenda, PI Research Assistant Professor, Genetics and Biochemistry Clemson University Feng Luo,collaborator Assistant Professor, School of Computing Clemson University Pengfei Xuan M.S. student Computer Science Clemson University David Camak former member, currently M.S. student Biology SELU

Acknowedgements Cotton Incorporated www.cottonmarker.org Acknowedgements Cotton Incorporated www.cottonmarker.org

www.cottonmarker.org Thank you!