PI & Project Coordinator: Calvin Qualset Project Manager: Patrick McGuire Objectives 1 and 2. EST Production Coordinator: Olin Anderson Objective 5. Bioinformatics.

PI & Project Coordinator: Calvin Qualset Project Manager: Patrick McGuire Objectives 1 and 2. EST Production Coordinator: Olin Anderson Objective 5. Bioinformatics Coordinator: Olin Anderson Objective 3. Mapping Coordinator: Bikram Gill Objective 4. Functional Genomics Coordinator: Mark Sorrells Objective 6. Genome Structure & Evolution Coordinator: Jan Dvořák EST Arrays SAGE Sequence Matching Deletion Mapping Comparative Mapping Objs. 1 & 2. EST Production cDNA libraries Screening/normalizations Sequencing Data analysis DNA storage/distribution Obj. 6. Genome Structure & Evolution Obj. 3. Mapping Obj. 4. Functional Genomics Objectives and coordination structure b. The second round of global phrap assembly was done on April 1, 2002 on 77,022 Project ESTs. Of these, 70,074 were 5' ESTs and 6,948 were 3' ESTs. They were assembled into 11,758 contigs. c. ESTs selected from each contig and those of unassembled ESTs are the resources for singleton selection. Altogether, about 32,000 ESTs are in this resource pool for further screening. d. ESTs containing sequences found to match retro- elements, E. coli, phage, mitochondrial, and chloro- plast gene sequences are removed. Sequence comparison is done using the cross_match program. e. Validation process—Redundant ESTs are further screened and removed by comparing 3' sequence data with 3' sequences of previously identified singletons. The resulting singletons were rearrayed for probe distribution. 2. Mapping results As of Aug. 27, 8,789 probes have been sent out to the 10 labs and mapping data have been returned for 46% of the distributed probes. At Albany, mapping data are processed to display the mapped probes by chromosome bin position, defined by the deletion line break points. The Project has assigned a coordinator from among the investigators for each homoeologous chromosome group who review and validate the assignments of the probe locations. Each probe may identify more than one loci and, at this point, the validated probe locations account for 7,985 individual loci mapped to chromosome bins. Approach: Initially, the plan was to produce and analyze with respect to function microarrays of the mapped EST singletons in 10 labs focusing on five aspects of wheat reproduction. As a result of an NSF mid-term site review, the plan now is to develop a test array, organize and hold a training workshop in microarray construction and analysis for Project personnel, and evaluate microarray production strategies for wheat. Status: Technology development for using cDNAs in microarray analysis has been initiated at Albany. All equipment (arrayer and scanner) are in place and operational in the Albany labs of O.D. Ander- son and D. Laudencia-Chingcuanco, and printing of a limited number of a test array is underway. RNA has been prepared to test this initial array and begin evaluation of analysis software options. The training workshop was held in August (see Training). An evaluation of the suitability of arrays of long oli- gonucleotides for transcriptional analysis in wheat is being carried out. Information sought include an estimate of the optimal size of oligos to represent the wheat ESTs and a comparison of an oligo microarray with a cDNA array (PIs Steber, Sorrells, and K.S. Gill). 4.To determine functional activity of the mapped ESTs relevant to reproductive biology of wheat. Approach: Develop and enhance means to analyze, interpret, and visualize Project data (data pro- cessing, database modifications, and web page maintenance). Status: Protocols were established for data entry and the linking of the EST data to the records for the mapped loci. All the mapping laboratories participate by submitting hybridization results through a web- based interface to the central bioinformatics site in Albany. The information is parsed through Perl scripts which prepared the submitted information for database entry. All hybridizations were scanned by each submitting lab and formatted to an image tem- plate, and then submitted to this central database. To date, 3408 images are on line. Data are viewable at the public website (http://wheat. pw.usda.gov/NSF/). An interface was developed for the mapping coordinators to survey results and veri-fy scoring of the results. Both validated (“Confirmed”) and nonvalidated (“Unconfirmed”) data are presen-ted along with a disclaimer making clear the prelim-inary nature of the unconfirmed locations. Using a relational database built with mySQL, several dis-play options are available to users through queries with such criteria as location, status of verification, or mapping lab origin. One database uses the ACEDB biologically oriented database program and the other uses the mySQL relational database program. Information from ACEDB is available through the webace/AceBrow- ser interface which also includes links to EST and contig assembly information. The ACEDB display is familiar to many of the Triticeae working laboratories familiar with GrainGenes. The mySQL relational database also has these links and, in addition, con- tains specialized constructions for data-mining the relationships of loci to ESTs and contig assembly information. The mySQL database is a version built for efficient mining of the archived information. A user-friendly link to EST data from map information was also created. Databases allow linking of the mapped information to other information associated with the EST project. In some cases, external links are made to resource sites and related projects. Co-PI Close has contributed to the annotations for the Project cDNA libraries that are available from the Project website. In addition, he and a program-mer (Steve Wanamaker) have developed a stand-alone tool for creating contig assemblies of EST data (HarvEST, http://harvest.ucr.edu). This data-base integrates all Triticeae EST data, including wheat and rye ESTs generated by this Project and the CUGI Barley EST Project (http://www.genome. clemson.edu/projects/barley/) and allows analyses of the relationships of ESTs assembled into contigs and their cDNA library of origin. 5.To process, analyze, and display data accumulated in this project (bioinformatics). 6.To analyze gene density and distribution of mapped ESTs and thus genes in the wheat genomes (genome structure and evolution). Approach: Analyze densities and distributions of ESTs in deletion maps. Status: The database of mapped ESTs became large enough in this past year to allow (1) study of wheat transcriptome structure and evolution and (2) comparisons of wheat ESTs with sequence information from other taxa. For (1), one manuscript, directed by co-PI Dvořák and postdoc E. Akhunov, has been submitted and a second is in preparation. For (2), a manuscript, directed by co-PI Sorrells, is in preparation. 1. Analyzing 3977 ESTs mapped into chromosome deletion bins, it was found that single-gene loci that were not subjected to gene duplication and loci an- cestral to duplicated loci are most frequently found in proximal chromosome regions, while multi-gene loci and loci derived by duplication are most frequently found in distal chromosome regions. This distribution correlated with increasing recombination rates from centromere to telomere along chromosome arms. It is suggested that recombination has played a central role in evolution of wheat transcriptome structure and that microsynteny of the wheat transcriptome is diverging faster where recombination is higher. 2. Analyzing 2835 ESTs mapped into chromosome deletion bins and segregating populations in comparison to the public rice genome sequence data from ordered BAC/PAC clones, revealed strong similarities between the resulting DNA sequence- based comparative map and previously published comparative maps based on RFLPs. While there appears to be extensive conservation of both gene content and order at the resolution conferred by the physical chromosome deletions in the wheat genome, there has also been an abundance of rearrangements, insertions, deletions, and dupli- cations that may complicate the use of rice as a model for cross-species transfer of information in nonconserved regions. Bioinformatics personnel: Data Curator Shiaoman Chao, based at Albany, and Bioinformatics Pro- grammer Hugh Edwards, based at Cornell Univ., are supported by collaboration with USDA ARS bioinformatics specialists in Albany (G. Lazo) and at Cornell (D. Matthews). Objectives, approaches, and status after 36 months (9/1/99–8/31/02) 2.To determine the base-pair sequence of these cDNAs, yielding ESTs. Approach: In-house, single-site 5' sequencing of approx. 3000 clones in at least 30 libraries, with 3' sequencing of putative singletons. Status: Sequencing has been carried out at O. An- derson’s lab, Albany CA. To date, over 90,000 5'- sequenced ESTs have been generated from 41 of the libraries (Table 1). Library quality was evaluated based on (1) number of empty clones or clones containing vector sequence or short adapter sequence only, (2) number of clones containing ribosomal RNA sequence contamination, (3) number of clones with reversed orientation (most of the libraries were made with cDNAs cloned in the fixed direction). Library complexity was evaluated based on the level of clone redundancy using the method of comparing all the 5' ESTs within each library. ESTs are considered redundant if they show a degree of similarity and overlapping with other ESTs. These ESTs can be grouped and assembled together into a contig. Representatives from each contig and those ESTs not forming contigs are singleton candidates. Those libraries exhibiting the highest proportions of singleton candidates are considered to be of higher complexity, thus worth extensive sampling. EST assembly analysis was carried out among libraries. This analysis has indicated that among the 90,000 ESTs generated so far, about 22,000 are singleton candidates (Table 1). More analysis is underway to char- acterize and identify unique gene sequences. 1.To produce cDNA libraries from as many tissue and condition combinations as possible. Approach: Produce multiple cDNA libraries from mRNAs isolated in several labs with a target of 30 total libraries. Status: This work is essentially completed. 50 cDNA libraries are now available to the Project. 28 were made at T. Close’s lab at the University of California, Riverside, eight are from H. Nguyen’s lab at Texas Tech University, and 14 were contributed from other sources. Tissue sources included spikes sampled at various developmental stages, anther, embryo, endosperm, young seedling, root, crown, and flag leaf and sheath. Tissues were sampled under various treatments, such as drought stress, cold stress, salt stress, aluminum stress, ABA treatment, and vernalization. Of these libraries, 41 have been used to date for ESTs (Table 1). In year 2, the B. Gill lab (KSU) held a workshop (Feb. 11–16, 2001) for the postdocs from the 10 mapping labs to ensure standard mapping and data entry protocols. Also in year 2, Project PIs were successful with an NSF REU proposal to support participation of 13 under- graduates in Project labs. In year 3, a microarray production and analysis workshop was held for 8 Project postdocs and graduate students in the D. Laudencia-Chingcuanco lab (USDA- ARS, Albany) (Aug. 12–16, 2002). Training The Project’s goal is to generate and map a large number of unique DNA sequences from the bread wheat genomes. The assumption is that these unique DNA sequences will correspond to individual genes of wheat and their identification is a first step in determining gene function. The ultimate use of this information is the improvement of wheat quality, yield, and adaptability to new and marginal environ- ments, thus increasing production. Because of the large size of the wheat genomes, it is unlikely that the actual base-pair sequences of the DNA molecules will be learned completely in the near future. This Project takes an alternative strategy to realize the benefits of new techniques for discovering genes and learning their function. Fol- lowing the identification of 10,000 unique wheat DNA sequences (termed ESTs, Expressed Se- quence Tags), they will be mapped to their physical location on wheat chromosomes using a set of deletion stocks. The information gathered on the sequence and position of these genes in the wheat chromosomes is publicly available, distributed by means of the website created for this Project. The results from this Project will be immediately applicable to other crops, because of the close rela- tionship of wheat to other species in the Triticeae tribe and other grass species, especially corn and rice. The diversity of experimental techniques and traits pursued in the individual laboratories collabor- ating on this Project is an ideal training ground for graduate students and postdoctoral scientists. The large pool of well-characterized and mapped unique DNA sequences, available in the public domain will be an exceedingly important resource for future Triticeae research and basic functional genomics research. Introduction DBI-9975989 The Structure and Function of the Expressed Portion of the Wheat Genomes Contract Agreement DBI-9975989 3.To map into wheat deletion stocks a set of 10,000 unique ESTs. Approach: Map EST singletons into bins defined by wheat deletion stocks; target is 10,000 mapped singletons. Status: 1. Singleton selection strategy: a. Processed 5' ESTs were searched against NCBI’s nonredundant nucleotide (blastn) and pro-tein (blastx) databases. Distribution of research investigators by objective Obj. 6 Genome structure & evolution Investigator Obj. 1 cDNA libraries Obj. 2 cDNA sequencing Obj. 3 Mapping Obj. 4 Functional genomics Obj. 5 Bioinfor- matics OD Anderson UCDavis/ARS X* X TJ Close UCRiverside XX HT Nguyen U Missouri XXX BS Gill Kansas State U X*XX ME Sorrells Cornell U XX*X J Dvořák UCDavis XX J Dubcovsky UCDavis XX KS Gill Wash State U XX JP Gustafson U Mo/ARS XX SF Kianian N Dak State U XX JA Anderson U Minn X NLV Lapitan Colo State U X CM Steber Wash State U/ARS X * designates coordinator for the corresponding objective. X* X TA001E1Xendosperm (Cheyenne)2,7284171,125305 TA001E1Sendosperm subtracted (Cheyenne)2692321855 TA005E1Xdehydrated seedling79582622168 TA006E1Xunstressed shoot2,2613751,224433 TA006E2Nunstressed shoot normalized1,68633626852 TA006E3Nunstressed shoot normalized1,6721391,338836 TA007E1Xcold-stressed seedling938107696181 TA007E3Scold-stressed seedling subtracted1555203956816 TA008E1Xetiolated root4,0176432,143747 TA008E3Netiolated root normalized4,3089631,739702 TA009XXXspike (Sumai3)10,2871,8544,8813,003 TA012XXXABA-treated embryo (Brevor)2,2072641,491625 TA015E1Xheat-stressed seedling821100567200 TA016E1Xvernalized crown2,2862831,555496 TA017E1X20 to 45 DAP spike1,076127422119 TA018E1X5 to 15 DAP spike2,8604151,581499 TA019E1Xpre-anthesis spike11,1941,7545,2012,766 TA027E1Xdrought-stressed leaf (TAM W101)90594635231 TA031E1Xheat-stressed flag leaf97386710243 TA032E1Xheat-stressed spike1,01297716259 TA036E1Xdrought-stressed leaf64155485165 TA037E1Xsalt-stressed sheath964123559166 TA038E1Xsalt-stressed crown94375743207 TA047E1Xroot tip959125682178 TA048E1XAl-stressed root tip (BH1146)991143646214 TA049E1Xdormant embryo (Brevor)2,9274381,519714 TA055E1Xdrought-stressed root1,023116769345 TA056E1XAl-stressed root tip1,032174657219 TA058E1Xunstressed root at tiller stage1,025127770286 TA059E1Xwhole grain (Butte)3,6496241451509 TA065E1Xsalt-stressed root2,0552881385585 TA066E1Xmixed tissue1,404211864303 TM011XXXvegetative apex (acc. DV92)30314321,906937 TM043E1Xearly reproductive apex (acc. DV92)2,6473821,516673 TT039E1Xwhole plant (Langdon-16)1,194123765241 SC010XXXAl-stressed root tip (Blanco)1,198105905457 SC013XXXcontrol root tip (Blanco)77857649319 SC024E1Xanther (Blanco)4,6316391,994987 AS040E1Xanther2,4663301,408591 AS067E1Xanther1,044134695231 Total (9/23/02)91,71522,001 *In the Name field, TA indicates Triticum aestivum, TM is T. monococcum, TT is T. turgidum, SC is Secale cereale, and AS is Aegilops speltoides. All of the TA libraries are from the Chinese Spring genotype except where indicated otherwise in parentheses in the Tissue field. Table 1. Sequencing status by library Within a libraryAmong all libraries No. unassem- No. ESTs Name*TissueNo. ESTsNo. contigsbled ESTs (unique to library)

PI & Project Coordinator: Calvin Qualset Project Manager: Patrick McGuire Objectives 1 and 2. EST Production Coordinator: Olin Anderson Objective 5. Bioinformatics.

Similar presentations

Presentation on theme: "PI & Project Coordinator: Calvin Qualset Project Manager: Patrick McGuire Objectives 1 and 2. EST Production Coordinator: Olin Anderson Objective 5. Bioinformatics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PI & Project Coordinator: Calvin Qualset Project Manager: Patrick McGuire Objectives 1 and 2. EST Production Coordinator: Olin Anderson Objective 5. Bioinformatics.

Similar presentations

Presentation on theme: "PI & Project Coordinator: Calvin Qualset Project Manager: Patrick McGuire Objectives 1 and 2. EST Production Coordinator: Olin Anderson Objective 5. Bioinformatics."— Presentation transcript:

Similar presentations

About project

Feedback