Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology
BLAST for Recognition of Undesirable Clones Summary of 84 Barley Libraries (ver. 0.90) #. % High quality sequences282,720 E. coli genome Lambda genome rRNA6, Chloroplast2, Mitochondrion Fungal cDNA Repetitive Elements Low complexity1, Odd vector Both polyA & polyT Total Good271,
Unigenes in ESTs in Current Assembly Ideally: one “unigene” per gene in the genome, expecting ~50,000 based on rice. Maximum unigene count in ESTs: the sum of the number of contigs and singletons following assembly: Contigs24,208 Singletons24,899 Total49,107 Minimum unigene count in ESTs: the sum of the number of contigs and singletons that have good 3’ ends: Contigs14,589 Singletons 7,219 Total21,880
Microarray ChipGene Expression Data The Immediate Objective
Barley 2H Caleosins Hvcal1Hvcal2 Barley 2H Steptoe x Morex Rice R4 Gene Map Oscal1 Oscal2 BAC OSJB cM 0cM 77cM EST alignment
TIGR Rice Caleosin Gene Models OSCal01(R4) OSCal03(R3) OSCal02(R4)
Comparison of Gene Structures of Barley and Rice Caleosins
Homology of Wheat G3 Deletion line mapped ESTs to Rice Chromosomes
General Comclusions EST sequence May lack polyA Reading frame may be ambiguous Exon/intron boundaries may not be obvious We don’t have all barley genes despite >330,000 ESTS. (probably between 33% to 50%. Value of comparative studies with rice BUT poor annotation (actually appalling) Rice genomic sequencing is work in progress Comparative route is OK but can’t be only game in town. Several examples of genes not being there !!!
Major Issues Data validation »Errors in public database sequence »Errors in annotation »‘Chinese whispers’ – anchoring annotation in biochemistry Comparative Data »Rice > wheat > maize – but also Arabidopsis »When is homology actually orthology ? »Partial data sets »% match only part of the story »Need for domain/feature information – mammalian/bacterial bias »Everything in work in progress ? Where are the data sources »dbEST »Nr nucleotide database at NCBI »Gramene at CSHL »TIGR »GrainGenes/wEST at USDA, Albany »CUGI > AGI »Iowa State/USDA »Harvest/Foxpro »ContEST at SCRI »The horses mouth
Phenotype Sequence Sd1 – green revolution gene in rice. Mutation in gibberellin- 20 oxidase (plant hormone production pathway) one member of a small gene family other members have subtely different pattern of expression able to partially compensate for mutation. Rht1 – green revolution gene in wheat. Mutation in receptor response pathway. Copies in all 3 wheat genomes Barley - commercially significant dwarfs from both of these and several other pathway or response genes.
Acknowledgements Robbie Waugh Peter Hedley, David Caldwell, Luke Ramsay, Hui Liu Linda Cardle Paul Shaw Arnise Druker Doreen Ware Dave Mathews Tim Close Olin Anderson