Download presentation
1
ProRepeat a comprehensive directory of exact tandem repeats in proteins
2
PolyQ and neurodegenerative diseases
9 diseases causes by polyQ repeats HD DRPLA SCA 1,2,3,6,7,17 Kennedy’s disease (SBMA)
3
Androgen receptor (AR)
Transcription factor, mediating the effect of androgens on gene expression Gene located on the X chromosome, divided into three functional regions: one variable, the N-terminal transregulation domain (NTD) and two highly conserved, the DNA binding domain and the C-terminal ligand-binding domain (LBD) Involved in differentiation between male and female phenotype Responsible for SBMA (spinal and bulbar muscular atrophy) or Kennedy’s disease Céline Poux, RU
4
Androgen receptor (AR)
Polymorphic polyQ repeat in the NTD ranges between 9 and 35 residues, with an average of 20 to 25 depending on ethnic origin Transcriptional activity depends on the intramolecular interaction between the NTD and the LBD inversely correlated with length and flexible structure of polyQ tract Differences in polyQ tract length can have important consequences longer tracts : feminization syndromes shorter tracts : prostate cancer susceptibility repeat exceeds 40 residues : SBMA NDT can contain other repeat tracts in mammals, such as polyP, polyG or polyQ Céline Poux, RU
5
Androgen receptor (AR)
Transcription Factor HORMONE BINDING TRANSCRIPTIONAL REGULATION DNA BINDING NH3- -COOH T1 T2 T3 Region 1 Region 2 Region 3 polyQ tract length has important consequences ■ shorter tracts : prostate cancer susceptibility ■ longer tracts : feminization syndromes ■ over 40 residues : SBMA (spinal and bulbar muscular atrophy) or Kennedy’s disease 9-35 residues, average of depending on ethnic origin I will present one of these genes: the Androgen receptor gene. We chose this gene because it was one of the two polyQ genes involve in a neurodegenerative disease for which the function of the protein was known. The Androgen receptor is a transcription factor that mediates the effect on genes expression of androgens. Its action is important in the differentiation process between male and female phenotype. This protein is encoded by a gene situated on the chromosome X and divided into three main regions: the first exon encodes for the transregulation domain, the DNA binding domain and the ligand biding domain. No deleterious mutation has been recorded in the first exon except the CAG repeats domain. The length of the repeat seems modulate the activity of the protein in such a way that the shorter the track is, the stronger the effect of the protein will be. When the number of repeats increases beyond the normal length, beside feminization problems, the individuals get a great chance to develop a Spinal and Bulbar Muscular Atrophy or Kennedy disease. Few poly amino acid repeats were already known in the first exon the polyQ responsible for the disease, a polyProline and a polyGlycine at the end of the exon. After sequencing of more than 30 mammals species it turned out that the Androgen receptor is a real “slippery protein” wit a lot of repeats often short and occurring in few species, like a polyAlanine repeat in the kangaroo.
6
PolyQ in AR Collection of polyQ repeats
792 human individuals available from earlier study (Edwards, 1992) 26 armadillo individuals sequenced by CP 77 mammals and marsupials from protein database Céline Poux, RU
7
What about repeats in other proteins?
ProRepeat database Data sources: UniProt and RefSeq Limited to exact tandem repeats Standard, linear-time suffix tree algorithm Stored in Oracle 10g Interface in PHP5 unit length repetitions 1 ≥ 5 2 ≥ 4 3 ≥ 3 4 .. N ≥ 2 Maarten van den Bosch, WUR
8
DE is equivalent to ED; DEF is equivalent to EFD and FDE
Simple query syntax: e.g. “Q” or “DE” DE is equivalent to ED; DEF is equivalent to EFD and FDE
10
Or use ProSite syntax: e.g. “[DE]-{P}-X(0,1).”
11
Taxonomic distributions of hits
13
Sorting/grouping options
Identifier Repeat unit Repetitions Unit length Length Start location End location Protein Taxonomy Ontology
14
Link to DNA data DNA coding sequences of available repeats also stored in the database Extracted from EMBL and/or RefSeq Hong Luo, WUR
15
Link to DNA data / errors
Approximately 3% of corresponding nucleotide sequences cannot be retrieved Errors caused by No links to nucleotide database (35%) NO_ANNOTATED_CDS No EMBL links Annotation errors in the nucleotide database (65%) Error type III: Join the complement of exon1, exon2, exon3 etc instead of complement the join
16
Guido Kappé, RU
17
T S Q G P A E
18
Verdeling van aminozuren in subgroepen van peptides binnen een proteoom. Bekeken zijn Single Amino Acid repeats (SAA), alle repeats minus de SAAs, en alle eiwitten (samenstelling) minus de repeats
19
Ter vergelijk, arabidopsis, waar Ser het meest abundant is in SAAs.
20
Current work Annotation of repeats versus function
Adding imperfect tandem repeats - a.k.a. approximate tandem repeats (ATR) – to the database Offering remote access via web services (WSDL and BioMoby) Expansion of the analysis capabilities of the interface
21
PolyQ in AR (reprise) Impure tracts longer and more variable than pure CAG tracts (mainly CAA, CCG, and CGG) Presence of other codons better explained by codon duplication than multiple point mutations interrupting codons are part of elongation process, rather than hampering their dynamics as proposed previously Negative correlation between lengths of the different CAG tracts maximal expansion length that protein can handle without being deleterious Céline Poux, RU
22
Acknowledgements Wageningen University and Research Centre
Maarten van den Bosch Hong Luo Mark Kramer Harm Nijveen Radboud University, Nijmegen Guido Kappé Céline Poux Wilfried W. de Jong This work was supported in part by project grants from NWO/BMI (GK, CP) and the NBIC/BioAssist program (HN)
23
Thank you for your attention!
See also our posters on phylogenetic domain visualisation (TreeDomViewer) and microarray (re)annotation at the ISMB Post-doc positions available: contact or
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.