Gene Discovery by use of MySQL Background – myself NsGene – DTU satellite Parkinson Disease (Affymetrix GeneChip) Analysis of fetal brain tissue Search for new protein families MySQL & bioinformatic tools
Background Thomas Nordahl Petersen Chemist, Ph.D protein Crystallography, University of Copenhagen Computational Scientist, SBI-AT (Hørsholm) Prediction of protein structure, secondary structure, fold recognition, homology modeling Bioinformatics - Gene discovery, NsGene Devolop novel cell and gene based products for the treatment of neurological diseases.
Growth of cells in a capsule matrix The therapeutic protein be released directly in the relevant brain area Safe delivery across the blood-brain-barrier ECT Products ECT for Parkinson’s Disease Michael J. Fox foundation granted US $3 million to support a clinical “proof-of-concept” (May 2004)
Identification of novel genes by use of bioinformatics NBN (GDNF family – potent neuroprotective effects) Factor Products Scanning the human genome or assembled protein sets for different features of interest
A case study Search for Parkinson related gene(s) Affymetrix GeneChip experiments Fetal brain tissue
Parkinson Disease Degenerative central nervous system (CNS) disorder
Parkinson Disease Loss of dopamine producing brain cells
Parkinson’s Disease Dopamine from Substantia nigra activates neurons in Striatum/Basal ganglia Important for initiation of movement
Cure for Parkinson’s Disease ? Parkinson disease may be cured provided that new dopamine producing cells replace the dead ones. Dopamin producing brain cells from aborted foetuses have been operated into the brain of parkinson patients and ín some cases cured the disease. Brain tissue from approx 6 foetuses were needed. Major ethical problems ! Search for a protein drug is the only valid option
Parkinson Disease Dopamine producing cells Dopaminergic neurons can be found in the ventral part of the mesencephalon (VM) from approximately 6 weeks No dopaminergic neurons can be found in the neighbouring dorsal part (DM). Dopaminergic differentiation by use of GeneChips to compare the expression profiles of VM and DM
Fetal brain tissue Midbrain mesencephalon VmDm + Dopamine producing cells - Dopamine producing cells Aborted feotus brain tissue – Karolinska hospital Feotus of age 6-10 weeks, 2 cases
Midbrain mesencephalon Vm Dm + Dopamine producing cells - Dopamine producing cells RNA purification + amplification Affymetrix genechip analysis Isolate the two samples (Vm/Dm) Dopamine producing cells at the interface ?
GenePublisher (program by Steen Knudsen) Scale, normalize the Affymetrix GeneChip experiments A1A2A2B1B2B2P-value e e e e
Vulcano plot P-value Log2 Fold change
Assigning Affymetrix GeneChip probes to a protein sequence ~ probes on each of the A/B Affymetrix chips. The probes are normally not a part of a protein sequence. Affymetrix probe Blast IPI protein sequence Blast inferred Unigene sequence (cDNA) 5’3’
Internal database
Signal Peptide prediction
Conclusion – so far The most up-regulated genes include several ‘known’ genes like dopamine transporter (good positive control) The most interesting genes are the ‘unknowns’ that were up-regulated in Vm. Futher analysis is ongoing. Roland JR et al., Exp Neur (2006) Vol 198,2, “Identification of novel genes regulated in the developing human ventral mesencephalon”
A new growth factor family Criteria ‘Unknown’ family of protein sequences Growth factor like (Cys-Cys, SigP) Data source Assembled protein set/genomic data Search criteria are dynamic Use of MySQL
MySQL – a relational database language Data are stored in tables as a ’black box’ Data physically separated from user Language is easy to read and understand Complex search queries Combine data in different tables/databases Result can be obtained in seconds Search criteria can be changed
Parsing Blast files (Preparing data for MySQL) # Qname Dname Mlen Alen Qlen % a_id % q_ide-value Qfrom Qto Dlen DfromDto IPI STAU_HUMAN IPI RASN_HUMAN e IPI RASH_HUMAN e IPI RASK_HUMAN e IPI RASL_HUMAN e IPI ZNT1_MOUSE e IPI CSL2_HUMAN IPI SFR4_HUMAN IPI LMA3_MOUSE e
Storing data from blast alignments FieldType query_dbenum('hs_2_18','hs_2_23','affym','mm_1_11','affym_mouse') query_accvarchar(20) target_dbenum('swissp','mm_1_11','sid','sid_mouse’) target_accvarchar(20) align_lensmallint(6) match_lensmallint(6) query_lensmallint(6) perc_align_lenfloat(5,1) perc_query_lenfloat(5,1) minus_ln_efloat(6,2) query_fromsmallint(6) query_tosmallint(6) target_fromsmallint(6) target_tosmallint(6) target_lenint(11)
MySQl example SELECT a.query_db, a.query_acc, a.target_db, a.target_acc, a.perc_align_len, a.minus_ln_e, b.target_db, b.target_acc, c.cleavage_site FROM blastdb AS a, blastdb AS b, signalp AS c WHERE a.query_db='hs_2_23' AND a.target_db = 'mm_1_11' AND a.target_acc != 'NULL' AND b.target_db='swissp' AND a.query_acc=b.query_acc AND b.target_acc='NULL' AND c.query_db='hs_2_23' AND c.query_acc = a.query_acc AND c.cleavage_site >= 15 AND c.cleavage_site<=45;
Output from MySQL query_dbquery_acctarget_dbtarget_accperc_align_lenminus_ln_etarget_dbtarget_acccleavage_site hs_2_23IPI mm_1_11IPI swisspNULL35 hs_2_23IPI mm_1_11IPI swisspNULL26 hs_2_23IPI mm_1_11IPI swisspNULL21 hs_2_23IPI mm_1_11IPI swisspNULL45 hs_2_23IPI mm_1_11IPI swisspNULL30 hs_2_23IPI mm_1_11IPI swisspNULL38 hs_2_23IPI mm_1_11IPI swisspNULL44 hs_2_23IPI mm_1_11IPI swisspNULL44 hs_2_23IPI mm_1_11IPI swisspNULL42
Clustering of protein sequences Tribe-mcl sequences clusters Store in MySQL 1) Cluster size ACPGICSKSCCPF LTPALCSRTCCPY 2) Cys-Cys (3)
Conserved Cys-Cys Many growth factor families have their own specific Cys- pattern,TGF- family. Transforming growth factor- is a multifunctional peptide that controls proliferation, differentiation and other functions in many cell types. Search for Cys-pattern without any a priori knowledge
Search criteria Family cluster size > 1 No SwissProt homologues Cys count > 4 Signal Peptide Mouse homologue/orthologue 48 Families Manual inspection of alignments (- isoforms) Upload remaining sequences to internal database
Internal database
Tissue-specific expression 100 bp ladder Universal ref Whole brain Heart Kidney Liver Lung Placenta Prostate Salivary gland Skeletal muscle Spleen Testis 100 bp ladder Thymus Thyroid gland Trachea Uterus Colon Small Intestine Spinal Cord Fetal Liver Fetal brain Pancreas Neurosphere ctrl dH2O 100 bp ladder
Outcome from Gene Search Family including 5 sequences At least 8 Cys Predicted as growth factors/hormones ~125 – 140 amino acids
Outcome from Gene Search Family including 2 sequences - approx 30% seqid 11 of 16 Cys are conserved Effect on cultured neural cells