Gene Discovery by use of MySQL Background – myself NsGene – DTU satellite Parkinson Disease (Affymetrix GeneChip) Analysis of fetal brain tissue Search.

Slides:



Advertisements
Similar presentations
Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
Advertisements

MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
The cloning and expression of SNAP-25a and b in zebrafish Maia Lavarias*, Dr. Wendy Boehmler Department of Biology, York College of Pennsylvania, York,
DNA Chip Scanning DNA Chips are scanned with a laser to excite the fluorescein dye that is attached to the target cDNA. Only those probe spots where target.
Affymetrix case study Jesper Jørgensen NsGene A/S
Mathematical Statistics, Centre for Mathematical Sciences
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Gene Discovery & Genome Browsing
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Gene Search by use of MySQL Background – myself NsGene – DTU satellite Parkinson Disease (Affymetrix GeneChip) Analysis of fetal brain tissue Gene Discovery.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Review of important points from the NCBI lectures. –Example slides Review the two types of microarray platforms. –Spotted arrays –Affymetrix Specific examples.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
PDbase : A database of Parkinson’s Disease-related genes and genetic variation using substantia nigra ESTs Jin Ok Yang Korean BioInformation Center (KOBIC)
 Cells – smallest unit of life  Tissue – a collection of cells that work for a common function.
숙명여자대학교 / 생명과학과 박종훈 Biotechnology 의 최신 동향. 1. Genome: > 75 yrs 2. Genomics : T. Roderick in 1986 Structural genomics Functional genomics Functional Genomics.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Development of Bioinformatics and its application on Biotechnology
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
11 Organ Systems Protection, Support, and Movement
Neural Progenitor Cells as Replacement Therapy for Diseased and Aging Brains. R.G. Jarman, E. Alveraz, C.R. Freed; Division of Clinical Pharmacology, Dept.
Health Biotechnology Stem Cell Therapeutics; Tissue Engineering LECTURE 21: Biotechnology; 3 Credit hours Atta-ur-Rahman School of Applied Biosciences.
Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Finish up array applications Move on to proteomics Protein microarrays.
RT-PCR:PRKWNK1, WNK1, T39 T40 (232 bp) Bedingungen: 2% TBE 90 V 1h 40 min 9 µl PCR-Probe 2 µl Ladepuffer _Gewebe_1_A1_T39T40_calb M
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
RT-PCR: PRKWNK1,WNK1, T39 T40 (232 bp), Reamplifikation _Reampli_Gewebe_1_A1_T39T40_calb Bedingungen: 2% TBE 90V 1h 40min 9 µ Reampli-Probe 2 µl.
The European Bioinformatics Institute Atlas of Gene Human Gene Expression Proposal - resources Alvis Brazma, Tom Freeman and Helen Parkinson.
RT-PCR: PC326, T146 T147 (240 bp) _Gewebe_1_B1_T146T147_calb Bedingungen: 2% TBE 90V 1h 40min 9 µl PCR-Probe 2 µl LB M
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Overview of Bioinformatics 1 Module Denis Manley..
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Central dogma: the story of life RNA DNA Protein.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
RT-PCR: BCLG, T84 T85 (487bp) _Gewebe_1_A2_T84T85_calb Bedingungen: 2% TBE 90V 1h 45min 9 µl PCR-Produkt 2 µl LB M
Introduction to Human Anatomy and Physiology. Anatomy – the structure of body parts (also called Morphology) Physiology – the function of the body parts,
Molecular characterization of the DYX1C1 gene and its application as a cancer biomarker Yun-Ji Kim 1 *, Jae-Won Huh 1,2 *, Dae-Soo Kim 3, Min-In Bae 1,
Α-synuclein transgenic mouse models of Parkinson’s disease Michelle Maurer December 2015.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Microarray: An Introduction
Nervous System By the end of the lesson you should be able to Describe the transmission of impulses from senses to central nervous system and back to muscles.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
STEM CELLS A cell that has the ability to continuously divide and differentiate (develop) into various other kind(s) of cells/tissues. Stem Cell Characteristics:
Expression Data Integration Microarray Gene Expression Database Meeting Sunday 14th November 1999.
RT-PCR: RBP-MS, T52 T53 (427 bp), Reamplifikation _Reampli_Gewebe_1_B1_T52T53_calb Bedingungen: 2% TBE 90V 1h 45min 9 µ Reampli-Probe 2 µl LB M 1.
Uncovering the Protein Tyrosine Phosphatome in Cattle
Statistical Applications in Biology and Genetics
Myopodin, a Synaptopodin Homologue, Is Frequently Deleted in Invasive Prostate Cancers  Fan Lin, Yan-Ping Yu, Jeff Woods, Kathleen Cieply, Bill Gooding,
Gene expression.
*Habibi N [1], Mustafa AS [1,2], Al-Shammari S [3], Shaheed F [1]
Microarray Technology and Applications
Loyola Marymount University
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
A Novel Gene Causing a Mendelian Audiogenic Mouse Epilepsy
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Presentation transcript:

Gene Discovery by use of MySQL Background – myself NsGene – DTU satellite Parkinson Disease (Affymetrix GeneChip) Analysis of fetal brain tissue Search for new protein families MySQL & bioinformatic tools

Background Thomas Nordahl Petersen Chemist, Ph.D protein Crystallography, University of Copenhagen Computational Scientist, SBI-AT (Hørsholm) Prediction of protein structure, secondary structure, fold recognition, homology modeling Bioinformatics - Gene discovery, NsGene Devolop novel cell and gene based products for the treatment of neurological diseases.

Growth of cells in a capsule matrix The therapeutic protein be released directly in the relevant brain area Safe delivery across the blood-brain-barrier ECT Products ECT for Parkinson’s Disease Michael J. Fox foundation granted US $3 million to support a clinical “proof-of-concept” (May 2004)

Identification of novel genes by use of bioinformatics NBN (GDNF family – potent neuroprotective effects) Factor Products Scanning the human genome or assembled protein sets for different features of interest

A case study Search for Parkinson related gene(s) Affymetrix GeneChip experiments Fetal brain tissue

Parkinson Disease Degenerative central nervous system (CNS) disorder

Parkinson Disease Loss of dopamine producing brain cells

Parkinson’s Disease Dopamine from Substantia nigra activates neurons in Striatum/Basal ganglia Important for initiation of movement

Cure for Parkinson’s Disease ? Parkinson disease may be cured provided that new dopamine producing cells replace the dead ones. Dopamin producing brain cells from aborted foetuses have been operated into the brain of parkinson patients and ín some cases cured the disease. Brain tissue from approx 6 foetuses were needed. Major ethical problems ! Search for a protein drug is the only valid option

Parkinson Disease Dopamine producing cells Dopaminergic neurons can be found in the ventral part of the mesencephalon (VM) from approximately 6 weeks No dopaminergic neurons can be found in the neighbouring dorsal part (DM). Dopaminergic differentiation by use of GeneChips to compare the expression profiles of VM and DM

Fetal brain tissue Midbrain mesencephalon VmDm + Dopamine producing cells - Dopamine producing cells Aborted feotus brain tissue – Karolinska hospital Feotus of age 6-10 weeks, 2 cases

Midbrain mesencephalon Vm Dm + Dopamine producing cells - Dopamine producing cells RNA purification + amplification Affymetrix genechip analysis Isolate the two samples (Vm/Dm) Dopamine producing cells at the interface ?

GenePublisher (program by Steen Knudsen) Scale, normalize the Affymetrix GeneChip experiments A1A2A2B1B2B2P-value e e e e

Vulcano plot P-value Log2 Fold change

Assigning Affymetrix GeneChip probes to a protein sequence ~ probes on each of the A/B Affymetrix chips. The probes are normally not a part of a protein sequence. Affymetrix probe Blast IPI protein sequence Blast inferred Unigene sequence (cDNA) 5’3’

Internal database

Signal Peptide prediction

Conclusion – so far The most up-regulated genes include several ‘known’ genes like dopamine transporter (good positive control) The most interesting genes are the ‘unknowns’ that were up-regulated in Vm. Futher analysis is ongoing. Roland JR et al., Exp Neur (2006) Vol 198,2, “Identification of novel genes regulated in the developing human ventral mesencephalon”

A new growth factor family Criteria ‘Unknown’ family of protein sequences Growth factor like (Cys-Cys, SigP) Data source Assembled protein set/genomic data Search criteria are dynamic Use of MySQL

MySQL – a relational database language Data are stored in tables as a ’black box’ Data physically separated from user Language is easy to read and understand Complex search queries Combine data in different tables/databases Result can be obtained in seconds Search criteria can be changed

Parsing Blast files (Preparing data for MySQL) # Qname Dname Mlen Alen Qlen % a_id % q_ide-value Qfrom Qto Dlen DfromDto IPI STAU_HUMAN IPI RASN_HUMAN e IPI RASH_HUMAN e IPI RASK_HUMAN e IPI RASL_HUMAN e IPI ZNT1_MOUSE e IPI CSL2_HUMAN IPI SFR4_HUMAN IPI LMA3_MOUSE e

Storing data from blast alignments FieldType query_dbenum('hs_2_18','hs_2_23','affym','mm_1_11','affym_mouse') query_accvarchar(20) target_dbenum('swissp','mm_1_11','sid','sid_mouse’) target_accvarchar(20) align_lensmallint(6) match_lensmallint(6) query_lensmallint(6) perc_align_lenfloat(5,1) perc_query_lenfloat(5,1) minus_ln_efloat(6,2) query_fromsmallint(6) query_tosmallint(6) target_fromsmallint(6) target_tosmallint(6) target_lenint(11)

MySQl example SELECT a.query_db, a.query_acc, a.target_db, a.target_acc, a.perc_align_len, a.minus_ln_e, b.target_db, b.target_acc, c.cleavage_site FROM blastdb AS a, blastdb AS b, signalp AS c WHERE a.query_db='hs_2_23' AND a.target_db = 'mm_1_11' AND a.target_acc != 'NULL' AND b.target_db='swissp' AND a.query_acc=b.query_acc AND b.target_acc='NULL' AND c.query_db='hs_2_23' AND c.query_acc = a.query_acc AND c.cleavage_site >= 15 AND c.cleavage_site<=45;

Output from MySQL query_dbquery_acctarget_dbtarget_accperc_align_lenminus_ln_etarget_dbtarget_acccleavage_site hs_2_23IPI mm_1_11IPI swisspNULL35 hs_2_23IPI mm_1_11IPI swisspNULL26 hs_2_23IPI mm_1_11IPI swisspNULL21 hs_2_23IPI mm_1_11IPI swisspNULL45 hs_2_23IPI mm_1_11IPI swisspNULL30 hs_2_23IPI mm_1_11IPI swisspNULL38 hs_2_23IPI mm_1_11IPI swisspNULL44 hs_2_23IPI mm_1_11IPI swisspNULL44 hs_2_23IPI mm_1_11IPI swisspNULL42

Clustering of protein sequences Tribe-mcl sequences clusters Store in MySQL 1) Cluster size ACPGICSKSCCPF LTPALCSRTCCPY 2) Cys-Cys (3)

Conserved Cys-Cys Many growth factor families have their own specific Cys- pattern,TGF-  family. Transforming growth factor- is a multifunctional peptide that controls proliferation, differentiation and other functions in many cell types. Search for Cys-pattern without any a priori knowledge

Search criteria Family cluster size > 1 No SwissProt homologues Cys count > 4 Signal Peptide Mouse homologue/orthologue 48 Families Manual inspection of alignments (- isoforms) Upload remaining sequences to internal database

Internal database

Tissue-specific expression 100 bp ladder Universal ref Whole brain Heart Kidney Liver Lung Placenta Prostate Salivary gland Skeletal muscle Spleen Testis 100 bp ladder Thymus Thyroid gland Trachea Uterus Colon Small Intestine Spinal Cord Fetal Liver Fetal brain Pancreas Neurosphere ctrl dH2O 100 bp ladder

Outcome from Gene Search Family including 5 sequences At least 8 Cys Predicted as growth factors/hormones ~125 – 140 amino acids

Outcome from Gene Search Family including 2 sequences - approx 30% seqid 11 of 16 Cys are conserved Effect on cultured neural cells