Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kate Milova MolGen retreat March 24, 2005 1 Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.

Similar presentations


Presentation on theme: "Kate Milova MolGen retreat March 24, 2005 1 Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005."— Presentation transcript:

1 Kate Milova MolGen retreat March 24, 2005 1 Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005

2 Kate Milova MolGen retreat March 24, 2005 2 Outline.  Microarray platforms and services available at AECOM:  cDNA  Long Oligo  Afymetrix  Database ( cDNA & Long Oligo ) structure and content:  Printing information  Chip layout  Annotation  Annotation algorithms and data mining  On-line Analysis Tools:  Normalization  Signal filtering  Comparison  Statistical packages and Analysis software  Summary

3 Kate Milova MolGen retreat March 24, 2005 3 Microarray Platforms at AECOM.

4 Kate Milova MolGen retreat March 24, 2005 4 How to choose a microarray platform.

5 Kate Milova MolGen retreat March 24, 2005 5 Before starting your microarray experiment.

6 Kate Milova MolGen retreat March 24, 2005 6 Microarray Experts.

7 Kate Milova MolGen retreat March 24, 2005 7 cDNA Microarray Facility. Home page. Standart & Custom Arrays. Description & Prices Hybridization, labeling, bioinformatics, workshops Database for cDNA & Long Oligo Arrays. Analysis Pipeline AECOM cDNA microarray facility. Supported publications Useful links of analysis tools

8 Kate Milova MolGen retreat March 24, 2005 8 cDNA Microarray Facility. Database.

9 Kate Milova MolGen retreat March 24, 2005 9 Database for Analysis of Microarrays at AECOM. Contents. Printing Information Chip layout Gene Annotation  Chip name  Specie  Number of spots  Number of controls  Number of pen domains  Number of slides  Printing pattern  Distance between spots  Number of rows  Number of columns  Printing date  Master chip  Chip name  Spot information (Accession or clone id or bacterial control)  Spot location  Library name  Clone location on 384 plate  Clone location on 96 plate  Accession  Clone ID  Clone end  Vector name  Clone name  UniGene cluster ID  Best blast hit  Main blast parameters (score, E-value, % identity, blast date, etc.)  Gene ID  Gene symbol  Gene synonyms  Chromosome  Map location  GO IDs  GO Annotation

10 Kate Milova MolGen retreat March 24, 2005 10 Annotation Extraction Algorithm. Database of cDNA & Long Oligo sequences Blast search against Refseq & NT databases All hits are examined with alignment quality check Only hits with >90% identity are left All hits now go through linguistic filter ‘Best blast Hit’ is: 1.First ‘good’ Refseq hit from group 1; OR 2.First ‘good’ NT hit from group 1; OR 3.First ‘good’ Refseq hit from group 2 OR 4.First ‘good’ NT hit from group 2; Hits which passes two tests are defined as ’Good Hits’ All hits are divided in two groups: 1. > 80 % of overlapping and 2. < 80% (Partially similar)

11 Kate Milova MolGen retreat March 24, 2005 11 Annotation Extraction Algorithm. Sequences Raw Data Database of cDNA & Long Oligo sequences Formatted Data Homology search against RefSeq & NT 80% 90% Alignment quality check

12 Kate Milova MolGen retreat March 24, 2005 12 Annotation sources: NCBI. NCBI Entrez Gene UniGene Refseq & NT databases  Annotation Blast Search Blast Software UniGene ID  Gene ID  GO ID UniGene ID  Accession UniGene ID  Blast against UniGene clusters

13 Kate Milova MolGen retreat March 24, 2005 13 Annotation sources: NCBI. NCBI UniGene UniGene ID  Accession UniGene ID  Blast against UniGene clusters  NCBI  UniGene  UniGene ID:  UniGene Id for cDNA arrays is obtained from the UniGene source file for each particular accession number of the clone.  NCBI  UniGene  Blast:  UniGene Id for Long Oligo arrays is obtained from blast results  Blast search was done with the set of oligo sequences against UniGene clusters with cutoff 99% for sequence identity and 90% for overlapping.  UniGene Id for the oligo hitting multiple UniGene clusters is marked as an “Ambiguous cluster ID”.

14 Kate Milova MolGen retreat March 24, 2005 14 Annotation sources: NCBI. NCBI Entrez Gene Unigene ID  Gene ID  GO ID  UniGene ID  Gene ID:  All information retrieved from ‘Enrez Gene’ project is based on the UniGene cluster ID and corresponding Gene ID.  Gene ID is ambiguous in ‘Gene ID’ to ’UniGene cluster ID’ connection.  Parsing filter was used to eliminate ambiguous Gene IDs.  Gene ID  GO ID:  For each Gene ID corresponding Gene Ontology IDs were retrieved from Entrez Gene source file  There might be a few or more then 10 different GO IDs for a Gene ID. All of them are collected.

15 Kate Milova MolGen retreat March 24, 2005 15 Annotation sources: NCBI. NCBI Refseq & NT databases  Annotation Blast Search Blast Software  Blast Software package is installed on the microarray server.  This software allows to format databases and run batch homology search for any combination of custom databases and query sequences.  Refseq & NT databases. Annotation  Loaded formatted and periodically updated on the microarray server.  When databases are updated we run blast search of cDNA and Long Oligo sequences.  Blast results are parsed using our algorithm for annotation extraction.

16 Kate Milova MolGen retreat March 24, 2005 16 Annotation sources: Gene Ontology. Gene Ontology Biological process Cellular compartment Molecular function  Gene Ontology.  Multiple GO IDs for each Gene ID are retrieved in the previous step from Enrez Gene ( if available).  Gene Ontology annotation for all GO IDs is kept in three different information fields: biological processes, molecular function and cellular compartment. For each of the fields all available annotation was prefiltered with redundancy check and concatenated.

17 Kate Milova MolGen retreat March 24, 2005 17 Database Search.

18 Kate Milova MolGen retreat March 24, 2005 18 Microarray Data Analysis Pipeline.

19 Kate Milova MolGen retreat March 24, 2005 19 Pipeline. LOWESS Normalization.

20 Kate Milova MolGen retreat March 24, 2005 20 cDNA Microarray Facility. Pipeline. Filtering.

21 Kate Milova MolGen retreat March 24, 2005 21 Pipeline. Data set Comparison.

22 Kate Milova MolGen retreat March 24, 2005 22Summary                  

23 Kate Milova MolGen retreat March 24, 2005 23 cDNA Microarray Facility.Services. cDNA Microarray Facility. Services.

24 Kate Milova MolGen retreat March 24, 2005 24 cDNA Microarray Facility. Arrays.

25 Kate Milova MolGen retreat March 24, 2005 25 cDNA Microarray Facility. Publications.

26 Kate Milova MolGen retreat March 24, 2005 26 Annotation Extraction Algorithm. Sequences Raw Data Database of cDNA & Long Oligo sequences Formatted Data Homology search against RefSeq & NT Alignment quality check of the blast hits Blast Results


Download ppt "Kate Milova MolGen retreat March 24, 2005 1 Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005."

Similar presentations


Ads by Google