Presentation is loading. Please wait.

Presentation is loading. Please wait.

I=1,2,...N data points = vertices of graph neighbors i,j connected by edges 5 1 8 J i,j – weight associated with edge i,j J 5,8 J i,j depends on distance.

Similar presentations


Presentation on theme: "I=1,2,...N data points = vertices of graph neighbors i,j connected by edges 5 1 8 J i,j – weight associated with edge i,j J 5,8 J i,j depends on distance."— Presentation transcript:

1 i=1,2,...N data points = vertices of graph neighbors i,j connected by edges 5 1 8 J i,j – weight associated with edge i,j J 5,8 J i,j depends on distance D i,j J i,j D i,j weighted graph

2 partitions

3 weights

4 correlated pairs

5 clusters C i,j > 0.5

6 how many clusters? 3 LARGE MANY small (SPC) toy problem SPC

7 TSS vs K

8 Iris setosa Iris versicolor Iris virginica 50 specimes from each group 4 numbers for each flower 150 data points in 4-dimensional space irises

9 150 points in d=4 3 large clusters d=4

10 comparison Iris

11 3circles: N=4800 POINTS IN D=2

12 identifying stable clusters

13 Same data - Average Linkage No analog for 

14 Same data - Average Linkage Examining this cluster

15 Advantages of SPC RELIES ON PROXIMITY SCANS ALL RESOLUTIONS (T) ROBUST AGAINST NOISE AND INITIALIZATION - CALCULATES COLLECTIVE CORRELATIONS. IDENTIFIES STABLE CLUSTERS (  T) NO NEED TO PRE-SPECIFY NUMBER OF CLUSTERS

16 stability larger  T - tighter, more stable cluster TT

17 YEAST CELL-CYCLE EXPRESSION DATA EXPRESSION DATA: SIMULTANEOUS MEASUREMENT OF MRNA CONCENT- RATION OF THOUSANDS OF GENES. DATA: N=2467 GENES OF KNOWN FUNCTION MEASURED AT 18 TIME INTERVALS (18*7 MIN) DURING CELL CYCLE OF YEAST.. CELLS SYNCHRONIZED BY ALPHA FACTOR ARREST AND RELEASE. SPELLMAN ET. AL. (1998) MOL. BIOL. CELL

18 CELL CYCLE G1 –gap, decide whether to proliferate, wait or cross to non-dividing stage G0 S -- DNA Synthesis G2– gap, allow DNA repair M – Mitosis, cell division

19 Yeast data dendrogram

20 WE APPLIED FILTERS TO SELECT CLUSTERS OF CELL-CYCLE RELATED GENES. THE MEAN EXPRESSION PROFILE OF A CLUSTER SHOULD HAVE –SMOOTH, LOW FREQUENCY TEMPORAL VARIATION. –SIGNIFICANT DEVIATION FROM CONSTANT VALUE Choosing clusters to examine

21 Chosen clusters

22 Cell-cycle clusters Late G1 Cln1,2 Clb5,6 Swi4 G2/M Clb1,2 Swi5 Ace2 S Histones

23 Progression of the cell-cycle

24 Other stable clusters

25 Analyzing promoters of the genes Mostly ribosomal proteins (Artifact - freezer) General metabolism A novel conserved DNA motif GCGATGAGNT in 90% of genes Dip at the end A novel conserved DNA motif RNNGCWGCNNC G.Getz, E.Levine, E.Domany and M.Zhang Physica A279, 457 (2000)

26 oscillations – by eye

27 PRIMARY TARGETS OF P53 TEMPERATURE SENSITIVE MUTANT P53, ACTIVATE - 32 C (t=0) MEASURE EXPRESSION AT t=0,2,6,12,24 h (use t=0 as control) REPEAT IN PRESENCE OF CYCLOHEXIMIDE (CHX) t=0,2,4,6,9,12 (CHX INHIBITS PROTEIN SYNTHESIS) IDENTIFY UPREGULATED GENES USING FILTER: AT LEAST 2.5 FOLD INCREASE AT 3 OR MORE TIME POINTS (SEPARATELY IN EACH OF THE TWO EXPTS, -CHX AND +CHX) 38 CANDIDATE PRIMARIES: EFFECT OF FILTERING??? RELEASE FILTER FROM +CHX CLUSTERING: 38 47 (31)

28 REDUCE EFFECT OF FILTERING BY CLUSTERING X – 38 candidate primary targets % candidate primary targets c a K.Kannan et al, Oncogene

29 COLON CANCER DATA: Colon Cancer Data

30 Two-way clustering S1(G1) G1(S1) TWO-WAY CLUSTERING:

31 TWO-WAY CLUSTERING: Two way clustering-ordered S1(G1) G1(S1)

32 TWO-WAY CLUSTERING – TISSUES - S1(G1) 1.IDENTIFY TISSUE CLASSES (TUMOR/NORMAL) EACH TISSUE = POINT IN 2000 DIMENSIONAL SPACE 2-way clustering - tissues

33 Ribosomal proteins Cytochrome C HLA2 metabolism 2-way clustering –genes Erel TWO-WAY CUSTERING – GENES - G1(S1) 2. FIND DIFFERENTIATING AND CORRELATED GENES EACH GENE = POINT IN 62 DIMENSIONAL SPACE

34 TWO-WAY CLUSTERING: Two-way clustering

35 football

36 COUPLED TWO-WAY CLUSTERING C2WC - Motivation MOTIVATION: ONLY A SMALL SUBSET OF GENES PLAY A ROLE IN A PARTICULAR BIOLOGICAL PROCESS; THE OTHER GENES INTRODUCE NOISE, WHICH MAY MASK THE SIGNAL OF THE IMPORTANT PLAYERS. ONLY A SUBSET OF SAMPLES EXHIBIT THE EXPRESSION PATTERNS OF INTEREST. SHOULD USE A SUBSET OF GENES TO STUDY A SUBSET OF THE SAMPLES (AND VICE VERSA) PROBLEM: ENORMOUS NUMBER OF SUBMATRICES

37 COUPLED TWO-WAY CLUSTERING PICK ONE STABLE GENE CLUSTER. REPRESENT TISSUES BY THE EXPRESSION LEVELS OF THESE GENES ONLY. ANALYZE ALL TISSUE CLUSTERS BY USING ALL GENE CLUSTERS, ONE AT A TIME. LOOK FOR INTERNAL STRUCTURE, SUB-CLUSTERS. USE ALL STABLE TISSUE CLUSTERS TO CLASSIFY GENES; IDENTIFY GENE CLUSTERS THAT GOVERN BIOLOGICAL PROCESSES. ITERATE THE PROCEDURE UNTIL NO NEW STABLE CLUSTERS EMERGE C2WC - method

38 COUPLED TWO-WAY CLUSTERING OF COLON CANCER: TISSUES tissues 1 G4 G12 S1(G4) S1(G12)

39 COUPLED TWO-WAY CLUSTERING OF COLON CANCER: TISSUES CTWC colon cancer - tissues S1(G4) S1(G12) S17

40 genes1 S17 G1(S17)

41 COUPLED TWO WAY CLUSTERING OF COLON CANCER - GENES USING ONLY THE TUMOR TISSUES TO CLUSTER GENES, REVEALS CORRELATION BETWEEN TWO GENE CLUSTERS; CELL GROWTH AND EPTHELIAL COLON CANCER - ASSOCIATED WITH EPITHELIAL CELLS CTWC of colon cancer - genes G1(S17) G1(S1)

42 colon cancer carcinoma +adenoma COLON CANCER: 18 PAIRED CARCINOMA/NORMAL 4 PAIRED ADENOMA/NORMAL Notterman et al Cancer Res. (2001) tumor/normal distance matrix

43 colon cancer carcinoma +adenoma COLON CANCER: 18 PAIRED CARCINOMA/NORMAL 4 PAIRED ADENOMA/NORMAL Notterman et al Cancer Res. (2001) protocol A /protocol B distance matrix

44 A(II) ScGBM PrGBM CL GENES S2 S3 T S1(G1) G12 G5 Coupled Two-Way Clustering (CTWC) of 358 Genes and 36 Samples Fig. 2A GLIOBLASTOMA: M. HEGI et al CHUV, CLONTECH ARRAYS glioblastoma

45 S11 S12 S14 S10 S13 S1(G5) Super-Paramagnetic Clustering of All Samples Using Stable Gene Cluster G5 Fig. 2B S1(G5)

46 G5Ver validation

47 Induction of IGFBPs under Hypoxic Conditions in Glioblastoma Cell Lines Fig. 4 IGFBP EXPT

48 Predicting response to doxorubicin treatment. 20 patients before/after chemotherapy. Use G46 – a cluster of 33 genes, to probe a group of 29 samples. Intermediate expression level of the G46 genes may serve as a marker for a relatively high success rate of the doxorubicin treatment ER- ER+ BREAST CANCER DATA (PEROU ET AL, NATURE 2000)

49 BREAST CANCER DATA (BOTSTEIN/BROWN LAB PEROU ET AL, NATURE 2000) 20 patients before/after chemotherapy. 10 of the “before” samples are in cluster b; all 3 successful treatments’ samples in this group. Intermediate expression level of the G46 genes may serve as a marker for a relatively high success rate of the doxorubicin treatment Predicting response to doxorubicin treatment; successful for 3/20 patients

50 survival S1(G33) Sorlie BREAST CANCER DATA (BOTSTEIN/BROWN LAB), Sorlie et al, PNAS (2001) Cluster (a): high expression levels of the genes of G33, low survival, mutant p53. predictor of survival.

51 S1(G36) Sorlie BREAST CANCER DATA (BOTSTEIN/BROWN LAB), Sorlie et al, PNAS (2001) Gene cluster G36 induces clear partition to two classes of no known clinical interpretation

52 signature algorithm J. Ihmels, G. Friedlander,S. Bergmann,O. Sarig, Y Ziv, N. Barkai

53 ( (a)N core = 37,73,145 genes for ribosomal proteins 132 genes for biosynthesis Each used as input G I ref, returns (nearly same) gene signature S ref add N rand randomly picked genes G I input set of N core + N rand genes, returns gene signatures S I Recurrence of S ref is measured by Overlap = Fraction of shared genes by S ref and S I (b) Use as G I ref sets of genes with shared regulatory sequences. Only the truely coregulated ones are returned in S ref ; recurrent. yeast genome: 6400 genes, 1000 “conditions” (chips) recurrence

54 pathways (a)Tricarboxyl acid (TCA) cycle: known genes in E.coli, find (34) homologues in yeast used as G I ; produce S I which excludes the wrong genes and misses only few correct ones (b,c) Identify two autonomous subparts of the cycle

55

56

57

58


Download ppt "I=1,2,...N data points = vertices of graph neighbors i,j connected by edges 5 1 8 J i,j – weight associated with edge i,j J 5,8 J i,j depends on distance."

Similar presentations


Ads by Google