Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Workflow Overview Genomics High- Throughput Facility Genome Analyzer IIx Institute for Genomics and Bioinformatics Computation Resources Storage Capacity.

Similar presentations


Presentation on theme: "Data Workflow Overview Genomics High- Throughput Facility Genome Analyzer IIx Institute for Genomics and Bioinformatics Computation Resources Storage Capacity."— Presentation transcript:

1 Data Workflow Overview Genomics High- Throughput Facility Genome Analyzer IIx Institute for Genomics and Bioinformatics Computation Resources Storage Capacity Public Web Servers ● ~ 800 processors ● Sun Grid Engine ● ~ 100TB (secured) ● Fast drives ● 30TB for HTS ● HTTP, FTP ● Dedicated hosts ● User accounts HTS: 700GB/day Bandwidth: 10Gb/s USER Sample Analysis Requests (via web interface) Analysis Results (FTP server)

2 Data Analysis Workflow IMAGES 2-4 TB INTENSITIES 100-200 GB Image Analysis Firecrest Base Calling Bustard BASE CALLS 50-100 GB SEQUENCES + SCORES 20/30 GB Synthesis Gerald GENOME ALIGNMENT >100 GB Alignment ELAND + Reference Genome READ COUNTS Read Counting Casava VDC Sample-Specific Analysis, Visualization… e.g. Genome alignment, RNAseq, CHIPseq analysis Downloadable files for HTS users FASTQ files

3 Sequences, Scores (FASTQ) @HWUSI-EAS1562_0001:8:1:1119:18138#0/1 ATATTCTTATATAAAAATATAATTATTTTAATATTTGGTCCTTTCGTACTAAAATAT +HWUSI-EAS1562_0001:8:1:1119:18138#0/1 aaY`_aaY^a``[[`a\\\\aaa_^[aaZZWaaaXXY[VYaW^aaaa[aaa]a[a` @HWUSI-EAS1562_0001:8:1:1119:13476#0/1 AGAAAGCTTTGAAAATTATGTATACGCCTCGTAAGCCCAGTCCAAAGTCAAGACCA +HWUSI-EAS1562_0001:8:1:1119:13476#0/1 a_^`a`_a[[NOONN__V__`Y^`^X]R[]]]]]Q```Y````__`^W`YVUPR]] Sequence identifierRaw Sequence Phred base calling quality scores (0 to 62 encoded using ASCII 64 to 126)

4 Genome Alignment (ELAND) HWUSI-EAS1562_0001:8:1:1119:18138#0/1 ATATTCTTATATAAAAATATAATTATTTT AATATTTGGTCCTTTCGTACTAAAATAT U1 0 147 255 chr1.fa 26532086 F 23G HWUSI-EAS1562_0001:8:1:1119:13476#0/1 AGAAAGCTTTGAAAATTATGTATACGCC TCGTAAGCCCAGTCCAAAGTCAAGACCA U0 1 0 0 chr12.fa 90535786 F Sequence identifier Raw Sequence Type of match Number of exact/1-error/2-error matches Chromosome/Position/Direction Substitution

5 Read Counts (Casava VDC) Matchs with Genes, Exons, Splice junctions ChromosomeGeneMatchs Files for visualization (GenomeStudio) Genome alignment, Gene expression, RNAseq and CHIPseq analysis


Download ppt "Data Workflow Overview Genomics High- Throughput Facility Genome Analyzer IIx Institute for Genomics and Bioinformatics Computation Resources Storage Capacity."

Similar presentations


Ads by Google