Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware Software
What is bioinformatics?
What is bioinformatics? Someone to analyze my data The boring stuff I do between experiments Someone to help me think about my data People sitting in a dark room analyzing data A person who writes complex algorithms perl python R linux java C++ bash ruby HTML A person who knows what an HMM is That bloke who fixes my computer Someone who builds websites
Who are bioinformaticians? Scientists trying to get tenure, get grants, publish papers, train students Scientists trying to help others analyze their data
Who are bioinformaticians? YOU!
Hardware
Torrent Server Recommended Processors - Two Six-core processors RAM - 48 GB RAM HDD Capacity - Eight 2 TB Hard drives in RAID 5 with 12 TB usable Network – Quad port gigabit NIC GPU - NVIDIA Graphic Processor Unit Chassis – Dell Precision T7500 tower. No rack mount available. Monitor⁄Keyboard – not included – file access available via SSH or web service $12,500
Computers My cluster 192 TB lustre FS 51 node cluster most nodes: 16 cpus, 8 cores each,132 GB RAM, 1TB local storage (/usr/data), infiniband interconnects (6,528 cores; 6,732 GB RAM; 50 TB scratch storage) 192 TB lustre FS connected to most nodes via infiniband
Computers rambox edwards.sdsu.edu 24 processors with 6 cores each 198 MB RAM edwards.sdsu.edu lab web server 24 processors, 6 cores each 50M RAM 19TB RAID 6 storage 18TB USED
Computers file servers and back up servers 4 secret servers! 48TB backups and archival storage
Software
Software Locally installed software Remote (web) software
Local Software bioperl biopython bowtie2 cdhit crass diamond fastQC focus FOCUS FragGeneScan genemark groopm idba_ud jellyfish last masurca mauve metabat metagenemark mira MUMmer Muscle PEAR phylip prinseq qiime qudaich rapsearch scaffold_builder seed-servers spades tagcleaner tRNAscan-SE velvet
Metagenomics Processing Merge paired-end reads Preprocessing Functional Assignments Taxonomic assignments Contamination removal Gene Prediction Contig Clustering Binning reads
Metagenomics Quality control – Prinseq Statistics Deconseq Annotation FOCUS Real time metagenomics mg-rast Super FOCUS Statistics STAMP Population genomes crAss metabat ContigClustering
Metagenomics Processing AbundanceBin CompostBin concoct crAss tetra Contig clustering FASTQC FastX Toolkit fitGCP NGS QC Toolkit Non-pareil Prinseq QC-Chain Streaming Trim Preprocessing FragGeneScan GlimmerMG MetaGeneAnnotator MetaGeneMark MetaGun Orphelia Prodigal Gene Prediction CARMA myTaxa FOCUS PhylopythiaS KRAKEN phymmbl LMAT RAIphy MEGAN TACOA Metaplan Taxy Taxonomic assignment CLAMS Sequedex DiScRIBinATE SORT-ITEMS genometa SPANNER GSMer SPHINX PPLACER TaxSOM RTMg Treephyler Functional assignment