Download presentation
Presentation is loading. Please wait.
Published byClaud Lewis Modified over 8 years ago
1
Setting Up and Managing a Bioinformatics Project Andrii Rozumnyi and Gagandeep Singh Spring 2016
2
Content Why do we need care about it? Basic Unix commands Project Directories and Directory Structures Project Documentation Automate File Processing Tasks Markdown for Project Notebooks
3
Why do we need care about it? Save our time on maintains Convenience & Consistency Collaboration Save time on reproducibility (by someone or by you over time) Easier to automate the file procesing So, it makes our scientific life a bit easier
4
Basic Unix commands man, help or info cd /home/BI/projects/zmays-snps/data/stats/qual.txt cd../data/stats/qual.txt mkdir dir-name ls touch shell expansion wildcards
5
Project Directories and Directory Structures All files and directories of your project should be in a single project directory with a clear name. Inside the main directory should be next folders: data (contains all raw and intermediate data) scripts (project-wide scripts) analysis (high-level analysis) $ mkdir project-name $ cd project-name $ mkdir data scripts analysis
6
Project Directories and Directory Structures Names of files and folders should be without spaces (use dashes or underscores instead) Better to include extensions in filenames Always use relative paths rather than absolute Divide Your Project into Subprojects
7
Project Documentation All the next information is the best stored in plain-text README files can easily be read, searched, and edited directly from command line Keep README files in each of your project’s at least in main directories Better to avoid formats like Microsoft Word (less portable) Easy copy and paste commands from document to command line Document the versions of the software that you ran
8
Project Documentation Document your methods and workflows Any command that produces results needs to be documented Include full command lines Describe any possible option to run the script Write default values in explicit manner (as they may change) Document the origin of all data in your project directory Where it was downloaded or who gave it to you Document when you downloaded data Record data version information Describe how you downloaded the data (which tools have you used)
9
Automate File Processing Tasks Automating file processing is an integral part of bioinformatics Using clear and consistent file naming schemes allow us to programmatically refer to files Shell Expansion Tips save our typing time in terminal ~, * brace expansion $ mkdir -p project-name/{data/seqs,scripts,analysis} project-name/data/seqs project-name/scripts project-name/analysis
10
Automate File Processing Tasks $ touch seqs/zmays{A,B,C}_R{1,2}.fastq $ ls seqs/ zmaysA_R1.fastq zmaysB_R1.fastq zmaysC_R1.fastq zmaysA_R2.fastq zmaysB_R2.fastq zmaysC_R2.fastq Wildcards are expanded to all matching file or directory names $ ls seqs/zmaysB* zmaysB_R1.fastq zmaysB_R2.fastq
11
Automate File Processing Tasks WildcardWhat it matches * Zero or more characters. ? One character. [A-Z] Any character between the supplied alphanumeric range (in this case, any character between A and Z); this works for any alphanumeric character range (e.g., [0-9] matches any character between 0 and 9) * Table 1. Common Unix filename wildcards * - for numerical range you should use curly braces ({10..15})
12
Automate File Processing Tasks While wildcards are powerful, they’re useless if files are inconsistently named! NB! Better to be as restrictive as possible with wildcards. If you want to process all zmaysB FASTQ files: 1.zmaysB* (worse as capture zmaysB-interesting-SNPs-found.xls etc.) 2.zmaysB*fastq (better)
13
Automate File Processing Tasks Leading Zeros and Sorting $ ls -l -rw-r--r-- 1 vinceb staff 0 Feb 21 21:24 genes-1.txt -rw-r--r-- 1 vinceb staff 0 Feb 21 21:24 genes-11.txt -rw-r--r-- 1 vinceb staff 0 Feb 21 21:24 genes-12.txt -rw-r--r-- 1 vinceb staff 0 Feb 21 21:24 genes-13.txt -rw-r--r-- 1 vinceb staff 0 Feb 21 21:24 genes-14.txt [...] $ ls -l -rw-r--r-- 1 vinceb staff 0 Feb 21 21:23 genes-001.txt -rw-r--r-- 1 vinceb staff 0 Feb 21 21:23 genes-002.txt [...] -rw-r--r-- 1 vinceb staff 0 Feb 21 21:23 genes-013.txt -rw-r--r-- 1 vinceb staff 0 Feb 21 21:23 genes-014.txt
14
Markdown Files
15
TOOLS: Pandoc(http://pandoc.org/README.pdf) Remarkable(https://remarkableapp.github.io/linux/download.html) Markdownpad(Windows)(http://markdownpad.com/)http://markdownpad.com/ dillinger.io (online)
16
Used for keeping chronological information about your project. For example: Steps you have taken Information about why you have made decisions Relevant information to reproduce your work Modern day equivalent to notebooks for keeping notes of the task we are performing. Simple plain text format, which can be easily rendered to HTML or PDF Usage
17
Syntax Special Symbols in Markdown: SymbolName \backslash `backtick *asterisk _underscore {}curly braces []square brackets ()parentheses #hash mark +plus sign -minus sign (hyphen).dot !exclamation mark
18
Types of Elements Block Level: * Block-level html elements-- e.g.,,,, etc.-- must be separated from surrounding content by blank lines. Example: This is a regular paragraph. Foo This is another regular paragraph.
19
o HEADERS :- Use #’s up to 6 to mark the headers or use = or – sign for marking headers up to level 2. Example: # This is an H1 ## This is an H2 ###### This is an H6 or This is an H1 ============= This is an H2 ------------- Optionally, you may close the # style header(s) o BLOCKQUOTES: Use > symbol to mark piece of text as a block quote.
20
o LISTS: Unordered: Use */+/single – for creating Unordered lists. Ordered lists use numbers followed by periods: Example(s): Unordered List - Red - Green - Blue Ordered List: 1. Bird 2. McHale 3. Parish
21
Questions: What will be the output?: 1. 1. Red 1. Green 1. Yellow 2. 1982. was the warmest year.
22
Span level Span-level HTML tags — e.g.,, or — can be used anywhere in a Markdown paragraph, list item, or header. Unlike block-level HTML tags, Markdown syntax is processed within span-level tags. o LINKS: Inline This is [an example](http://example.com/ "Title") inline link. [This link](http://example.net/) has no title attribute. Reference-style links use a second set of square brackets, inside which you place a label of your choosing to identify the link: This is [an example][id] reference-style link. Then, anywhere in the document, you define your link label like this, on a line by itself: [id]: http://example.com/ "Optional Title Here"
23
o EMPHASIS Markdown treats asterisks (*) and underscores (_) as indicators of emphasis. Text wrapped with one * or _ will be wrapped with an HTML tag; double *’s or _’s will be wrapped with an HTML tag. E.g., this input: *single asterisks* _single underscores_ **double asterisks** __double underscores__ To produce a literal asterisk or underscore at a position where it would otherwise be used as an emphasis delimiter, you can backslash escape it: \*this text is surrounded by literal asterisks\*
24
o IMAGES: Markdown uses an image syntax that is intended to resemble the syntax for links, allowing for two styles: inline and reference. Inline: ![Alt text](/path/to/img.jpg) ![Alt text](/path/to/img.jpg "Optional title") Reference-style image syntax looks like this: ![Alt text][id] Where “id” is the name of a defined image reference. Image references are defined using syntax identical to link references: [id]: url/to/image "Optional title attribute"
25
Trimmed down Tutorial in pdf format for markdown files(Double click the blue box and pdf file will open)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.