Download presentation
Presentation is loading. Please wait.
1
regular expressions - grep
Regular expressions describe sets of strings with patterns (not the same as globbing) A normal character matches itself . matches any normal character A range [<letters>] matches any one of the <letters>, which can also be a range [^<letters>] matches any not one of the <letters> ? after a pattern makes it optional + after a pattern matches one or more repetitions * after a pattern matches any number of repetitions {<N>} after a pattern matches <N> repetitions in regular expressions ^ means the start of the line $ means the end of the line ()s round a regular expression makes it one thing to which repetition and placement options can be applied. grep finds lines in files that match limited regular expressions. grep ‘^>’ file.txt displays lines in file.txt that start with a > grep -c ‘^+$’ *fastq displays lines in all fasta files that are composed of a single +
2
regular expressions - grep
grep -E finds lines in files that match a regular expression grep –E ‘^[a-zA-Z]’ file.txt displays all lines in file.txt that start with an alphabetic character grep –E ”a*b+c{4}” *fastq displays lines in all fasta files that contain any number of a’s followed by at least one b and 4 c’s grep -E '^(a*b+c{4})+$' file.txt looks for lines in file.txt containing exactly repetitions of the abc’s grep has some useful options -c to count number of matches -l to list files names that match -v to list lines that don't match
3
regular expressions - grep
bbbbcc abbccc aaabbbccc aaabbbcccddd bccbccbccbccbcc Which of the following lines are recognized by the regular expression? ^a*b+c{2} 1. University of Miami 2. Umbilical cord 3. U Miami 4. university of Miami 5. UM 6. Useless Men 7. university in Miami What s the correct regular expression to extract all lines that contain ‘University of Miami’? grep -E '[Uu]*of' UM.txt grep -E '^([i ]+)(nt +[aiB][DaSn])' int.txt
4
regular expressions - grep
int aDog; int aDog ; // int aCommentAboutADog; double aBigDog; int BadDog; int dogWithNoTail int aDog,aCat; int aSpaceDog, aSpaceCat; int aDog, aBadCat; internationalDog; int a#Dog; int internetName; // fooo What is the correct regular expression to extract all lines that contain a legal Java style integer definition? grep -E '^([i ]+)(nt +[aiB][DaSn])' int.txt grep -E '^([i ]+)(nt +[aiB][DaSn])' int.txt
5
cut, sort, wc cut –f 1,2 file.txt
cut gets columns from a tab-delimited file cut –f 1,2 file.txt extracts the first two columns of file.txt cut –f 1-3, 5,6 file.txt > tmp.txt extracts the first three, fifth and sixth columns of file.txt and outputs them to tmp.txt sort sorts lines from a file sort file.txt sorts lines from file.txt uniq -c file.txt Removes repeated lines in file.txt and counts them wc counts lines, words and characters wc file.txt Counts lines, words and characters in file.txt wc –l file.txt Counts lines in file.txt
6
paste cut –f 1 file.txt > col1.txt cut –f 2 file.txt > col2.txt
paste concatenates files as columns cut –f 1 file.txt > col1.txt cut –f 2 file.txt > col2.txt cut –f 3 file.txt > col3.txt paste col1.txt col2.txt col3.txt paste –d ‘,’ col1.txt col2.txt col3 concatenates files by their right end concatenates files by their right end with , as delimiter
7
pipelines cat *fasta | grep -c “^>”
Pipelines consists in concatenate several commands by using the output of the first command as the input of the next one. Two commands are connected placing the sign “|” between them. cat *fasta | grep -c “^>” counts all > in the beginning of all lines in fasta files cut -f 1 blast_sample.txt | sort -u | wc -l cut -f 1 blast_sample.txt | sort | uniq -c
8
Commands inside commands
`` is used to run a command within a command wc -l `grep -l int *` takes the output of grep and counts the number of lines grep -l int * | wc -l but wouldn’t that be equivalent?
9
UNIX and the Internet ping machine checks if machine is reachable
talk allows to chat with ssh allows you to remotely login on your account user scp machine1:file1 machine2:file2 allows you to copy file1 on machine1 to file2 on machine2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.