regular expressions - grep

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
EMT 2390L Lecture 4 Dr. Reyes Reference: The Linux Command Line, W.E. Shotts.
LINUX System : Lecture 3 (English-Only Lecture) Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Acknowledgement.
CIS 240 Introduction to UNIX Instructor: Sue Sampson.
CS 497C – Introduction to UNIX Lecture 25: - Simple Filters Chin-Chih Chang
Guide To UNIX Using Linux Third Edition
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Grep, comm, and uniq. The grep Command The grep command allows a user to search for specific text inside a file. The grep command will find all occurrences.
CSCI 330 T HE UNIX S YSTEM File operations. OPERATIONS ON REGULAR FILES 2 CSCI The UNIX System Create Edit Display Contents Display Contents Print.
Introduction to Unix – CS 21 Lecture 5. Lecture Overview Lab Review Useful commands that will illustrate today’s lecture Streams of input and output File.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
Shell Script Examples.
Chapter 4: UNIX File Processing Input and Output.
1 Day 16 Sed and Awk. 2 Looking through output We already know what “grep” does. –It looks for something in a file. –Returns any line from the file that.
Advanced File Processing
Help session: Unix basics Keith 9/9/2011. Login in Unix lab  User name: ug0xx Password: ece321 (initial)  The password will not be displayed on the.
System Programming Regular Expressions Regular Expressions
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus.
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
CSC 352– Unix Programming, Spring 2015 April 28 A few final commands.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.
By Corey Stokes 9/14/10. What is grep? Global Regular Expression Print grep is a command line search utility in Unix Try: Search for a word in a.cpp file.
– Introduction to the Shell 1/21/2016 Introduction to the Shell – Session Introduction to the Shell – Session 3 · Job control · Start,
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
UNIX commands Head More (press Q to exit) Cat – Example cat file – Example cat file1 file2 Grep – Grep –v ‘expression’ – Grep –A 1 ‘expression’ – Grep.
Lesson 6-Using Utilities to Accomplish Complex Tasks.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Unix Tools Tawatchai Iempairote November 22, 2011.
Linux 201 Training Module Linux Adv File Mgmt.
Tutorial of Unix Command & shell scriptS 5027
Lesson 5-Exploring Utilities
CS 124/LINGUIST 180 From Languages to Information
Variables and Data Types
Some Linux Commands.
Chapter 6 Filters.
Linux command line basics III: piping commands for text processing
Week 14 - Friday CS221.
CS 403: Programming Languages
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Commands
Tutorial of Unix Command & shell scriptS 5027
CS 124/LINGUIST 180 From Languages to Information
The ‘grep’ Command Colin Masterson.
Guide To UNIX Using Linux Third Edition
CSC 352– Unix Programming, Spring 2016
Tutorial of Unix Command & shell scriptS 5027
Tutorial Unix Command & Makefile CIS 5027
CS 124/LINGUIST 180 From Languages to Information
Regular Expressions and Grep
1.5 Regular Expressions (REs)
CSC 4630 Meeting 4 January 29, 2007.
Regular Expressions.
Presentation transcript:

regular expressions - grep Regular expressions describe sets of strings with patterns (not the same as globbing) A normal character matches itself . matches any normal character A range [<letters>] matches any one of the <letters>, which can also be a range [^<letters>] matches any not one of the <letters> ? after a pattern makes it optional + after a pattern matches one or more repetitions * after a pattern matches any number of repetitions {<N>} after a pattern matches <N> repetitions in regular expressions ^ means the start of the line $ means the end of the line ()s round a regular expression makes it one thing to which repetition and placement options can be applied. grep finds lines in files that match limited regular expressions. grep ‘^>’ file.txt displays lines in file.txt that start with a > grep -c ‘^+$’ *fastq displays lines in all fasta files that are composed of a single +

regular expressions - grep grep -E finds lines in files that match a regular expression grep –E ‘^[a-zA-Z]’ file.txt displays all lines in file.txt that start with an alphabetic character grep –E ”a*b+c{4}” *fastq displays lines in all fasta files that contain any number of a’s followed by at least one b and 4 c’s grep -E '^(a*b+c{4})+$' file.txt looks for lines in file.txt containing exactly repetitions of the abc’s grep has some useful options -c to count number of matches -l to list files names that match -v to list lines that don't match

regular expressions - grep bbbbcc abbccc aaabbbccc aaabbbcccddd bccbccbccbccbcc Which of the following lines are recognized by the regular expression? ^a*b+c{2} 1. University of Miami 2. Umbilical cord 3. U Miami 4. university of Miami 5. UM 6. Useless Men 7. university in Miami What s the correct regular expression to extract all lines that contain ‘University of Miami’? grep -E '[Uu]*of' UM.txt grep -E '^([i ]+)(nt +[aiB][DaSn])' int.txt

regular expressions - grep int aDog; int aDog ; // int aCommentAboutADog; double aBigDog; int BadDog; int dogWithNoTail int aDog,aCat; int aSpaceDog, aSpaceCat; int aDog, aBadCat; internationalDog; int a#Dog; int internetName; // fooo What is the correct regular expression to extract all lines that contain a legal Java style integer definition? grep -E '^([i ]+)(nt +[aiB][DaSn])' int.txt grep -E '^([i ]+)(nt +[aiB][DaSn])' int.txt

cut, sort, wc cut –f 1,2 file.txt cut gets columns from a tab-delimited file cut –f 1,2 file.txt extracts the first two columns of file.txt cut –f 1-3, 5,6 file.txt > tmp.txt extracts the first three, fifth and sixth columns of file.txt and outputs them to tmp.txt sort sorts lines from a file sort file.txt sorts lines from file.txt uniq -c file.txt Removes repeated lines in file.txt and counts them wc counts lines, words and characters wc file.txt Counts lines, words and characters in file.txt wc –l file.txt Counts lines in file.txt

paste cut –f 1 file.txt > col1.txt cut –f 2 file.txt > col2.txt paste concatenates files as columns cut –f 1 file.txt > col1.txt cut –f 2 file.txt > col2.txt cut –f 3 file.txt > col3.txt paste col1.txt col2.txt col3.txt paste –d ‘,’ col1.txt col2.txt col3 concatenates files by their right end concatenates files by their right end with , as delimiter

pipelines cat *fasta | grep -c “^>” Pipelines consists in concatenate several commands by using the output of the first command as the input of the next one. Two commands are connected placing the sign “|” between them. cat *fasta | grep -c “^>” counts all > in the beginning of all lines in fasta files cut -f 1 blast_sample.txt | sort -u | wc -l cut -f 1 blast_sample.txt | sort | uniq -c

Commands inside commands `` is used to run a command within a command wc -l `grep -l int *` takes the output of grep and counts the number of lines grep -l int * | wc -l but wouldn’t that be equivalent?

UNIX and the Internet ping machine checks if machine is reachable talk user@machine allows to chat with user@machine ssh user@machine allows you to remotely login on your account user scp machine1:file1 machine2:file2 allows you to copy file1 on machine1 to file2 on machine2