LIS651 lecture 4 regular expressions Thomas Krichel 2006-04-22.

Slides:



Advertisements
Similar presentations
LIS651 lecture 3 taming PHP Thomas Krichel
Advertisements

LIS651 lecture 4 regular expressions Thomas Krichel
LIS651 lecture 1 PHP basics, database introduction Thomas Krichel
LIS651 lecture 5 direct use of wotan Thomas Krichel
LIS651 lecture 5 origins of wotan direct use of wotan Thomas Krichel
LIS901N: webmastering I: the static web site Thomas Krichel
LIS651 lecture 3 functions & sessions Thomas Krichel
LIS651 lecture 5 origins of wotan direct use of wotan Thomas Krichel
LIS654lecture 3 omeka installation and system overview start Thomas Krichel
Jump to first page Unix Commands Monica Stoica Jump to first page Introduction to Unix n Unix was born in 1969 at Bell Laboratories, a research subdivision.
NETW-240 Shells Last Update Copyright Kenneth M. Chipps Ph.D. 1.
ECT 250: Survey of e-commerce technology Publishing pages on a Unix system.
Now, return to the Unix Unix shells: Subshells--- Variable---1. Local 2. Environmental.
Linux+ Guide to Linux Certification, Second Edition
LIS651 lecture 4 regular expressions Thomas Krichel
Introduction to UNIX GPS Processing and Analysis with GAMIT/GLOBK/TRACK T. Herring, R. King. M. Floyd – MIT UNAVCO, Boulder - July 8-12, 2013 Directory.
UNIX By Darcy Tatlock. 1. Successful Log Into Unix To actively manipulate your website you need to be logged in. Without being logged in you cannot enter.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
LIS508 using Debian GNU/Linux Thomas Krichel
Using the “CLI” Unix / Linux Preparation Course May 25 Djibouti.
Last Updated March 2006 Slide 1 Regular Expressions.
1 Chapter 6 – Creating Web Forms and Validating User Input spring into PHP 5 by Steven Holzner Slides were developed by Jack Davis College of Information.
1 Day 3 Directories Files Moving & Copying. 2 Case Sensitive First thing to learn about UNIX is that everything is case sensitive. Thus the files: –enda.
1 THE UNIX FILE SYSTEM By Chokechai Chuensukanant ID COSC 513 Operating System.
Unix Primer. Unix Shell The shell is a command programming language that provides an interface to the UNIX operating system. The shell is a “regular”
Lesson 7-Creating and Changing Directories. Overview Using directories to create order. Managing files in directories. Using pathnames to manage files.
Chapter 9 Part II Linux Command Line Access to Linux Authenticated login using a Linux account is required to access a Linux system. The Linux prompt will.
1 Lecture 2 Working with Files and Directories COP 3344 Introduction to UNIX.
Essential Unix at ACEnet Joey Bernard, Computational Research Consultant.
Unix Basics Chapter 4.
Basic unix commands that everyone should know (Even if you have a mac) Slightly more advanced:
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
Linux+ Guide to Linux Certification, Third Edition
1 The EDIT Program The Edit program is a full screen text editor that allows you to: Create text files Create text files Edit an existing text files Edit.
LIS651 lecture 5 regular expressions & wotan use Thomas Krichel
Week 3 Exploring Linux Filesystems. Objectives  Understand and navigate the Linux directory structure using relative and absolute pathnames  Describe.
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Getting started: Basics Outline: I.Connecting to cluster: ssh II.Connecting outside UCF firewall: VPN client III.Introduction to Linux IV.Intoduction to.
ITR3 lecture 6: intoduction to UNIX Thomas Krichel
Unix Commands PowerPoint Presentation developed for LS 560 Information Technology online class - University of Alabama by Debey Sklenar TENacious Cohort.
Chapter Three The UNIX Editors.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 2a – A Unix Command Sampler (Courtesy of David Notkin, CSE 303)
Using the “CLI” Unix / Linux Preparation Course June 9, 2013 Lusaka, Zambia.
2 Manual & Filestore Mauro Jaskelioff. Introduction Using the manual The UNIX filestore File permissions.
1 Lecture 2 Working with Files and Directories COP 3353 Introduction to UNIX.
Linux+ Guide to Linux Certification, Second Edition Chapter 4 Exploring Linux Filesystems.
1 Introduction to Unix. 2 What is UNIX?  UNIX is an Operating System (OS).  An operating system is a control program that helps the user communicate.
LIS654 lecture 4 more on omeka Thomas Krichel
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
BIF703 FTP (File Transfer Protocol) Utility vi editor Utility.
Linux Tutorial Lesson Two *Getting Help in Linux *Data movement and manipulation *Relative and Absolute path *Processes Note: see chapter 1,2,3 from Linux.
ITX2000 Remote hosts and web servers Prof. Xiaohong (Sharon) Gao Room: T125 Ext: Week 14 – UNIX vi text editor.
Assignprelim.1 Assignment Preliminaries © 2012 B. Wilkinson/Clayton Ferner. Modification date: Jan 16a, 2014.
Web Programming Essentials:
Looking for Patterns - Finding them with Regular Expressions
Lecture 2 Working with Files and Directories
Part 3 – Remote Connection, File Transfer, Remote Environments
Chapter 19 PHP Part II Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Lecture 3 More on editors: emacs and vi COP 3344 Introduction to UNIX.
Tutorial of Unix Command & shell scriptS 5027
Basic UNIX OLC Training.
Tutorial of Unix Command & shell scriptS 5027
Web Programming Essentials:
Tutorial of Unix Command & shell scriptS 5027
Introduction Paul Flynn
CSE 303 Concepts and Tools for Software Development
The Emacs Editor Read: Forouzan, Appendix C
LIS651 lecture 4 regular expressions
Presentation transcript:

LIS651 lecture 4 regular expressions Thomas Krichel

remember DOS? DOS had the * character as a wildcard. If you said DIR *.EXE It would list all the files ending with.EXE Thus the * wildcard would mean all characters except the dot Similarly, you could say DEL *.* to delete all your files

regular expression Is nothing but a fancy wildcard. There are various flavours of regular expressions. –We will be using POSIX regular expressions here. They themselves come in two flavors old-style extended We study extended here aka POSIX –Perl regular expressions are more powerful and more widely used. POSIX regular expressions are accepted by both PHP and mySQL. Details are to follow.

pattern The regular expression describes a pattern of characters. Patters are common in other circumstances. –Query: Krichel Thomas in Google –Query: "Thomas Krichel" in Google –Dates are of the form yyyy-mm-dd.

pattern matching We say that a regular expression matches the string if an instance of the pattern described by the regular expression can be found in the string. If we say matches in the string may make it a little more clearer. Sometimes people also say that the string matches the regular expression. I am confused.

metacharacters Instead of just giving the star * special meaning, in a regular expression all the following have special meaning \ ^ $. | ( ) * + { } ? [ ] Collectively, these characters are knows as metacharacters. They don't stand for themselves but they mean something else. For example DEL *.EXE does not mean: delete the file "*.EXE". It means delete anything ending with.EXE.

metacharacters We are somehow already familiar with metacharacters. –In XML < means start of an element. To use < literally, you have to use < –In PHP the "\n" does not mean backslash and then n. It means the newline character.

simple regular expressions Characters that are not metacharacters just simply mean themselves gooddoes not match inGood Beer d Bmatches inGood Beer dBdoes not match inGood Beer Beer 'does not match in Good Beer If there are serveral matches, the pattern will match at the first occurance omatches in Good Beer

the backslash \ quote If you want to match a metacharacter in the string, you have to quote it with the backslash a 6+ pack does not match ina 6+ pack a 6\+ packdoes match ina 6+ pack \ does not match in a \ against boozing \\ does match in a \ against boozing

other characters to be quoted Certain non-metacharacters also need to be quoted. These include some of the usual suspects –\nthe newline –\r the carriage return –\tthe tabulation character But this quoting occurs by virtue of PHP, it is not part of the regular expression. Remember Sandfords law.

anchor metacharacters ^ and $ ^ matches at the beginning of the string. $ matches at the end of the string. keeper matches in beerkeeper keeper$ matches in beerkeeper ^keeper does not match inbeerkeeper ^$matches in Note that in a double quoted-string an expression starting with $ will be replaced by the variable's string value (or nothing if the variable has not been set).

character classes We can define a character class by grouping a list of characters between [ and ] b[ie]er matches in beer b[ie]er matches in bier [Bb][ie]er matches in Bier Within a class, metacharacters need not be escaped. In the class only -, ] and ^ are metacharacters.

- in the character class Within a character class, the dash - becomes a metacharacter. You can use to give a range, according to the sequence of characters in the character set you are using. Its usually alphabetic be[a-e]rmatches inbeer be[a-e]rmatches inbecr be[a-e]rdoes not match inbefr If the dash - is the last character in the class, it is treated like an ordinary character.

] in the character class ] gives you the end of the class. But if you put it first, it is treated like an ordinary character, because having it there otherwise would create an empty class, and that would make no sense. be[],]rmatches inbe]r

^ in the character class If the caret ^ appears as the first element in the class, it negates the characters mentioned. be[^i]rmatches inbeer b[^ie]erdoes not match inbier be[^a-e]rdoes match inbefr be[e^]rmatches inbeer beer[^6-9] matchesbeer0 to beer5 Otherwise, it is an ordinary character.

standard character classes The following predefined classes exist [:alnum:] any alphanumeric characters [:digit:] any digits [:punct:] any punctuation characters [:alpha:] any alphabetic characters (letters) [:graph:] any graphic characters [:space:] any space character (blank and \n, \r) [:blank:] any blank character (space and tab) [:lower:] any lowercase character

standard character classes [:upper:] any uppercase character [:cntrl:] any control character [:print:] any printable character [:xdigit:] any character for a hex number They are locale and operating system dependent. With this discussion we leave character classes.

The period. metacharacter The period matches any character bar the newline \n. The reason why the \n is not counted is historic. In olden days matching was done line by line, because the computer could not hold as much memory..does not match in ; ^.$ does not match in "\n" ^.$ matches ina

alternative operator | This acts like an or beer|wine matches in beer beer|wine matches in wine Alternatives are performed last, i.e. they take the component alternative as large as they can.

grouping with ( ) You can use ( ) to group (beer|wine) (glass|) matches in beer glass (beer|wine) (glass|) matches in wine glass (beer|wine) (glass|) matches in beer (beer|wine) (glass|) matches in wine (beer|wine) (glass(es|)|) matches in beer glasses Yes, groups can be nested.

repetition operators * means zero or more times what preceeds it. + means one or more times what preceeds it. ? means zero or one time what preceeds it. The shortest preceding expression is used, i.e. either a single character or a group. (beer )* matches in (beer )? matches in (beer )+ matches in beer beer beer be+rmatches in beer be+rdoes not match inbebe

enumeration We can use {min,max} to give a minimum min and a maximum max. min and max are positive integers. be{1,3}r matches inber be{1,3}r matches inbeer be{1,3}r matches inbeeer be{1,3}r does not matches inbeeeer ? is just a shorthand for {0,1} + is just a shorthand for {1,} * is just a shorthand for {0,}

examples US zip code ^[0-9]{5}(-[0-9]{4})?$ something like a current date in ISO form ^(20[0-9]{2})-(0[1-9]|1[0-2])-([12][0-9]|3[01])$ Something like a Palmer School course code (DIS[89])|(LIS[5-9]))[0-9]{2} Something like an XML tag

not using posix regular expressions Do not use regular expressions when you want to accomplish a simple for which there is a special PHP function already available. A special PHP function will usually do the specialized task easier. Parsing and understanding the regular expression takes the machine time.

ereg() ereg(regex, string) searches for the pattern described in regex within the string string. It returns the false if no string was found. If you call the function as ereg(regex, string, matches) the matches will be stored in the array matches. Thus matches will be a numeric array of the grouped parts (something in ()) of the string in the string. The first group match will be $matches[1].

ereg_replace ereg_replace ( regex, replacement, string ) searches for the pattern described in regex within the string string and replaces occurrences with replacement. It returns the replaced string. If replacement contains expressions of the form \\number, where number is an integer between 1 and 9, the number sub- expression is used. $better_order=ereg_replace('glass of (Karlsberg|Bruch)', 'pitcher of \\1',$order);

split() split(regex, string, [max]) splits the string string at the occurrences of the pattern described by the regular expression regex. It returns an array. The matched pattern is not included. If the optional argument max is given, it means the maximum number of elements in the returned array. The last element then contains the unsplit rest of the string string. Use explode() if you are not splitting at a regular expression pattern. It is faster.

case-insensitive function eregi() does the same as ereg() but work case-insensitively. eregi_replace() does the same as ereg_replace() but work case-insensitively. spliti() does the same as split() but work case-insensitively.

regular expressions in mySQL You can use POSIX regular expressions in mySQL in the SELECT command SELECT … WHERE REGEXP regex where regex is a regular expression.

communication with wotan For file editing and manipulation, we use putty. For file transfer, we use winscp. Both are available on the web. The protocol is ssh, the secure shell, based public-key cryptography.

installing putty Go to your favorite search engine to search for putty. If you have administrator rights install the installer version. Since you have already installed winscp, you should have no further problems.

putty options In the window/translation choose UTF-8, always. Find out what the size of your screen is of screen that your are using for the font that you are using, and save that in your session. For wotan, the port is 22, ssh. You can choose to disable the annoying bell.

issuing commands While you are logged in, you talk to the computer by issuing commands. Your commands are read by command line interpreter. The command line interpreter is called a shell. You are using the Bourne Again Shell, bash.

bash features bash allows to browse the command history with the up/down arrow keys bash allows to edit commands with the left/right arrow keys exit is the command to leave the shell.

files, directories and links Files are continuous chunks data on disks that are required for software applications. Directories are files that contain other files. Microsoft calls them folders. In UNIX, the directory separator is / The top directory is / on its own.

home directory When you first log in to wotan you are placed in your home directory /home/username cd is the command that gets you back to the home directory. The home directory is also abbreviated as ~ cd ~user gets you to the home of user user. cd ~ does what?

~/public_html Is your web directory. I created it with mkdir public_html in your home directory. The web server on wotan will map requests to to show the file ~user/public_html/index.html The web server will map requests to to show the file ~user/public_html/file The server will do this by virtue of a configuration option.

changing directory, listing files cd directory changes into the directory directory the current directory is. its parent directory is.. ls lists files

users and groups root is the user name of the superuser. The superuser has all privileges. There are other physical users, i.e. persons using the machine There are users that are virtual, usually created to run a daemon. For example, the web sever in run by a user www-data. Arbitrary users can be put together in groups.

permission model Permission of files are given –to the owner of the file –to the the group of the file –and to the rest of the world A group is a grouping of users. Unix allows to define any number of groups and make users a member of it. The rest of the world are all other users who have access to the system. That includes www-data!

listing files ls lists files ls -l make a long listing. It contains –elementary type and permissions (see next slide) –owner –group –size –date –name

first element in ls -l Type indicator –d means directory –l means link –- means ordinary file 3 letters for permission of owner 3 letters for permission of group 3 letters for permission of rest of the world r means read, w means write, x means execute Directories need to be executable to get in them…

change permission: chmod usage: chmod permission file file is a file permisson is three numbers, for owner, group and rest of the world. Each number is sum of elementary numbers –4 is read –2 is write –1 is excute –0 means no permission. Example: chmod 764 file

general structure of commands commandname –flag --option Where commandname is a name of a command flag can be a letter Several letters set several flags at the same time An option can also be expressed with - - and a word, this is more user-friendly than flags.

example command: ls ls lists files ls -l makes a long listing ls -a lists all files, not only regular files but some hidden files as well –all files that start with a dot are hidden ls -la lists all files is long listing ls --all is the same as ls -a. --all is known as a long listing.

copying and removing files cp file copyfile copies file file to file copyfile. If copyfile is a directory, it copies into the directory. mv file movedfile moves file file to file movedfile. If movedfile is a directory, it moves into the directory. rm file removes file, there is no recycling bin!!

directories and files mkdir directory makes a directory rmdir directory removes an empty directory rm -r directory removes a directory and all its files more file –Pages contents of file, no way back less file –Pages contents of file, u to go back, q to quit

soft links A link is a file that contain the address of another file. Microsoft call it a shortcut. A soft link can be created with the command ln -s file link_to_file where file is a file that is already there and link_to_file is the link.

file transfer You can use winscp to upload and download files to wotan. If uploaded files in the web directory remain invisible, that is most likely a problem with permission. Refer back to permissions. chmod 644 * will put it right for the files chmod 755. (yes with a dot) will put it right for the current directory * is a wildcard for all files. rm -r * is a command to avoid.

editing There are a plethora of editors available. For the neophyte, nano works best. nano file edits the file file. nano -w switches off line wrapping. nano shows the commands available at the bottom of the screen. Note that ^letter, where letter is a letter, means pressing CONTROL and the letter letter at the same time.

emacs This is another editor that is incredibly featureful and complex. Written by Richard M. Stallman, of GNU and GPL fame. Get an emacs cheat sheet of the web before you start it. Or look at next slide.

emacs commands (here ^ stands for the control characher) ^x^s saves buffer ^x^c exits emacs ^g escapes out of a troublesome situation control+space sets the mark ^w removes until the mark (cut) ^y pastes

common emacs/bash commands ^k kills until the end of the line or removes empty line ^y yank what has been killed (paste) ^a get to the beginning of the line ^e get to the end of the line

emacs modes Just like people get into different moods, emacs gets into different modes. One mode that will split your pants is the PHP mode. emacs file.php to edit the file file in PHP mode. Then look how emacs checks for completion of parenthesis, braces, brackets, and the ; and use the tab character to indent.

copy and paste Putty allows to copy and paste text between windows and wotan. On the windows machine, it uses the windows approach to copy and paste On wotan machine, –you copy by highlighting with the mouse left button –you paste using the middle button –if you don't have a middle button, use left and right together

running mySQL You can run mySQL in command line mode in wotan. Type mysql -u user -p You will then be prompted for your password. The username and password are your mySQL user name and mySQL password, not your wotan user name and wotan password. Dont forget the semicolon after each command!

Thank you for your attention! Please switch off machines b4 leaving!