Digital Text and Data Processing Tokenisation. Today’s class □ Tokenisation and creation of frequency lists □ Keyword in context lists □ Moretti and distant.

Slides:



Advertisements
Similar presentations
» PHP arrays are lists of values stored in key-value pairs. » Uses of arrays: Many built-in PHP environment variables. Database functions use arrays.
Advertisements

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 14 Introduction to Ruby.
References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains.
The Trie Data Structure Basic definition: a recursive tree structure that uses the digital decomposition of strings to represent a set of strings for searching.
CS 330 Programming Languages 10 / 14 / 2008 Instructor: Michael Eckmann.
COS 381 Day 21. Agenda Questions?? Resources Source Code Available for examples in Text Book in Blackboard
ISBN Chapter 6 Data Types Character Strings Pattern Matching.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
COS 381 Day 22. Agenda Questions?? Resources Source Code Available for examples in Text Book in Blackboard
Tutorial 14 Working with Forms and Regular Expressions.
Subroutines. aka: user-defined functions, methods, procdures, sub-procedures, etc etc etc We’ll just say Subroutines. –“Functions” generally means built-in.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong.
Fortran- Subprograms Chapters 6, 7 in your Fortran book.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Introduction to Perl Part III By: Cedric Notredame Adapted from (BT McInnes)
Tutorial 14 Working with Forms and Regular Expressions.
1 An Introduction to Perl Part 2 CSC8304 – Computing Environments for Bioinformatics - Lecture 8.
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.
Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp
PHP TUTORIAL. HISTORY OF PHP  PHP as it's known today is actually the successor to a product named PHP/FI.  Created in 1994 by Rasmus Lerdorf, the very.
Digital Text and Data Processing Week 2. “The book is a machine to think with” I.A. Richards, Principles of Literary Criticism “The technologising of.
Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
ECMM6018 Enterprise Networking For Electronic Commerce Tutorial 5 Server Side Scripting Perl.
4 1 Array and Hash Variables CGI/Perl Programming By Diane Zak.
Skills Needed by Your Child Prior to Beginning Kindergarten.
Perl Practical(?)‏ Extraction and Report Language.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Matching in list context (Chapter 11 = ($str =~ /pattern/); This stores the list of the special ($1, $2,…) capturing variables into the.
Chapter 2 Functions and Control Structures PHP Programming with MySQL 2 nd Edition.
Perl Language Yize Chen CS354. History Perl was designed by Larry Wall in 1987 as a text processing language Perl has revised several times and becomes.
Introduction to Perl Part III By: Bridget Thomson McInnes 6 Feburary 2004.
7 1 User-Defined Functions CGI/Perl Programming By Diane Zak.
An Intro to Perl, Pt. 2 Hashes, Foreach Control, and the Split Function.
5 1 Data Files CGI/Perl Programming By Diane Zak.
CS 330 Class 9 Programming plan for today: More of how data gets into a script Via environment variables Via the url From a form By editing the url directly.
Perl Chapter 5 Hashes. Outside of world of Perl, know as associative arrays Also called hash tables Perl one of few languages that has hashes built-in.
Pattern Matching II. Greedy Matching When dealing with quantifiers, Perl’s pattern matcher is by default greedy. For example, –$_ = “Bob sat next to the.
– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text Introduction to Perl Session 7 ·
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
A Few More Functions. One more quoting operator qw// Takes a space separated sequence of words, and returns a list of single-quoted words. –no interpolation.
Perl Day 5. Arrays vs Hash Arrays are one way to store multiple things in a variable. Hashes are another. Arrays are one way to store multiple things.
Basic Variables & Operators Web Programming1. Review: Perl Basics Syntax ► Comments: start with # (ignored by Perl) ► Statements: ends with ; (performed.
 2001 Prentice Hall, Inc. All rights reserved. Chapter 7 - Introduction to Common Gateway Interface (CGI) Outline 7.1Introduction 7.2A Simple HTTP Transaction.
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Lesson 4: Querying a Database. 2 Learning Objectives After studying this lesson, you will be able to:  Create, save, and run select queries  Set query.
CSC 4630 Meeting 17 March 21, Exam/Quiz Schedule Due to ice, travel, research and other commitments that we all have: –Quiz 2, scheduled for Monday.
Arrays and Lists. What is an Array? Arrays are linear data structures whose elements are referenced with subscripts. Just about all programming languages.
The Scripting Programming Language
PZ02CX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02CX - Perl Programming Language Design and Implementation.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Perl References arrays and hashes can only contain scalars (numbers and strings)‏ if we want something more complicated (like an array of arrays) we use.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
Perl: Practical Extraction & Reporting Language RL Schwartz, Learning Perl, RL Schwartz & L Wall, Programming Perl, O’Reilly & Associates.
Perl Programming Language Design and Implementation (4th Edition)
Digital Text and Data Processing
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Perl Variables: Array Web Programming.
Introduction to Perl Jarrad Battaglia.
Perl Variables: Hash Web Programming.
Context.
Perl Functions.
PERL: part II hashes, foreach control statements, and the split function By: Kevin Walton.
Presentation transcript:

Digital Text and Data Processing Tokenisation

Today’s class □ Tokenisation and creation of frequency lists □ Keyword in context lists □ Moretti and distant reading □ Research projects and assignment 1

Revision □ Regular expressions □ Simple sequences of characters □ Character classes, e.g. \w, \d or. □ Quantifiers, e.g. {2,4} or ?, +, * □ Anchors, e.g. \b, ^, $

Match variables □ Parentheses create substrings within a regular expression □ In perl, this substring is stored as variable $1 □ Example: $keyword = “quick-thinking” ; if ( $keyword =~ /(\w+)-\w+/ ) { print $1 ; #This will print “quick” }

Three types of variables □ Scalars: a single value; start with $ □ Arrays: multiple values; = (“Ullyses”, “Dubliners”, “Finnegan’s Wake”) ; □ Hashes: Multiple values which can be referenced with ‘keys’; start with % %isbn ; $isbn{“ ”} = “Ullyses”;

$line = "If music be the food of love, play on" = split(" ", $line ) ; # $array[0] contains "If" # $array[4] contains "food" Basic tokenisation

Looping through an array foreach my $w ) { print $w ; } Looping through an array

my %freq ; $freq{"if"}++ ; $freq{"music"}++ ; print $freq{"if"}. “\n" ; Creating a hash Assigning / updating a value

Calculation of frequencies my %freq ; foreach my $w ) { $freq{ $w }++ ; }

foreach my $f ( keys %freq ) { print $f. "\t". $freq{$f} ; } Looping through a hash

foreach my $f ( sort { $freq{$b} $freq{$a} } keys %freq ) { print $f. "\t". $freq{$f} ; } Sorting a hash

But she returned to the writing-table, observing, as she passed her son, "Still page 322?" Freddy snorted, and turned over two leaves. For a brief space they were silent. Close by, beyond the curtains, the gentle murmur of a long conversation had never ceased.

Is it actually a word? foreach my $w ) { if ( $w =~ /(\w)/ ) { $freq{ $1 }++ ; } }