R&D Group 开发 以人为本 交流 创造价值 Liqi Gao Text Operations.

Slides:



Advertisements
Similar presentations
Chapter 11 Introduction to Programming in C
Advertisements

Scalar Data Types and Basic I/O
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
Computer Programming w/ Eng. Applications
 2000 Prentice Hall, Inc. All rights reserved Fundamentals of Strings and Characters String declarations –Declare as a character array or a variable.
Intro to Python Welcome to the Wonderful world of GIS programing!
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
Programming and Perl for Bioinformatics Part I. A Taste of Perl: print a message perltaste.pl: Greet the entire world. #!/usr/bin/perl #greet the entire.
Scalar Variables Start the file with: #! /usr/bin/perl –w No spaces or newlines before the the #! “#!” is sometimes called a “shebang”. It is a signal.
03 Data types1June Data types CE : Fundamental Programming Techniques.
1 Lecture 2  Input-Process-Output  The Hello-world program  A Feet-to-inches program  Variables, expressions, assignments & initialization  printf()
1 Key Concepts:  Why C?  Life Cycle Of a C program,  What is a computer program?  A program statement?  Basic parts of a C program,  Printf() function?
Perl Lecture #1 Scripting Languages Fall Perl Practical Extraction and Report Language -created by Larry Wall -- mid – 1980’s –needed a quick language.
C programming an Introduction. Types There are only a few basic data types in C. char a character int an integer, in the range -32,767 to 32,767 long.
Introduction to Perl Software Tools. Slide 2 Introduction to Perl l Perl is a scripting language that makes manipulation of text, files, and processes.
Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To understand the structure of a C-language program. ❏ To write your first C.
Chapter 7. 2 Objectives You should be able to describe: The string Class Character Manipulation Methods Exception Handling Input Data Validation Namespaces.
Introduction to Python Dr. Bernard Chen Ph.D. University of Central Arkansas July 9 th 2012
Introduction to Programming Prof. Rommel Anthony Palomino Department of Computer Science and Information Technology Spring 2011.
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
C Programming Lecture 3. The Three Stages of Compiling a Program b The preprocessor is invoked The source code is modified b The compiler itself is invoked.
Chapter 5: Data Input and Output Department of Computer Science Foundation Year Program Umm Alqura University, Makkah Computer Programming Skills
Practical Extraction & Report Language PERL Joseph Beltran.
Introduction to Perl Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister or …
Programming I Introduction Introduction The only way to learn a new programming language is by writing programs in it. The first program to.
Engineering Computing I Chapter 1 – Part A A Tutorial Introduction.
Data & Data Types & Simple Math Operation 1 Data and Data Type Standard I/O Simple Math operation.
COMP519: Web Programming Autumn 2010 Perl Tutorial: The very beginning  A basic Perl Program The first line Comments and statements Simple printing 
Characters. Character Data char data type – Represents one character – char literals indicated with ' '
Introduction to PHP Advanced Database System Lab no.1.
Chapter 9: Perl Programming Practical Extraction and Report Language Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
Introduction to Programming
1 Introduction to Perl CIS*2450 Advanced Programming Techniques.
Chapter 3 Syntax, Errors, and Debugging Fundamentals of Java.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 3: Introduction to C: Input & Output, Assignments, Math functions.
Random Bits of Perl None of this stuff is worthy of it’s own lecture, but it’s all a bunch of things you should learn to use Perl well.
A Simple Java Program //This program prints Welcome to Java! public class Welcome { public static void main(String[] args) { public static void main(String[]
Topic 2: Working with scalars CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 2, pages 19-38, Programming Perl 3rd edition chapter.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
VISUAL C++ PROGRAMMING: CONCEPTS AND PROJECTS Chapter 2A Reading, Processing and Displaying Data (Concepts)
General Computer Science for Engineers CISC 106 Lecture 12 James Atlas Computer and Information Sciences 08/03/2009.
C++ for Engineers and Scientists Second Edition
Agenda  Take up homework  Loops - Continued –For loops Structure / Example involving a for loop  Storing Characters in variables  Introduction to Functions.
Chapter 8 Characters and Strings. Objectives In this chapter, you will learn: –To be able to use the functions of the character handling library ( ctype).
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI N305 Characters and Strings Functions.
A FIRST BOOK OF C++ CHAPTER 14 THE STRING CLASS AND EXCEPTION HANDLING.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Chapter 4: Variables, Constants, and Arithmetic Operators Introduction to Programming with C++ Fourth Edition.
1 C Syntax and Semantics Dr. Sherif Mohamed Tawfik Lecture Two.
Chapter 1 slides1 What is C? A high-level language that is extremely useful for engineering computations. A computer language that has endured for almost.
INTRODUCTION TO PROGRAMING System Development Mansoura October 2015.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Chapter 16 Introduction to Perl Perl (Practical Extraction and Report Language) Developed by Larry Wall as a Unix scripting language. Popular server-side.
CSCE 206 Structured Programming in C
A First Book of ANSI C Fourth Edition
Computer Science 210 Computer Organization
Programming in Perl Introduction
Chapter 11 Introduction to Programming in C
Introduction to CS Your First C Programs
Introduction to C++ Programming
Introduction to C++ Programming
C Programming Getting started Variables Basic C operators Conditionals
Introduction to Computer Science
Introduction to C EECS May 2019.
Computer Programming-1 CSC 111
Characters and Strings Functions
C Characters and Strings
Presentation transcript:

R&D Group 开发 以人为本 交流 创造价值 Liqi Gao Text Operations

R&D Group 开发 以人为本 交流 创造价值 Utilities C/C++ Library Perl (Active Perl) Regular Expression Edit Plus / Ultra Edit Excel

R&D Group 开发 以人为本 交流 创造价值 C/C++ Language Standard library: Read a line Remove a CR or LF Split a line C++ Boost Library Case Conversion Trimming Replace Algorithm Finding Algorithm Split

R&D Group 开发 以人为本 交流 创造价值 C/C++: Read a Line Though it’s simple, it’s useful! Three methods:

R&D Group 开发 以人为本 交流 创造价值 C/C++: Remove CR/LF Get a line under Windows and Linux platform

R&D Group 开发 以人为本 交流 创造价值 C/C++: Remove CR/LF (cont.) The noising CR Carriage Return

R&D Group 开发 以人为本 交流 创造价值 C/C++: Split a Line Split a line by a specific character HELLOWORLD! HELLOWORLD!

R&D Group 开发 以人为本 交流 创造价值 C/C++: Split a Line (cont.) Split a line

R&D Group 开发 以人为本 交流 创造价值 C++ Boost: Case Conversion to_upper: Convert a string to upper case to_lower: Convert a string to lower case

R&D Group 开发 以人为本 交流 创造价值 C++ Boost: Trimming & Replace

R&D Group 开发 以人为本 交流 创造价值 C++ Boost: Split split(): splits the input into parts

R&D Group 开发 以人为本 交流 创造价值 Regular Expression Regular expression is a powerful tool for string operations. operatorExplainExample *0 or more timesb, be, bee, beee, …  be* ?0 or one time be,b  be? +1 or more times be, bee, beee …  be+ []any of enclosed[A-Z] ^none of any char[^a-z] ()group(abc)+

R&D Group 开发 以人为本 交流 创造价值 An Example *\([0-9/ ]+\) *[0-9\.\?]+%  empty ^( *)([0-9]+)( *)  \2\t

R&D Group 开发 以人为本 交流 创造价值 An Introduction to Perl Excels at pattern search and text manipulation (Practical Extraction and Reporting Language) Open source / free software Cheap! Free and available for all systems can use and install without restriction open source promotes portability vastly expandable through freely available modules (add- on libraries at CPAN repository) fewer restrictions/lower cost for commercial use can buy fancy development tools if desired centralized source, linear development path avoids vendor vicissitudes and incompatibilities!

R&D Group 开发 以人为本 交流 创造价值 Perl is not compiled #include int main() { float x; x = 6e9; printf(“Hello world!\n”); printf(“All %d of you!\n”, x); } C Compiler C Compiler #!/usr/bin/perl $x = 6e9; print “Hello world!\n”; printf “All %d of you!\n”, $x; Perl Interpreter Perl Interpreter Hello world! All of you! Source Code Plain text (ASCII) Human readable Human editable Platform Independent C (compiled) Binary Executable NOT human readable NOT human editable NOT platform independent! C Compiler C Compiler Perl is not compiled

R&D Group 开发 以人为本 交流 创造价值 A Taste of Perl: print a message #!/usr/bin/perl -w - command interpretation header $x = 6e9; - variable assignment statement print “Hello world!\n”; printf “All %d of you!\n”, $x; } - function calls (output statements) perltaste.pl: Greet the entire world.

R&D Group 开发 以人为本 交流 创造价值 Scalar Values Numerical Values integer:5, “3”, 0, -307 floating point: 6.2e9, hexadecimal/octal:0x0d4f, 0477 NOTE: all numerical values stored as floating-point numbers (usu. “double” precision)

R&D Group 开发 以人为本 交流 创造价值 String Values Double-quoted: interpolates (replaces variable name/control character with it’s value) Single-quoted: no interpolation done (as-is) Quoting operators: qq//, qw//, etc. $day = “Monday”; “Happy Monday!\n” Happy Monday! “Happy $date!\n” Happy Monday! ‘Happy Monday!\n’ Happy Monday! ‘Happy $date!\n’ Happy $date!\n

R&D Group 开发 以人为本 交流 创造价值 String Manipulation Concatenation $dna1 = “ACTGCGTAGC”; $dna2 = “CTTGCTAT”; juxtapose in a string assignment or print statement $new_dna = “$dna1$dna2”; Use the concatenation operator ‘.’ $new_dna = $dna1. $dna2; Add segments serially using incremental concatenation: $new_dna = $dna1; $new_dna.= $dna2; (shorthand for: $new_dna = $new_dna. $dna2; )

R&D Group 开发 以人为本 交流 创造价值 Substitution DNA transcription: T  U Substitution operator s//: $dna = “GATTACATACACTGTTCA”; $rna = $dna; $rna =~ s/T/U/;# “GAUUACAUACACUGUUCA” Exercise: Start with $dna =“gattACataCACTgttca”; and do the same as above. Print out $rna to the screen.

R&D Group 开发 以人为本 交流 创造价值 transcribe.pl: $dna =“gattACataCACTgttca”; $rna = $dna; $rna =~ s/T/U/g; print "DNA: $dna\n"; print "RNA: $rna\n"; Does it do what you expect? If not, why not? Patterns in substitution are case-sensitive! What can we do? Convert all letters to upper (or lower) case (preferred when possible) If we want to retain mixed case, use transliteration operator tr// $rna =~ tr/tT/uU/;

R&D Group 开发 以人为本 交流 创造价值 Case conversion $string = “acCGtGcaTGc”; Upper case: $dna = uc($string);# “ACCGTGCATGC” or $dna = uc $string; or $dna = “\U$string”; Lower case: $dna = lc($string);# “accgtgcatgc” or $dna = “\L$string”; Sentence case: $dna = ucfirst($string) # “Accgtgcatgc” or $dna = “\u\L$string”;

R&D Group 开发 以人为本 交流 创造价值 Perl in NLP Look up in Dictionary Word Frequency Chinese Word Segmentation POS …… Whatever you could need

R&D Group 开发 以人为本 交流 创造价值 Case study

R&D Group 开发 以人为本 交流 创造价值 Thanks for your attention