Strings.

Slides:



Advertisements
Similar presentations
C Characters & Strings Character Review Character Handling Library Initialization String Conversion Functions String Handling Library Standard Input/Output.
Advertisements

 2003 Prentice Hall, Inc. All rights reserved Fundamentals of Characters and Strings Character constant –Integer value represented as character.
Strings.
Current Assignments Homework 5 will be available tomorrow and is due on Sunday. Arrays and Pointers Project 2 due tonight by midnight. Exam 2 on Monday.
Primitive Data Types There are a number of common objects we encounter and are treated specially by almost any programming language These are called basic.
C Strings. The char Data Type for Storing Characters The char data type can is used to declare a variable that can hold a single character. Examples:
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved Fundamentals of Strings and Characters Characters.
Introduction to Computers and Programming Lecture 7:
Chapter 9 Characters and Strings. Topics Character primitives Character Wrapper class More String Methods String Comparison String Buffer String Tokenizer.
Hash Tables1 Part E Hash Tables  
Working with the data type: char  2000 Prentice Hall, Inc. All rights reserved. Modified for use with this course. Introduction to Computers and Programming.
Characters and Strings. Characters In Java, a char is a primitive type that can hold one single character A character can be: –A letter or digit –A punctuation.
C Programming Lecture 3. The Three Stages of Compiling a Program b The preprocessor is invoked The source code is modified b The compiler itself is invoked.
Input & Output: Console
STRINGS & STRING HANDLING FUNCTIONS STRINGS & STRING HANDLING FUNCTIONS.
Chars and strings Jordi Cortadella Department of Computer Science.
Number Systems Spring Semester 2013Programming and Data Structure1.
Why does it matter how data is stored on a computer? Example: Perform each of the following calculations in your head. a = 4/3 b = a – 1 c = 3*b e = 1.
Character Arrays Based on the original work by Dr. Roger deBry Version 1.0.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI C-Style Strings Strings and String Functions Dale Roberts, Lecturer.
Vladimir Misic: Characters and Strings1Tuesday, 9:39 AM Characters and Strings.
CSCI 3133 Programming with C Instructor: Bindra Shrestha University of Houston – Clear Lake.
CSC141- Introduction to Computer programming Teacher: AHMED MUMTAZ MUSTEHSAN Lecture – 21 Thanks for Lecture Slides:
Chapter 4 Literals, Variables and Constants. #Page2 4.1 Literals Any numeric literal starting with 0x specifies that the following is a hexadecimal value.
School of Computer Science & Information Technology G6DICP - Lecture 4 Variables, data types & decision making.
Strings. Our civilization has been built on them but we are moving towards to digital media Anyway, handling digital media is similar to.. A string is.
Chapter 8 Characters and Strings. Objectives In this chapter, you will learn: –To be able to use the functions of the character handling library ( ctype).
C++ Programming Lecture 19 Strings The Hashemite University Computer Engineering Department (Adapted from the textbook slides)
Characters and Strings
 Variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory. 
Java Basics. Tokens: 1.Keywords int test12 = 10, i; int TEst12 = 20; Int keyword is used to declare integer variables All Key words are lower case java.
Principles of Programming - NI Chapter 10: Character & String : In this chapter, you’ll learn about; Fundamentals of Strings and Characters The difference.
Basic Data Types อ. ยืนยง กันทะเนตร คณะเทคโนโลยีสารสนเทศและการสื่อสาร มหาวิทยาลัยพะเยา Chapter 4.
Chapter 2 Variables and Constants. Objectives Explain the different integer variable types used in C++. Declare, name, and initialize variables. Use character.
1 ENERGY 211 / CME 211 Lecture 3 September 26, 2008.
EC-111 Algorithms & Computing Lecture #10 Instructor: Jahan Zeb Department of Computer Engineering (DCE) College of E&ME NUST.
מערכים (arrays) 02 אוקטובר אוקטובר אוקטובר 1602 אוקטובר אוקטובר אוקטובר 1602 אוקטובר אוקטובר אוקטובר 16 Department.
Week 2 - Wednesday CS 121.
Lec 3: Data Representation
Computer Architecture & Operations I
Data Representation.
INC 161 , CPE 100 Computer Programming
Pointers & Arrays 1-d arrays & pointers 2-d arrays & pointers.
Fundamentals of Characters and Strings
Chapter 6: Data Types Lectures # 10.
EPSII 59:006 Spring 2004.
13 Text Processing Hongfei Yan June 1, 2016.
Chapter 8 - Characters and Strings
COMP3221: Microprocessors and Embedded Systems
Strings, Line-by-line I/O, Functions, Call-by-Reference, Call-by-Value
Lecture 8b: Strings BJ Furman 15OCT2012.
Space-for-time tradeoffs
Strings.
Introduction to Primitive Data types
Variables Title slide variables.
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
Digital Encodings.
Strings.
Space-for-time tradeoffs
C Programming Getting started Variables Basic C operators Conditionals
Space-for-time tradeoffs
ECE 103 Engineering Programming Chapter 8 Data Types and Constants
CS2911 Week 2, Class 3 Today Return Lab 2 and Quiz 1
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
C++ Programming Lecture 20 Strings
Lecture 19: Working with strings
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Variables and Constants
Introduction to Primitive Data types
Section 6 Primitive Data Types
Presentation transcript:

Strings

Strings Our civilization has been built on them but we are moving towards to digital media Anyway, handling digital media is similar to .. A string is a list of characters 성균관대학교정보통신학부 Am I a string? 一片花飛減却春

Character Code - ASCII Define 2**7 = 128 characters why 7 bits? char in C is similar to an integer ‘D’+ ‘a’ – ‘A’

more about ASCII What about other languages? Unicode there are more than 100 scripts some scripts have thousands characters Unicode several standards UTF16 UTF8 –variable length, leading zero means ASCII set. 1~6 bytes are used Programming Languages C, C++ treat char as a byte Java treat char as two bytes

String Representations null-terminated: C, C++ array with length: Java linked list of characters used in some special areas Your Choice should consider space occupied constraints on the contents O(1) access to i-th char (search, insert, delete) what if the length can be limited (file names)

null-terminated char str[6] = “Hello”; char arr1[] = “abc”; need for the termination printf(“format string”, str); /* str can be any length str H e l l \0

String Match Exact matching Inexact matching a BIG problem for Google, twitter, NHN, bioinfo.... native method preprocessing methods Boyer-Moore Algorithm Utf-8 raises a limit Inexact matching this problem itself can be a single whole course for a semester

String match(2) Problem Your solution find P in T how many comparisons? better way? T x a b y z P a b x y z

The naïve string search 1 2 3 4 5 6 7 8 9 10 11 12 x a b y z a b x y z * a b x y z ^ ^ ^ ^ ^ ^ ^ * a b x y z * a b x y z * a b x y z Matched! * a b x y z ^ ^ ^ ^ ^ ^ ^ ^

The naïve string search Worst case Find “aaaaab” in “aaaaaaaaaaaaaaaab” O(nm) Bring a better one next week !!

Problem Example There are millions documents that contain company names – Anderson, Enron, Lehman.. M&A, bankrupt force them to be changed Your mission change all the company names if they are changed

Input/Output Example 4 “Anderson Consulting” to “Accenture” “Enron” to “Dynegy” “DEC” to “Compaq” “TWA” to “American” 5 Anderson Accounting begat Anderson Consulting, which offered advice to Enron before it DECLARED bankruptcy, which made Anderson Consulting quite happy it changed its name in the first place!

Your Plan read the M&A list into a DB main loop Now, define define DB structure how to handle the double quote (“) sign? main loop compare each line of a doc with each DB entry if there is a match, replace it with a new name Now, define functions global variables

Your naive solution read data for changed company names to build a DB for each entry look for the whole documents if there is a match, change it old name new name Anderson Consulting Accenture Enron Dynegy DEC Compaq .... .....

Solution in Detail mergers[101][2] 1 Anderson Consulting\0 Accenture\0 1 Anderson Consulting\0 Accenture\0 Enron\0 Dynegy\0 2 DEC\0 Compaq\0 3 TWA\0 American\0 mergers[101][2]

top down approach

String y C o m p a q String s D E C L A R

Example Review Is it safe? Is it efficient? What if a quote sign is missing? what if there is no room for a new long company name? .... Is it efficient? space time

String Library

String Library 2 Why size_t instead of int? According to the 1999 ISO C standard (C99), size_t is an unsigned integer type of at least 16 bit. It is usually the largest integer representable while the size of int is fixed to 32 bits or 16 bits.

Doublets a dictionary up to 25,143 words, not exceeding 16 letters