Uniq The uniq command is useful when you need to find duplicate lines in a file. The basic format of the command is uniq in_file out_file In this format,

Slides:



Advertisements
Similar presentations
SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Advertisements

CIS 118 – Intro to UNIX Shells 1. 2 What is a shell? Bourne shell – Developed by Steve Bourne at AT&T Korn shell – Developed by David Korn at AT&T C-shell.
EMT 2390L Lecture 4 Dr. Reyes Reference: The Linux Command Line, W.E. Shotts.
A Guide to Unix Using Linux Fourth Edition
The Command Line, Part II, Pine, and Pico CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture.
Uniq command 6/12/2015Gary DeRoest1 report or filter out repeated lines in a file Note: the file needs to be sorted so that repeated lines are adjacent.
 *, ? And [ …] . Any single character  ^ beginning of a line  $ end of the line.
CS 497C – Introduction to UNIX Lecture 25: - Simple Filters Chin-Chih Chang
Unix Utilities (sort/uniq) CS465 – Unix. The sort command Sorts lines Default behavior: Do a case-sensitive, ascii- alphabetic line sort, starting at.
Lecture 01CS311 – Operating Systems 1 1 CS311 – Lecture 01 Outline Course introduction Setting up your system Logging onto the servers at OSU with ssh.
Grep, comm, and uniq. The grep Command The grep command allows a user to search for specific text inside a file. The grep command will find all occurrences.
Introduction to Unix – CS 21 Lecture 5. Lecture Overview Lab Review Useful commands that will illustrate today’s lecture Streams of input and output File.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Python programs How can I run a program? Input and output.
Advanced File Processing
Va-scanCopyright 2002, Marchany Unit 6 – Solaris File Security Randy Marchany VA Tech Computing Center.
Sed sed is a program used for editing data. It stands for stream editor. Unlike ed, sed cannot be used interactively. However, its commands are similar.
Jozef Goetz, expanded by Jozef Goetz, 2009 Credits: Parts of the slides are based on slides created by UNIX textbook authors, Syed M. Sarwar, Robert.
CS 403: Programming Languages Lecture 21 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
1 Day 5 Additional Unix Commands. 2 Important vs. Not Often in Unix there are multiple ways to do something. –In this class, we will learn the important.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
UNIX Commands. Why UNIX Commands Are Noninteractive Command may take input from the output of another command (filters). May be scheduled to run at specific.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
With Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Office 2007 Intermediate.
Indexed and Relative File Processing
WHAT IS SED? A non-interactive stream editor Interprets sed instructions and performs actions Use sed to: Automatically perform edits on file(s) ‏ Simplify.
Chapter 9 Basic File Processing. Displaying File Contents cat, cat w/append tac nl pr more less head tail.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
Introduction to Unix (CA263) File Processing (continued) By Tariq Ibn Aziz.
5 1 Data Files CGI/Perl Programming By Diane Zak.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
Linux Lecture #02. File Related Commands cat --Concatenate and print (display) the content of files. --Also used to create a new file. Syntax cat [Options]
Microsoft Access Database Creation and Management.
– Introduction to the Shell 1/21/2016 Introduction to the Shell – Session Introduction to the Shell – Session 3 · Job control · Start,
CSC 352– Unix Programming, Spring 2015 February 2015 Unix Filters.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT File Processing.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
Lesson 6-Using Utilities to Accomplish Complex Tasks.
Sequential Processing to Update a File Please use speaker notes for additional information!
In the last class, Filters and delimiters The sample database pr command head and tail commands cut and paste commands.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
CSCI 330 UNIX and Network Programming
Warm Up Evaluate each expression for a = 2, b = –3, and c = a + 3c 2. ab – c c + b 4. 4c – b 5. b a + c 26 – x + y = 3 Solve.
Singleton Processing with Limited Memory Peter L. Montgomery Microsoft Research Redmond, WA, USA.
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
1 Linux Commands. 2 Path You specify a file or directory by its path name:  the full, or absolute, path name or the one relative to a location. The full.
Lesson 5-Exploring Utilities
CSCI The UNIX System sed - Stream Editor
Chapter 6 Filters.
INTRODUCTION TO UNIX: The Shell Command Interface
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Command & shell scriptS 5027
Chapter 18: Modifying SAS Data Sets and Tracking Changes
The Linux Command Line Chapter 6
Guide To UNIX Using Linux Third Edition
Tutorial of Unix Command & shell scriptS 5027
Chapter 14 Sorting and Merging.
Runtime evaluation of algorithms
Lab 7: Filtering.
Software I: Utilities and Internals
Debugging.
Conditional Compilation
Presentation transcript:

uniq The uniq command is useful when you need to find duplicate lines in a file. The basic format of the command is uniq in_file out_file In this format, uniq copies in_file to out_file, removing any duplicate lines in the process. uniq's definition of duplicated lines are consecutive-occurring lines that match exactly. If out_file is not specified, the results will be written to standard output. If in_file is also not specified,uniq acts as a filter and reads its input from standard input.

$ cat names Charlie Tony Emanuel Lucy Ralph Fred Tony $ uniq names Print unique lines Charlie Tony Emanuel Lucy Ralph Fred Tony

Tony still appears twice in the preceding output because the multiple occurrences are not consecutive in the file, and thus uniq's definition of duplicate is not satisfied. To remedy this situation, sort is often used to get the duplicate lines adjacent to each other. The result of the sort is then run through uniq. $ sort names | uniq Charlie Emanuel Fred Lucy Ralph Tony So the sort moves the two Tony lines together, and then uniq filters out the duplicate line

The -d Option Frequently, you'll be interested in finding the duplicate entries in a file. The -d option to uniq should be used for such purposes: It tells uniq to write only the duplicated lines to out_file (or standard output). Such lines are written just once, no matter how many consecutive occurrences there are. $ sort names | uniq -d List duplicate lines Tony

-c option The -c option to uniq behaves like uniq with no options (that is, duplicate lines are removed), except that each output line gets preceded by a count of the number of times the line occurred in the input. $ sort names | uniq –c Count line occurrences 1 Charlie 1 Emanuel 1 Fred 1 Lucy 1 Ralph 2 Tony