Browsing Directories Using walk

Slides:



Advertisements
Similar presentations
Directory and File Paths Copyright © Software Carpentry and The University of Edinburgh This work is licensed under the Creative Commons Attribution.
Advertisements

Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Tuples Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Control Flow Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
1 Drawing phylogenies We've seen several tree data structures but we still can't draw a tree  In the tree drawing exercise we write a drawtree function.
Chapter 5 Accessing Files and Directories. How Directories Get Created OS installation: usr, dev, etc, export, kernel and others places to store installation.
Loops and Iteration Chapter 5 Python for Informatics: Exploring Information
Week 4-5 Java Programming. Loops What is a loop? Loop is code that repeats itself a certain number of times There are two types of loops: For loop Used.
Win8 on Intel Programming Course Modern UI : Features Cédric Andreolli Intel Software.
Recursion Examples Fundamentals of CS Case 1: Code /* Recursion: Case 1 */ #include void count (int index); main () { count (0); getchar(); } void count.
Builtins, namespaces, functions. There are objects that are predefined in Python Python built-ins When you use something without defining it, it means.
Fixtures Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Interacting with a UNIX computer: Navigating through the directory tree.
Basic File Input and Output Copyright © Software Carpentry 2011 This work is licensed under the Creative Commons Attribution License See
Browsing Directories Copyright © Software Carpentry and The University of Edinburgh This work is licensed under the Creative Commons Attribution.
Intro Python: Variables, Indexing, Numbers, Strings.
Libraries Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Functions. Built-in functions You’ve used several functions already >>> len("ATGGTCA")‏ 7 >>> abs(-6)‏ 6 >>> float("3.1415")‏ >>>
CSC 110 Using Python [Reading: chapter 1] CSC 110 B 1.
Introduction Copyright © Software Carpentry This work is licensed under the Creative Commons Attribution License See
CS252: Systems Programming Ninghui Li Slides by Prof. Gustavo Rodriguez-Rivera Topic 7: Unix Tools and Shell Scripts.
Linux Lecture #02. File Related Commands cat --Concatenate and print (display) the content of files. --Also used to create a new file. Syntax cat [Options]
Querying Directory Contents Copyright © The University of Edinburgh 2011 This work is licensed under the Creative Commons Attribution License See
Finding Things Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
1 Class 1 Lecture Topic Concepts, Definitions and Examples.
PHP-5- Working with Files and Directories. Reading Files PHP’s file manipulation API is extremely flexible: it lets you read files into a string or into.
Exceptions Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Dictionaries Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Manipulating Directories and Files Copyright © The University of Edinburgh 2011 This work is licensed under the Creative Commons Attribution License See.
Various 2. readonly readonly x=4 x=44 #this will give an error (like what in java?)
LECTURE 2 Python Basics. MODULES So, we just put together our first real Python program. Let’s say we store this program in a file called fib.py. We have.
Python Files and Lists. Files  Chapter 9 actually introduces you to opening up files for reading  Chapter 14 has more on file I/O  Python can read.
CS100 - PYTHON – EXAM 2 REVIEW -ONLY THE VITAL STUFF- PYTHON STRING METHODS, LOOPS, FILES, AND DICTIONARIES.
Patterns Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Python: File Directories What is a directory? A hierarchical file system that contains folders and files. Directory (root folder) Sub-directory (folder.
Files in Python Opening and Closing. Big Picture To use a file in a programming language – You have to open the file – Then you process the data in the.
Python Aliasing Copyright © Software Carpentry 2010
Development Environment
Topic 2: binary Trees COMP2003J: Data Structures and Algorithms 2
Fundamentals of Python: First Programs
Python’s Modules Noah Black.
Program Design Invasion Percolation: Aliasing
PYGAME.
Lecture 2 Python Basics.
Sets and Dictionaries Examples Copyright © Software Carpentry 2010
How to Define Separate Order Counters for Separate Sub-Libraries
CMSC201 Computer Science I for Majors Lecture 18 – Recursion
JCreator Setup Instructions
Conditional Execution
CMSC201 Computer Science I for Majors Lecture 16 – Recursion
Operation System Program 4
Operating Systems Lecture 6.
Python Lessons 13 & 14 Mr. Kalmes.
CMSC201 Computer Science I for Majors Lecture 19 – Recursion
Winter 2018 CISC101 12/5/2018 CISC101 Reminders
Conditional Execution
Version Control Basic Operation Copyright © Software Carpentry 2010
Cmdlets “Command-lets”
Good Testing Practices
Version Control Introduction Copyright © Software Carpentry 2010
Program Design Invasion Percolation: Bugs
CISC101 Reminders Assn 3 sample solution is posted.
Python Lesson’S 1 & 2 Mr. Kalmes.
Version Control Conflict Copyright © Software Carpentry 2010
More examples of invariants
Program Design Invasion Percolation: The Grid
Python Lessons 13 & 14 Mr. Husch.
Lists Like tuples, but mutable. Formed with brackets: Assignment: >>> a = [1,2,3] Or with a constructor function: a = list(some_other_container) Subscription.
Review We've seen that a module is a file that can contain classes as well as its own variables. We've seen that you need to import it to access the code,
Using Modules.
Presentation transcript:

Browsing Directories Using walk Python Browsing Directories Using walk Hello, in this second episode of the Software Carpentry lectures on handling directories and files in Python we’ll take a look at Python’s walk command which explores a directory and builds a list of all the sub-directories, files, sub-sub-directories, indeed, everything, within that directory. Copyright © Software Carpentry and The University of Edinburgh 2010-2011 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information.

>>> from os import walk >>> tree = walk('.') Walk takes in a directory and returns a list of tuples. As walk uses recursion, and this can be a quite complex concept to understand if you’ve not encountered it before, we’ll walk through how walk works, which may help us understand its output more easily.

walk('.') A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt So, given this directory structure, walk would create a tuple with…

walk('.') . A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt The path to the current directory, for example, dot.

walk('.') . ['C', 'A', 'B'] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] There would be a list of the directories in the current directory, in this case A, B and C. As for listdir, the list of the directories is in no specific order.

walk('.') . ['C', 'A', 'B'] [] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] And there would be a list of the files in the current directory. In this case there are none so the list is empty.

walk('.') . ['C', 'A', 'B'] [] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] Walk then recurses. That is to say, it calls itself, using each directory in the current directory in turn. So it calls itself on the first directory which is C.

walk('.') walk('./C') . ['C', 'A', 'B'] [] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] []

walk('.') walk('./C') . ['C', 'A', 'B'] [] ./C A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C In this case, the path to the directory is dot C.

walk('.') walk('./C') . ['C', 'A', 'B'] [] ./C [] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] C has no sub-directories so the directory list is empty.

walk('.') walk('./C') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] And C has one file, c.txt.

walk('.') walk('./C') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] As C has no sub-directories, the call to walk on C exits.

walk('.') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] And we’re back in the original call to walk. This now moves onto the next directory in the list…

walk('.') walk('./A') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] …which is A.

walk('.') walk('./A') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] A has no directories and two files, a1.txt and a2.txt.

walk('.') walk('./A') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] A has no sub-directories so the call to walk on A exits.

walk('.') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] And again we’re back in the original call to walk. This now moves onto the next directory in the list…

walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] …which is B.

walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] B has one file, b.txt, and two directories, P and Q.

walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] The sub-directories of B are then “walked” in turn. So, starting with P…

walk('.') walk('./B') walk('./B/P') . ['C', 'A', 'B'] [] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt']

walk('.') walk('./B') walk('./B/P') . ['C', 'A', 'B'] [] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] P has one file and no directories.

walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] As P has no directories, we return up a level and move onto the next directory of B’s…

walk('.') walk('./B') walk('./B/Q') . ['C', 'A', 'B'] [] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] …which is Q which has no directories and two files.

walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] As Q has no directories, we return up to B.

walk('.') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] As we’re done both P and Q were finished with B and so we return to our original directory.

walk('.') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] And as we’ve now done A, B and C, we’re finished.

>>> from os import walk >>> tree = walk('.') So, here’s how we’d call walk in our code.

walk returns a list of tuples >>> from os import walk >>> tree = walk('.') walk returns a list of tuples We now know that walk returns a list of tuples so let’s save them in a variable.

>>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... We know that each tuple consists of a directory path, a list of sub-directories in that directory, and a list of files. So we can use a for-in loop to print each tuple in the list in turn.

>>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] And here is the result.

Each tuple contains a directory >>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] Each tuple contains a directory Remember, each tuple contains a directory…

Each tuple contains a directory, its subdirectories >>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] Each tuple contains a directory, its subdirectories The list of subdirectories in each directory. If there are none then this is an empty list.

Each tuple contains a directory, its subdirectories, and its files >>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] Each tuple contains a directory, its subdirectories, and its files And, each tuple also contains the list of files in each directory, again an empty list if there are none.

walk’s input is used as a prefix for each directory name >>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] walk’s input is used as a prefix for each directory name For each directory, the directory name given to walk is used as a prefix, in this case the dot.

>>> tree = walk(getcwd()) So, if we use walk with getcwd to get the current working directory…

>>> tree = walk(getcwd()) >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... /user/vlad ['C', 'A', 'B'] [] /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] And print the results.

>>> tree = walk(getcwd()) >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... /user/vlad ['C', 'A', 'B'] [] /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] We can see that the current working directory is the prefix.

>>> tree = walk(getcwd(), topdown=False) walk supports an optional topdown argument which by default is true. If we set this to false then..

>>> tree = walk(getcwd(), topdown=False) >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad ['C', 'A', 'B'] [] …tuples from child directories appear before their parents in the list…

P and Q are before B >>> tree = walk(getcwd(), topdown=False) >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad ['C', 'A', 'B'] [] P and Q are before B P and Q’s tuples appear before that of their parent, B.

A, B and C are before the original directory >>> tree = walk(getcwd(), topdown=False) >>> for dir,subdirs,files in tree: ... print "%s %s %s" %(dir,subdirs,files) ... /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad ['C', 'A', 'B'] [] A, B and C are before the original directory And A, B and C’s tuples appear in the list before those of the original directory.

Miscellaneous operating system interfaces os Miscellaneous operating system interfaces walk Recursively explore directory contents To summarize, in this episode we saw how the walk function allows us to recursively explore a directory’s contents and gather a complete list of all the directories and files beneath it.

Mike Jackson and Greg Wilson created by Mike Jackson and Greg Wilson Thank you for listening. May 2011 Copyright © Software Carpentry and The University of Edinburgh 2010-2011 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information.