Presentation is loading. Please wait.

Presentation is loading. Please wait.

Browsing Directories Using walk

Similar presentations


Presentation on theme: "Browsing Directories Using walk"— Presentation transcript:

1 Browsing Directories Using walk
Python Browsing Directories Using walk Hello, in this second episode of the Software Carpentry lectures on handling directories and files in Python we’ll take a look at Python’s walk command which explores a directory and builds a list of all the sub-directories, files, sub-sub-directories, indeed, everything, within that directory. Copyright © Software Carpentry and The University of Edinburgh This work is licensed under the Creative Commons Attribution License See for more information.

2 >>> from os import walk >>> tree = walk('.')
Walk takes in a directory and returns a list of tuples. As walk uses recursion, and this can be a quite complex concept to understand if you’ve not encountered it before, we’ll walk through how walk works, which may help us understand its output more easily.

3 walk('.') A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt
So, given this directory structure, walk would create a tuple with…

4 walk('.') . A B C a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt
The path to the current directory, for example, dot.

5 walk('.') . ['C', 'A', 'B'] A B C a1.txt a2.txt b.txt P Q c.txt p.txt
q1.txt q2.txt . ['C', 'A', 'B'] There would be a list of the directories in the current directory, in this case A, B and C. As for listdir, the list of the directories is in no specific order.

6 walk('.') . ['C', 'A', 'B'] [] A B C a1.txt a2.txt b.txt P Q c.txt
p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] And there would be a list of the files in the current directory. In this case there are none so the list is empty.

7 walk('.') . ['C', 'A', 'B'] [] A B C a1.txt a2.txt b.txt P Q c.txt
p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] Walk then recurses. That is to say, it calls itself, using each directory in the current directory in turn. So it calls itself on the first directory which is C.

8 walk('.') walk('./C') . ['C', 'A', 'B'] [] A B C a1.txt a2.txt b.txt P
Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] []

9 walk('.') walk('./C') . ['C', 'A', 'B'] [] ./C A B C a1.txt a2.txt
b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C In this case, the path to the directory is dot C.

10 walk('.') walk('./C') . ['C', 'A', 'B'] [] ./C [] A B C a1.txt a2.txt
b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] C has no sub-directories so the directory list is empty.

11 walk('.') walk('./C') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] A B C
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] And C has one file, c.txt.

12 walk('.') walk('./C') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] A B C
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] As C has no sub-directories, the call to walk on C exits.

13 walk('.') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] A B C a1.txt a2.txt
b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] And we’re back in the original call to walk. This now moves onto the next directory in the list…

14 walk('.') walk('./A') . ['C', 'A', 'B'] [] ./C [] ['c.txt'] A B C
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] …which is A.

15 walk('.') walk('./A') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] A has no directories and two files, a1.txt and a2.txt.

16 walk('.') walk('./A') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] A has no sub-directories so the call to walk on A exits.

17 walk('.') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] And again we’re back in the original call to walk. This now moves onto the next directory in the list…

18 walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] …which is B.

19 walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] B has one file, b.txt, and two directories, P and Q.

20 walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] The sub-directories of B are then “walked” in turn. So, starting with P…

21 walk('.') walk('./B') walk('./B/P') . ['C', 'A', 'B'] []
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt']

22 walk('.') walk('./B') walk('./B/P') . ['C', 'A', 'B'] []
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] P has one file and no directories.

23 walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] As P has no directories, we return up a level and move onto the next directory of B’s…

24 walk('.') walk('./B') walk('./B/Q') . ['C', 'A', 'B'] []
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] …which is Q which has no directories and two files.

25 walk('.') walk('./B') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] As Q has no directories, we return up to B.

26 walk('.') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] As we’re done both P and Q were finished with B and so we return to our original directory.

27 walk('.') . ['C', 'A', 'B'] [] ./C [] ['c.txt']
a1.txt a2.txt b.txt P Q c.txt p.txt q1.txt q2.txt . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] And as we’ve now done A, B and C, we’re finished.

28 >>> from os import walk >>> tree = walk('.')
So, here’s how we’d call walk in our code.

29 walk returns a list of tuples
>>> from os import walk >>> tree = walk('.') walk returns a list of tuples We now know that walk returns a list of tuples so let’s save them in a variable.

30 >>> from os import walk >>> tree = walk('.')
>>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... We know that each tuple consists of a directory path, a list of sub-directories in that directory, and a list of files. So we can use a for-in loop to print each tuple in the list in turn.

31 >>> from os import walk >>> tree = walk('.')
>>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] And here is the result.

32 Each tuple contains a directory
>>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] Each tuple contains a directory Remember, each tuple contains a directory…

33 Each tuple contains a directory, its subdirectories
>>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] Each tuple contains a directory, its subdirectories The list of subdirectories in each directory. If there are none then this is an empty list.

34 Each tuple contains a directory, its subdirectories, and its files
>>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] Each tuple contains a directory, its subdirectories, and its files And, each tuple also contains the list of files in each directory, again an empty list if there are none.

35 walk’s input is used as a prefix for each directory name
>>> from os import walk >>> tree = walk('.') >>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... . ['C', 'A', 'B'] [] ./C [] ['c.txt'] ./A [] ['a1.txt', 'a2.txt'] ./B ['P', 'Q'] ['b.txt'] ./B/P [] ['p.txt'] ./B/Q [] ['q1.txt' 'q2.txt'] walk’s input is used as a prefix for each directory name For each directory, the directory name given to walk is used as a prefix, in this case the dot.

36 >>> tree = walk(getcwd())
So, if we use walk with getcwd to get the current working directory…

37 >>> tree = walk(getcwd())
>>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... /user/vlad ['C', 'A', 'B'] [] /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] And print the results.

38 >>> tree = walk(getcwd())
>>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... /user/vlad ['C', 'A', 'B'] [] /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] We can see that the current working directory is the prefix.

39 >>> tree = walk(getcwd(), topdown=False)
walk supports an optional topdown argument which by default is true. If we set this to false then..

40 >>> tree = walk(getcwd(), topdown=False)
>>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad ['C', 'A', 'B'] [] …tuples from child directories appear before their parents in the list…

41 P and Q are before B >>> tree = walk(getcwd(), topdown=False)
>>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad ['C', 'A', 'B'] [] P and Q are before B P and Q’s tuples appear before that of their parent, B.

42 A, B and C are before the original directory
>>> tree = walk(getcwd(), topdown=False) >>> for dir,subdirs,files in tree: print "%s %s %s" %(dir,subdirs,files) ... /user/vlad/C [] ['c.txt'] /user/vlad/A [] ['a1.txt', 'a2.txt'] /user/vlad/B/P [] ['p.txt'] /user/vlad/B/Q [] ['q1.txt' 'q2.txt'] /user/vlad/B ['P', 'Q'] ['b.txt'] /user/vlad ['C', 'A', 'B'] [] A, B and C are before the original directory And A, B and C’s tuples appear in the list before those of the original directory.

43 Miscellaneous operating system interfaces
os Miscellaneous operating system interfaces walk Recursively explore directory contents To summarize, in this episode we saw how the walk function allows us to recursively explore a directory’s contents and gather a complete list of all the directories and files beneath it.

44 Mike Jackson and Greg Wilson
created by Mike Jackson and Greg Wilson Thank you for listening. May 2011 Copyright © Software Carpentry and The University of Edinburgh This work is licensed under the Creative Commons Attribution License See for more information.


Download ppt "Browsing Directories Using walk"

Similar presentations


Ads by Google