Create a filesystem report with Python

In many cases, data accumulates over time, and a lot of this data is not being accessed at all, sometimes for years. Accordingly, it would be great if you could create a report that shows which data, the last access date, its size and its owner.

With a few lines of Python, you can very efficiently traverse (I named it any mounted filesystem and save to a CSV file that has the following set of data:

  • Filename (with absolute path)
  • Last Access Date & Time
  • File Size (in bytes)
  • UID or username of the owner
import os, sys, time
import pathlib #pathlib only needed if fileowner will be a string not UID

for root, directories, filenames in os.walk(sys.argv[1]): #Loop through the filesystem or folder passed as an argument
    for filename in filenames:
            fullpath = os.path.join(root, filename) #Get full path of the file
            timelastaccessed = time.ctime(os.stat(fullpath).st_atime) #Get last access time
            filesize = os.stat(fullpath).st_size #Get file size
            fileowner = os.stat(fullpath).st_uid #Get file owner as a UID
            #fileowner = pathlib.Path(fullpath).owner() #Get file owner as a string, much slower
            print(fullpath, ",", timelastaccessed, ",", filesize, ",", fileowner, sep='')  #Format to output in CSV
        except OSError:
            print("Path does not exist or is inaccessible") #In case the file was inaccessible
        except UnicodeEncodeError:
            print("Encoding Error") #In case of files with names that throw a UnicodeEncodeError exception

To execute, just run:

python3 filesystem_or_folder >> output.csv

The above outputs the UID, but if you want to output the username itself, just comment the fileowner = os.stat(xxxxx) and uncomment the next line that uses pathlib

You can then import that CSV file into any available analytics platform like PowerBI and do your magic!

That’s it! Enjoy!

This entry was posted in Linux and tagged

