In many cases, data accumulates over time, and a lot of this data is not being accessed at all, sometimes for years. Accordingly, it would be great if you could create a report that shows which data, the last access date, its size and its owner.
With a few lines of Python, you can very efficiently traverse (I named it
traverse.py) any mounted filesystem and save to a CSV file that has the following set of data:
- Filename (with absolute path)
- Last Access Date & Time
- File Size (in bytes)
- UID or username of the owner
import os, sys, time import pathlib #pathlib only needed if fileowner will be a string not UID for root, directories, filenames in os.walk(sys.argv): #Loop through the filesystem or folder passed as an argument for filename in filenames: try: fullpath = os.path.join(root, filename) #Get full path of the file timelastaccessed = time.ctime(os.stat(fullpath).st_atime) #Get last access time filesize = os.stat(fullpath).st_size #Get file size fileowner = os.stat(fullpath).st_uid #Get file owner as a UID #fileowner = pathlib.Path(fullpath).owner() #Get file owner as a string, much slower print(fullpath, ",", timelastaccessed, ",", filesize, ",", fileowner, sep='') #Format to output in CSV except OSError: print("Path does not exist or is inaccessible") #In case the file was inaccessible except UnicodeEncodeError: print("Encoding Error") #In case of files with names that throw a UnicodeEncodeError exception
To execute, just run:
python3 traverse.py filesystem_or_folder >> output.csv
The above outputs the UID, but if you want to output the username itself, just comment the fileowner = os.stat(xxxxx) and uncomment the next line that uses pathlib
You can then import that CSV file into any available analytics platform like PowerBI and do your magic!
That’s it! Enjoy!