In many cases, data accumulates over time, and a lot of this data is not being accessed at all, sometimes for years. Accordingly, it would be great if you could create a report that shows which data, the last access date, its size and its owner.
With a few lines of Python, you can very efficiently traverse (I named it traverse.py) any mounted filesystem and save to a CSV file that has the following set of data:
- Filename (with absolute path)
- Last Access Date & Time
- File Size (in bytes)
- UID or username of the owner
import os, sys, time
import pathlib #pathlib only needed if fileowner will be a string not UID
for root, directories, filenames in os.walk(sys.argv[1]): #Loop through the filesystem or folder passed as an argument
for filename in filenames:
try:
fullpath = os.path.join(root, filename) #Get full path of the file
timelastaccessed = time.ctime(os.stat(fullpath).st_atime) #Get last access time
filesize = os.stat(fullpath).st_size #Get file size
fileowner = os.stat(fullpath).st_uid #Get file owner as a UID
#fileowner = pathlib.Path(fullpath).owner() #Get file owner as a string, much slower
print(fullpath, ",", timelastaccessed, ",", filesize, ",", fileowner, sep='') #Format to output in CSV
except OSError:
print("Path does not exist or is inaccessible") #In case the file was inaccessible
except UnicodeEncodeError:
print("Encoding Error") #In case of files with names that throw a UnicodeEncodeError exception
To execute, just run:
python3 traverse.py filesystem_or_folder >> output.csv
The above outputs the UID, but if you want to output the username itself, just comment the fileowner = os.stat(xxxxx) and uncomment the next line that uses pathlib
You can then import that CSV file into any available analytics platform like PowerBI and do your magic!
That’s it! Enjoy!