In many cases, data accumulates over time, and a lot of this data is not being accessed at all, sometimes for years. Accordingly, it would be great if you could create a report that shows which data, the last access date, its size and its owner.
With a few lines of Python, you can very efficiently traverse (I named it traverse.py
) any mounted filesystem and save to a CSV file that has the following set of data:
- Filename (with absolute path)
- Last Access Date & Time
- File Size (in bytes)
- UID or username of the owner
import os, sys, time
import pathlib #pathlib only needed if fileowner will be a string not UID
for root, directories, filenames in os.walk(sys.argv[1]): #Loop through the filesystem or folder passed as an argument
for filename in filenames:
try:
fullpath = os.path.join(root, filename) #Get full path of the file
timelastaccessed = time.ctime(os.stat(fullpath).st_atime) #Get last access time
filesize = os.stat(fullpath).st_size #Get file size
fileowner = os.stat(fullpath).st_uid #Get file owner as a UID
#fileowner = pathlib.Path(fullpath).owner() #Get file owner as a string, much slower
print(fullpath, ",", timelastaccessed, ",", filesize, ",", fileowner, sep='') #Format to output in CSV
except OSError:
print("Path does not exist or is inaccessible") #In case the file was inaccessible
except UnicodeEncodeError:
print("Encoding Error") #In case of files with names that throw a UnicodeEncodeError exception
To execute, just run:
python3 traverse.py filesystem_or_folder >> output.csv
The above outputs the UID, but if you want to output the username itself, just comment the fileowner = os.stat(xxxxx) and uncomment the next line that uses pathlib
You can then import that CSV file into any available analytics platform like PowerBI and do your magic!
That’s it! Enjoy!
Like this:
Like Loading...