SharePoint Online - Detecting Possibly Corrupted Files ...with Thumbnails?
GitHub Link
Grab it at: https://github.com/RecursiveKea/SharePointScripts/tree/main/Potentially%20Corrupted%20File%20Detection
Background
While working with SharePoint you notice that some documents have the office viewer and others have autogenerated thumbnails within the information panels on the top right:
They also appear in search results:
You can also add an image column to your document library to include the thumbnail in the view - no extra setup required:
See this blog post on how to do this: sharepointdiary.com/2019/10/sharepoint-onli..
Thumbnail Support
It's important to mention that SharePoint of course won't create thumbnails for all file types as they are not all supported. Valid Office documents (.docx, .xlsx, .pptx, etc), emails (.msg), images (.jpg, .png, etc), and text (.txt) documents seem to be fine.
Corrupt File Test
So knowing that some files in SharePoint could become corrupted (partially uploaded, incorrect extension, etc) I thought SharePoint wouldn't be able to read the file and therefore couldn't generate a thumbnail for the supported extensions. So I forced a few files to be corrupted by changing the extension from a word doc to other file types:
Sure enough no thumbnails generated (excluding emails - see below for more detail). But in the thumbnail image column the image source still had a url:
But when going to that url you get a bad request message:
Theory Time
So I thought as the thumbnails are automatically generated for valid documents and not for corrupted documents and that an error is thrown I could leverage this to scan a folder / library / site / tenant for corrupted files and using thumbnails for this. Certainly a strange idea but from initial testing in my dev tenancy it seems to work well.
Important Notes
This is an in-dev script and will be improved from future use
This script will flag any SharePoint thumbnail unsupported extensions that are very likely not corrupted - if I run this through an environment with a lot of different documents I can get a supported list and refine the script. The commented out supported list isn't complete as there are others that are supported missing from that list.
Corrupted Emails appear to generate a consistent thumbnail. With this in mind the script will actually download the thumbnail and OCR it (using https://www.powershellgallery.com/packages/PsOcr/1.1.0/Content/root.psm1) and check if it matches the empty email text like the below
There are other ways to detect corrupted files but would need to download a local copy. This isn't perfect and is version 0.1 and will raise false-positives but where it does work could be quite useful
How To Run
Grab the script from GitHub and use one of the examples in the Example Script similar to:
$ModulePath = "{PATH}";
. ($ModulePath + "\Module-SPOnlineDetectCorruptedFiles.ps1") -ModulePath $ModulePath;
$SiteUrlToCheck = "{SiteUrl}";
$LibraryToCrawl = "{LibraryRootFolder}";
Connect-PnPOnline -Url $SiteUrlToCheck -Interactive -ErrorAction Stop;
Detect-SPOnlinePotentiallyCorruptedFiles -FolderPath $LibraryToCrawl -TempExportFolder "C:\TEMP";
$ModulePath: Path you have downloaded the scripts to
$SiteUrlToCheck: The Site you want to check
$LibraryToCrawl: The Library you want to crawl
TempExportFolder: This is used to temporarily store the email thumbnail
The script returned results, what do I do?
First, check that the file is truly corrupted by opening it in its application
Contact the person that uploaded the document as they might have a local copy of the file
Check if there are prior versions of the file, you might need to restore to a previous version