SharePoint Online - Detecting Possibly Corrupted Files ...with Thumbnails?

·

3 min read

Grab it at: https://github.com/RecursiveKea/SharePointScripts/tree/main/Potentially%20Corrupted%20File%20Detection

Background

While working with SharePoint you notice that some documents have the office viewer and others have autogenerated thumbnails within the information panels on the top right:

They also appear in search results:

You can also add an image column to your document library to include the thumbnail in the view - no extra setup required:

See this blog post on how to do this: sharepointdiary.com/2019/10/sharepoint-onli..

Thumbnail Support

It's important to mention that SharePoint of course won't create thumbnails for all file types as they are not all supported. Valid Office documents (.docx, .xlsx, .pptx, etc), emails (.msg), images (.jpg, .png, etc), and text (.txt) documents seem to be fine.

Corrupt File Test

So knowing that some files in SharePoint could become corrupted (partially uploaded, incorrect extension, etc) I thought SharePoint wouldn't be able to read the file and therefore couldn't generate a thumbnail for the supported extensions. So I forced a few files to be corrupted by changing the extension from a word doc to other file types:

Sure enough no thumbnails generated (excluding emails - see below for more detail). But in the thumbnail image column the image source still had a url:

But when going to that url you get a bad request message:

Theory Time

So I thought as the thumbnails are automatically generated for valid documents and not for corrupted documents and that an error is thrown I could leverage this to scan a folder / library / site / tenant for corrupted files and using thumbnails for this. Certainly a strange idea but from initial testing in my dev tenancy it seems to work well.

Important Notes

  • This is an in-dev script and will be improved from future use

  • This script will flag any SharePoint thumbnail unsupported extensions that are very likely not corrupted - if I run this through an environment with a lot of different documents I can get a supported list and refine the script. The commented out supported list isn't complete as there are others that are supported missing from that list.

  • Corrupted Emails appear to generate a consistent thumbnail. With this in mind the script will actually download the thumbnail and OCR it (using https://www.powershellgallery.com/packages/PsOcr/1.1.0/Content/root.psm1) and check if it matches the empty email text like the below

  • There are other ways to detect corrupted files but would need to download a local copy. This isn't perfect and is version 0.1 and will raise false-positives but where it does work could be quite useful

How To Run

Grab the script from GitHub and use one of the examples in the Example Script similar to:

$ModulePath = "{PATH}";
. ($ModulePath + "\Module-SPOnlineDetectCorruptedFiles.ps1") -ModulePath $ModulePath;

$SiteUrlToCheck = "{SiteUrl}";
$LibraryToCrawl = "{LibraryRootFolder}";
Connect-PnPOnline -Url $SiteUrlToCheck -Interactive -ErrorAction Stop;
Detect-SPOnlinePotentiallyCorruptedFiles -FolderPath $LibraryToCrawl -TempExportFolder "C:\TEMP";
  • $ModulePath: Path you have downloaded the scripts to

  • $SiteUrlToCheck: The Site you want to check

  • $LibraryToCrawl: The Library you want to crawl

  • TempExportFolder: This is used to temporarily store the email thumbnail

The script returned results, what do I do?

  • First, check that the file is truly corrupted by opening it in its application

  • Contact the person that uploaded the document as they might have a local copy of the file

  • Check if there are prior versions of the file, you might need to restore to a previous version