PowerShell Script To Find and Extract Files From SharePoint That Have A URL Longer Than 260 Characters

If you’re here, it can only mean one thing: Your users have created a folder and filename path in SharePoint that is so long that they’re now getting errors, and they can’t edit the document in Office applications. Like a burrowing parasite, the office document document has gone deeper and deeper into a convoluted folder structure until the URL exceeds the SharePoint limit of 260 characters.

It’s called Longurlitis, and while painful, it is curable.

Take this (not a real world) example:

http://sharepoint.company.com/Documents/Dave%20Documents/My%20Administrator%20Told%20Me%20To%20Use/
Metadata%20but%20what%20does%20he%20know/I’ll%20show%20him/
You’ll%20take%20these%20folders%20away%20from%20my%20cold%20dead%20hands/
Mwahahahahahahahahahahahahahahahahahaha/
Dave’s%20Word%20Document%20With%20Meeting%20Minutes%20From%201992%20
About%20That%20Upcoming%20Millenium%20Bug%20No%20I%20Can’t%20Get%20Rid%20Of%20These/
Meeting%20Minutes%20June%20Fourteenth%201992%20FINAL%20VERSION%20DRAFT%20v482.doc

Ouch, that hurt my everything.

Now when the user tries to do anything with this file, you get this:

The URL for this file is too long for the application. A temporary copy of this file will be opened on your computer. You must save this copy as a new file.
Every time this error appears, a SharePoint administrator sheds a tear.

This issue has been discussed at fairly great length a number of places, and by SharePoint heavyweights like Joel Oleson.

The problem is fairly straightforward when it’s only one or two files, but what do you do when this isn’t a one-off, but a full-on infestation of Longurlitis? I recently came across over 700 files with Longurlitis in an old site that’s being decommissioned. After running the SharePoint Site Extraction script, I found a number of files missing. The reason being is that the script’s BinaryWriter chokes on the insane URL length.

To that end I’ve created the script below that finds all files in a given site that with a URL that exceeds 260 characters. You can also download all of the files by un-commenting the designated line. Also if you’d like to spit out an inventory CSV file with the URL and number of characters, just use the Out-File Cmdlet like so:

FindLongPaths.ps1 | Out-File -filepath C:\wherever\LongFiles.csv

Note: I had to use a character other than a comma for the CSV output, because it’s possible and likely that the same users who are creating these files are also putting commas in the filename. Because the hash character (#) is forbidden in SharePoint URLs, it works well for delimiting the output.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Add-PSSnapin Microsoft.SharePoint.PowerShell -erroraction SilentlyContinue
 
#Where to Download the files to. 
$destination = "C:\Wherever"
 
#The site to extract from. Make sure there is no trailing slash.
$site = "http://sharepoint.company.com/site"
 
# Function: DownloadLongFiles
# Description: Downloads all documents with a URL > 260 characters
# Variables
# $folderUrl: The Document Library to Download
function DownloadLongFiles($folderUrl)
{
    $folder = $web.GetFolder($folderUrl)
 
    foreach ($file in $folder.Files) 
	{
		$encodedURL = $file.url -replace " ", "%20"
		$FullURL = $site+'/'+$encodedURL
	    $URLWithLength = $FullURL+'#'+$FullURL.length
		$Filename = $file.Name
		$Downloadpath = $destination+'\'+$Filename
		if ($FullURL.length -ge 260)
		{
			#Uncomment the line below to download the files.
			#HTTPDownloadFile "$FullURL" "$Downloadpath"
			Write-Host $FullURL
			Write-Host $destination
 
			Write-Host $URLWithLength
			Write-Output $URLWithLength
		}
 
	}
}
 
# Function: DownloadSite
# Description: Calls DownloadLongFiles recursiveley to download all documents with long file names in a site.
# Variables
# $webUrl: The URL of the site to download all document libraries
function DownloadSite($webUrl)
{
	$web = Get-SPWeb -Identity $webUrl
 
	foreach($list in $web.Lists)
	{
		if($list.BaseType -eq "DocumentLibrary")
		{
			DownloadLongFiles $list.RootFolder.Url
			#Download files in folders
			foreach ($folder in $list.Folders) 
			{
    			DownloadLongFiles $folder.Url
			}
		}
	}
}
 
# Function: HTTPDownloadFile
# Description: Downloads a file using webclient
# Variables
# $ServerFileLocation: Where the source file is located on the web
# $DownloadPath: The destination to download to
 
function HTTPDownloadFile($ServerFileLocation, $DownloadPath)
{
	$webclient = New-Object System.Net.WebClient
	$webClient.UseDefaultCredentials = $true
	$webclient.DownloadFile($ServerFileLocation,$DownloadPath)
}
 
#Download Site Documents + Versions
DownloadSite "$site"

Close All Open Items In Outlook

A small annoyance I have with Outlook is that it leaves all the messages you’ve opened as separate windows, cluttering up the screen. Luckily, Outlook has a “Close All Items” option. I added it to my quick access toolbar in Outlook to save time; here’s how:

  1. In the upper Left, click on “Customize The Quick Access Toolbar” dropdown, then click “More Commands…
  2. Now select the “Close All Items” command, and click “Add >>” Now click “Ok
  3. Huzzah! Instant screen de-clutterer button!

SharePoint Cleanup: Inventory of All Documents In Site Collection

The Problem

Most of our SharePoint sites are long-running projects or continuing programs that have no end date – meaning site decommissioning will never happen.  As such, document libraries can get pretty cluttered over the years with junk, making it difficult for users to find what they need (especially if Metadata isn’t used!), and bloating up SharePoint storage requirements. A lot of people will say that you can pick up a 2 terabyte drive at Futureshop for $100, but:

  1. Isn’t it better to keep things neat and well organized?
  2. Most SharePoint farms run on storage that costs way more than $100/ TB and
  3. Don’t throw hardware at a people problem! Just clean up already, it will save time, money and sanity!

Unfortunately cleaning up these sites can be a gargantuan job as thousands of files and hundreds of folders (shudder) pile up. The site owners are loath to tackle the big job of cleaning up, because it takes a fair bit of time, and most of the time they don’t even know where to start.

The Solution

Luckily, Gary LaPointe of STSADM (and Powershell) fame has a script to list all documents in the entire farm. This guy is a fountain of knowledge wrapped in a library inside a crunchy chipotle taco.

Gary’s script will spit out a CSV file of every document in the farm. From there it’s simple to pop it into Excel, do a bit of sorting, and conditional formatting to produce reports for your site collection administrators and site owners. With this report in hand, it’s easy to get them to clean things up, because all the problem documents are laid out for them.

We developed criteria for ‘the usual suspects’, ie: red flags that indicate something may need to be archived or deleted:

  1. Item Creation Date > 1 year ago
    Maybe these documents are still relevant, maybe not. In many cases site owners and users had forgotten they existed.
  2. Item Modified Date > 6 months ago
    Like with #1, this is kind of a ‘yellow flag’ – maybe it’s worth keeping, maybe it’s junk.
  3. File Size > 40MB
    This, to me, is an indication that a file needs to be looked at. It’s fairly rare in our case to have an Office document to get this large.
  4. Number of Versions > 5
    Our governance model limits the number of stored versions to 10. Anything more than 5 may need a look at. In some cases site owners had turned on versioning ‘just to see what it looks like’ and forgotten to turn it off – the actual functionality wasn’t used.
  5. Total file size (file size x # of versions) > 50MB
    This nabbed a lot of problem files. Some noteable examples were BMP (uncompressed image) files that were 40MB and had 10 versions – so 400MB just for one file. By compressing the BMP to a GIF or JPG, we took the file size down to 10kb, making the total potential file size 100kb.
  6. Item contains the word ‘Archive’, ‘Temp’/ ‘Temporary’, ‘Old’, etc.
    Big red flag right here. Site members and owners will often ‘cleanup’ their libraries by taking all stuff they aren’t sure about and clicking and dragging it all into a folder called “Archive” (thanks, Explorer View!). A lot of times things were dragged here and completely forgotten about.
  7. ZIP, RAR, TAR and other compressed file types
    This might be a contentious issue, but to me it’s rare that compressed files should be stored in SharePoint. A ZIP file containing 100 word documents has little value in SharePoint simply because they must first download the entire ZIP file, get the doc they need, re-zip it,  re-upload it and replace the old file. Sounds a lot like the pre-SharePoint document collaboration days.
    My other issue with compressed files is that they’re often used to circumvent the banned file types (ie: .bat, .cab, .exe, etc).
  8. Files with extensions > 4 characters
    Again, another tactic used to circumvent the banned file types. Some users would take a bat file and rename it from MyScript.bat to MyScript.bat_rename. Users, please don’t do this.

The above are just ideas on where to start, and totally depend on your Governance model, policies, and your farm. Maybe you’re ok with having .exe files in ZIP files on your farm – that’s cool! The point is to work out what works for you and your farm.

The Result

We’ve already shaved off about 20GB of junk from our farm. Our site collection administrators, site owners, and users were actually really happy to go through and clean stuff up. In a few cases, I was flat out told that they wanted to do it for months but had no idea where to start. We now have plans to automate and make this reporting a regular part of our farm maintenance.