SharePoint PowerShell Script to Extract All Documents and Their Versions

Hey! Listen: This script doesn’t extract documents that suffer from Longurlitis (URL greater than the SharePoint maximum of 260 characters). So you may also want to also run the PowerShell Script To Find and Extract Files From SharePoint That Have A URL Longer Than 260 Characters.

Recently a client asked to extract all content from a SharePoint site for archival. A CMP file was out of the question, because this had to be a SharePoint independent solution.  Powershell to the rescue! The script below  will extract all documents and their versions, as well as all metadata and list data to CSV files.

The DownloadSite function will download all the documents and their versions into folders named after their respective document libraries. Versions will be named [filename]_v[version#].[extension].

The DownloadMetadata function will download all the document library’s metadata as well as list data from the site and export it as a CSV file. If you don’t need to download the metadata/ lists, just comment out the function below.

There’s also ample commenting in case someone wants to modify/ expand upon the script!

# This script will extract all of the documents and their versions from a site. It will also
# download all of the list data and document library metadata as a CSV file.
 
Add-PSSnapin Microsoft.SharePoint.PowerShell -erroraction SilentlyContinue
# 
# $destination: Where the files will be downloaded to
# $webUrl: The URL of the website containing the document library for download
# $listUrl: The URL of the document library to download
 
#Where to Download the files to. Sub-folders will be created for the documents and lists, respectively.
$destination = "C:\Export"
 
#The site to extract from. Make sure there is no trailing slash.
$site = "http://yoursitecollection/yoursite"
 
# Function: HTTPDownloadFile
# Description: Downloads a file using webclient
# Variables
# $ServerFileLocation: Where the source file is located on the web
# $DownloadPath: The destination to download to
 
function HTTPDownloadFile($ServerFileLocation, $DownloadPath)
{
	$webclient = New-Object System.Net.WebClient
	$webClient.UseDefaultCredentials = $true
	$webclient.DownloadFile($ServerFileLocation,$DownloadPath)
}
 
function DownloadMetadata($sourceweb, $metadatadestination)
{
	Write-Host "Creating Lists and Metadata"
	$sourceSPweb = Get-SPWeb -Identity $sourceweb
	$metadataFolder = $destination+"\"+$sourceSPweb.Title+" Lists and Metadata"
	$createMetaDataFolder = New-Item $metadataFolder -type directory 
	$metadatadestination = $metadataFolder
 
	foreach($list in $sourceSPweb.Lists)
	{
		Write-Host "Exporting List MetaData: " $list.Title
		$ListItems = $list.Items 
		$Listlocation = $metadatadestination+"\"+$list.Title+".csv"
		$ListItems | Select * | Export-Csv $Listlocation  -Force
	}
}
 
# Function: GetFileVersions
# Description: Downloads all versions of every file in a document library
# Variables
# $WebURL: The URL of the website that contains the document library
# $DocLibURL: The location of the document Library in the site
# $DownloadLocation: The path to download the files to
 
function GetFileVersions($file)
{
	foreach($version in $file.Versions)
	{
		#Add version label to file in format: [Filename]_v[version#].[extension]
		$filesplit = $file.Name.split(".") 
		$fullname = $filesplit[0] 
		$fileext = $filesplit[1] 
		$FullFileName = $fullname+"_v"+$version.VersionLabel+"."+$fileext			
 
		#Can't create an SPFile object from historical versions, but CAN download via HTTP
		#Create the full File URL using the Website URL and version's URL
		$fileURL = $webUrl+"/"+$version.Url
 
		#Full Download path including filename
		$DownloadPath = $destinationfolder+"\"+$FullFileName
 
		#Download the file from the version's URL, download to the $DownloadPath location
		HTTPDownloadFile "$fileURL" "$DownloadPath"
	}
}
 
# Function: DownloadDocLib
# Description: Downloads a document library's files; called GetGileVersions to download versions.
# Credit 
# Used Varun Malhotra's script to download a document library
# as a starting point: http://blogs.msdn.com/b/varun_malhotra/archive/2012/02/13/10265370.aspx
# Variables
# $folderUrl: The Document Library to Download
# $DownloadPath: The destination to download to
function DownloadDocLib($folderUrl)
{
    $folder = $web.GetFolder($folderUrl)
    foreach ($file in $folder.Files) 
	{
        #Ensure destination directory
		$destinationfolder = $destination + "\" + $folder.Url 
        if (!(Test-Path -path $destinationfolder))
        {
            $dest = New-Item $destinationfolder -type directory 
        }
 
        #Download file
        $binary = $file.OpenBinary()
        $stream = New-Object System.IO.FileStream($destinationfolder + "\" + $file.Name), Create
        $writer = New-Object System.IO.BinaryWriter($stream)
        $writer.write($binary)
        $writer.Close()
 
		#Download file versions. If you don't need versions, comment the line below.
		GetFileVersions $file
	}
}
 
# Function: DownloadSite
# Description: Calls DownloadDocLib recursiveley to download all document libraries in a site.
# Variables
# $webUrl: The URL of the site to download all document libraries
function DownloadSite($webUrl)
{
	$web = Get-SPWeb -Identity $webUrl
 
	#Create a folder using the site's name
	$siteFolder = $destination + "\" +$web.Title+" Documents"
	$createSiteFolder = New-Item $siteFolder -type directory 
	$destination = $siteFolder
 
	foreach($list in $web.Lists)
	{
		if($list.BaseType -eq "DocumentLibrary")
		{
			Write-Host "Downloading Document Library: " $list.Title
			$listUrl = $web.Url +"/"+ $list.RootFolder.Url
			#Download root files
			DownloadDocLib $list.RootFolder.Url
			#Download files in folders
			foreach ($folder in $list.Folders) 
			{
    			DownloadDocLib $folder.Url
			}
		}
	}
}
 
#Download Site Documents + Versions
DownloadSite "$site"
 
#Download Site Lists and Document Library Metadata
DownloadMetadata $site $destination