Exporting Salesforce Files (aka ContentDocument)

Posted by Johan on Tuesday, July 3, 2018

Last week a client asked me to help out, we had been creating a system that creates PDF files in Salesforce using Drawloop (today known as Nintex Document Generation which is a boring name).

Anyways, we had about 2000 PDF created in the system and after looking into it there doesn’t seem to be a way to download them in bulk. Sure you can use the Dataloader and download them but you’ll get the content in a CSV column and that doesn’t really fly with most customers.

I tried dataloader.io, Realfire and search through every link on Google or at least the first 2 pages and I didn’t find a good way of doing it.

There seems to be an old AppExchange listing for FileExporter by Salesforce Labs and I think this is the actual software FileExporter and it stopped working with the TLS 1.0 deprecation.

Enough of small talk, I had to solve the problem so I went ahead and created a very simple Python script that lets you specify the query to find your ContentVersion objects and also filter the ContentDocuments if you need to ignore some ids.

My very specific use case was that I was to export all PDF files with a certain pattern in the filename but only those that were related to a custom object that had a certain status. Given that you can’t do certain queries like this one:

SELECT ContentDocumentId, Title, VersionData, CreatedDate FROM ContentVersion WHERE ContentDocumentId IN (
SELECT ContentDocumentId FROM ContentDocumentLink where LinkedEntityId IN (SELECT Id FROM Custom_Object__c))

It gives you a: Entity 'ContentDocumentLink' is not supported for semi join inner selects

I had to implement the option for the second query which gives a list of valid ContentDocumentIds to include in the download.

The code is at https://github.com/snorf/salesforce-files-download, feel free to try it out and let me know if it works or doesn’t work out for you.

One more thing, keep in mind that even if you’re an administration with View All you will not see ContentDocuments that doesn’t belong to you or are explicitly shared with you. You’ll need to either change the ownership of the affected files or share them with the user running the Python script.

Ohana!