Exporting Salesforce Files (aka ContentDocument)

Last week a client asked me to help out, we had been creating a system that creates PDF files in Salesforce using Drawloop (today known as Nintex Document Generation which is a boring name).

Anyways, we had about 2000 PDF created in the system and after looking into it there doesn’t seem to be a way to download them in bulk. Sure you can use the Dataloader and download them but you’ll get the content in a CSV column and that doesn’t really fly with most customers.

I tried dataloader.io, Realfire and search through every link on Google or at least the first 2 pages and I didn’t find a good way of doing it.

There seems to be an old AppExchange listing for FileExporter by Salesforce Labs and I think this is the actual software FileExporter and it stopped working with the TLS 1.0 deprecation.

Enough of small talk, I had to solve the problem so I went ahead and created a very simple Python script that lets you specify the query to find your ContentVersion objects and also filter the ContentDocuments if you need to ignore some ids.

My very specific use case was that I was to export all PDF files with a certain pattern in the filename but only those that were related to a custom object that had a certain status. Given that you can’t do certain queries like this one:

SELECT ContentDocumentId, Title, VersionData, CreatedDate FROM ContentVersion WHERE ContentDocumentId IN (
SELECT ContentDocumentId FROM ContentDocumentLink where LinkedEntityId IN (SELECT Id FROM Custom_Object__c))

It gives you a:

Entity 'ContentDocumentLink' is not supported for semi join inner selects

I had to implement the option for the second query which gives a list of valid ContentDocumentIds to include in the download.

The code is at https://github.com/snorf/salesforce-files-download, feel free to try it out and let me know if it works or doesn’t work out for you.

One more thing, keep in mind that even if you’re an administration with View All you will not see ContentDocuments that doesn’t belong to you or are explicitly shared with you. You’ll need to either change the ownership of the affected files or share them with the user running the Python script.

Ohana!

Uploading CSV data to Einstein Analytics with AWS Lambda (Python)


I have been playing around with Einstein Analytics (the thing they used to call Wave) and I wanted to automate the upload of data since there’s no reason on having dashboards and lenses if the data is stale.

After using Lambda functions against the Bulk API I wanted to have something similar and I found another nice project over at Heroku’s GitHub account called pyAnalyticsCloud

I don’t have a Postgres Database so I ended up using only the uploader.py file and wrote this Lambda function to use it:

from __future__ import print_function

import json
from base64 import b64decode
import boto3
import uuid
import os
import logging
import unicodecsv
from uploader import AnalyticsCloudUploader

logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3_client = boto3.client('s3')
username = os.environ['SF_USERNAME']
encrypted_password = os.environ['SF_PASSWORD']
encrypted_security_token = os.environ['SF_SECURITYTOKEN']
password = boto3.client('kms').decrypt(CiphertextBlob=b64decode(encrypted_password))['Plaintext'].decode('ascii')
security_token = boto3.client('kms').decrypt(CiphertextBlob=b64decode(encrypted_security_token))['Plaintext'].decode('ascii')
file_bucket = os.environ['FILE_BUCKET']
wsdl_file_key = os.environ['WSDL_FILE_KEY']
metadata_file_key = os.environ['METADATA_FILE_KEY']

def bulk_upload(csv_path, wsdl_file_path, metadata_file_path):
    with open(csv_path, mode='r') as csv_file:
        logger.info('Initiating Wave Data upload.')
        logger.debug('Loading metadata')
        metadata = json.loads(open(metadata_file_path, 'r').read())

        logger.debug('Loading CSV data')
        data = unicodecsv.reader(csv_file)
        edgemart = metadata['objects'][0]['name']

        logger.debug('Creating uploader')
        uploader = AnalyticsCloudUploader(metadata, data)
        logger.debug('Logging in to Wave')
        uploader.login(wsdl_file_path, username, password, security_token)
        logger.debug('Uploading data')
        uploader.upload(edgemart)
        logger.info('Wave Data uploaded.')
        return 'OK'

def handler(event, context):
    for record in event['Records']:
        # Incoming CSV file
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        csv_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        s3_client.download_file(bucket, key, csv_path)

        # WSDL file
        wsdl_file_path = '/tmp/{}{}'.format(uuid.uuid4(), wsdl_file_key)
        s3_client.download_file(file_bucket, wsdl_file_key, wsdl_file_path)

        # Metadata file
        metadata_file_path = '/tmp/{}{}'.format(uuid.uuid4(), metadata_file_key)
        s3_client.download_file(file_bucket, metadata_file_key, metadata_file_path)
        return bulk_upload(csv_path, wsdl_file_path, metadata_file_path)

Yes the logging is a bit on the extensive side and make sure to add these environment variables in AWS Lambda:

SF_USERNAME - your SF username
SF_PASSWORD - your SF password (encrypted)
SF_SECURITYTOKEN - your SF security token (encrypted)
FILE_BUCKET- the bucket in where to find the mapping file
METADATA_FILE_KEY- the path to the metadata file in that bucket (you get this from Einstein Analytics)
WSDL_FILE_KEY - the path to the wsdl partner file in the bucket

I added an S3 trigger that runs this function as soon as a new file is uploaded. It has some issues (crashing with parenthesis in the file name for example) so please don’t use this for a production workload before making it enterprise grade.

Note: The code above only works in Python 2.7

Cheers

Using AWS Lambda functions with the Salesforce Bulk API


One common task when integrating Salesforce with customers system is to import data, either as a one time task or regularly.

This can be done in several ways depending on the inhouse technical level and the simplest way might be to use the Import Wizard or the Data Loader. If you want to do it regularly in a batch fashion and are fortunate enough to have AWS infrastructure available using Lambda functions is an alternative.

Recently I did this as a prototype and I will share my findings here.

I will not go into details about AWS and Lambda, I used this tutorial to get started with Lambda functions but most of it didn’t concern the Salesforce parts but rather AWS specifics like IAM.

I found this Heroku project for using the bulk api.

The full python code looks like this:

from __future__ import print_function
from base64 import b64decode
import boto3
import uuid
import csv
import os
from salesforce_bulk import SalesforceBulk, CsvDictsAdapter
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3_client = boto3.client('s3')
username = os.environ['SF_USERNAME']
encrypted_password = os.environ['SF_PASSWORD']
encrypted_security_token = os.environ['SF_SECURITYTOKEN']
password = boto3.client('kms').decrypt(CiphertextBlob=b64decode(encrypted_password))['Plaintext'].decode('ascii')
security_token = boto3.client('kms').decrypt(CiphertextBlob=b64decode(encrypted_security_token))['Plaintext'].decode('ascii')
mapping_file_bucket = os.environ['MAPPING_FILE_BUCKET']
mapping_file_key = os.environ['MAPPING_FILE_KEY']

def bulk_upload(csv_path, mapping_file_path):
    with open(csv_path, mode='r') as infile:
        logger.info('Trying to login to SalesforceBulk')
        job = None
        try:
            bulk = SalesforceBulk(username=username, password=password, security_token=security_token)
            job = bulk.create_insert_job("Account", contentType='CSV')

            # Mapping file
            mapping_file = open(mapping_file_path, 'rb')
            bulk.post_mapping_file(job, mapping_file.read())

            accounts = csv.DictReader(infile)
            csv_iter = CsvDictsAdapter(iter(accounts))
            batch = bulk.post_batch(job, csv_iter)
            bulk.wait_for_batch(job, batch)
            bulk.close_job(job)
            logger.info('Done. Accounts uploaded.')
        except Exception as e:
            if job:
                bulk.abort_job(job)
            raise e
        return 'OK'

def handler(event, context):
    for record in event['Records']:
        # Incoming CSV file
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        s3_client.download_file(bucket, key, download_path)

        # Mapping file
        mapping_file_path = '/tmp/{}{}'.format(uuid.uuid4(), mapping_file_key)
        s3_client.download_file(mapping_file_bucket, mapping_file_key, mapping_file_path)

        return bulk_upload(download_path, mapping_file_path)

Make sure to add the following environment variables in Lambda before executing

SF_USERNAME - your SF username
SF_PASSWORD - your SF password (encrypted)
SF_SECURITYTOKEN - your SF security token (encrypted)
MAPPING_FILE_BUCKET - the bucket in where to find the mapping file
MAPPING_FILE_KEY - the path to the mapping file in that bucket

I also added a method (In my own clone of the project here) to be able to provide the mapping file as part of the payload, I’ll make sure to create a pull request for this later.

The nice thing with using the Bulk API is that you get the monitoring directly in Salesforce, just go to to see the status of your job(s).

I haven’t added the listen to S3-trigger yet but it’s the next part of the tutorial so shouldn’t be a problem.

Cheers,
Johan

Polygon with holes in aggdraw

Drawing polygons with holes have haunted me for a while and recently I had to do it in Python with PIL.

Googling the subject gives you a hint.

This code:

import os
import Image
import aggdraw

draw = aggdraw.Draw('RGB', (100, 100), 'white')

path = aggdraw.Path()
path.moveto(10, 10)
path.lineto(10,60,60,60)
path.lineto(60,10)
path.lineto(10,10)
path.moveto(20,20)
path.lineto(40,20)
path.lineto(40,40)
path.lineto(20,40)
path.lineto(20,20)

pen = aggdraw.Brush("black")
draw.path((25, 25), path, pen, None)

img = Image.fromstring('RGB', (100, 100), draw.tostring())
p = os.path.join(os.path.dirname(__file__), 'box.png')
img.save(p)
 Gives the following image:
Polygon with hole created with aggdraw

 

Compiling scipy in 32 bit in a 64 bit environment (el5)

During the last 2 days I’ve been trying to compile an old product in 32 bit mode on a 64 bit Redhat Enterprise Linux 5 environment which should not be that hard.

Python itself is no problem:

TCC="gcc -m32" ./configure

(got information from here)

And this approach works for almost every 3rd party software, except for Scipy.

Scipy contains a lot of FORTRAN code and it wasn’t obvious how to get setup.py to understand that it should both build and link with the -m32 flag.

After a lot of trial and error this is what I used:

F90FLAGS="-m32" F77FLAGS="-m32" \
LDFLAGS="-g -Wall -shared -m32 -fPIC" \
$PYTHON setup.py config_fc --fcompiler=gnu95 install

There is probably other flags that are better but these worked for me, I think the LDFLAGS is what did it since they are used when g77 is linking the FORTRAN code. The flags also worked fine for numpy.

I will try to write more often but I have had a lot to do recently.

Google App Engine ReferenceProperty and HTML5 local storage

The best thing with my job is that I work with the same things that I can spend hours doing in my free time. Too bad you don’t have 40 hours a week free time.

It’s been a while but I have finally made som progress.

I had some troubles with BigTable (the database that you use in Google App Engine). I put pretty large arrays with weather data in db.BlobProperty but when I read this back from the database GAE ran out of memory, even if I didn’t touch the blob. After reading up on this I found out that I had to use db.ReferenceProperty.

As always the manual is not that clear so here is some example code:

class ForecastData(db.Model):
    values = db.BlobProperty()

class Forecast(db.Model):
  firstGridPoint = db.GeoPtProperty()
  lastGridPoint = db.GeoPtProperty()
  increment = db.FloatProperty()
  parameter = db.StringProperty()
  forecast_data = db.ReferenceProperty(ForecastData)
  reference_time = db.DateTimeProperty()
  forecast_time = db.DateTimeProperty()
  insert_time = db.DateTimeProperty(auto_now_add=True)

I put my blob in a separate model and referenced it with a db.ReferenceProperty(ModelName). Below is an example for putting data in the Data Store.

    # Create the data object
    forecast_data = ForecastData()
    forecast_data.values = values

    # Put in in the database
    forecast_data = forecast_data.put()

    # Create the forecast object
    forecast = Forecast()
    # Reference the data (forecast_data is a key)
    forecast.forecast_data = forecast_data

And getting the data is done like this:

query = db.GqlQuery("SELECT * from Forecast where forecast_time=:1", forecast_time)
forecast = query.fetch(1)
if forecast:
    forecast_data = Forecast.forecast_data.get_value_for_datastore(forecast[0])
    forecast = ForecastData.get(forecast_data).values

I get the forecast object from the database with a GQL query. The referenced property can be fetched with the get_value_for_datastore method.

After this the application is much faster.

To minimize the data transfered I’m using HTML5 local storage (a very good guide to html5 can be found here).

To put something in the local storage:

window.localStorage.setItem('key', value);

and to get it back (even if the browser have been closed):

window.localStorage.getItem('key');

This is a very simple key/value store. Other useful commands are clear() which clears all saved values.

I’m hoping to launch the site for others to try out very soon but I want to get some more features in place.

Until then here is an up-to-date screenshot: