PDFToolkit API (DB Connector)

Requirement

As part of the PDFManager Module creation, a method was required that would allow the manipultaion of fillable (form) PDF's.

The requirement was to:

  1. Complete a form PDF inserting data into the fields (rather than just stamping un-editable text on the form), and

  2. Add a barcode and text to an existing PDF form, and output the resultant PDF as a form (the v1 PDFManager using a PHP solution always output a flat non-form PDF, even if a form PDF is used as the input).

This could not be achieved (in 2022) using an opensource PHP module, but there is a well established and proven Linux CLI app which can be utilized, and provided a couple of additional features to the requirement.

The main Drupal site (served by an Acquia webserver), while running on Linux is not managed by City of Boston and the pdftk libraries are not loaded on that server. Given the short time constraints, the pdftk was deployed within the same container as the DBConnector, leveraging the existing endpoint services (node/javascript/express) and some shellscripting.

AWS Microservice

The dbconnector service was extended to provide the following endpoints:

Administration Functions

Ping to test service is available.

GET /v1/pdf/heartbeat

{
    "health": "ok"
}

Runs an internal test to verify that pdftk is installed properly.

GET /v1/pdf/test

Internally calls the pdftk and captures the version of the cli.

{
    "test": "success"
}

PDF File Operations

Adds data to fields of a PDF form, and outputs a reference to the completed PDF form.

POST /v1/pdf/fill

A PDF and data file must be provided. The PDF must be a fillable form PDF and the data file must be a file in an FDF format.

The /v1/pdf/generate_fdf endpoint can be used to generate a blank FDF data file.

Request Body

NameTypeDescription

formfile*

String

Url to a form PDF

datafile*

String

Url to a form data file in FDF format

{
    "output": "<file-reference>"
}

Stamps a PDF on to another PDF, and outputs a reference to the merged PDF.

POST /v1/pdf/overlay

Request Body

NameTypeDescription

basefile*

String

A PDF document - can be a URL or a file-reference returned from another endpoint.

overlayfile*

String

URL to a PDF document

overwrite

String

Defaults to "true"

{
    "output": "<file-reference>"
}

Updates the PDF document properties and outputs a reference to the updated PDF.

POST /v1/pdf/metadata

Request Body

NameTypeDescription

pdf_file*

String

A PDF document - can be a URL or a file-reference returned from another endpoint.

meta_data*

String

A file in a the following format:

InfoBegin

InfoKey: <one of title, author, subject, creator, producer>

InfoValue: <the value to set>

InfoKey: ..

InfoValue: ...

...

{
    "output": "<file-reference>"
}

Removes compression on a PDF, and returns the decompressed file as an attachment.

GET /v1/pdf/decompress

This is a useful utility to use the PDFManager cannot manipulate a PDF because its compression is later than PDF1.5.

The endpoint first checks to see if it already has a file with the filename specified in the pdf_file query parameter. If it does, then it just returns that file. NOTE: restarting the dbconnector task(s) on AWS will empty this cache.

If the del parameter is "true" then the file is deleted after decompression and downloading. To reduce load on the endpoint, set to "false" if the pdf_file does not change often and if you expect to call the function frequently.

Query Parameters

NameTypeDescription

pdf_file*

String

Url to a PDF document

del

String

Should the file be deleted after it is downloaded. Defaults to "true".

Returns the decompressed document as an attachment.

The expected headers are:

Content-Disposition:attachment;filename=<pdf_file>
Content-Type:application/pdf

PDF Retrieval

Returns the requested PDF document from its reference.

GET /v1/pdf/fetch

Query Parameters

NameTypeDescription

file*

String

A file-reference from one of the endpoints

del

String

Delete the file after downloading. defaults to false

show

String

Download method: D (default) downloads attachment, I download and display in browser (if supported)

Returns the document as an attachment.

When show=D, expected headers are:

Content-Disposition:attachment;filename=<pdf_file>
Content-Type:application/pdf

The when show=I, expected headers are:

Content-Disposition:filename=<pdf_file>
Content-Type:application/pdf

Last updated