I released five new sample lessons from my Test With Spring course: Introduction to Spock Framework

Spring From the Trenches: Creating PDF Documents With Wkhtmltopdf

When we are writing a web application, we often face a requirement which states that our application must provide reporting to its users.

Typically the users of our application want to see these reports on the user interface and have the possibility to export them as Excel and/or PDF documents.

The problem is that creating PDF documents is not exactly a walk in the park. There are libraries that can create PDF documents from HTML markup, but I have never been completely happy with them. However, I had to tolerate their shortcomings because I didn’t have a choice. Then I heard about a command line tool called wkhtmltopdf and never looked back.

My "Test With Spring" course helps you to write unit, integration, and end-to-end tests for Spring and Spring Boot Web Apps:

CHECK IT OUT >>

This blog post describes how we can create a microservice that transforms HTML documents into PDF Documents by using Java 8, Spring Boot, and Wkhtmltopdf.

Before we will implement our microservice, we will take a quick look at the PDF creation process. It has three steps:

  1. A client sends an HTTP request to our microservice, and specifies the url of the HTML document and the file name of the created PDF file.
  2. Our microservice invokes the wkhtmltopdf command line tool which reads the HTML document and transforms it into a PDF document.
  3. Our microservice reads the created PDF document and writes it to the body of the HTTP response.

Let’s start by installing wkhtmltopdf.

This blog post is based on the ideas of Kyösti Herrala. He described this technique to me, and I have been using it ever since.

Installing Wkhtmltopdf

The first thing that we have to do is to install the wkhtmltopdf command line tool. We can simply download the installation packages from its website and install them.

If you are using OS X, I recommend that you install wkhtmltopdf by using Homebrew. You can do this by running the following command at the command prompt:

brew install Caskroom/cask/wkhtmltopdf

After we have installed the wkhtmltopdf command line tool, we can implement our microservice. Let’s start by implementing the component that transforms HTML documents into PDF documents.

Creating PDF Documents From HTML Documents

Before we can implement the component that transforms HTML documents into PDF documents and writes the created PDF documents to the body of the HTTP response, we have to create a class that is used to pass the required configuration parameters to that component.

We can do that by creating a PdfFileRequest class that has two fields:

  • The fileName field contains the file name of the created PDF document.
  • The sourceHtmlUrl field contains the URL address of the converted HTML document.

The source code of the PdfFileRequest class looks as follows:

public class PdfFileRequest {

    private String fileName;
    private String sourceHtmlUrl;

    PdfFileRequest() {}

    public String getFileName() {
        return fileName;
    }

    public String getSourceHtmlUrl() {
        return sourceHtmlUrl;
    }

    public void setFileName(String fileName) {
        this.fileName = fileName;
    }

    public void setSourceHtmlUrl(String sourceHtmlUrl) {
        this.sourceHtmlUrl = sourceHtmlUrl;
    }
}

We can now create the component that creates PDF documents by following these steps:

  1. Create a PdfFileCreator class and annotate the created class with the @Service annotation.
  2. Add a static final Logger field to the created class. We will use this logger to write an error message to the log if the PDF document cannot be created.
  3. Add a writePdfToResponse() method to the created class. This method takes two method parameters:
    • The PdfFileRequest object that contains the configuration of the PDF creation process.
    • The HttpServletResponse object in which the created PDF document is written.
  4. Implement the writePdfToResponse() method by following these steps:
    1. Ensure that the file name of created PDF document and the url of the HTML document are valid.
    2. Create the command that is used to invoke the wkhtmltopdf command line tool. This command has three parts:
      1. The name of the invoked program (wkhtmltopdf)
      2. The url of the HTML document.
      3. The output file. The string ‘-‘ tells to wkhtmltopdf that it must write the created PDF file to STDOUT.
    3. Start the wkhtmltopdf process.
    4. Read the created PDF document from STDOUT and write it to the body of the HTTP response.
    5. Wait that the wkhtmltopdf process exits before continuing the current thread.
    6. Ensure that the PDF file was created successfully.
    7. Add the required metadata (content type and the file name of the created PDF file) to the HTTP response.
    8. If the PDF document could not be created, read the error message from STDERR and write it to the log.
    9. Destroy the wkhtmltopdf process.

The source code of the PdfFileCreator class looks as follows:

import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;

import javax.servlet.http.HttpServletResponse;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.StringWriter;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.TimeUnit;

@Service
class PdfFileCreator {

    private static final Logger LOGGER = LoggerFactory.getLogger(PdfFileCreator.class);

    void writePdfToResponse(PdfFileRequest fileRequest, HttpServletResponse response) {
        String pdfFileName = fileRequest.getFileName();
        requireNotNull(pdfFileName, "The file name of the created PDF must be set");
        requireNotEmpty(pdfFileName, "File name of the created PDF cannot be empty");

        String sourceHtmlUrl = fileRequest.getSourceHtmlUrl();
        requireNotNull(sourceHtmlUrl, "Source HTML url must be set");
        requireNotEmpty(sourceHtmlUrl, "Source HTML url cannot be empty");

        List<String> pdfCommand = Arrays.asList(
                "wkhtmltopdf",
                sourceHtmlUrl,
                "-"
        );

        ProcessBuilder pb = new ProcessBuilder(pdfCommand);
        Process pdfProcess;

        try {
            pdfProcess = pb.start();

            try(InputStream in = pdfProcess.getInputStream()) {
                writeCreatedPdfFileToResponse(in, response);
                waitForProcessBeforeContinueCurrentThread(pdfProcess);
                requireSuccessfulExitStatus(pdfProcess);
                setResponseHeaders(response, fileRequest);
            }
            catch (Exception ex) {
                writeErrorMessageToLog(ex, pdfProcess);
                throw new RuntimeException("PDF generation failed");
            }
            finally {
                pdfProcess.destroy();
            }
        }
        catch (IOException ex) {
            throw new RuntimeException("PDF generation failed");
        }
    }

    private void requireNotNull(String value, String message) {
        if (value == null) {
            throw new IllegalArgumentException(message);
        }
    }

    private void requireNotEmpty(String value, String message) {
        if (value.isEmpty()) {
            throw new IllegalArgumentException(message);
        }
    }

    private void writeCreatedPdfFileToResponse(InputStream in, HttpServletResponse response) throws IOException {
        OutputStream out = response.getOutputStream();
        IOUtils.copy(in, out);
        out.flush();
    }

    private void waitForProcessBeforeContinueCurrentThread(Process process) {
        try {
            process.waitFor(5, TimeUnit.SECONDS);
        }
        catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
        }
    }

    private void requireSuccessfulExitStatus(Process process) {
        if (process.exitValue() != 0) {
            throw new RuntimeException("PDF generation failed");
        }
    }

    private void setResponseHeaders(HttpServletResponse response, PdfFileRequest fileRequest) {
        response.setContentType("application/pdf");
        response.setHeader("Content-Disposition", "attachment; filename=\"" + fileRequest.getFileName() + "\"");
    }

    private void writeErrorMessageToLog(Exception ex, Process pdfProcess) throws IOException {
        LOGGER.error("Could not create PDF because an exception was thrown: ", ex);
        LOGGER.error("The exit value of PDF process is: {}", pdfProcess.exitValue());

        String errorMessage = getErrorMessageFromProcess(pdfProcess);
        LOGGER.error("PDF process ended with error message: {}", errorMessage);
    }

    private String getErrorMessageFromProcess(Process pdfProcess) {
        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(pdfProcess.getErrorStream()));
            StringWriter writer = new StringWriter();

            String line;
            while ((line = reader.readLine()) != null) {
                writer.append(line);
            }

            return writer.toString();
        }
        catch (IOException ex) {
            LOGGER.error("Could not extract error message from process because an exception was thrown", ex);
            return "";
        }
    }
}
If you are writing a real web application, you must not allow anonymous users to access to the HTML reports. Instead you should configure the user that is used by wkhtmltopdf when it creates the PDF document. You can do this by passing one of the following options to the wkhtmltopdf process: cookie, custom-header, and custom-header-propagation.

Our next step is to create the controller that provides the public REST API of our microservice.

Implementing the REST API

We can create the REST API of our microservice by following these steps:

  1. Create a PdfController class and annotate the created class with the @RestController
  2. Add a private PdfFileCreator field to created class and inject its value by using constructor injection.
  3. Add a createPdf() method to the controller class. This method has two method parameters:
    1. The PdfFileRequest object is read from the request body and it configures the PDF creation process.
    2. The HttpServletRequest object is the HTTP response in which the created PDF document is written.
  4. Configure the createPdf() method to handle POST requests which are send to the url: ‘/api/pdf’.
  5. Implement the createPdf() method by invoking the writePdfToResponse() method of the PdfFileCreator class.

The source code of the PdfController class looks as follows:

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RestController;

import javax.servlet.http.HttpServletResponse;

@RestController
class PdfController {

    private final PdfFileCreator pdfFileCreator;

    @Autowired
    PdfController(PdfFileCreator pdfFileCreator) {
        this.pdfFileCreator = pdfFileCreator;
    }

    @RequestMapping(value = "/api/pdf", method = RequestMethod.POST)
    void createPdf(@RequestBody PdfFileRequest fileRequest, HttpServletResponse response) {
        pdfFileCreator.writePdfToResponse(fileRequest, response);
    }
}

We have now implemented our microservice that transforms HTML documents into PDF documents by using wkhtmltopdf command line tool. Let’s find out how we can use our new microservice.

Using Our Microservice

We can use our microservice by following these steps:

  1. Send a POST request to the url: ‘/api/pdf’.
  2. Configure the PDF creation process by using JSON that is send in the body of the request.

For example, if we want to transform the front page of google.com into a PDF document, we have to send a POST request to the url: ‘/api/pdf’ and write the following JSON document to the request body:

{
	"fileName": "google.pdf",
	"sourceHtmlUrl": "http://www.google.com"
}

Let’s implement a simple Spring MVC controller that transforms the front page of google.com into a PDF document by using our microservice. We can do this by following these steps:

  1. Create a GooglePdfController class and annotate it with the @Controller annotation.
  2. Add a final RestTemplate field to the created class and inject its value by using constructor injection.
  3. Add a createPdfFromGoogle() method to the created class and configure it to handle GET requests send to the url: ‘/pdf/google’. This method takes an HttpServletResponse object as a method parameter.
  4. Implement the createPdfFromGoogle() method by following these steps:
    1. Create a new PdfFileRequest object, and set the name of the created PDF file (google.pdf) and the url of the HTML document (http://www.google.com).
    2. Send a POST request to the url: ‘http://localhost:8080/api/pdf’ by invoking the postForObject() method of the RestTemplate class. Pass the following method parameters to this method:
      1. The URL (http://localhost:8080/api/pdf).
      2. The object that is written to the request body (The created PdfFileRequest object).
      3. The type of the return value (byte[].class).
    3. Write the received byte array, which contains the created PDF document, to the body of the HTTP response.
    4. Set the content type of the response to: ‘application/json’.
    5. Set the file name of the created PDF document to the HTTP response by using the Content-Disposition header.

The source code of the GooglePdfController class looks as follows:

import org.apache.tomcat.util.http.fileupload.IOUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.client.RestTemplate;

import javax.servlet.http.HttpServletResponse;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

@Controller
class GooglePdfController {

    private final RestTemplate restTemplate;

    @Autowired
    GooglePdfController(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }

    @RequestMapping(value = "/pdf/google", method = RequestMethod.GET)
    void createPdfFromGoogle(HttpServletResponse response) {
        PdfFileRequest fileRequest = new PdfFileRequest();
        fileRequest.setFileName("google.pdf");
        fileRequest.setSourceHtmlUrl("http://www.google.com");

        byte[] pdfFile = restTemplate.postForObject("http://localhost:8080/api/pdf", 
				fileRequest, 
				byte[].class
		);
        writePdfFileToResponse(pdfFile, "google.pdf", response);
    }

    private void writePdfFileToResponse(byte[] pdfFile, 
										String fileName, 
										HttpServletResponse response) {
        try (InputStream in = new ByteArrayInputStream(pdfFile)) {
            OutputStream out = response.getOutputStream();
            IOUtils.copy(in, out);
            out.flush();

            response.setContentType("application/pdf");
            response.setHeader("Content-Disposition", "attachment; filename=\"" + fileName + "\"");
        }
        catch (IOException ex) {
            throw new RuntimeException("Error occurred when creating PDF file", ex);
        }
    }
}

We can now send a GET request to the url: ‘/pdf/google’ and we will receive the front page of google.com as a PDF document.

It looks pretty good, but if we use this technique in a real web application, we have to take a few things into account. These things are:

  • Wkhtmltopdf is not very fault tolerant. For example, if it cannot find an image (or other resource such as .js or .css file), it doesn’t create the PDF file. It simply fails and writes the error message to STDERR.
  • The error messages of Wkhtmltopdf can be quite long and a bit messy. In other words, it is not always “easy” to figure out what is wrong.
  • Even though Wkhtmltopdf is very good at transforming HTML documents into PDF documents, you might have to create separate reporting views that are used only for this purpose. Also, sometimes you have to render these reporting views on the server.
  • The performance of this solution depends from the wkhtmltopdf. We can make it faster by following these rules:

Some of these drawbacks are quite irritating, but I still think that using Wkhtmltopdf is a good idea. Why? Well, it is the least bad option, and it has a lot of configuration parameters that the other options don’t have.

Let’s summarize what we learned from this blog post.

My "Test With Spring" course helps you to write unit, integration, and end-to-end tests for Spring and Spring Boot Web Apps:

CHECK IT OUT >>

Summary

This blog post has taught us four things:

  • We can invoke the wkhtmltopdf command line tool by using Java 8 and configure it to write the created PDF document to STDOUT.
  • We learned how we can read the created PDF document from STDOUT and write it to HTTP response.
  • We learned how we can create a microservice that allows us to customize the PDF creation process.
  • Wkhtmltopdf is not a perfect tool. It has a few drawbacks, but it is still the least bad option.

P.S. You can get the example application of this blog post from Github.

About the Author

Petri Kainulainen is passionate about software development and continuous improvement. He is specialized in software development with the Spring Framework and is the author of Spring Data book.

About Petri Kainulainen →

14 comments… add one
  • It’s there any better solution than sleeping for 5 seconds?

    Reply
    • The code in question doesn’t necessarily sleep for five seconds until it continues the current thread. The Javadoc of the waitFor() method describes that method as follows:

      Causes the current thread to wait, if necessary, until the subprocess represented by this Process object has terminated, or the specified waiting time elapses.

      If the subprocess has already terminated then this method returns immediately with the value true. If the process has not terminated and the timeout value is less than, or equal to, zero, then this method returns immediately with the value false.

      I used this method in one software project that has been in production for about six months now, and there hasn’t been a single complaint about the performance of the PDF export function (we use 30 second timeout in that project).

      That being said, it would be interesting to hear if there is a better way to start external processes with Java 8.

      Reply
  • Can you elaborate on performance of this solution?

    Reply
    • Sure. I profiled the example application before I published this blog post, and I noticed that most of the time is spend writing the created PDF document to the body of the HTTP response. In other words, the speed of this solution depends from the wkhtmltopdf.

      There are three things that you can do to make it faster:

      You might want to also check this StackOverflow question.

      As you can see, this solution will cause performance problems if you use the things I mentioned above. That is why I suggested that you might want to create separate reporting views for this purpose.

      Reply
  • Hello,
    I was thinking about trying your solution. For the past 2 days i`ve been trying to implement Flying Saucer+thymeleaf as template for PDF`s with very little success.

    Massive problems emerge when you have images and external libraries. Basically i need to create invoices using content from db and integrating it with thymeleaf template. I would have to add images to the invoice. Is it possible to do something like that with Wkhtmltopdf? It is a spring-boot application.

    Reply
    • Hi Branko,

      Wkhtmltopdf has a relatively good support for images, and I have used it to create PDF documents from HTML documents which had 0-20 images per document. I think that it is possible to fulfill your requirements by using the approach described in this blog post.

      That being said, you could create a fast prototype by cloning the example application of this blog post and creating a “dummy” invoice which has images in it. This way you could test Wkhtmltopdf without investing a lot of your time (in case you find out that it isn’t the right tool for the job).

      Reply
      • Hey Petri,

        Thanks alot for you answer. I am gonna give it a try and let you know if i manage to make it work.

        Reply
        • Hi Branko,

          You are welcome. If you run into any problems, let me know and I will try to help you.

          Reply
  • Hi Petri,
    Awesome m8! works like a charm out of the box. Xvfb is a bit difficult to manage properly but once it’s setup properly life is sweet ;)

    Reply
    • That you for your kind words. I really appreciate them. By the way, I agree that the initial configuration is a bit tricky, but the reward is worth it :)

      Reply
  • Man, i was tried using this tutorial. but i just got ton of kind of messages like this
    “<Žûݾ÷ím×N‡nS•æ8jîx¼S«4—*¾wog­úšôS"fž0éäȍŒ2ÊlŒÌVfµã]?÷«m{$LÊ
    „Œq^Qôfµ[KÕíµ«o:ÖUš<ã#±ô«4�QE�QE�SZs’ªO¸§Q@ò°+þé#ùS”2}Ùd_£gùÒÑ@
    ':KŸ÷—4á}2ö‰þ¹Ze(ԝ~ô9ÿ�u¿Æœ54y%_ Ýüª
    (ÈÔàÏ.SýáŠ|wqK÷dCøÕ:G_ï(?Qš�Ñ7BÐÑŠÌò°ÛþïÊœ“îÉ"ý?΀4hªy“¤¤ÿ�¼§Ù×þy¿Ôm ´UEÔ\}èÿ�u¿Æœ55z9�öçùPšðø)¯ía{û~Çž'ñÆ—\k¢+[‘w"Ï4‹³î©lŸ¥{¸Ô¢îYÞ\W
    ûDüð·íAðÎ"

    whats happen with this?, it doest work anymore or?

    Thanks man, you are awesome

    Reply
    • Hi,

      As far as I know, this approach should still be working. Could you provide a bit more details about your problem? For example:

      • Where do you see these messages?
      • Do you see any error messages in the log file?
      • Does this happen on every web page (like google.com) or can you identify the web page that causes this? If you can identify the web page, can you share its url?

      I remember one situation where the generated PDF file contained similar garbage. The reason for this was that an error occurred when Wkhtmltopdf was transforming the HTML document into a PDF file. If I remember correctly, the problem was that one CSS file could not be loaded.

      Reply

Leave a Comment