ryanrosello-og

Posted on Jul 10, 2022

Validate complicated graphics rich pdf documents using Playwright

#playwright #testautomation #typescript #automation

This article will take your pdf verification skills to the next level using Playwright.

The pdf document we will use for this example will be the "Tesla Powerwall 2 Datasheet". This file is hosted at following location https://oedtrngbj.wpengine.com/wp-content/uploads/Powerwall_2_AC_Datasheet_EN_NA.pdf

Observe that this pdf document contains multiple pages, some illustrations and a table.

Caveats

Let me be upfront with the following issues I have encountered with this approach:

This solution only seems to work when the test is run using chromium in headed mode
The elements contained within pdf viewer component are not accessible to Playwright, this means you will not be able to mask/hide dynamic elements in the pdf. For example, customer id or dynamic date/time stamps
We are using Playwrights' built in visual comparison library. It is advisable that you get familiar with the maintenance required to keep the baseline images up to date. See the Visual comparisons page on the Playwright documentation

If you are happy with these compromises, read on!

page.setContent() + toMatchSnapshot() = 🤩

Using the setContent(), load the pdf into an iframe like so:

  const pdfResource =
    'https://oedtrngbj.wpengine.com/wp-content/uploads/Powerwall_2_AC_Datasheet_EN_NA.pdf';
  let iframe = `<iframe src="${pdfResource}#zoom=60" style="width: 100%;height:100%;border: none;"></iframe>`;
  await page.setContent(iframe);
  await page.waitForTimeout(5000);

NOTE: You may need to experiment with the zoom level, width, height attributes to suit your needs

ANOTHER NOTE: The waitForTimeout function is used here to wait for the pdf contents to be loaded into the iframe

We will make use of Playwrights' assertion expect(screenshot).toMatchSnapshot(name[, options]) => https://playwright.dev/docs/test-assertions#screenshot-assertions-to-match-snapshot-1, to capture a screenshot of a particular element matching a locator, in our case - we will need to take a screenshot of the iframe above with the PDF file fully loaded to particular page.

Our solution will make use of this function:

  expect(await page.locator('iframe').screenshot()).toMatchSnapshot();

The completed test will look like this ...

import { test, expect } from '@playwright/test';

test('validate a complex pdf', async ({ page }) => {
  const pdfResource =
    'https://oedtrngbj.wpengine.com/wp-content/uploads/Powerwall_2_AC_Datasheet_EN_NA.pdf';
  let iframe = `<iframe src="${pdfResource}#zoom=60" style="width: 100%;height:100%;border: none;"></iframe>`;
  await page.setContent(iframe);
  await page.waitForTimeout(5000);
  expect(await page.locator('iframe').screenshot()).toMatchSnapshot();
});

Run the test. It should fail with the following error.

Playwright has not found a golden snapshot of the element and hence on the very first test execution, it will automatically generate this file for you. You will need to commit these files into your repo.

Rerun the test again, this time it should pass since it will already have a baseline image to compare against.

Ok, that's nice. We managed to validate the first page of the pdf.

But most pdfs you will encounter out in the wild will contain multiple pages. Let us ammend our test to cater for multiple pages.

test('validate a complex pdf II, all pages', async ({page}) => {
  const numberOfPages = 2;
  for (let i = 1; i < numberOfPages + 1; i += 1) {
    let pdfResource =
      'https://oedtrngbj.wpengine.com/wp-content/uploads/Powerwall_2_AC_Datasheet_EN_NA.pdf';
    let iframe = `<iframe src="${pdfResource}#zoom=60&page=${i}" style="width: 100%;height:100%;border: none;"></iframe>`;
    await page.setContent(iframe);
    await page.waitForTimeout(5000);
    expect(await page.locator('iframe').screenshot()).toMatchSnapshot({
      name: `pdf_validation_page_${i}.png`,
    });
  }
});

Job done.

Other improvements for you to consider

(all totally optional and it entirely up to you to implement)

Dynamically determine the number of pages, the example above uses a predefined value for the expected number of pages.
Remove the hard coded waitForTimeout and implement a better way of waiting for the contents to be loaded.

DEV Community

Validate complicated graphics rich pdf documents using Playwright

Caveats

page.setContent() + toMatchSnapshot() = 🤩

Other improvements for you to consider

Final solution

Top comments (0)

Read next

Getting Started with a Node.js TypeScript Boilerplate

How to Create a Web App in 2024 Using a Modern Stack ( tRPC, React and Express)

Types vs Interfaces

Fetching Liked Posts Using the Bluesky API