Best Practices for Rendering Structured Content into PDFs with @datocms/structured-text-to-html-string

maj.koren · July 31, 2025, 3:46pm

We’re using DatoCMS to manage rich course content (including StructuredText, images, and custom blocks), and we’re working on generating printable PDFs from that content in our frontend app.

Here’s what we’re doing:

We render StructuredText fields into static HTML using @datocms/structured-text-to-html-string.
That HTML is placed in a DOM node and passed to react-to-pdf, which uses html2canvas to snapshot the node and create a PDF.

What’s working:

Text and standard block-level content are rendering correctly.
We’re successfully handling block rendering via the renderBlock() function.

What’s not:

Images (ImageBlockRecord) are displayed in the DOM using responsiveImage.src, but do not appear in the PDF.

Additional context:

There are other custom blocks (e.g. AlertRecord, QuoteRecord) that we plan to support, but haven’t implemented yet — we’re currently focused on solving the image rendering first.

Framework and Version:

Framework: Next.js 15 (App Router)
PDF Tool: react-to-pdf (based on html2canvas)
DatoCMS Rendering: @datocms/structured-text-to-html-string with renderBlock

What we’d like to know:

Are there any alternative approaches you recommend for generating PDFs from structured DatoCMS content — particularly when rendering inline images?

We’re open to:

SSR or build-time tools,
PDF generators you know work well with DatoCMS,
Client-side rendering alternatives to react-to-pdf,
Or any general advice for reliably exporting structured content and media as PDFs.

Thanks so much for your time and support!

Best regards,
Maj

roger · July 31, 2025, 5:36pm

Hi @maj.koren,

Welcome to the DatoCMS forum!

Hmm… This is a very interesting question, but I don’t think it’s really about DatoCMS at that point. I’m not an expert in this area, but I’ll share what limited knowledge I have…

First, just to directly answer your question:

Your link to the PDF lib is broken, and I unfortunately don’t know why it’s not rendering the images That might be an issue to raise with the lib maintainers, or you can try using CSS and media-queries to hide/show a certain kind of image. It’s possible that lib is having issues with srcSets or tags or such, hard to say without knowing specifically how they are parsing the DOM.

We also have libs to help with a more direct conversion from Structured Text to DOM nodes: https://github.com/datocms/structured-text/tree/main/packages/to-dom-nodes

If that helps.

Thoughts on the overall approach…

But overall, my understanding is that by the time you have ready DOM, the page no longer has much to do with our servers or even our frontend components. It’s all just HTML/CSS/images at that point.

Thus, regardless of your CMS, backend, or frontend, the rendered output is just a HTML page, and it’s up to your PDF output engine to handle conversion from that to a PDF.

Unfortunately, both HTML and PDF are extremely complex formats, and there will inevitably be a loss in fidelity going between them, especially since they have completely different page size/layout and pagination models — an infinite canvas vs defined page sizes.

Your link to the PDF lib is broken and I’m not sure why it’s not rendering images, but I think going from React to PDF is always going to be a losing game, as any lib that has to understand and parse out individual components would have to keep up with the frequent pace of React updates and the huge ecosystem of third-party libs and approaches on top of it…

Using a headless browser instead

What if you bypassed that and used a headless browser (via Playwright or similar) to automate the PDF output that way… basically it’s a script that would spin up a browser, navigate to the page, and then “print to PDF” for you, using the browser or operating system’s own PDF output system to generate the file. See https://playwright.dev/docs/api/class-page#page-pdf and https://www.checklyhq.com/learn/playwright/generating-pdfs/#generating-a-pdf-file

But even then, you would have to deal with the different responsiveness and pagination models. You’d have to make sure the viewport width is set to a size that looks good on whatever sized paper you want in the PDF. Some text and images will inevitably get chopped off at inconvenient places, between pages.

As far as I know, there is no simple, foolproof way to do this… you’d have the same issues going from any sort of pageless format to PDF (like a spreadsheet or ePub).

But I think a Playwright+browser-based approach would be easier to use (and far less likely to break) than anything that actually tries to parse the DOM. Rendering a DOM is hard enough on its own (which is why we only have the three browser renderers in the world), and having to then convert that to PDF is very much not trivial. Even when you manually “print to PDF” from a browser, variances and mistakes will occur depending on your particular PDF engine and the specifics of your HTML…

There is also Gotenberg, if you want something more containerized.