Hi @maj.koren,
Welcome to the DatoCMS forum!
Hmmā¦
This is a very interesting question, but I donāt think itās really about DatoCMS at that point. Iām not an expert in this area, but Iāll share what limited knowledge I haveā¦
First, just to directly answer your question:
Your link to the PDF lib is broken, and I unfortunately donāt know why itās not rendering the images
That might be an issue to raise with the lib maintainers, or you can try using CSS and media-queries to hide/show a certain kind of image. Itās possible that lib is having issues with srcSets or tags or such, hard to say without knowing specifically how they are parsing the DOM.
We also have libs to help with a more direct conversion from Structured Text to DOM nodes: https://github.com/datocms/structured-text/tree/main/packages/to-dom-nodes
If that helps.
Thoughts on the overall approachā¦
But overall, my understanding is that by the time you have ready DOM, the page no longer has much to do with our servers or even our frontend components. Itās all just HTML/CSS/images at that point.
Thus, regardless of your CMS, backend, or frontend, the rendered output is just a HTML page, and itās up to your PDF output engine to handle conversion from that to a PDF.
Unfortunately, both HTML and PDF are extremely complex formats, and there will inevitably be a loss in fidelity going between them, especially since they have completely different page size/layout and pagination models ā an infinite canvas vs defined page sizes.
Your link to the PDF lib is broken and Iām not sure why itās not rendering images, but I think going from React to PDF is always going to be a losing game, as any lib that has to understand and parse out individual components would have to keep up with the frequent pace of React updates and the huge ecosystem of third-party libs and approaches on top of itā¦
Using a headless browser instead
What if you bypassed that and used a headless browser (via Playwright or similar) to automate the PDF output that way⦠basically itās a script that would spin up a browser, navigate to the page, and then āprint to PDFā for you, using the browser or operating systemās own PDF output system to generate the file. See https://playwright.dev/docs/api/class-page#page-pdf and https://www.checklyhq.com/learn/playwright/generating-pdfs/#generating-a-pdf-file
But even then, you would have to deal with the different responsiveness and pagination models. Youād have to make sure the viewport width is set to a size that looks good on whatever sized paper you want in the PDF. Some text and images will inevitably get chopped off at inconvenient places, between pages.
As far as I know, there is no simple, foolproof way to do this⦠youād have the same issues going from any sort of pageless format to PDF (like a spreadsheet or ePub).
But I think a Playwright+browser-based approach would be easier to use (and far less likely to break) than anything that actually tries to parse the DOM. Rendering a DOM is hard enough on its own (which is why we only have the three browser renderers in the world), and having to then convert that to PDF is very much not trivial. Even when you manually āprint to PDFā from a browser, variances and mistakes will occur depending on your particular PDF engine and the specifics of your HTMLā¦
There is also Gotenberg, if you want something more containerized.