Indeed, the way includes
- Don't just simply convert a HTML file to PDF, one-to-one. Otherwise, you can never control page breaks.
- Nonetheless, HTML rendering will be web-browser dependent. (Therefore, not sure about Pandoc.)
- CSS is powerful, but are there exceptions?
Therefore, I suggest a way of using a web driver + a PDF library, that can READ and MODIFY pdf.
The web driver is currently best either Puppeteer, or Chrome DevTools Protocol.
Additionally, it might be possible to distribute PDF generator via Electron + Puppeteer-in-Electron.
I got this code from another Stackoverflow Question:
import electron from "electron";
import puppeteer from "puppeteer-core";
const delay = (ms: number) =>
new Promise(resolve => {
setTimeout(() => {
resolve();
}, ms);
});
(async () => {
try {
const app = await puppeteer.launch({
executablePath: electron,
args: ["."],
headless: false,
…
The PDF manager, that can read-and-merge PDF, is traditionally either PDFtk (binary) or pdfbox (Java), I think; but I have just recently found,
About CSS, yes CSS can also detect page margins.
body {
position: fixed;
width: 100vw;
height: 100vh;
display: flex;
align-items: center;
justify-content: center;
}
This is my attempt so far.
patarapolw / make-pdf
Beautifully make a pdf from couples of image files
So, the answer to the question is, no, do not convert a single HTML or Markdown file, to one PDF file; but do combine within a folder. Also,
- Running a web server might be better than using
file://
protocol and relative paths - Choosing a web browser might affect result.
Also, consider alternatives to PDF, that easily allow editing. Might be odt or docx?
Top comments (2)
Our team has been working on a document generation project and we convert HTML to PDF using wkhtmltopdf. For example we generate documents like this using only HTML and CSS. wkhtmltopdf has a great CSS support. Regarding page breaks we can control them using page-break-before and page-break-after properties.
As for alternatives, recently we started to use docx templates and process them with docxtemplater and convert to PDF with libreoffice headless.
Apparently, I find that
pandoc
alone can be powerful enough.New page is as easy as
\newpage
. (I know, LaTeX syntax in Markdown.)Also,
geometry: margin=1cm
in YAML frontmatter.Also, LaTeX can be used to host and join PDF.
But, is there a best tool that can easily do all these?