DEV Community

Cover image for Parse pdf in nextjs
neo
neo

Posted on

Parse pdf in nextjs

Recently developed a Chat PDF AI website,I encountered some troubles in PDF parsing. I will share the following code:
Key Code:

import * as pdfjsLib from 'pdfjs-dist'
pdfjsLib.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.min.mjs`
/*Note that ".mjs" is required here because ".js" cannot be found*/

  const fetchPdfContent = async (pdfSrc:string) => {
    try {
      // Use the proxy API to fetch the PDF
      const proxyUrl = `/api/xxx?url=${encodeURIComponent(pdfSrc)}`
      const pdf = await pdfjsLib.getDocument(proxyUrl).promise
      let fullText = ''
      for (let i = 1; i <= pdf.numPages; i++) {
        const page = await pdf.getPage(i)
        const textContent = await page.getTextContent()
        const pageText = textContent.items.map((item: any) => item.str).join(' ')
        fullText += pageText + '\n'
      }
      console.log(fullText)
      setPdfContent(fullText)
}
Enter fullscreen mode Exit fullscreen mode

Top comments (0)