This article discusses how to download a PDF using Python's requests
library.
Approach
- Import
requests
library - Request the URL and get the
response
object. - Get the PDF file using the
response
object, and returnTrue
. - If the PDF cannot be downloaded, return
False
Implementation
The following program downloads a PDF files from the provided URL.
#!/usr/bin/env python3
import os
import requests
def download_pdf_file(url: str) -> bool:
"""Download PDF from given URL to local directory.
:param url: The url of the PDF file to be downloaded
:return: True if PDF file was successfully downloaded, otherwise False.
"""
# Request URL and get response object
response = requests.get(url, stream=True)
# isolate PDF filename from URL
pdf_file_name = os.path.basename(url)
if response.status_code == 200:
# Save in current working directory
filepath = os.path.join(os.getcwd(), pdf_file_name)
with open(filepath, 'wb') as pdf_object:
pdf_object.write(response.content)
print(f'{pdf_file_name} was successfully saved!')
return True
else:
print(f'Uh oh! Could not download {pdf_file_name},')
print(f'HTTP response status code: {response.status_code}')
return False
if __name__ == '__main__':
# URL from which pdfs to be downloaded
URL = 'https://raw.githubusercontent.com/seraph776/DevCommunity/main/PDFDownloader/assests/the_raven.pdf'
download_pdf_file(URL)
Output
the_raven.pdf was successfully saved!
Conclusion
After reading this article you should now be able to download a PDF using Python's requests
library. Remember that some website might more difficult than others to get data from. If you are unable to download the PDF file, analyze the HTTP response status codes to help determine what wrong. Please leave a comment if you found this article helpful.
Code available at GitHub
Top comments (1)
njconsumeraffairs.gov/Actions/2024... this link is getting downloaded but the file size is 1kb and not able to open it . Please check it