How often do you work with the ZIP files in your day-to-day life?
If you ever worked with ZIP files, then you would know that a lot of files and directories are compressed together into one file that has a .zip file extension.
So, in order to read that files, we need to extract them from ZIP format.
In this tutorial, we will implement some Pythonic methods for performing various operations on ZIP files without even having to extract them.
For that purpose, we'll use Python's zipfile
module to handle the process for us nicely and easily.
What is a ZIP file?
As mentioned above, a ZIP file contains one or more files or directories that have been compressed.
ZIP is an archive file format that supports lossless data compression.
Lossless compression means that the original data will be perfectly reconstructed from the compressed data without even losing any information.
If you wonder what is an archive file, then it is nothing but computer files that are composed of one or more files along with their metadata.
This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP.Source
Illustration to show how the files are placed on the disk.Source
What is the need for a ZIP file?
ZIP files can be crucial for those who work with computers and deal with large digital information because it allows them to
Reduce the storage requirement by compressing the size of the files without the loss of any information.
Improve transfer speed over the network.
Accumulate all your related files into one archive for better management.
Provides security by encrypting the files.
How to manipulate ZIP files using Python?
Python provides multiple tools to manipulate ZIP files which include some low-level Python libraries such as lzma, bz2, zlib, tarfile, and many others that help in compressing and decompressing files using specific compression algorithms.
Apart from these Python has a high-level module called zipfile
that helps us to read, write, create, extract, and list the content of ZIP files.
Python's zipfile
zipfile
module does provide convenient classes and functions for reading, writing, and extracting the ZIP files.
But it does have limitations too like:
The data decryption process is slow because it runs on pure Python.
It can't handle the creation of encrypted ZIP files.
The use of multi-disk ZIP files isn't supported currently.
Opening ZIP files for Reading & Writing
zipfile
has a class ZipFile
that allows us to open ZIP files in different modes and works exactly as Python's open() function.
There are four types of modes available -
r
: Opens a file in reading mode. Defaultw
: Writing mode.a
: Append to an existing file.x
: Create and write a new file.
ZipFile is also a context manager and therefore supports the
with
statement.Source
import zipfile
with zipfile.ZipFile("sample.zip", mode="r") as arch:
arch.printdir()
.........
File Name Modified Size
document.txt 2022-07-04 18:13:36 52
data.txt 2022-07-04 18:17:30 37538
hello.md 2022-07-04 18:33:02 7064
Here, we can see that all the files present in the sample.zip
folder have been listed.
Inside ZipFile
, the first argument we provided is the path of the file which is a string.
Then the second argument we provided is the mode. Reading mode is default whether you pass it or not it doesn't matter.
Then we called .printdir()
on arch
which holds the instance of ZipFile
to print the table of contents in a user-friendly format
File Name
Modified
Size
Error Handling by using Try & Except
We are going to see how zipfile
handles the exceptions using the BadZipFile
class that provides an easily readable error.
# Provided valid zip file
try:
with zipfile.ZipFile("sample.zip") as arch:
arch.printdir()
except zipfile.BadZipFile as error:
print(error)
.........
File Name Modified Size
document.txt 2022-07-04 18:13:36 52
data.txt 2022-07-04 18:17:30 37538
hello.md 2022-07-04 18:33:02 7064
# Provided bad zip file
try:
with zipfile.ZipFile("not_valid_zip.zip") as arch:
arch.printdir()
except zipfile.BadZipFile as error:
print(error)
.........
File is not a zip file
The first code block ran successfully and printed the contents of the sample.zip
file because the ZIP file we provided was a valid ZIP file, whereas the error was thrown when we provided a bad ZIP file.
We can check if a zip file is valid or not by using is_zipfile
function.
# Example 1
valid = zipfile.is_zipfile("bad_sample.zip")
print(valid)
.........
False
# Example 2
valid = zipfile.is_zipfile("sample.zip")
print(valid)
........
True
Returns True
if a file is a valid ZIP file otherwise returns False
.
# Print content if a file is valid
if zipfile.is_zipfile("sample.zip"):
with zipfile.ZipFile("sample.zip") as arch:
arch.printdir()
else:
print("This is not a valid ZIP format.")
.........
File Name Modified Size
document.txt 2022-07-04 18:13:36 52
data.txt 2022-07-04 18:17:30 37538
hello.md 2022-07-04 18:33:02 7064
if zipfile.is_zipfile("bad_sample.zip"):
with zipfile.ZipFile("sample.zip") as arch:
arch.printdir()
else:
print("This is not a valid ZIP format file.")
.........
This is not a valid ZIP format file.
Writing the ZIP file
To open a ZIP file for writing, use write mode w
.
If the file you are trying to write exists, then w
will truncate the existing file and writes new content that you've passed in.
import zipfile
# Adding a file
with zipfile.ZipFile('geekpython.zip', 'w') as myzip:
myzip.write('geek.txt')
myzip.printdir()
........
File Name Modified Size
geek.txt 2022-07-05 14:52:01 85
geek.txt
will be added to the geekpython.zip
which is created just now after running the code.
Adding multiple files
import zipfile
# Adding multiple files
with zipfile.ZipFile('geekpython.zip', 'w') as myzip:
myzip.write('geek.txt')
myzip.write('program.py')
myzip.printdir()
........
File Name Modified Size
geek.txt 2022-07-05 14:52:01 85
program.py 2022-07-05 14:52:01 136
Note: The file you are giving as an argument to
.write
should exist.
If you try to create a directory or pass a file that does not exist, it will throw a FileNotFoundError
.
import zipfile
# Passing a non-existing directory
with zipfile.ZipFile('hello/geekpython.zip', 'w') as myzip:
myzip.write('geek.txt')
.........
FileNotFoundError: [Errno 2] No such file or directory: 'hello/geekpython.zip'
import zipfile
# Passing a non-existing file
with zipfile.ZipFile('geekpython.zip', 'w') as myzip:
myzip.write('hello.txt')
........
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'hello.txt'
Appending files to the existing ZIP archive
To append the files into an existing ZIP archive use append mode a
.
import zipfile
# Appending files to the existing zip file
with zipfile.ZipFile('geekpython.zip', 'a') as myzip:
myzip.write("index.html")
myzip.write("program.py")
myzip.printdir()
.........
File Name Modified Size
geek.txt 2022-07-05 14:52:00 85
index.html 2022-07-05 15:32:35 176
program.py 2022-07-05 14:52:01 136
Reading Metadata
There are some methods that help us to read the metadata of ZIP archives.
.getinfo(filename)
: It returns a ZipInfo object that holds information about member file provided byfilename
..infolist()
: Return a list containing a ZipInfo object for each member of the archive..namelist()
: Return a list of archive members by name.
There is another function which is .printdir()
that we already used.
import zipfile
with zipfile.ZipFile("geekpython.zip", mode="r") as arch:
myzip = arch.getinfo("geek.txt")
print(myzip.filename)
>>> geek.txt
print(myzip.date_time)
>>> (2022, 7, 5, 14, 52, 0)
print(myzip.file_size)
>>> 85
print(myzip.compress_size)
>>> 85
Extracting information about the files in a specified archive using .infolist()
import zipfile
import datetime
date = datetime.datetime
with zipfile.ZipFile("geekpython.zip", "r") as info:
for arch in info.infolist():
print(f"The file name is: {arch.filename}")
print(f"The file size is: {arch.file_size} bytes")
print(f"The compressed size is: {arch.compress_size} bytes")
print(f"Date of creation: {date(*arch.date_time)}")
print("-" * 15)
.........
The file name is: geek.txt
The file size is: 85 bytes
The compressed size is: 85 bytes
Date of creation: 2022-07-05 14:52:00
---------------
The file name is: index.html
The file size is: 176 bytes
The compressed size is: 176 bytes
Date of creation: 2022-07-05 15:32:34
---------------
The file name is: program.py
The file size is: 136 bytes
The compressed size is: 136 bytes
Date of creation: 2022-07-05 14:52:00
---------------
Let's see some more methods
import zipfile
with zipfile.ZipFile("sample.zip", "r") as info:
for arch in info.infolist():
if arch.create_system == 0:
system = "Windows"
elif arch.create_system == 3:
system = "UNIX"
else:
system = "Unknown"
print(f"ZIP version: {arch.create_version}")
print(f"Create System: {system}")
print(f"External Attributes: {arch.external_attr}")
print(f"Internal Attributes: {arch.internal_attr}")
print(f"Comments: {arch.comment}")
print("-" * 15)
.........
ZIP version: 20
Create System: Windows
External Attributes: 32
Internal Attributes: 1
Comments: b''
---------------
ZIP version: 20
Create System: Windows
External Attributes: 32
Internal Attributes: 1
Comments: b''
---------------
ZIP version: 20
Create System: Windows
External Attributes: 32
Internal Attributes: 1
Comments: b''
---------------
.create_system
returned an integer
0 - for Windows
3 - for Unix
Example for showing the use of .namelist()
import zipfile
with zipfile.ZipFile("geekpython.zip", "r") as files:
for files_list in files.namelist():
print(files_list)
.........
geek.txt
index.html
program.py
Reading and Writing Member files
Member files are referred to as those files which are present inside the ZIP archives.
To read the content of the member file without extracting it, then we use .read()
. It takes name
which is the name of the file in an archive and pwd
is the password used for the encrypted files.
import zipfile
with zipfile.ZipFile("geekpython.zip", "r") as zif:
for lines in zif.read("intro.txt").split(b"\r\n"):
print(lines)
.........
b'Hey, Welcome to GeekPython!'
b''
b'Are you enjoying it?'
b''
b"Now it's time, see you later!"
b''
We've added .split()
to print the stream of bytes into lines by using the separator /r/n
and added b
as a suffix because we are working on the byte object.
Other than .read()
, we can use .open()
which allows us to read, write and add a new file in a flexible way because just like open()
function, it implements context manager protocol and therefore supports with
statement.
import zipfile
with zipfile.ZipFile("sample.zip", "r") as my_zip:
with my_zip.open("document.txt", "r") as data:
for text in data:
print(text)
.........
b'Hey, I a document file inside the sample.zip folder.\r\n'
b'\r\n'
b'Are you enjoying it.'
We can use .open()
with write mode w
to create a new member file and write content to it, and then we can append it to the existing archive.
import zipfile
with zipfile.ZipFile("sample.zip", "a") as my_zip:
with my_zip.open("file.txt", "w") as data_file:
data_file.write(b"Hi, I am a new file.")
with zipfile.ZipFile("sample.zip", mode="r") as archive:
archive.printdir()
print("-" * 20)
for line in archive.read("file.txt").split(b"\n"):
print(line)
.........
File Name Modified Size
data.txt 2022-07-04 18:17:30 37538
hello.md 2022-07-04 18:33:02 7064
document.txt 2022-07-06 17:08:36 76
file.txt 1980-01-01 00:00:00 20
--------------------
b'Hi, I am a new file.'
Extracting the ZIP archive
There are 2 methods to extract ZIP archive
-
.extractall()
- which allows us to extract all members of the archive in the current working directory. We can also specify the path of the directory of our choice.
import zipfile
with zipfile.ZipFile("geekpython.zip", "r") as file:
file.extractall("files")
All the member files will be extracted into the folder named files
in your current working directory. You can specify another directory.
-
.extract()
- allows us to extract a member from the archive to the current working directory. You must keep one thing in mind you need to specify the full name of the member or it must be aZipInfo
object.
import zipfile
with zipfile.ZipFile("geekpython.zip", "r") as file:
file.extract("hello.txt")
hello.txt
will be extracted from the archive to the current working directory. You can specify the output directory of your choice. You just need to specify path="output_directory/"
as an argument inside the extract()
.
Creating ZIP files
Creating ZIP files is simply writing existing files.
# Creating archive using zipfile module
files = ["hello.txt", "geek.md", "python.txt"]
with zipfile.ZipFile("archive_created.zip", "w") as archive:
for file in files:
archive.write(file)
or you can simply add files by directly specifying the full name.
import zipfile
with zipfile.ZipFile("another_archive.zip", "w") as archive:
archive.write("hello.txt")
archive.write("geek.md")
archive.write("python.txt")
Creating ZIP files using shutil
We can use shutil
to make a ZIP archive and it provides an easy way of doing it.
The Shutil module helps in performing high-level file operations in Python.
import shutil
shutil.make_archive("archive", "zip", "files")
Here archive
is the file name that will be created as a ZIP archive, zip
is the extension that will be added to the file name, and files
is a folder whose data will be archived.
Unpacking the ZIP archive using shutil
import shutil
shutil.unpack_archive("archive.zip", "archive")
Here archive.zip
is the ZIP archive and archive
is the name of the file to be given after the extraction.
Compressing ZIP files
Usually, when we use zipfile
to make a ZIP archive, the result we get is actually uncompressed because by default it uses ZIP_STORED compression method.
It's like member files are stored in a container that is archived.
So, we need to pass an argument compression
inside ZipFile
.
There are 3 types of constants to compress files:
zipfile.ZIP_DEFLATED
- requires azlib
module and compression method is deflate.zipfile.ZIP_BZIP2
- requires abz2
module and the compression method is BZIP2.zipfile.ZIP_LZMA
- requires alzma
module and the compression method is LZMA.
import zipfile
with zipfile.ZipFile("compressed.zip", "w", compression=zipfile.ZIP_DEFLATED) as archive:
archive.write("geek.md")
archive.write("python.txt")
archive.write("hello.txt")
import zipfile
with zipfile.ZipFile("bzip_compressed.zip", "w", compression=zipfile.ZIP_BZIP2) as archive:
archive.write("geek.md")
archive.write("python.txt")
archive.write("hello.txt")
We can also add a compression level. We can give a value between 0
to 9
for maximum compression.
import zipfile
with zipfile.ZipFile("max_compressed.zip", "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9) as archive:
archive.write("geek.md")
archive.write("python.txt")
archive.write("hello.txt")
Did you know that zipfile
can run from the command line?
Run zipfile from Command Line Interface
Here are some options which allow us to list, create, and extract ZIP archives from the command line.
-l
or --list
: List files in a zipfile.
python -m zipfile -l data.zip
.........
File Name Modified Size
Streamlit-Apps-master/ 2022-06-30 23:31:36 0
Streamlit-Apps-master/Covid-19.csv 2022-06-30 23:31:36 1924
Streamlit-Apps-master/Covid-Banner.png 2022-06-30 23:31:36 538140
Streamlit-Apps-master/Procfile 2022-06-30 23:31:36 40
Streamlit-Apps-master/Readme.md 2022-06-30 23:31:36 901
Streamlit-Apps-master/WebAppPreview.png 2022-06-30 23:31:36 145818
Streamlit-Apps-master/app.py 2022-06-30 23:31:36 3162
Streamlit-Apps-master/requirements.txt 2022-06-30 23:31:36 46
Streamlit-Apps-master/setup.sh 2022-06-30 23:31:36 220
It just works like .printdir()
.
-c
or --create
: Create zipfile from source files.
python -m zipfile --create shell.zip python.txt hello.txt
It will create a ZIP archive named shell.zip
and add the file names specified above.
Creating a ZIP file to archive the entire directory
python -m zipfile --create directory.zip source/
python -m zipfile -l directory.zip
.........
File Name Modified Size
source/ 2022-07-07 17:22:42 0
source/archive/ 2022-07-07 10:58:06 0
source/archive/hello.txt 2022-07-07 10:58:06 62
source/archive/index.html 2022-07-07 10:58:06 176
source/archive/intro.txt 2022-07-07 10:58:06 86
source/archive/program.py 2022-07-07 10:58:06 136
source/geek.md 2022-07-07 15:50:28 45
source/hello.txt 2022-07-07 12:21:22 61
-e
or --extract
: Extract zipfile into the target directory.
python -m zipfile --extract directory.zip extracted/
directory.zip
will be extracted into the extracted
directory.
-t
or --test
: Test whether the zipfile is valid or not.
python -m zipfile --test bad_sample.zip
.........
Traceback (most recent call last):
...
BadZipFile: File is not a zip file
Conclusion
Phew, that was a long module to cover, and this article still haven't covered everything.
However, it is sufficient to get started with the zipfile
module and manipulate ZIP archives without extracting them.
ZIP files do have some benefits like they save disk storage and faster transfer speed over a network and more.
We certainly learned some useful operations that we can perform on ZIP archives with the zipfile
module, such as:
Read, write, and extract the existing ZIP archives
Reading the metadata
Creating ZIP archives
Manipulating member files
Running
zipfile
from command line
πOther articles you might be interested in if you liked this one
β What is so special about Python generators and how they work?
β How to convert bytes into a string in Python?
β Understanding the different uses of asterisk(*) in Python.
β Different ways to display web and local images in Jupyter Notebook?
β How to access list items within the dictionary in Python?
β What is the difference between sort() and sorted() in Python?
β How to use super() function in Python classes?
β What are context managers in Python?
That's all for now
Keep Codingββ
Top comments (4)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.