3. Modules
Modules are reusable libraries of code in Python. Python comes with many standard library modules.
A module is imported using the import
statement.
>>> import time
>>> print(time.asctime())
'Fri Mar 30 12:59:21 2012'
In this example, we’ve imported the time module and called the asctime
function from that module, which returns current time as a string.
There is also another way to use the import statement.
>>> from time import asctime
>>> asctime()
'WED May 06 16:50:37 2020'
Here we just imported the asctime
function from the time module.
The pydoc
command provides help on any module or a function.
$ pydoc time
Help on module time:
NAME
time - This module provides various functions to manipulate time values.
...
$ pydoc time.asctime
Help on built-in function asctime in time:
time.asctime = asctime(...)
asctime([tuple]) -> string
...
On Windows, the pydoc
command is not available. The work-around is to use, the built-in help function.
>>> help('time')
Help on module time:
NAME
time - This module provides various functions to manipulate time values.
...
Writing our own modules is very simple.
For example, create a file called num.py with the following content.
def square(x):
return x * x
def cube(x):
return x * x * x
Now open Python interpreter:
>>> import num
>>> num.square(3)
9
>>> num.cube(3)
27
That's all we’ve written a python library.
Try pydoc num (pydoc.bat numbers on Windows) to see documentation for this numbers modules. It won’t have any documentation as we haven’t provided anything yet.
In Python, it is possible to associate documentation for each module, function using docstrings. Docstrings are strings written at the top of the module or at the beginning of a function.
Lets try to document our num
module by changing the contents of num.py
"""The num module provides utilties to work on numbers.
Current it provides square and cube.
"""
def square(x):
"""Computes square of a number."""
return x * x
def cube(x):
"""Computes cube of a number."""
return x * x
The pydoc command will now show us the doumentation nicely formatted.
Help on module num:
NAME
num - The num module provides utilties to work on numbers.
FILE
/Users/anand/num.py
DESCRIPTION
Current it provides square and cube.
FUNCTIONS
cube(x)
Computes cube of a number.
square(x)
Computes square of a number.
Under the hood, python stores the documentation as a special field called doc.
>>> import os
>>> print(os.getcwd.__doc__)
getcwd() -> path
Return a string representing the current working directory.
3.1. Standard Library
Python comes with many standard library modules. Lets look at some of the most commonly used ones.
3.1.1. os module
The os
and os.path
modules provides functionality to work with files, directories etc.
Problem 1: Write a program to list all files in the given directory.
Problem 2: Write a program extcount.py
to count number of files for each extension in the given directory. The program should take a directory name as argument and print count and extension for each available file extension.
$ python extcount.py src/
14 py
4 txt
1 csv
Problem 3: Write a program to list all the files in the given directory along with their length and last modification time. The output should contain one line for each file containing filename, length and modification date separated by tabs.
Hint: see help for os.stat
.
Problem 4: Write a program to print directory tree. The program should take path of a directory as argument and print all the files in it recursively as a tree.
$ python dirtree.py foo
foo
|-- a.txt
|-- b.txt
|-- code
| |-- a.py
| |-- b.py
| |-- docs
| | |-- a.txt
| | \-- b.txt
| \-- x.py
\-- z.txt
3.1.2. urllib module
The urllib module provides functionality to download webpages.
>>> from urllib.request import urlopen
>>> response = urlopen("http://python.org/")
>>> print(response.headers)
Date: Fri, 30 Mar 2012 09:24:55 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Fri, 30 Mar 2012 08:42:25 GMT
ETag: "105800d-4b7b-4bc71d1db9e40"
Accept-Ranges: bytes
Content-Length: 19323
Connection: close
Content-Type: text/html
X-Pad: avoid browser bug
>>> response.header['Content-Type']
'text/html'
>>> content = request.read()
Problem 5: Write a program wget.py to download a given URL. The program should accept a URL as argument, download it and save it with the basename of the URL. If the URL ends with a /, consider the basename as index.html.
$ python wget.py http://docs.python.org/tutorial/interpreter.html
saving http://docs.python.org/tutorial/interpreter.html as interpreter.html.
$ python wget.py http://docs.python.org/tutorial/
saving http://docs.python.org/tutorial/ as index.html.
3.1.3. re module
Problem 6: Write a program antihtml.py that takes a URL as argument, downloads the html from web and print it after stripping html tags.
$ python antihtml.py index.html
...
The Python interpreter is usually installed as /usr/local/bin/python on
those machines where it is available; putting /usr/local/bin in your
...
Problem 7: Write a function make_slug that takes a name converts it into a slug. A slug
is a string where spaces and special characters are replaced by a hyphen, typically used to create blog post URL from post title. It should also make sure there are no more than one hyphen in any place and there are no hyphens at the beginning and end of the slug.
>>> make_slug("hello world")
'hello-world'
>>> make_slug("hello world!")
'hello-world'
>>> make_slug(" --hello- world--")
'hello-world'
Problem 8: Write a program links.py that takes URL of a webpage as argument and prints all the URLs linked from that webpage.
Problem 9: Write a regular expression to validate a phone number.
3.1.4. json module
Problem 10: Write a program myip.py to print the external IP address of the machine. Use the response from http://httpbin.org/
get and read the IP address from there. The program should print only the IP address and nothing else.
3.1.5. zipfile module
The zipfile module provides interface to read and write zip files.
Here are some examples to demonstate the power of zipfile module.
The following example prints names of all the files in a zip archive.
import zipfile
z = zipfile.ZipFile("a.zip")
for name in z.namelist():
print(name)
The following example prints each file in the zip archive.
import zipfile
z = zipfile.ZipFile("a.zip")
for name in z.namelist():
print()
print("FILE:", name)
print()
print(z.read(name))
Problem 11: Write a python program zip.py to create a zip file. The program should take name of zip file as first argument and files to add as rest of the arguments.
$ python zip.py foo.zip file1.txt file2.txt
Problem 12: Write a program mydoc.py to implement the functionality of pydoc. The program should take the module name as argument and print documentation for the module and each of the functions defined in that module.
$ python mydoc.py os
Help on module os:
DESCRIPTION
os - OS routines for Mac, NT, or Posix depending on what system we're on.
...
FUNCTIONS
getcwd()
...
Hints:
-The dir function to get all entries of a module
-The inspect.isfunction function can be used to test if given object is a function
-x.doc gives the docstring for x.
-The import function can be used to import a module by name
3.2. Installing third-party modules
PyPI, The Python Package Index maintains the list of Python packages available. The third-party module developers usually register at PyPI and uploads their packages there.
The standard way to installing a python module is using pip
or easy_install
. Pip is more modern and preferred.
Lets start with installing easy_install
.
Download the easy_install install script ez_setup.py.
Run it using Python.
That will install easy_install
, the script used to install third-party python packages.
Before installing new packages, lets understand how to manage virtual environments for installing python packages.
Earlier the only way of installing python packages was system wide. When used this way, packages installed for one project can conflict with other and create trouble. So people invented a way to create isolated Python environment to install packages. This tool is called virtualenv
.
To install virtualenv
:
$ easy_install virtualenv
Installing virtualenv also installs the pip command, a better replace for easy_install.
Once it is installed, create a new virtual env by running the virtualenv
command.
$ virtualenv testenv
Now to switch to that env.
On UNIX/Mac OS X:
$ source testenv/bin/activate
On Windows:
> testenv\Scripts\activate
Now the virtualenv testenv is activated.
Now all the packages installed will be limited to this virtualenv. Lets try to install a third-party package.
$ pip install tablib
This installs a third-party library called tablib
.
The tablib
library is a small little library to work with tabular data and write csv and Excel files.
Here is a simple example.
# create a dataset
data = tablib.Dataset()
# Add rows
data.append(["A", 1])
data.append(["B", 2])
data.append(["C", 3])
# save as csv
with open('test.csv', 'wb') as f:
f.write(data.csv)
# save as Excel
with open('test.xls', 'wb') as f:
f.write(data.xls)
# save as Excel 07+
with open('test.xlsx', 'wb') as f:
f.write(data.xlsx)
It is even possible to create multi-sheet excel files.
sheet1 = tablib.Dataset()
sheet1.append(["A1", 1])
sheet1.append(["A2", 2])
sheet2 = tablib.Dataset()
sheet2.append(["B1", 1])
sheet2.append(["B2", 2])
book = tablib.Databook([data1, data2])
with open('book.xlsx', 'wb') as f:
f.write(book.xlsx)
Problem 13: Write a program csv2xls.py
that reads a csv file and exports it as Excel file. The program should take two arguments. The name of the csv file to read as first argument and the name of the Excel file to write as the second argument.
Problem 14: Create a new virtualenv and install BeautifulSoup. BeautifulSoup is very good library for parsing HTML. Try using it to extract all HTML links from a webpage.
Read the BeautifulSoup documentation to get started.
Top comments (2)
Great stuff Py-lady
Thanks.. Just trying