DEV Community

Chris White
Chris White

Posted on • Edited on

Network Connection Fundamentals With Python

When accessing this site there's quite a lot going on. In today's cloud centric world much of this low level communication is abstracted out. In this series we'll be looking at some of the foundations of network communication, starting with how basic connections work.

Client and Server

To start out we'll take a basic server that takes what was sent to it and returns it as upper case. This will be using python's built in socketserver module to handle the details, the documentation for which this example comes from:

import socketserver

class MyTCPHandler(socketserver.StreamRequestHandler):
    """
    The request handler class for our server.

    It is instantiated once per connection to the server, and must
    override the handle() method to implement communication to the
    client.
    """

    def handle(self):
        # self.request is the TCP socket connected to the client
        self.data = self.rfile.readline().strip()
        print("{} wrote:".format(self.client_address))
        print(self.data)
        # just send back the same data, but upper-cased
        self.request.sendall(self.data.upper())

if __name__ == "__main__":
    HOST, PORT = "localhost", 5555

    # Create the server, binding to localhost on port 9999
    with socketserver.TCPServer((HOST, PORT), MyTCPHandler) as server:
        # Activate the server; this will keep running until you
        # interrupt the program with Ctrl-C
        server.serve_forever()
Enter fullscreen mode Exit fullscreen mode

The first important part here is the server doing a binding:

with socketserver.TCPServer((HOST, PORT), MyTCPHandler) as
Enter fullscreen mode Exit fullscreen mode

This will register that the program wants to use a specific port (5555) so the operating system attempts to reserve it until the program shuts down and a handler is registered as well. This will be executed when a client connects to decide what will be done with the request. In this case a StreamRequestHandler is being used, which will expose the client's connection as a file like object:

self.rfile.readline().strip()
Enter fullscreen mode Exit fullscreen mode

The handler here has a single line being read in. I will note that in the real world where you don't know who is sending data, someone could simply never send a newline and the connection would be stuck open. Magnify this and soon you could have a server with too many connections taken up to the point of potential resource exhaustion, effectively becoming a Denial of Service (DoS) attack.

While readline is shown, on the back-end it's actually doing a series of recv calls. These calls take in a certain number of bytes until there is no data left. In a similar note sendall will keep sending data until there is none left. Now on the client side:

import socket

MSG = bytearray("Hello World", 'utf-8')

connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
connection.connect(("127.0.0.1", 5555))
print(connection.getsockname())
connection.send(MSG)
result = connection.recv(len(MSG.upper()))
print(result)
Enter fullscreen mode Exit fullscreen mode

One thing to note here is that networking deals with bytes at a low level. bytearray is a special python type which lets you turn a string (array of characters) into a series of bytes designated by the character encoding given (UTF-8).

socket.socket(socket.AF_INET, socket.SOCK_STREAM)
Enter fullscreen mode Exit fullscreen mode

This creates a socket connection. AF_INET indicates we'll be dealing with connections via IPv4 (Internet Protocol Version 4) addresses. SOCK_STREAM is a fancy way of indicating a TCP (Transmission Control Protocol) connection. This means we're connecting via TCP/IP.

connection.send(MSG)
result = connection.recv(len(MSG.upper()))
Enter fullscreen mode Exit fullscreen mode

Here the message is sent, then we switch to receiving mode to get the data sent from the server. Given that we know what will come back (the message as upper case) we can use this to retrieve the exact number of bytes of the upper case version. Now after running everything together:

> python .\server.py
('127.0.0.1', 57990) wrote:
b'Hello World'

> python client.py
('127.0.0.1', 57990)
b'HELLO WORLD'
Enter fullscreen mode Exit fullscreen mode

The two way connection is complete.

Ports

Ports on operating systems actually have a specification by the Internet Engineering Task Force (IETF) in Request For Comments (RFC) 8335. This talks about how ports and service names work. Service names are special labels for specific ports that are managed by IANA (Internet Assigned Numbers Authority). The IANA website holds the current mapping of service names to ports. Operating systems often use this to provide user friendly names of these special ports. Linux for example, often stores this listing in /etc/services:

tcpmux          1/tcp                           # TCP port service multiplexer
echo            7/tcp
echo            7/udp
discard         9/tcp           sink null
discard         9/udp           sink null
systat          11/tcp          users
daytime         13/tcp
daytime         13/udp
netstat         15/tcp
qotd            17/tcp          quote
chargen         19/tcp          ttytst source
chargen         19/udp          ttytst source
ftp-data        20/tcp
ftp             21/tcp
fsp             21/udp          fspd
ssh             22/tcp                          # SSH Remote Login Protocol
telnet          23/tcp
smtp            25/tcp          mail
time            37/tcp          timserver
time            37/udp          timserver
whois           43/tcp          nicname
tacacs          49/tcp                          # Login Host Protocol (TACACS)
Enter fullscreen mode Exit fullscreen mode

The the built-in socket python module even has getservbyname and getservbyport methods to work with this information:

>>> import socket
>>> print(socket.getservbyname('http'))
80
>>> print(socket.getservbyport(80))
http
Enter fullscreen mode Exit fullscreen mode

There are also designations for port ranges. Ports 0-1023 are system ports. Running a server on requires either administrative access on Windows or root/privileged access on *NIX systems. If run without administrative privileges an access denied message will appear. This is done as you wouldn't want say, a random user putting up their own SSH server.

Ports 1024-49151 are meant for non-admin users to allow them to run services. This is why binding to 5555 doesn't require administrative access. It's also the reason many web applications that are being installed in a local environment tend to use ports such as 8080 or 8888 so users don't have to worry about their admin permissions. Finally there are "dynamic ports" which can be seen in the output:

('127.0.0.1', 57990) wrote:
Enter fullscreen mode Exit fullscreen mode

These ports are reserved by the operating system for client communication. Without such ports the server has no way to communicate with the client. In essence, a dynamic port lets the client also act as a "server" in a sense for the duration of the connection. While the RFC lists these ports as 49152-65535, the actual range is OS specific and in some cases can be configured. Later versions of Windows use the IANA recommendation, while my Ubuntu instance as:

# sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768    60999
Enter fullscreen mode Exit fullscreen mode

IP Address

As mentioned previously network communication works in bytes. So what about IP Addresses? Are we simply turning the string "127.0.0.1" into a series of bytes? It turns out that IP addresses are a special way of indicating a 32 bit number. . separated value is an 8 bit/1 byte segment:

[8 bits].[8 bits].[8 bits].[8 bits]
Enter fullscreen mode Exit fullscreen mode

Where each 1 byte segment is the binary version of the decimal the segment indicates. socket.inet_aton can be used to showcase this:

>>> import socket
>>> ip_binary = socket.inet_aton('127.0.0.1')
>>> import struct
>>> struct.unpack('BBBB', ip_binary)
(127, 0, 0, 1)
>>> ip_binary
b'\x7f\x00\x00\x01'
Enter fullscreen mode Exit fullscreen mode

ip_binary is a sequence of the bytes and struct.unpackis set to 4 unsigned chars at 1 byte (what B represents) which have the values 0-255, matching the range of the allowed values of each IP segment for IPv4. IPV6 is a bit more complicated and a full example looks something like this:

0123:4567:89ab:cdef:0123:4567:89ab:cdef
Enter fullscreen mode Exit fullscreen mode

In this case segments are broken up by :. Each segment has 4 base 16 values from 0(0000 binary) to f(1111 binary) which each take up 4 bits. This gives you 16 bits per segment, 8 segments, giving a total of 128 bits. This makes it 4 times the size of IPv4 addresses. Due to socket.inet_aton only being for IPv4 addresses socket.inet_pton is used instead which allows us to designate IPv6 addresses:

>>> import socket
>>> socket.inet_pton(socket.AF_INET6, '0123:4567:89ab:cdef:0123:4567:89ab:cdef')
b'\x01#Eg\x89\xab\xcd\xef\x01#Eg\x89\xab\xcd\xef'
Enter fullscreen mode Exit fullscreen mode

IP Addresses in binary form can be passed in as a constructor to one of the ipaddress modules classes to obtain back an object back:

>>> import ipaddress
>>> import socket
>>> ipv6_bytes = socket.inet_pton(socket.AF_INET6, '0123:4567:89ab:cdef:0123:4567:89ab:cdef')
>>> ipv4_bytes = socket.inet_aton('127.0.0.1')
>>> ipaddress.IPv4Address(ipv4_bytes)
IPv4Address('127.0.0.1')
>>> ipaddress.IPv6Address(ipv6_bytes)
IPv6Address('123:4567:89ab:cdef:123:4567:89ab:cdef')
>>> ipaddress.IPv6Address(ipv6_bytes).exploded
'0123:4567:89ab:cdef:0123:4567:89ab:cdef'
Enter fullscreen mode Exit fullscreen mode

For IPv6 in particular, 0s are not shown in the output by default. The lack of a value simply implies 0000. Using the exploded property will show the full address with 0s appropriately shown. There's also a few helper functions with useful information:

>>> ipaddress.IPv4Address(ipv4_bytes).is_loopback
True
>>> ipaddress.IPv4Address(ipv4_bytes).is_global
False
>>> ipaddress.IPv4Address(ipv4_bytes).is_private
Enter fullscreen mode Exit fullscreen mode

Now what does this mean is_private or is_global and how can we tell? It turns out that IANA also handles IP addresses as well. In this case though they mostly handle the allocation of the first 8 bit value in an IPv4 address. So for example if I take one of the Google DNS IP addresses 8.8.8.8:

008/8   Administered by ARIN    1992-12     whois.arin.net  https://rdap.arin.net/registry
http://rdap.arin.net/registry   LEGACY
Enter fullscreen mode Exit fullscreen mode

I'm told that it's administered by ARIN. Now ARIN is the American Registry for Internet Numbers and handles IP address allocation for most of North America. This means that IANA acts as an allocation authority for which regional authorities a prefix goes to. The regional authorities are then:

  • AFRINIC: Africa Region
  • APNIC: Asia/Pacific Region
  • ARIN: Canada, USA, and some Caribbean Islands
  • LACNIC: Latin America and some Caribbean Islands
  • RIPE NCC: Europe, the Middle East, and Central Asia

Now it's important to note that you generally get the best information if you search for an IP address using its regional whois service. For example, if I try to use ARIN Whois to search for a Japanese IP address:

ARIN whois recommending that a Japanese IP address search be done using APNIC whois

It will tell me I should use APNIC instead. Using these I can get more information about IP address ownership. In Google DNS' case:

ARIN whois search showing 8.8.8.8 as Google owned

It shows ownership of the IP address by Google. You can also see what IP blocks an organization owns. For example here is the list of Google owned IP blocks. One thing to note is that this is also an unfortunate tool for malicious actors. They find such IP blocks belonging to an organization and initiate mass scans on them. This is how EC2 instances on AWS are continually being scanned in mass. Another rather awkward situation with ownership is that some Early Registration Transfers were initiated which transferred certain IP blocks from one regional authority to another (RIPE to ARIN).

DNS

Domain Name System or DNS allows the resolution of a specific name to an IP address. This is powered by a global network of servers along with a local override. The /etc/hosts file on *NIX systems and c:\Windows\System32\Drivers\etc\hosts file on Windows systems allow the manual setting locally of a host name -> IP address mapping. As an example on my Ubuntu instance:

127.0.0.1       localhost
127.0.0.1       gitserver

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
Enter fullscreen mode Exit fullscreen mode

This maps localhost to 127.0.0.1 and another entry does the same for a gitolite server to be accessible locally with a more human friendly name. Note that because DNS lookups happen so much they are generally cached for performance purposes as network traffic relying on DNS lookups cannot continue without them resolving the IP address. As an example, here is one of the entries in my Windows DNS cache:

example.org
    ----------------------------------------
    Record Name . . . . . : example.org
    Record Type . . . . . : 1
    Time To Live  . . . . : 61854
    Data Length . . . . . : 4
    Section . . . . . . . : Answer
    A (Host) Record . . . : 93.184.216.34
Enter fullscreen mode Exit fullscreen mode

This tells me example.org resolves to 93.184.216.34 and the time to live (in seconds) indicates this entry should be cached for around 17 hours. Note this value fluctuates depending what's backing a DNS entry. Once that's done the lookups keep going up a chain of servers to find out what the IP is. This can be one of:

  • Manually set servers, such as someone setting up Google DNS
  • The router, which generally forwards to your ISP
  • Your ISP
  • A server part of the global DNS network
  • A server specific to a domain name / organization

It's worth noting that python does provide a way to get IPs from hostnames:

>>> import socket
>>> dns_result = socket.getaddrinfo('google.com', 80)
>>> dns_result
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('142.250.191.238', 80)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('142.250.191.238', 80)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_RAW: 3>, 0, '', ('142.250.191.238', 80)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2607:f8b0:4009:819::200e', 80, 0, 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('2607:f8b0:4009:819::200e', 80, 0, 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_RAW: 3>, 0, '', ('2607:f8b0:4009:819::200e', 80, 0, 0))]
Enter fullscreen mode Exit fullscreen mode

But the returned values don't quite map out to the standard way a user would expect to work with DNS (not to mention requiring a port). Thankfully there is a python package to present DNS results in a more layout friendly method. This will require installing dnspython via pip: pip install dnspython:

from dns.resolver import resolve
from dns.rdatatype import RdataType

for query_type in RdataType:
    try:    
        answers = resolve('dev.to', query_type)
        for rdata in answers:
            print(f"{RdataType.to_text(query_type)}:{rdata.to_text()}")
    except:
        continue
Enter fullscreen mode Exit fullscreen mode

Which will output:

> python .\dns_list.py
A:151.101.194.217
A:151.101.66.217
A:151.101.130.217
A:151.101.2.217
NS:josh.ns.cloudflare.com.
NS:jill.ns.cloudflare.com.
SOA:jill.ns.cloudflare.com. dns.cloudflare.com. 2309864129 10000 2400 604800 3600
MX:10 alt4.aspmx.l.google.com.
MX:5 alt1.aspmx.l.google.com.
MX:5 alt2.aspmx.l.google.com.
MX:10 alt3.aspmx.l.google.com.
MX:1 aspmx.l.google.com.
TXT:"v=spf1 a mx include:_spf.google.com include:sendgrid.net include:servers.mcsv.net include:shops.shopify.com ~all"
TXT:"facebook-domain-verification=1xzy1qk89qs7ngxdt5e4s0kvvqw701"
TXT:"_globalsign-domain-verification=VzRovTWhxjedMqXFfoiZ-UNRnlnuTXYHgjKemPNt33"
TXT:"google-site-verification=oTtYzW83zP_41DlUrb_VXtAjLTW1p71RBmWR2g5ctrk"
Enter fullscreen mode Exit fullscreen mode

So there a number if interesting record types you'll find in many queries:

  • A records: Mapping of hosts -> IP addresses
  • AAAA records: Same but for IPv6
  • CNAME records: Used for aliases
  • TXT records: Simply text information, but commonly used for domain ownership verification purposes
  • NS records: Nameservers for the domain
  • MX records: Used to indicate email servers
  • SAO records: Required record that indicates the start of authority with general ownership and administrative contact information

These are the ones that are most commonly used or modified. That said there are some services such as Route53 in AWS which dynamically return IP addresses for A and AAAA records based on certain conditions. This can be anything from server load to geographical location. In the case of dev.to's DNS answers we can reason a few things:

  • They are using CloudFlare for DNS.
  • Fastly is providing CDN (Content Delivery Network)
  • Google Mail services are being utilized for email
  • SPF (Sender Policy Framework) allows Sendgrid, Mail Chimp, and Shopify to send email on behalf of dev.to
  • Verification for Facebook API, Verisign Global Sign, and Google Sites was made to show ownership

Given that AAAA records are not present you wouldn't be able to connect directly using IPv6. The IP address for the A records are also interesting as they should technically fall under RIPE administration but look to be part of the Early Registration Transfer program (enough to where a bug was filed about it to RedHat). Also if you try and visit one of the A record IP addresses:

Fastly error showing no setup

This is due to a single IP being able to host websites of multiple domains. Because of this the IP alone may not be enough and require the actual domain to be included with the request so the server knows what it's actually supposed to serve.

Conclusion

This concludes a look into networking fundamentals in python. While much of this information is abstracted away from much of modern day cloud computing, it's still interesting to know what goes on behind the scenes. It might even help solve a challenging debug session one day!

Top comments (0)