Memgraph for Memgraph

Posted on Jul 20, 2023 • Originally published at memgraph.com

How to Build a Graph Web Application With Python, Flask, Docker & Memgraph

#flask #docker #memgraph #webdev

The goal is straightforward (or at least it seems simple enough). Let's build a
web application in Python that can visualize a graph and run some cool graph
algorithms out of the box. Maybe it's not your flavor, but I prefer the
Flask web framework for such occasions, so bear with me through this
tutorial.

Now, I am going to show you an example of how to accomplish this. You can also
take a look at the finished
app
on GitHub if you want to see the complete code.

The general outline of the tutorial is:

Create a Flask server
Dockerize your application
Import the data into Memgraph
Query the database

Graph visualizations will be covered in part two of the tutorial so stay
tuned! Spoiler alert, we are going to use D3.js to draw our graph.

Prerequisites

For this tutorial, you will need to install:

Docker
Docker Compose (which is included with Docker on Windows and macOS)

With Docker, we don't need to worry about installing Python, Flask, Memgraph...
essentially anything. Everything will be installed automatically and run
smoothly inside Docker containers!

Disclaimer: Docker fanboy alert

1. Create a Flask server

I included comments in the code to make it more understandable, but if at any
point you feel like something is unclear, join our Discord
Server and share your thoughts. First, create the
file app.py with the following contents:

import json
import logging
import os
from argparse import ArgumentParser
from flask import Flask, Response, render_template
from gqlalchemy import Memgraph

log = logging.getLogger(__name__)

def init_log():
    logging.basicConfig(level=logging.DEBUG)
    log.info("Logging enabled")
    # Set the log level for werkzeug to WARNING because it will print out too much info otherwise
    logging.getLogger("werkzeug").setLevel(logging.WARNING)

Other than the imports, the first few lines focus on setting up the logging. No
web application is complete without logging, so we will add the bare minimum and
disable the pesky werkzeug logger, which sometimes prints too much info.

Now, let's create an argument parser. This will enable you to easily change the
behavior of the app on startup using arguments.

# Parse the input arguments for the app
def parse_args():
    """
    Parse command line arguments.
    """
    parser = ArgumentParser(description=__doc__)
    parser.add_argument("--host", default="0.0.0.0", help="Host address.")
    parser.add_argument("--port", default=5000, type=int, help="App port.")
    parser.add_argument("--template-folder", default="public/template", help="Flask templates.")
    parser.add_argument("--static-folder", default="public", help="Flask static files.")
    parser.add_argument("--path-to-input-file", default="graph.cypherl", help="Graph input file.")
    parser.add_argument("--debug", default=True, action="store_true", help="Web server in debug mode.")
    print(__doc__)
    return parser.parse_args()

args = parse_args()

It’s time to create your server instance:

# Create the Flask server instance
app = Flask(
    __name__,
    template_folder=args.template_folder,
    static_folder=args.static_folder,
    static_url_path="",
)

You can finally create the view functions that will be invoked from the browser
via HTTP requests. In layman's terms, the homepage is called by:

# Retrieve the home page for the app
@app.route("/", methods=["GET"])
def index():
    return render_template("index.html")

The only thing that’s left is to implement and call the main() function:

# Entrypoint for the app that will be executed first
def main():
    # Code that should only be run once
    if os.environ.get("WERKZEUG_RUN_MAIN") == "true":
        init_log()
    app.run(host=args.host,
            port=args.port,
            debug=args.debug)

if __name__ == "__main__":
    main()

The somewhat strange statement os.environ.get("WERKZEUG_RUN_MAIN") == "true"

will make sure that this code is only executed once. Confused? A problem arises
when working with Flask in development mode because each code change triggers a
reload of the server, which in turn could result in parts of your code executing
multiple times (for example, like the main function).

So, if you need to execute something only once in Flask at the beginning like
loading data, this is the perfect place for it.

The next step is to create the following files, which we will work on in the
next tutorial:

index.html in public/template
index.js in public/js
style.css in public/css

One more file is needed and this one will specify all the Python dependencies
that need to be installed. Create requirements.txt with the following
contents:

gqlalchemy==1.0.6
Flask==2.0.2

Your current project structure should look like this:

app
├── public
│  ├── css
│  │   └── style.css
│  ├── js
│  │   └── index.js
│  └── templates
│      └── index.html
├── app.py
└── requirements.txt

2. Dockerize your application

This is much simpler than you might think. Most often, you will need a
Dockerfile in which you will specify how your Docker image should be created.
Let's take a look at our Dockerfile :

FROM python:3.9

# Install CMake
RUN apt-get update && \
  apt-get --yes install cmake && \
  rm -rf /var/lib/apt/lists/*

# Install Python packages
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

# Copy the source code
COPY public /app/public
COPY app.py /app/app.py
WORKDIR /app

# Set the environment variables
ENV FLASK_ENV=development
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

# Start the web application
ENTRYPOINT ["python3", "app.py"]

The first line indicates that we are basing our image on a Linux image that has
Python 3.9 preinstalled. The next step is to install CMake (which is needed for
the Memgraph Python driver) with RUN and the standard Linux installation
command apt-get ... .

We copy the requirements.txt file and install the Python packages with
pip. The source code also needs to be copied to the image in order for us to
start the web application. The ENTRYPOINT command is responsible for starting
the desired process inside the container.

But we are not finished with Docker yet. We need to create a
docker-compose.yml file that will tell Docker which containers to start.

version: "3"
services:
  server:
    build: .
    volumes:
      - .:/app
    ports:
      - "5000:5000"
    environment:
      MEMGRAPH_HOST: memgraph
      MEMGRAPH_PORT: "7687"
    depends_on:
      - memgraph

  memgraph:
    image: "memgraph/memgraph"
    ports:
      - "7687:7687"

There are two services/containers in our app:

Server: Uses the Dockerfile to build a Docker image and run it.
Memgraph: This is our database. Docker will automatically download the image and start it.

Because we are supplying environment variables, let's load them in app.py
right after the imports:

MEMGRAPH_HOST = os.getenv("MEMGRAPH_HOST", "memgraph")
MEMGRAPH_PORT = int(os.getenv("MEMGRAPH_PORT", "7687"))

Your current project structure should look like this:

app
├── public
│  ├── css
│  │   └── style.css
│  ├── js
│  │   └── index.js
│  └── templates
│      └── index.html
├── app.py
├── docker-compose.yml
├── Dockerfile
└── requirements.txt

Now, we can even start our app with the following commands:

docker-compose build
docker-compose up

3. Import the data into Memgraph

This task will be done inside the main() function because it only needs to be
executed once:

memgraph = None

def main():
    if os.environ.get("WERKZEUG_RUN_MAIN") == "true":
        init_log()
        global memgraph
        memgraph = Memgraph(MEMGRAPH_HOST,
                            MEMGRAPH_PORT)
        load_data(args.path_to_input_file)
    app.run(host=args.host,
            port=args.port,
            debug=args.debug)

How do we import the data into Memgraph? I prepared a file with the Cypher
queries that need to be executed in order to populate the database. You just
need to download the file in your root directory and add the following
load_data() function:

def load_data(path_to_input_file):
    """Load data into the database."""
    try:
        memgraph.drop_database()
        with open(path_to_input_file, "r") as file:
            for line in file:
                memgraph.execute(line)
    except Exception as e:
        log.info(f"Data loading error: {e}")

First, we clear everything in the database, and then we go over each line in the
file graph.cypherl and execute them. And that's it. Once we start the web
application, Memgraph will import the dataset.

4. Query the database

We will create a function that will execute a Cypher query and return the
results. It returns the whole graph, but we will limit ourselves to 100 nodes:

def get_graph():
    results = memgraph.execute_and_fetch(
        f"""MATCH (n)-[]-(m)
                RETURN n as from, m AS to
                LIMIT 100;"""
    )
    return list(results)

The view function get_data() which fetches all the nodes and relationships
from the database, filters out the most important information, and returns it in
JSON format for visualization. To can the network load at a minimum, you will
send a list with every node id (no other information about the nodes) and a list
that specifies how they are connected to each other.

@app.route("/get-graph", methods=["GET"])
def get_data():
    """Load everything from the database."""
    try:
        results = get_graph()

        # Sets for quickly checking if we have already added the node or edge
        # We don't want to send duplicates to the frontend
        nodes_set = set()
        links_set = set()
        for result in results:
            source_id = result["from"].properties['name']
            target_id = result["to"].properties['name']

            nodes_set.add(source_id)
            nodes_set.add(target_id)

            if ((source_id, target_id) not in links_set and
                    (target_id, source_id,) not in links_set):
                links_set.add((source_id, target_id))

        nodes = [
            {"id": node_id}
            for node_id in nodes_set
        ]
        links = [{"source": n_id, "target": m_id} for (n_id, m_id) in links_set]

        response = {"nodes": nodes, "links": links}
        return Response(json.dumps(response), status=200, mimetype="application/json")
    except Exception as e:
        log.info(f"Data loading error: {e}")
        return ("", 500)

What’s next?

As you can see, it’s very easy to connect to Memgraph and run graph algorithms,
even from a web application. While this part of the tutorial focused on the
backend, in the next one, we will talk about graph visualizations and the D3.js
framework.