If, for some reason, you are building your own graph-node without using a hosted-service such as The Graph, you must also correctly consider the scale out of graph-nodes. If you do not build a configuration that allows scale out, users will often end up with a 504 (Gateway Timeout) request from a Web3 client. This article introduces an easy method of graph-node scale out.
TL;DR
Rewrite official docker-compose.yml as follows (explanation follows).
version: '3'
services:
graph-node-index:
image: graphprotocol/graph-node
ports:
- '8020:8020'
depends_on:
- ipfs
- postgres
extra_hosts:
- host.docker.internal:host-gateway
environment:
postgres_host: postgres
postgres_user: graph-node
postgres_pass: let-me-in
postgres_db: graph-node
ipfs: 'ipfs:5001'
ethereum: 'mainnet:http://host.docker.internal:8545'
GRAPH_LOG: info
node_role: index-node
node_id: index-node
BLOCK_INGESTOR: index-node
graph-node-query:
image: graphprotocol/graph-node
ports:
- '8000:8000'
- '8001:8001'
depends_on:
- ipfs
- postgres
extra_hosts:
- host.docker.internal:host-gateway
environment:
postgres_host: postgres
postgres_user: graph-node
postgres_pass: let-me-in
postgres_db: graph-node
ipfs: 'ipfs:5001'
ethereum: 'mainnet:http://host.docker.internal:8545'
GRAPH_LOG: info
node_role: query-node
ipfs:
image: ipfs/go-ipfs:v0.4.23
ports:
- '5001:5001'
volumes:
- ./data/ipfs:/data/ipfs
postgres:
image: postgres
ports:
- '5432:5432'
command:
[
"postgres",
"-cshared_preload_libraries=pg_stat_statements",
"-cmax_connections=100"
]
environment:
POSTGRES_USER: graph-node
POSTGRES_PASSWORD: let-me-in
POSTGRES_DB: graph-node
volumes:
- ./data/postgres:/var/lib/postgresql/data
Then run the node as follows:
docker-compose up -d --scale graph-node-query=5
The following is an explanation
Scale out graph nodes
I am modifying docker-compose configuration to be able to scale out graph nodes. There are two main things I am doing
- Split graph nodes into two index node and query node
- Increase the number of concurrent PostgreSQL connections
Increase the number of concurrent PostgreSQL connections
Let's start with the easy one. Add a startup option to increase the number of concurrent PostgreSQL connections.
@@ -34,7 +53,8 @@
command:
[
"postgres",
- "-cshared_preload_libraries=pg_stat_statements"
+ "-cshared_preload_libraries=pg_stat_statements",
+ "-cmax_connections=100"
]
environment:
POSTGRES_USER: graph-node
Since we are increasing the number of graph nodes for scale-out, we also need to increase the number of connections waiting on the PostgreSQL side. Here I set it to 100.
Split graph-node into two roles: index node and query node
Next, graph nodes are divided into two roles: index-only nodes and query-only nodes. Scale out by increasing the number of query-only nodes. If scale-out is performed without this role separation, there will be multiple index nodes, which will compete for indexing work. Another approach is to use unique node_id, but I did not take this approach this time (see reason in postscript).
Index and query nodes are switched by the DISABLE_BLOCK_INGESTOR
environment variable at graph node startup. If DISABLE_BLOCK_INGESTOR=false
, it becomes a query node, and if true
, it becomes an index node. This is a bit complicated, but it is switched in the start script inside the Dockerfile by node_role
.
graph-node-index:
environment:
node_role: index-node
node_id: index-node
BLOCK_INGESTOR: index-node
graph-node-query:
environment:
node_role: query-node
This alone will cause host-side port conflicts, so to prevent host-side port conflicts, the index node (graph-node-index) listens only to the API port (8020), and the query node (graph-node-query) listens only to HTTP (8000) and Websocket ( 8001) only.
graph-node-index:
image: graphprotocol/graph-node
ports:
# - '8000:8000'
# - '8001:8001'
- '8020:8020'
#- '8030:8030'
#- '8040:8040'
graph-node-query:
image: graphprotocol/graph-node
ports:
- '8000:8000'
- '8001:8001'
#- '8020:8020'
#- '8030:8030'
#- '8040:8040'
With this in place, scale out the query node and you're good to go.
docker-compose up -d --scale graph-node-query=5
Yey, Good Job!!!
(Postscript) Why didn't I avoid the conflict by making the node_id unique?
My first approach was to make node_id unique. However, I changed my approach for the following two reasons
Hiding API node endpoints
I wanted to completely isolate the API node from query nodes accessed by the general public in order to reduce the chance of strangers accessing the API without permission.
Wanna using the official Dockerfile as is.
It would be easiest to set the node_id to $HOSTNAME
to make it unique, but this would require modifying the start
script, which is simply annoying.
Top comments (0)