Skip to main content

Running SubQuery Locally

SubQuery TeamAbout 6 min

Running SubQuery Locally

This guide works through how to run a local SubQuery node on your infrastructure, which includes both the indexer and query service. Don't want to worry about running your own SubQuery infrastructure? SubQuery provides a Managed Serviceopen in new window to the community for free. Follow our publishing guide to see how you can upload your project to SubQuery Managed Serviceopen in new window.

There are two ways to run a project locally, using Docker or running the individual components using NodeJS (indexer node service and query service).

Using Docker

An alternative solution is to run a Docker Container, defined by the docker-compose.yml file. For a new project that has been just initialised you won't need to change anything here.

Under the project directory run the following command:

docker-compose pull && docker-compose up

Note

It may take some time to download the required packages (@subql/nodeopen in new window, @subql/queryopen in new window, and Postgres) for the first time but soon you'll see a running SubQuery node.

Running an Indexer (subql/node)

Requirements:

A SubQuery node is an implementation that extracts Substrate/Polkadot-based blockchain data per the SubQuery project and saves it into a Postgres database.

If you are running your project locally using subql-node or subql-node-<network>, make sure you enable the pg_extension btree_gist

You can run the following SQL query:

CREATE EXTENSION IF NOT EXISTS btree_gist;

Installation

# NPM
npm install -g @subql/node

Warning

Please note that we DO NOT encourage the use of yarn global due to its poor dependency management which may lead to an error down the line.

Once installed, you can start a node with the following command:

subql-node <command>

Key Commands

The following commands will assist you to complete the configuration of a SubQuery node and begin indexing. To find out more, you can always run --help.

Point to local project path

subql-node -f your-project-path

Connect to database

export DB_USER=postgres
export DB_PASS=postgres
export DB_DATABASE=postgres
export DB_HOST=localhost
export DB_PORT=5432
subql-node -f your-project-path

Depending on the configuration of your Postgres database (e.g. a different database password), please ensure also that both the indexer (subql/node) and the query service (subql/query) can establish a connection to it.

If your database is using SSL, you can use the following command to add the server certificate to it:

subql-node -f your-project-path --pg-ca /path/to/ca.pem

If your database is using SSL and requires a client certificate, you can use the following command to connect to it:

subql-node -f your-project-path --pg-ca /path/to/ca.pem --pg-cert /path/to/client-cert.pem --pg-key /path/to/client-key.key

Specify a configuration file

subql-node -c your-project-config.yml

This will point the query node to a manifest file which can be in YAML or JSON format.

Change the block fetching batch size

subql-node -f your-project-path --batch-size 200

Result:
[IndexerManager] fetch block [203, 402]
[IndexerManager] fetch block [403, 602]

When the indexer first indexes the chain, fetching single blocks will significantly decrease the performance. Increasing the batch size to adjust the number of blocks fetched will decrease the overall processing time. The current default batch size is 100.

Check your node health

There are 2 endpoints that you can use to check and monitor the health of a running SubQuery node.

  • Health check endpoint that returns a simple 200 response.
  • Metadata endpoint that includes additional analytics of your running SubQuery node.

Append this to the base URL of your SubQuery node. Eg http://localhost:3000/meta will return:

{
    "currentProcessingHeight": 1000699,
    "currentProcessingTimestamp": 1631517883547,
    "targetHeight": 6807295,
    "bestHeight": 6807298,
    "indexerNodeVersion": "0.19.1",
    "lastProcessedHeight": 1000699,
    "lastProcessedTimestamp": 1631517883555,
    "uptime": 41.151789063,
    "polkadotSdkVersion": "5.4.1",
    "apiConnected": true,
    "injectedApiConnected": true,
    "usingDictionary": false,
    "chain": "Polkadot",
    "specName": "polkadot",
    "genesisHash": "0x91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3",
    "blockTime": 6000
}

http://localhost:3000/health will return HTTP 200 if successful.

A 500 error will be returned if the indexer is not healthy. This can often be seen when the node is booting up.

{
    "status": 500,
    "error": "Indexer is not healthy"
}

If an incorrect URL is used, a 404 not found error will be returned.

{
"statusCode": 404,
"message": "Cannot GET /healthy",
"error": "Not Found"
}

Debug your project

Use the node inspectoropen in new window to run the following command.

node --inspect-brk <path to subql-node> -f <path to subQuery project>

For example:

node --inspect-brk /usr/local/bin/subql-node -f ~/Code/subQuery/projects/subql-helloworld/
Debugger listening on ws://127.0.0.1:9229/56156753-c07d-4bbe-af2d-2c7ff4bcc5ad
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.

Then open up the Chrome dev tools, go to Source > Filesystem and add your project to the workspace and start debugging. For more information, check out How to debug a SubQuery project.

Running a Query Service (subql/query)

Installation

# NPM
npm install -g @subql/query

Warning

Please note that we DO NOT encourage the use of yarn global due to its poor dependency management which may lead to an error down the line.

Running the Query service

export DB_HOST=localhost
subql-query --name <project_name> --playground

Make sure the project name is the same as the project name when you initialize the project. Also, check the environment variables are correct.

After running the subql-query service successfully, open your browser and head to http://localhost:3000. You should see a GraphQL playground showing in the Explorer and the schema that is ready to query.

Running High Performance SubQuery Infrastructure

SubQuery is designed to provide reliable and performant indexing to production applications, we use the services that we build to run SubQuery in our own managed service which serves millions of requests each day to hundreds of customers. As such, we've added some commands that you will find useful to get the most performance out of your project and mitigate against any DDOS attacks.

Improve Indexing with Node Workers and Cache Size

Use node worker threads to move block fetching and block processing into its own worker thread. It could speed up indexing by up to 4 times (depending on the particular project). You can easily enable it using the -workers=<number> flag. Note that the number of available CPU cores strictly limits the usage of worker threads. Read more here.

You should also adjust and play around with the various arguments that control how SubQuery uses a store to improve indexing performance by making expensive database operations in bulk. In particular, you can review --store-cache-threshold, --store-get-cache-size, --store-cache-async, and --store-flush-interval - read more about these settings in our references.

DDOS Mitigation

SubQuery runs well behind an API gateway or a DDOS mitigation service. For any public project that is run in a production configuration, setting up a gateway, web application firewall, or some other protected endpoint is recommended.

Request Caching

Although @subql/node does not natively provide any default request level caching, one of the easiest ways to increase performance when the number of users hitting your SubQuery project increases is by adding a cache in front of the GraphQL endpoint with a basic TTL of a few seconds (depending on how stale you want to allow your data). Most cloud providers offer simple to setup and manage caching solutions (e.g. Redis) that will work well with the GraphQL api endpoints that we provide. If you're worried about stale data affecting your user's experience, by leveraging GraphQl subscriptions you can ensure that the most recent data is never affected by the cache while older, slower data is mostly from the cache. Additionally, consider different TTLs for each different entity.

Database Configuration

In our own managed service, we've been able to run a number of SubQuery projects in the same Postgres database - you do not need to run each project in a different database for sufficient performance. When the I/O on the database becomes a problem, the simplest solution is to first consider if any more indexes can be added to your project.

The next step our team will usually carry out is split the database into a read-write replica architecture. One database instance is the writer (that the @subql/node service connects to), while the other is the reader (that the @subql/query service connects to). We will do this before splitting up projects into different databases as it generally makes a huge improvement to database I/O.

Run Multiple Query Services

SubQuery is designed so that you can run multiple query services behind a load balancer for redundancy and performance. Just note that unless you have multiple read replicas of the database, you're performance will quickly become db constrained.

Restrict Query Complexity

GraphQL is extremely powerful, but one of the downsides is that it allows users to make somewhat unrestricted calls that result in complex SQL that can really affect performance for other users. We provide two controls for this:

  • --query-complexity is a flag that controls the level of query complexity that this service will accept expressed as a positive integer, read more here.
  • --query-timeout is a flag that will restrict the time each query will be allowed to run for, read more here.
  • --max-connection is a flag that will restrict the number of simultaneous connections to the query endpoint, read more here.
  • --query-limit is a flag that allows you to limit the number of results returned by any query and enforce pagination, read more here.
  • --unsafe is a flag that enables some advanced features like GraphQL aggregations, these may have performance impacts, read more here