Automated Historical State Tracking
Automated Historical State Tracking
Background
SubQuery allows you to index any data that you want from Substrate, Avalanche, and other networks. Currently, SubQuery operates as a mutable data store, where you can append, update, delete, or otherwise change existing saved entities in the dataset that is indexed by SubQuery. As SubQuery indexes each block, the state of each entity may be updated or deleted based on your project's logic.
A basic SubQuery project that indexes account balances might have an entity that looks like the following.
type Account @entity {
id: ID! # Alice's account address
balance: BigInt
transfers: [Transfer]
}
In the above example, Alice's DOT balance constantly changes, and as we index the data, the balance
property on the Account
entity will change. A basic SubQuery project that indexes account balances will lose this historical data and will only store the state of the current indexing block height. For example, if we currently index to block 100, the data in the database can only represent the state of Alice's account at block 100.
Then we are faced with a problem. Assuming the data has changed when indexing to block 200, how can we query the state of the data at block 100?
Automated Historical State Tracking
SubQuery now automates the historical state tracking of entities for all new projects. You can automatically query the state of your SubQuery project at any block height/timestamp. This means that you can build applications that allow users to go back in time, or show how the state of your data changes over time.
In short, when you create, update, or delete any SubQuery entity, we store the previous state with the block range/timestamp that it was valid for. You can then query data from a specific block height/timestamp using the same GraphQL endpoints and API.
Enabling This
This feature is enabled by default and will track indexing by block height. If your project does not currently have this enabled you will need to reindex your project.
@subql/node@<5.3.0
If you want to disable this feature for any reason, you can set the --disable-historical=true
parameter on subql-node
.
On startup, the current status of this feature is printed to the console (Historical state is enabled
).
@subql/node@>=5.3.0
Since this release we support historical indexing by timestamp as well as block height. You can choose one or the other.
The timestamp field is stored as a unix timestamp in milliseconds.
To configure this feature you can set the --historical=height
or --historical=timestamp
parameter on subql-node
. The --disable-historical
flag still exists but is deprecated. If you wish to disable historical indexing you can do so with --historical=false
.
On startup, the current status of this feature is printed to the console (Historical state is height
or Historical state is timestamp
).
If you are running your project locally using subql-node
or subql-node-<network>
, make sure you enable the pg_extension btree_gist
You can run the following SQL query:
CREATE EXTENSION IF NOT EXISTS btree_gist;
Querying Historical State
There is a special (optional) property on the GraphQL entity filter called blockHeight
or timestamp
depending on the indexing mode. If you omit this property, SubQuery will query the entity state at the current block height/time.
Please see one of our example projects: RMRK NFT.
To query the owners of RMRK NFTs at block height 5,000,000, add the blockHeight parameter as shown below:
query {
nFTEntities(first: 5, blockHeight: "5000000") {
nodes {
id
name
currentOwner
}
}
}
query {
nFTEntities(first: 5, timestamp: "1575017844000") {
nodes {
id
name
currentOwner
}
}
}
To query the owners of those RMRK NFTs collections at the latest block height/timestamp, omit the blockHeight
/timestamp
parameter as shown below.
query {
nFTEntities(first: 5) {
nodes {
id
name
currentOwner
}
}
}
Reindexing with Historical Data
When you enable Automated Historical State Tracking, you can benefit from on-demand partial reindexing from certain block heights. For example:
- You can subscribe to new events, transactions, or assets in your manifest file, then backtrack to when they were deployed and start reindexing from that block
- You could update your mapping files to add new logic to deal with a runtime change, and then backtrack to the block where the runtime change was deployed.
- Coming Soon: You can update your schema and reindex from a certain block height to reflect those changes
You should see the new -- reindex command in Command Line Flags to learn more about how to use this new feature.
DB Schema
When the Automated Historical State Tracking is enabled, we make some key underlying changes to the DB tables to manage this for you automatically.
The below example shows the table of the Account
entity provided before
type Account @entity {
id: ID! # Alice's account address
balance: BigInt
transfers: [Transfer]
}
id | _id | _block_range | balance |
---|---|---|---|
alice | 0e6a444d-cc33-415b-9bfc-44b5ee64d3f4 | [0,1000) | 5 |
alice | 943c3191-ea96-452c-926e-db31ab5b95c7 | [1000,2000) | 15 |
alice | b43ef216-967f-4192-975c-b14a0c5cef4b | [2000,) | 25 |
bob | 4876a354-bd75-4370-9621-24ce1a5b9606 | [0,) | 15 |
charlie | 6e319240-ef14-4fd9-86e9-c788ff5de152 | [1000,) | 100 |
... | ... | ... | ... |
When the historical feature is enabled, the id
field is no longer used as primary key for the database table, instead we automatically generate an unique GUID key _id
for this row within the DB table.
The _block_range
indicates the start to end block for this record using Postgres' range type. For example, between block 0 to 999, alice
's balance
is 5. Then from block 1000 to 1999, alice
's balance
is 15.
_id
and _block_range
are not visible to end users via the query service (GraphQL API), they are internal datatypes automatically generated and handled by the query service.
Info
When historical indexing uses the timestamp
mode. The DB schema is the same but the _block_range
column will be timestamps rather than block numbers.