Skip to content

Implemented eth event indexing

Thomas Binétruy-Pic requested to merge feat/eth-event-based-indexing into paper

Unlike on Tezos, indexing of cross-contract calls is essentially not possible without implementing a custom eth virtual machine tracking contract call op-codes. Which is problematic since in the general case, contracts can be called from other contracts. As an example, for a token contract, normalizing user balances requires indexing cross-contract calls since in the case of a crowdsale/dex/whatever contract making a token transfer via a cross-contract call, without indexing the internal call, the true balance cannot be indexed. For this reason, eth indexers are event based. That is, once a contract C calls a contract D, that call to contract D cannot be retrieved from block data, but events emitted form contract D can.

Moreover, since public nodes are usually not stable enough to be queried continuously (personal experience, though no empirical measurements have been made), it is customary to go through providers such as Alchemy or Moralis, which have very efficient node with very generous free plans. Taking the example of Alchemy, on gets 300_000_000 « compute units (CU) » / month for free [1], where each RPC endpoint call costs some CU [2]. Thus, it is important to keep in mind these costs when implementing the EVM indexer.

For the sake of comparing how block data is retrieved between blockchains, on Tezos, querying block information retrieves all cross-contract calls, their events, and the transaction status (did it throw and rollback?). However, on EVM-chains, querying block data is more complex and indexing a block requires an extra call to eth_getTransactionReceipt for each transaction on top of the block eth_getBlockByNumber call. For a chain such as Polygon which has a block time of 1s, this gets really expensive in terms of CU very quick. alchemy_getTransactionReceipts could be used, but this would induce a dependency on Alchemy which is a no-go for DjWebDapp.

However, it is possible on EVM chains to query the node with eth_getFilterLogs for events emitted across multiple blocks for a subset of contracts to index. Thus, the strategy goes as follows:

  1. at index_init, we query for all events for all contracts to index between the last indexed block and the chain's head and store them in a provider attribute.
  2. at index_level, we filter for the events queried at index_init for the indexed level. For each event emitted at that block level, we index it along with the original transaction and the contract that emitted it. Indeed, a contract C, not indexed, calling a contract D at transaction T, indexed, emitting an event E, will require indexing the event E with a relation to the transaction T (which will need to be indexed although contract C is not) and D (for normalization of contract D that can only occur via events).
  3. we still need to index spooled transactions (contract origination and calls) to ensure that they were included on chain. To save the eth_getBlockByNumber at each block, we only query it if either spooled transactions are awaiting confirmation or an event was emitted from an indexed contract at the current level being indexed.
  • Integration test
  • Unit test
  • Documentation

[1] https://www.alchemy.com/pricing [2] https://docs.alchemy.com/reference/compute-unit-costs#standard-evm-json-rpc-methods-ethereum-polygon-pos-polygon-zkevm-optimism-arbitrum-astar

Edited by Thomas Binétruy-Pic

Merge request reports