Skip to content

introduced txgraph model

Thomas Binétruy-Pic requested to merge txgraph_model into master

In order to remedy to our previous' implementation lack of performance - it produced a query for each node it crawled through until it found a suitable tx to deploy - this implementation represents the graph in DB in a maner that's easy to query for all the graph at once (actually, the subgraph of transactions to deploy).

This allows for reconstructing the graph using NetworkX, a powerful graph analysis package for Python, and solving for the next transaction to deploy in the dependency graph without making any recursive db queries.

Note that NetworkX is implemented in pure python and performance gains should be possible by switching to a package implemented in a compiled language such as https://graph-tool.skewed.de/. Indeed, the only feature we're using is a basic topological sort, why is implemented in all graph analysis software.

Moreover, we never expect the undeployed transaction sub graphs to ever get so large, and thus a solution such as Apache AGE may be overkill and force djwebdapp users to install that Postgres extension (which may not be possible on managed Postgres such as AWS RDS, altough further research should be done with respect to that).

There is also https://pypi.org/project/django-postgresql-dag/ which could replace the implementation proposed in this commit. It uses CTEs to solve for topological sorts, which adds loads to the database. Whereas the proposed implementation in this commit does not, leaving it to the process querying the database. This may prove to be more performant when parallelizing the spooler service as it delocates the graph solving from the Postgres process to one of the spooler processes.

For performance reasons, we also prevent networks from having too many deploy transactions in the m2m field preventing the topological sort from taking too much computing power. Indeed, we need to keep in mind that the spooler parallelizes transactions and so that a topological sort needs to be done for each of the parallelized transactions.

Setting all transactions in the ascendancy of an aborted transactions should also be possible by doing some pathfinding using NetworkX or some other package.

Edited by Thomas Binétruy-Pic

Merge request reports