Skip to content

Normalizer and dependencies

jpic ∞ requested to merge jpic into master

In order to remedy to our previous' implementation lack of performance - it produced a query for each node it crawled through until it found a suitable tx to deploy - this implementation represents the graph in DB in a maner that's easy to query for all the graph at once (actually, the subgraph of transactions to deploy).

This allows for reconstructing the graph using NetworkX, a powerful graph analysis package for Python, and solving for the next transaction to deploy in the dependency graph without making any db queries.

Note that NetworkX is implemented in pure python and performance gains should be possible by switching to a package implemented in a compiled language such as https://graph-tool.skewed.de/. Indeed, the only feature we're using is a basic topological sort, why is implemented in all graph analysis software.

Moreover, we never expect the undeployed transaction sub graphs to ever get so large, and thus a solution such as Apache AGE may be overkill and force djwebdapp users to install that Postgres extension (which may not be possible on managed Postgres such as AWS RDS, altough further research should be done with respect to that).

There is also https://pypi.org/project/django-postgresql-dag/ which could replace the implementation proposed in this commit. It uses CTEs to solve for topological sorts, adds loads to the database. Whereas the proposed implementation in this commit does not, leaving it the the process querying the database. This may prove to be more performant when parallelizing the spooler service as it delocates the graph solving from the Postgres process to one of the spooler processes.

Merge request reports