RUNNING LONG AND COMPLEX PROCESSES WITH POSTGIS
Spatial analysis or geometric processes can be long and heavy, since volumes of data are often huge and algorithms may be very complex.
This presentation shows a way to implement such long processes using PostGIS. The specific issue concerns the update of a country-wide road network reference data set on which corporate data have been created. The process uses a topological graph pairing approach. The full run on a production server currently last three whole weeks.
After a quick explanation of the problem itself, we show how this kind of heavy processes can be achieved thanks to PostGIS and Python, and explain the issues we had to tackle.
Dealing with huge volumes of data is what PostgreSQL and PostGIS are made for. PostGIS GEOS backend provide advanced functionalities to process geometries. Python is used as a process management tool, for input and output automation and to provide a user command line interface.
This particular use of PostGIS follows the ELT principle (Extract, Load, Transform), as opposed to ETL (Extract, Transform, Load). Hence all geometric processing takes place inside the database.
Volumes of data are important, as almost all French road network is processed. This implies having adapted hardware, and fine-tuning PostgreSQL configuration. Managing indexes carefully also becomes one of the main performance factor.
Specific problems were faced, raised by process duration, volume of data and complexity of treatment. Solutions or workaround have been found to tackle them. Memory management in PostGIS and GEOS led to unexpected behaviours. The amount of data triggered a lot of corner case and robustness issues in various algorithms. Dealing with points of recovery, monitoring and controlling hence became crucial. Particularly because tests and debugging are difficult since it has to be done on production conditions, which is time consuming.
Vincent Picavet - Oslandia