So you start building an app and you need to store some data. Obviously you would use a database for storing that data. As the application becomes more successful you notice the DB load becomes a concern, in some cases the number of connections opened by the software can be way too much. When this happen the first thing we think in introducing is a connection pool, and if you are working with Postgres likely that connection pool would be pgbouncer.
Pgbouncer has basically three pool modes:
- Session
- Transactional
- Statement
This defines when the connection is returned back to the pool. Obviously the decision you make has implications, so for instance if you use statement the connection would be returned after the statement is completed, while if you choose transactional it would be returned after each transaction.
It’s a common practice in web oriented applications to tie the transaction to the request lifecycle, this is the transaction is started when the request is started and committed (or rolled back) when it finish.
So we have a connection pool (pgbouncer) with some number of connections allowed configured using transaction pool mode and you have the transaction tie to the request lifecycle.
In the requests maybe you are doing many things, not just dealing with the Postgres DBs, perhaps you have other things like Redis, ElasticSearch, etc. that you need to deal with too.
Meanwhile your transaction is still opened and therefore not release back to the pool.
Let’s imagine for a sec that those other systems become slow, perhaps you are trying to acquire a lock from Redis or indexing some entries in ES that is undergoing heavy load, and your transaction is still opened and therefore not returned to the pool.
Under the right circumstances this can create contention at the pgbouncer level, this is all the available connections are being used so the new requests are waiting for the connections to be freed and released to the pool so they can do their stuff. Meanwhile the clock is ticking, the user is waiting for the response, the responses become slow at the API level and basically nothing works. Your user is already trying to find an alternative to your service because your service clearly sucks and a some kids could have done a better job using Scratch.
Pretty bad eh!
The good news is that it’s easy to fix the problem, just commit!. Try to isolated the parts of the code that deal with the DB from the parts that work in other systems and ensure you commit before jumping from one to the other. It’s literally that simple. Of course depending on the domain and the nature of the business it might not be so simple.