Skip Navigation
Expand
Maximum Database Connections
Answer ID 10277   |   Last Review Date 04/28/2022

How do I investigate for the root cause of database (db) connection errors (max_user_connections/drizzle state read:lost connection to server/unable to configure connection) logged to a site/interface error log?

Environment:

Oracle B2C Service

Issue:

The presence of database (db) connection errors (max_user_connections/drizzle state read:lost connection to server/unable to configure connection) logged to a site/interface error log is an indication that the configured limit of concurrent db connections used by an Oracle B2C Service site/instance has been reached. With no remaining connections from the available pool to fulfill requests, this condition can cause site-wide errors due to all aspects of the site being unable to connect to the database. Investigating for what is the root cause of all of the trouble can be a complex task, and understanding the common causes and resolution of such problems, as well as how to work with the CX Service Support team while investigating, can be very helpful.

Resolution:

Database connection errors will occur when product-level safeguards protecting against overuse of the db resource are insufficient to appropriately limit the number of concurrent db connections in use.

What safeguards are designed into the product to avoid performance degradation and db connection problems?

These safeguards include the following (including the max_user_connections configuration itself):

  • The Oracle B2C Service service is a shared resource, and sites are initially provisioned to handle expected capacity. Capacity constraints can be adjusted by our GNC team according to changing capacity needs.
  • Sites are protected by the hidden configuration (max_user_connections) used to indicate the maximum number of concurrent db connections allowable by a single site/instance. This number is 150 for all sites without exception.
  • Maximum join configurations limit queries made on the site/instance db (limits queries to scan a configured number of db rows under specific and identified circumstances)
  • ROQL blocklisting functionality is used in conjunction with maximum join configurations to limit queries made on the site/instance db (blocks/blocklists or diverts ROQL queries to replication)
  • Product feature-level timeouts limit the amount of time a db connection can be held by a particular service request (SOAP API for example)

When are these safeguards insufficient?

The safeguards put in place are sometimes not enough to protect a site from hitting the max_user_connections configured limit. It can be helpful to consider that Oracle B2C Service instances/sites can process a massive amount of data and transactions in a given time. Further, it happens to be that often the cause of db connection problems are less a problem with overall site/instance capacity utilization but rather the result of a single aspect of the site configuration/customization that is able to hold all 150 of the available connections alone under certain circumstances.

How should I go about investigating db connection errors?

The first thing to do when investigating db connection issues is to confirm the error using the error log of the site/interface, or otherwise identifying such an error in a server-side trace file (by Oracle resources with access). Be aware that if one interface is showing the error, then all interfaces receiving activity at that same time are likely also receiving the same type of error showing up across all uses of the product db (for the site/instance). For this reason it is helpful to be aware that most, if not all, of the errors logged (indicating db connection problems) will be specific to the symptom of the problem rather than the cause. For this reason it is helpful to understand that some product areas that more likely to be associated with such a problem.

To investigate max_user_connections errors we want to evaluate the following:

  1. Is there anything specific configured/developed on the site that will wait on another (external) system while holding a db connection – where concurrent db connections can stack up when there is increased latency experienced in system calls? For example, all server-side Customer Portal processing hold a database connection for the duration of the server-side process. Thus, if a process (or multiple as would likely be the case) are waiting on a response from an integration call made from PHP, this is going to be very likely to bring a site down repeatedly, until the implementation is re-done. Moving the integration call to the client side, by using JavaScript to make the call rather than PHP, is one way to do so.
 
  1. Is customization/configuration of the site following prescribed best practices for public API usage? For further details see

Answer ID 8264: Optimal efficiency when creating or updating objects using CWS API

Answer ID 8887: Optimal efficiency when destroying objects using CWS API

Answer ID 9725: Avoiding issues with cascading delete

Answer ID 7972: Better to do more work in each request rather than sending more requests

Answer ID 7970: Server-side timeouts associated with the Connect Web Services for SOAP (CWS) API

 
  1. Are there analytics reports being run by too many users at one time – such as with complex reports (complex enough to require relatively significant run time but not scanning too many rows to encounter max join limitations) used frequently enough in specific periods of time? This can involve reports run in CP that become a problem when CP is under relatively high load. This can also happen with reports run in the agent console, including agent dashboard reports if all agents login about the same time, for example. Another scenario where reports can cause max join errors is when reports are set to refresh frequently (such as every 5 seconds).
 
  1. Is the database server experiencing performance degradation? This is a concern because slower response times mean that db connections are held for a longer period of time to do the same amount of work – and thus can tip the scales in terms of concurrently used db connections. If performance degradation is observed and there are no known site changes and/or volume increase on the site itself there is a possibility that another customer on the same db server is causing the degradation. In such cases it is advised to create a Service Request with the CX Service Support team with a request that the cause of the performance degradation be determined. For further details see
 
   

What other considerations should be made when investigating max_user_connections errors?

Appropriate optimization of the db will generally cause queries to perform better, thus reducing the time that processes hold onto a db connection for a specific action involving the running of said queries. For further details on db optimization see

Answer ID 8702: Working with Technical Support to analyze and implement database indexes

Answer ID 4494: When to index a custom field

Answer ID 10440: Data management is crucial to maximize Oracle B2C Service success

Finally, there are rare cases where the investigation of db connection errors are found to involve a potential product defect or enhancement. Investigations of this nature can lead to improvements to the performance and/or scalability of the product., and can include such things as enhanced caching to fulfill requests of a certain type etc.. This possibility is an additional reason to engage CX Service Support through a Service Request if database connection issues are causing problems for a site.