Connection Pools and Connection Parameters
This document describes how to configure connection pools and connection parameters when you use a driver or ORM framework to connect to TiDB.
If you are interested in more tips about Java application development, see Best Practices for Developing Java Applications with TiDB
If you are interested in more tips about Java application development, see Best Practices for Developing Java Applications with TiDB
Connection pool
Building TiDB (MySQL) connections is relatively expensive (for OLTP scenarios at least). Because in addition to building a TCP connection, connection authentication is also required. Therefore, the client usually saves the TiDB (MySQL) connections to the connection pool for reuse.
Java has many connection pool implementations such as HikariCP, tomcat-jdbc, druid, c3p0, and dbcp. TiDB does not limit which connection pool you use, so you can choose whichever you like for your application.
Configure the number of connections
It is a common practice that the connection pool size is well adjusted according to the application's own needs. Take HikariCP as an example:
- maximumPoolSize: The maximum number of connections in the connection pool. If this value is too large, TiDB consumes resources to maintain useless connections. If this value is too small, the application gets slow connections. Therefore, you need to configure this value according to your application characteristics. For details, see About Pool Sizing.
- minimumIdle: The minimum number of idle connections in the connection pool. It is mainly used to reserve some connections to respond to sudden requests when the application is idle. You also need to configure it according to your application characteristics.
The application needs to return the connection after finishing using it. It is recommended that the application uses the corresponding connection pool monitoring (such as metricRegistry) to locate connection pool issues in time.
Probe configuration
The connection pool maintains persistent connections to TiDB. TiDB does not proactively close client connections by default (unless an error is reported), but generally, there are also network proxies such as LVS or HAProxy between the client and TiDB. Usually, these proxies proactively clean up connections that are idle for a certain period. In addition to paying attention to the idle configuration of the proxies, the connection pool also needs to keep alive or probe connections.
If you often see the following error in your Java application:
The last packet sent successfully to the server was 3600000 milliseconds ago. The driver has not received any packets from the server. com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
If n
in n milliseconds ago
is 0
or a very small value, it is usually because the executed SQL operation causes TiDB to exit abnormally. To find the cause, it is recommended to check the TiDB stderr log.
If n
is a very large value (such as 3600000
in the above example), it is likely that this connection was idle for a long time and then closed by the proxy. The usual solution is to increase the value of the proxy's idle configuration and allow the connection pool to:
- Check whether the connection is available before using the connection every time.
- Regularly check whether the connection is available using a separate thread.
- Send a test query regularly to keep alive connections.
Different connection pool implementations might support one or more of the above methods. You can check your connection pool documentation to find the corresponding configuration.
Formulas based on experience
According to the About Pool Sizing article of HikariCP, if you have no idea about how to set a proper size for the database connection pool, you can get started with a formula based on experience. Then, based on the performance result of the pool size calculated from the formula, you can further adjust the size to achieve the best performance.
The formula based on experience is as follows:
connections = ((core_count * 2) + effective_spindle_count)
The description of each parameter in the formula is as follows:
- connections: the size of connections obtained.
- core_count: the number of CPU cores.
- effective_spindle_count: the number of hard drives (not SSD). Because each spinning hard disk can be called a spindle. For example, if you are using a server with a RAID of 16 disks, the effective_spindle_count should be 16. Because HDD usually can handle only one request at a time, the formula here is actually measuring how many concurrent I/O requests your server can manage.
In particular, pay attention to the following note below the formula.
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((core_count * 2) + effective_spindle_count). Core count should not include HT threads, even if hyperthreading is enabled. Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls. ... There hasn't been any analysis so far regarding how well the formula works with SSDs.
This note indicates that:
- core_count is the number of physical cores, regardless of whether you enable Hyper-Threading or not.
- When data is fully cached, you need to set effective_spindle_count to
0
. As the hit rate of cache decreases, the count is closer to the actual number ofHDD
. - Whether the formula works for SSD is not tested and unknown.
When using SSDs, it is recommended that you use the following formula based on experience instead:
connections = (number of cores * 4)
Therefore, you can set the maximum connection size of the initial connection pool to cores * 4
in the case of SSDs and further adjust the size to tune the performance.
Tuning direction
As you can see, a size calculated from the formulas based on experience is just a recommended base value. To get the optimal size on a specific machine, you need to try other values around the base value and test the performance.
Here are some basic rules to help you get the optimal size:
- If your network or storage latency is high, increase your maximum number of connections to reduce the latency waiting time. Once a thread is blocked by latency, other threads can take over and continue processing.
- If you have multiple services deployed on your server and each service has a separate connection pool, consider the sum of the maximum number of connections to all connection pools.
Connection parameters
Java applications can be encapsulated with various frameworks. In most of the frameworks, JDBC API is called on the bottommost level to interact with the database server. For JDBC, it is recommended that you focus on the following things:
- JDBC API usage choice
- API Implementer's parameter configuration
JDBC API
For JDBC API usage, see JDBC official tutorial. This section covers the usage of several important APIs.
Use Prepare API
For OLTP (Online Transactional Processing) scenarios, the SQL statements sent by the program to the database are several types that can be exhausted after removing parameter changes. Therefore, it is recommended to use Prepared Statements instead of regular execution from a text file and reuse Prepared Statements to execute directly. This avoids the overhead of repeatedly parsing and generating SQL execution plans in TiDB.
At present, most upper-level frameworks call the Prepare API for SQL execution. If you use the JDBC API directly for development, pay attention to choosing the Prepare API.
In addition, with the default implementation of MySQL Connector/J, only client-side statements are preprocessed, and the statements are sent to the server in a text file after ?
is replaced on the client. Therefore, in addition to using the Prepare API, you also need to configure useServerPrepStmts = true
in JDBC connection parameters before you perform statement preprocessing on the TiDB server. For detailed parameter configuration, see MySQL JDBC parameters.
Use Batch API
For batch inserts, you can use the addBatch
/executeBatch
API. The addBatch()
method is used to cache multiple SQL statements first on the client, and then send them to the database server together when calling the executeBatch
method.
Use StreamingResult
to get the execution result
In most scenarios, to improve execution efficiency, JDBC obtains query results in advance and saves them in client memory by default. But when the query returns a super large result set, the client often wants the database server to reduce the number of records returned at a time and waits until the client's memory is ready and it requests for the next batch.
Usually, there are two kinds of processing methods in JDBC:
Set FetchSize to
Integer.MIN_VALUE
to ensure that the client does not cache. The client will read the execution result from the network connection throughStreamingResult
.When the client uses the streaming read method, it needs to finish reading or close
resultset
before continuing to use the statement to make a query. Otherwise, the errorNo statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
is returned.To avoid such an error in queries before the client finishes reading or closes
resultset
, you can add theclobberStreamingResults=true
parameter in the URL. Then,resultset
is automatically closed but the result set to be read in the previous streaming query is lost.To use Cursor Fetch, first set
FetchSize
as a positive integer and configureuseCursorFetch=true
in the JDBC URL.
TiDB supports both methods, but it is preferred that you use the first method, because it is a simpler implementation and has a better execution efficiency.
MySQL JDBC parameters
JDBC usually provides implementation-related configurations in the form of JDBC URL parameters. This section introduces MySQL Connector/J's parameter configurations (If you use MariaDB, see MariaDB's parameter configurations). Because this document cannot cover all configuration items, it mainly focuses on several parameters that might affect performance.
Prepare-related parameters
This section introduces parameters related to Prepare
.
useServerPrepStmts
useServerPrepStmts is set to
false
by default, that is, even if you use the Prepare API, the "prepare" operation will be done only on the client. To avoid the parsing overhead of the server, if the same SQL statement uses the Prepare API multiple times, it is recommended to set this configuration totrue
.To verify that this setting already takes effect, you can do:
- Go to TiDB monitoring dashboard and view the request command type through Query Summary > CPS By Instance.
- If
COM_QUERY
is replaced byCOM_STMT_EXECUTE
orCOM_STMT_PREPARE
in the request, it means this setting already takes effect.
cachePrepStmts
Although
useServerPrepStmts=true
allows the server to execute Prepared Statements, by default, the client closes the Prepared Statements after each execution and does not reuse them. This means that the "prepare" operation is not even as efficient as text file execution. To solve this, it is recommended that after settinguseServerPrepStmts=true
, you should also configurecachePrepStmts=true
. This allows the client to cache Prepared Statements.To verify that this setting already takes effect, you can do:
Go to TiDB monitoring dashboard and view the request command type through Query Summary > CPS By Instance.
If the number of
COM_STMT_EXECUTE
in the request is far more than the number ofCOM_STMT_PREPARE
, it means this setting already takes effect.In addition, configuring
useConfigs=maxPerformance
will configure multiple parameters at the same time, includingcachePrepStmts=true
.
prepStmtCacheSqlLimit
After configuring
cachePrepStmts
, also pay attention to theprepStmtCacheSqlLimit
configuration (the default value is256
). This configuration controls the maximum length of the Prepared Statements cached on the client.The Prepared Statements that exceed this maximum length will not be cached, so they cannot be reused. In this case, you may consider increasing the value of this configuration depending on the actual SQL length of the application.
You need to check whether this setting is too small if you:
- Go to TiDB monitoring dashboard and view the request command type through Query Summary > CPS By Instance.
- And find that
cachePrepStmts=true
has been configured, butCOM_STMT_PREPARE
is still mostly equal toCOM_STMT_EXECUTE
andCOM_STMT_CLOSE
exists.
prepStmtCacheSize
prepStmtCacheSize controls the number of cached Prepared Statements (the default value is
25
). If your application requires "preparing" many types of SQL statements and wants to reuse Prepared Statements, you can increase this value.To verify that this setting already takes effect, you can do:
- Go to TiDB monitoring dashboard and view the request command type through Query Summary > CPS By Instance.
- If the number of
COM_STMT_EXECUTE
in the request is far more than the number ofCOM_STMT_PREPARE
, it means this setting already takes effect.
Batch-related parameters
While processing batch writes, it is recommended to configure rewriteBatchedStatements=true
. After using addBatch()
or executeBatch()
, JDBC still sends SQL one by one by default, for example:
pstmt = prepare("INSERT INTO `t` (a) values(?)");
pstmt.setInt(1, 10);
pstmt.addBatch();
pstmt.setInt(1, 11);
pstmt.addBatch();
pstmt.setInt(1, 12);
pstmt.executeBatch();
Although Batch
methods are used, the SQL statements sent to TiDB are still individual INSERT
statements:
INSERT INTO `t` (`a`) VALUES(10);
INSERT INTO `t` (`a`) VALUES(11);
INSERT INTO `t` (`a`) VALUES(12);
But if you set rewriteBatchedStatements=true
, the SQL statements sent to TiDB will be a single INSERT
statement:
INSERT INTO `t` (`a`) values(10),(11),(12);
Note that the rewrite of the INSERT
statements is to concatenate the values after multiple "values" keywords into a whole SQL statement. If the INSERT
statements have other differences, they cannot be rewritten, for example:
INSERT INTO `t` (`a`) VALUES (10) ON DUPLICATE KEY UPDATE `a` = 10;
INSERT INTO `t` (`a`) VALUES (11) ON DUPLICATE KEY UPDATE `a` = 11;
INSERT INTO `t` (`a`) VALUES (12) ON DUPLICATE KEY UPDATE `a` = 12;
The above INSERT
statements cannot be rewritten into one statement. But if you change the three statements into the following ones:
INSERT INTO `t` (`a`) VALUES (10) ON DUPLICATE KEY UPDATE `a` = VALUES(`a`);
INSERT INTO `t` (`a`) VALUES (11) ON DUPLICATE KEY UPDATE `a` = VALUES(`a`);
INSERT INTO `t` (`a`) VALUES (12) ON DUPLICATE KEY UPDATE `a` = VALUES(`a`);
Then they meet the rewrite requirement. The above INSERT
statements will be rewritten into the following one statement:
INSERT INTO `t` (`a`) VALUES (10), (11), (12) ON DUPLICATE KEY UPDATE a = VALUES(`a`);
If there are three or more updates during the batch update, the SQL statements will be rewritten and sent as multiple queries. This effectively reduces the client-to-server request overhead, but the side effect is that a larger SQL statement is generated. For example:
UPDATE `t` SET `a` = 10 WHERE `id` = 1; UPDATE `t` SET `a` = 11 WHERE `id` = 2; UPDATE `t` SET `a` = 12 WHERE `id` = 3;
In addition, because of a client bug, if you want to configure rewriteBatchedStatements=true
and useServerPrepStmts=true
during batch update, it is recommended that you also configure the allowMultiQueries=true
parameter to avoid this bug.
Integrate parameters
Through monitoring, you might notice that although the application only performs INSERT
operations to the TiDB cluster, there are a lot of redundant SELECT
statements. Usually this happens because JDBC sends some SQL statements to query the settings, for example, select @@session.transaction_read_only
. These SQL statements are useless for TiDB, so it is recommended that you configure useConfigs=maxPerformance
to avoid extra overhead.
useConfigs=maxPerformance
includes a group of configurations. To get the detailed configurations in MySQL Connector/J 8.0 and those in MySQL Connector/J 5.1, see mysql-connector-j 8.0 and mysql-connector-j 5.1 respectively.
After it is configured, you can check the monitoring to see a decreased number of SELECT
statements.
Timeout-related parameters
TiDB provides two MySQL-compatible parameters to control the timeout: wait_timeout
and max_execution_time
. These two parameters respectively control the connection idle timeout with the Java application and the timeout of the SQL execution in the connection; that is to say, these parameters control the longest idle time and the longest busy time for the connection between TiDB and the Java application. Since TiDB v5.4, the default value of wait_timeout
is 28800
seconds, which is 8 hours. For TiDB versions earlier than v5.4, the default value is 0
, which means the timeout is unlimited. The default value of max_execution_time
is 0
, which means the maximum execution time of a SQL statement is unlimited.
However, in an actual production environment, idle connections and SQL statements with excessively long execution time negatively affect databases and applications. To avoid idle connections and SQL statements that are executed for too long, you can configure these two parameters in your application's connection string. For example, set sessionVariables=wait_timeout=3600
(1 hour) and sessionVariables=max_execution_time=300000
(5 minutes).