TiDBTiDB Docs Dash 2024

Jan 9 - Jan 12 (UTC)
Join us to improve docs and win a prize!
Sign InTry Free

Migration Task Precheck

Before using DM to migrate data from upstream to downstream, a precheck helps detect errors in the upstream database configurations and ensures that the migration goes smoothly. This document introduces the DM precheck feature, including its usage scenario, check items, and arguments.

Usage scenario

To run a data migration task smoothly, DM triggers a precheck automatically at the start of the task and returns the check results. DM starts the migration only after the precheck is passed.

To trigger a precheck manually, run the check-task command.

For example:

tiup dmctl check-task ./task.yaml

Descriptions of check items

After a precheck is triggered for a task, DM checks the corresponding items according to your migration mode configuration.

This section lists all the precheck items.

  • If a mandatory check item does not pass, DM returns an error after the check and does not proceed with the migration task. In this case, modify the configurations according to the error message and retry the task after meeting the precheck requirements.

  • If a non-mandatory check item does not pass, DM returns a warning after the check. DM automatically starts a migration task if the check result contains only warnings but no errors.

Common check items

Regardless of the migration mode you choose, the precheck always includes the following common check items:

  • Database version

    • MySQL version > 5.5

    • MariaDB version >= 10.1.2

  • Compatibility of the upstream MySQL table schema

    • Check whether the upstream tables have foreign keys, which are not supported by TiDB. A warning is returned if a foreign key is found in the precheck.

    • Check whether the upstream tables use character sets that are incompatible with TiDB. For more information, see TiDB Supported Character Sets.

    • Check whether the upstream tables have primary key constraints or unique key constraints (introduced from v1.0.7).

Check items for full data migration

For the full data migration mode (task-mode: full), in addition to the common check items, the precheck also includes the following check items:

  • (Mandatory) dump permission of the upstream database

    • SELECT permission on INFORMATION_SCHEMA and dump tables
    • RELOAD permission if consistency=flush
    • LOCK TABLES permission on the dump tables if consistency=flush/lock
  • (Mandatory) Consistency of upstream MySQL multi-instance sharding tables

    • In the pessimistic mode, check whether the table schemas of all sharded tables are consistent in the following items:

      • Number of columns
      • Column name
      • Column order
      • Column type
      • Primary key
      • Unique index
    • In the optimistic mode, check whether the schemas of all sharded tables meet the optimistic compatibility.

    • If a migration task was started successfully by the start-task command, the precheck of this task skips the consistency check.

  • Auto-increment primary key in sharded tables

Check items for physical import

If you set import-mode: "physical" in the task configuration, the following check items are added to ensure that Physical Import runs normally. After following the prompts, if you find it difficult to meet the requirements of these check items, you can try to use the logical import mode to import data.

  • Empty Regions in the downstream database

    • If the number of empty Regions is greater than max(1000, 3 * the number of tables) (the larger of "1000" and "3 times the number of tables"), the precheck returns a warning. You can adjust related PD parameters to speed up the merging of empty Regions and wait for the number of empty Regions to decrease. See PD Scheduling Best Practices - Slow Region Merge.
  • Region distribution in the downstream database

    • Checks the number of Regions on different TiKV nodes. Assuming that the TiKV node with the lowest Region count has a Regions and the TiKV node with the highest Region count has b Regions, if a / b is less than 0.75, the precheck returns a warning. You can adjust related PD parameters to speed up the scheduling of Regions and wait for the number of Regions to change. See PD Scheduling Best Practices - Leader/Region distribution is not balanced.
  • The versions of TiDB, PD, and TiKV in the downstream database

    • Physical import must call the interfaces of TiDB, PD, and TiKV. If the versions are not compatible, the precheck returns an error.
  • The free space of the downstream database

    • Estimates the total sizes of all tables in the allow list in the upstream database (source_size). If the free space of the downstream database is less than source_size, the precheck returns an error. If the free space of the downstream database is less than the number of TiKV replicas * source_size * 2, the precheck returns a warning.
  • Whether the downstream database is running tasks that are incompatible with physical import

    • Currently, physical import is incompatible with TiCDC and PITR tasks. If these tasks are running in the downstream database, the precheck returns an error.

Check items for incremental data migration

For the incremental data migration mode (task-mode: incremental), in addition to the common check items, the precheck also includes the following check items:

  • (Mandatory) Upstream database REPLICATION permission

    • REPLICATION CLIENT permission
    • REPLICATION SLAVE permission
  • Database primary-secondary configuration

    • To avoid primary-secondary replication failures, it is recommended that you specify the database ID server_id for the upstream database (GTID is recommended for non-AWS Aurora environments).
  • (Mandatory) MySQL binlog configuration

    • Check whether binlog is enabled (required by DM).
    • Check whether binlog_format=ROW is configured (DM only supports the migration of binlog in the ROW format).
    • Check whether binlog_row_image=FULL is configured (DM only supports binlog_row_image=FULL).
    • If binlog_do_db or binlog_ignore_db is configured, check whether the database tables to be migrated meet the conditions of binlog_do_db and binlog_ignore_db.
  • (Mandatory) Check if the upstream database is in an Online-DDL process (in which the ghost table is created but the rename phase is not executed yet). If the upstream is in the online-DDL process, the precheck returns an error. In this case, wait until the DDL to complete and retry.

Check items for full and incremental data migration

For the full and incremental data migration mode (task-mode: all), in addition to the common check items, the precheck also includes the full data migration check items and the incremental data migration check items.

Ignorable check items

Prechecks can find potential risks in your environments. It is not recommended to ignore check items. If your data migration task has special needs, you can use the ignore-checking-items configuration item to skip some check items.

Check itemDescription
dump_privilegeChecks the dump privilege of the user in the upstream MySQL instance.
replication_privilegeChecks the replication privilege of the user in the upstream MySQL instance.
versionChecks the version of the upstream database.
server_idChecks whether server_id is configured in the upstream database.
binlog_enableChecks whether binlog is enabled in the upstream database.
table_schemaChecks the compatibility of the table schemas in the upstream MySQL tables.
schema_of_shard_tablesChecks the consistency of the table schemas in the upstream MySQL multi-instance shards.
auto_increment_IDChecks whether the auto-increment primary key conflicts in the upstream MySQL multi-instance shards.
online_ddlChecks whether the upstream is in the process of online-DDL.
empty_regionChecks the number of empty Regions in the downstream database for physical import.
region_distributionChecks the distribution of Regions in the downstream database for physical import.
downstream_versionChecks the versions of TiDB, PD, and TiKV in the downstream database.
free_spaceChecks the free space of the downstream database.
downstream_mutex_featuresChecks whether the downstream database is running tasks that are incompatible with physical import.

Configure precheck arguments

The migration task precheck supports processing in parallel. Even if the number of rows in sharded tables reaches a million level, the precheck can be completed in minutes.

To specify the number of threads for the precheck, you can configure the threads argument of the mydumpers field in the migration task configuration file.

mydumpers: # Configuration arguments of the dump processing unit global: # Configuration name threads: 4 # The number of threads that access the upstream when the dump processing unit performs the precheck and exports data from the upstream database (4 by default) chunk-filesize: 64 # The size of the files generated by the dump processing unit (64 MB by default) extra-args: "--consistency none" # Other arguments of the dump processing unit. You do not need to manually configure table-list in `extra-args`, because it is automatically generated by DM.

Was this page helpful?

Download PDFRequest docs changesAsk questions on Discord
Playground
New
One-stop & interactive experience of TiDB's capabilities WITHOUT registration.
Products
TiDB
TiDB Dedicated
TiDB Serverless
Pricing
Get Demo
Get Started
© 2024 PingCAP. All Rights Reserved.
Privacy Policy.