Troubleshoot Debezium MySQL connector errors

We’ve already talked about debezium in a previous article, so you can check those if you would like to set up a connector or learn more about it. I am assuming that if you are reading this, you might be well aware of debezium, and you should already have your feet deep in it looking at how to make it work! 🥲

Common errors

Common errors encountered while using the MySQL Debezium connector, some resolution and error messages may apply to the other connector’s type.

1. Debezium configuration

When using a MySQL connector, you will need to use a user with quite some privileges on the MySQL database for your debezium connector. And depending on the MySQL database type and where it’s hosted, those privileges may differ slightly, but enough to push a working connector into failure.

GRANT RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium' 

That’s the recommendation from debezium, but if it’s with AWS Amazon Aurora, you will also need:

GRANT LOCK TABLES ON database.* TO debezium

You can check the grants in the mysql prompt:

mysql> show grants for debezium;
+--------------------------------------------------------------------------+
| Grants for debezium@%                                                    |
+--------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'debezium'@'%'                                     |
| GRANT LOCK TABLES ON database.* TO 'debezium'@'%'                        |
| GRANT SELECT, INSERT, UPDATE, DELETE ON database.* TO 'debezium'@'%'     |
+--------------------------------------------------------------------------+

And with that, the debezium user should be set up, to be used in the connector’s configuration. It should look like that:

database.username="debezium"
database.password="********"

2. SQL Database configuration issues

This one should be pretty self-explanatory, since the error message gives you the solution:

[ERROR] io.debezium.DebeziumException: The MySQL server is not configured to use a ROW binlog_format, which is required for this connector to work properly. 
Change the MySQL configuration to use a binlog_format=ROW and restart the connector.

This is not a connector issue, just some sql configuration to set, you can check it up using this command:

mysql> show global variables like 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+

You can use the SET keyword to update its value, if it doesn’t match ROW in your case.

3. Connector Configuration issues

Debezium’s configuration has evolved considerably between the first release to the more current ones. Some help online might be using an older version’s value, which prove to be ignored by the connector.

For the database settings for MySQL is the timezone which changed:

# Old notation
database.serverTimezone="UTC"
# Newer notation
database.connectionTimeZone="UTC"

For these, if you use the wrong one, you should see an error message which makes it easier to troubleshoot. However, for the history topic’s configuration, it may just get ignored and won’t work or connect without more information, which opens the lid for many root causes:

# Old notation from 1.9
database.history.internal.kafka.topic
# New notation from v2
schema.history.internal.kafka.topic

You can find the documentation for your debezium version.

4. Kafka Configuration issues

The history topic, you will need a specific Kafka configuration for your connector to run smoothly.

If not configured well, you encounter one of those error messages upon triggering one ah-hoc snapshot:

The db history topic is missing

[WARN] Database schema history was not found but was expected
[ERROR] Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted
[ERROR] The db history topic is missing. You may attempt to recover it by reconfiguring the connector to SCHEMA_ONLY_RECOVERY

The db history topic or its content is fully or partially missing

[ERROR] The db history topic or its content is fully or partially missing. Please check database schema history topic configuration and re-execute the snapshot.

Those are generated by the MySqlConnectorTask and are related to the history topic. Sometimes setting the snapshot.mode to schema_only_recovery may help restore the schema if lost, but that’s not a long-term solution.

The recommendations for the configuration of the Debezium history topics is to have infinite retention:

Infinite (or very long) retention but no compaction!
- With retention.ms: -1 and retention.bytes: -1
Replication factor at least 3 for production
- With replication-factor: 1
Single partition
With partitions: 1

Infinite seems scary, but the connector doesn’t send that many messages in the history topic. The history topic is used to guarantee that the connector has a lifetime record of all the schema changes applicable to the table of the source database it’s capturing changes.

Depending on what you use with kafka, updating the topic may be different, but it should show this in the end:

       Name       | Value | Read-Only
------------------+-------+------------
  retention.bytes |    -1 | false
  retention.ms    |    -1 | false

A database’s schema can change, the connector needs to be able to identify what the table’s schema was when we get a change event. To process old events, the connector uses the state of the schema stored in the history topic to correctly process those changes. When the connector restarts, it begins processing changes from it left off, and will use the history topic to rebuild the table structure in-memory.

5. Inactive connector idling

If the Debezium MySQL connector stops for too long, the MySQL server purges older binlog files and the connector’s last position may be lost. When the connector is restarted, the MySQL server no longer has the starting point and the connector performs another initial snapshot. If the snapshot is disabled, the connector fails with an error:

[ERROR] Connector requires binlog file 'mysql-bin.001134', but MySQL only has mysql-bin.001255, mysql-bin.001256
[ERROR] The connector is trying to read binlog starting at binlog file 'mysql-bin.001134', pos=1562348, 
skipping 2550 events plus 95 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.

Here it’s trying to read the binlog 1134, but only the last available is 1255, so it can’t proceed. Follow the error message and use the snapshot mode: "snapshot.mode": "when_needed". This should help the connector restart by having it create a new snapshot of the database to start the process again.

If it doesn’t work or loops to another error message, you can follow the fix in this article or create a new connector with the same configuration but a new name like myConnectorV2.

6. Binlog processing error

When the connector stops working as UNASSIGNED or gets stuck not producing any records while still marked as RUNNING. If the connector wasn’t idling like in the previous example, the error might be due to a change of database or host:

[ERROR] Error during binlog processing. Last offset stored = null, binlog reader near position = mysql-bin.000356/15296051
[ERROR] Error during binlog processing. Last offset stored = {transaction_id=null, file=mysql-bin-changelog.000343, pos=0, server_id=1234, event=1}, binlog reader near position = mysql-bin-changelog.000356/15296051

In this case, you can update the server id database.server.id with the server id of the mysql server or a new number different from the previous one. Use the snapshot mode: "snapshot.mode": "when_needed" to create a new snapshot.

If the host change happened while Debezium was reading the binlog, you may find errors like that. Applying the same fix as before should also work.

binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin-changelog.001675' at 6478, 
the last event read from '/rdsdbdata/log/binlog/mysql-bin-changelog.001675' at 125, the last byte read from '/rdsdbdata/log/binlog/mysql-bin-changelog.001675' at 6497. 
Error code: 1236; SQLSTATE: HY000.

If you have direct access to kafka you could try to manually update the offset following the Debezium documentation from the offset.storage.topic which has the binlog offset in it (this is usually created by default). You can also try the snapshot.select.overrides, to create a new snapshot:

{
  "config": {
    "snapshot.select.statement.overrides": "example.table",
    "snapshot.select.statement.overrides.example.table": "SELECT * FROM example.table WHERE created_at > '2023-10-31'"
  }
}

The snapshot override lets you control the size of the snapshot. In the above example, we’ll be creating debezium events for all rows whose creation date is after Halloween 🎃 during the snapshot (instead of all rows).

Debezium doesn’t seem to be built for blue/green deployment, I didn’t see a viable solution to avoid the problem automatically via configuration (like an autoskip for binlog offset?). So if none of the above works, you can create a new connector with the same config but a different name and delete the previous one.

7. Broken Schema

Sometime, the received schema will be broken or won’t be deserialized or processed correctly by the connector. The default behaviour would be to have the connector fail, but you can also ignore those problematic ones. On my case, I had set it to log as warning and proceed, which creates this type of error message:

[WARN] The connector is trying to read binlog starting at 'SourceInfo [currentGtid=null, currentBinlogFilename=mysql-bin-changelog.277838, currentBinlogPosition=594067, currentRowNumber=0, serverId=0, sourceTime=null, threadId=-1, currentQuery=null, tableIds=[], databaseName=null]', but this is no longer available on the server. 
Forcing the snapshot execution as it is allowed by the configuration.

It could be due to some DDL (Data Definition Language, a subset of the SQL language used to define or modify the structure of the database, like DROP, CREATE TABLE, TRUNCATE, …). They may or may not be related to the table the connector is capturing the data from.

Here is the configuration I have:

{
  "config": {
    "errors.log.enable": "true",
    "errors.log.include.messages": "true",
    "event.processing.failure.handling.mode": "warn",
    "inconsistent.schema.handling.mode": "warn",
    "schema.history.internal.skip.unparseable.ddl": "true",
    "schema.history.internal.store.only.captured.tables.ddl": "true",
  }
}

This will enable the logging and warns for any failure, while skipping any problematic DDL. We only look for the ones related to the captured tables to reduce any noise from the database. Since DDL is used to manage the schema of a database, the bigger the database is, the more likely those problems will arise.

8. Broken connector on Kafka

The connector may stop working for no apparent reason, throwing an error showing:

[ERROR] 2023-05-09 17:51:24,934 [task-thread-<connector-name>-0] org.apache.kafka.connect.runtime.WorkerTask doRun -
WorkerSourceTask{id=<connector-name>-0} Task threw an uncaught and unrecoverable exception.
Task is being killed and will not recover until manually restarted

The error is thrown by a kafka connect library, which means the issue is probably within the framework itself. In those cases, you may investigate more than debezium logs but also any suspicious errors at the same time.

It could be that the configuration to connect to Kafka to produce the debezium events is wrong and that it can’t connect to it. Make sure that the configuration is correct for both producer and consumer, including for the history topic as well:

{
  "config": {
    "schema.history.internal.producer.security.protocol": "SASL_SSL",
    "schema.history.internal.producer.sasl.mechanism": "PLAIN",
    "schema.history.internal.producer.sasl.jaas.config": "${sasl_jaas_config}",
    "schema.history.internal.producer.ssl.endpoint.identification.algorithm": "https",
    "schema.history.internal.consumer.security.protocol": "SASL_SSL",
    "schema.history.internal.consumer.sasl.mechanism": "PLAIN",
    "schema.history.internal.consumer.sasl.jaas.config": "${sasl_jaas_config}",
    "schema.history.internal.consumer.ssl.endpoint.identification.algorithm": "https"
  }
}

This is an example using the JAAS (Java Authorization and Authentication Service) the config using the plain text mechanism should look like:

org.apache.kafka.common.security.plain.PlainLoginModule required \
   username="*******" \
   password="*******";

If the configuration is correct, then you may try different configurations.

Last solution: Create a new connector with the same config but a different name

And that concludes it for now, hopefully I didn’t miss the one you are encountering right now. If I did feel free to let some logs in the comments or reach out to the good folks at Debezium for some knowledge. They have a google group, and that’s where you’ll find the best insights from people using it in production.