<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kafka Connect on SafetyWing Runbooks</title><link>https://runbooks.safetywing.dev/runbooks/kafka-connect/</link><description>Recent content in Kafka Connect on SafetyWing Runbooks</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://runbooks.safetywing.dev/runbooks/kafka-connect/index.xml" rel="self" type="application/rss+xml"/><item><title>KafkaConnectFailedTasks</title><link>https://runbooks.safetywing.dev/runbooks/kafka-connect/kafkaconnectfailedtasks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://runbooks.safetywing.dev/runbooks/kafka-connect/kafkaconnectfailedtasks/</guid><description>&lt;h2 id="meaning"&gt;Meaning&lt;a class="anchor" href="#meaning"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One or more Debezium CDC connector tasks have entered the &lt;code&gt;FAILED&lt;/code&gt; state, so change capture for the affected connector is degraded or fully stopped. Fires when:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-promql" data-lang="promql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;kafka_connect_worker_metrics_connector_failed_task_count{namespace&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&amp;#34;&lt;span style="color:#e6db74"&gt;safetywing-&amp;lt;env&amp;gt;-infra&lt;/span&gt;&amp;#34;}&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;for: 10m&lt;/code&gt;, severity &lt;code&gt;page&lt;/code&gt;, tier &lt;code&gt;component&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="impact"&gt;Impact&lt;a class="anchor" href="#impact"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;CDC from MOCO MySQL into Kafka is stalled for the failed connector. Downstream consumers stop receiving database changes: search indices fall behind, derived/mirror tables go stale, and any event-driven flow fed by these topics no longer reflects new writes. Lag grows until the task is recovered.&lt;/p&gt;</description></item><item><title>KafkaConnectNoConnectors</title><link>https://runbooks.safetywing.dev/runbooks/kafka-connect/kafkaconnectnoconnectors/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://runbooks.safetywing.dev/runbooks/kafka-connect/kafkaconnectnoconnectors/</guid><description>&lt;h2 id="meaning"&gt;Meaning&lt;a class="anchor" href="#meaning"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The Kafka Connect cluster is running but reports zero connectors, meaning no Debezium CDC source connector is deployed or running — CDC for the environment may be unconfigured or all connectors were removed. Fires when:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-promql" data-lang="promql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;kafka_connect_worker_metrics_connector_count{namespace&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&amp;#34;&lt;span style="color:#e6db74"&gt;safetywing-&amp;lt;env&amp;gt;-infra&lt;/span&gt;&amp;#34;}&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;for: 15m&lt;/code&gt;, severity &lt;code&gt;ticket&lt;/code&gt;, tier &lt;code&gt;component&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="impact"&gt;Impact&lt;a class="anchor" href="#impact"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;No change capture is happening at all in this environment: no MySQL changes flow from MOCO MySQL into Kafka. Downstream consumers (search indices, mirror/derived tables, event-driven flows) receive nothing new. For a freshly provisioned env this may be expected during bring-up; for an established env it means CDC is silently broken.&lt;/p&gt;</description></item><item><title>KafkaConnectWorkersDown</title><link>https://runbooks.safetywing.dev/runbooks/kafka-connect/kafkaconnectworkersdown/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://runbooks.safetywing.dev/runbooks/kafka-connect/kafkaconnectworkersdown/</guid><description>&lt;h2 id="meaning"&gt;Meaning&lt;a class="anchor" href="#meaning"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Fewer Kafka Connect workers are reporting metrics than the number of replicas the &lt;code&gt;kafka-cdc&lt;/code&gt; chart expects, indicating one or more Connect pods are down, crash-looping, or not scraping. Fires when:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-promql" data-lang="promql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;kafka_connect_worker_metrics_connector_count{namespace&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&amp;#34;&lt;span style="color:#e6db74"&gt;safetywing-&amp;lt;env&amp;gt;-infra&lt;/span&gt;&amp;#34;}&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;connect&lt;span style="color:#960050;background-color:#1e0010"&gt;.&lt;/span&gt;replicas&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;for: 5m&lt;/code&gt;, severity &lt;code&gt;page&lt;/code&gt;, tier &lt;code&gt;component&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="impact"&gt;Impact&lt;a class="anchor" href="#impact"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Reduced Connect capacity and resilience for the CDC pipeline. Tasks owned by the missing worker are rebalanced onto survivors (added load, possible throughput drop and lag); if the cluster is at one replica, CDC from MOCO MySQL into Kafka is fully stopped and downstream consumers (search indices, mirror tables, event flows) stop receiving DB changes.&lt;/p&gt;</description></item></channel></rss>