Alert Catalog#
Every alert evaluated across SafetyWing clusters — 29 custom (component / environment / platform tiers, owned by us) and 133 stock (kube-prometheus-stack defaults). Custom alerts link to the runbook on this site; stock alerts link to the upstream prometheus-operator runbooks.
Generated from the live hetzner rule set + the infra-charts/cluster-monitors sources. Stock alerts are identical across clusters; custom alerts deploy per environment/cluster where the chart is enabled.
SafetyWing custom alerts#
Kafka (component tier)#
| Alert | Severity | Runbook |
|---|
| KafkaOfflinePartitions | page | runbook |
| KafkaNoActiveController | page | runbook |
| KafkaUnderReplicatedPartitions | ticket | runbook |
| KafkaConsumerGroupLagHigh | ticket | runbook |
Kafka Connect (component tier)#
| Alert | Severity | Runbook |
|---|
| KafkaConnectFailedTasks | page | runbook |
| KafkaConnectWorkersDown | page | runbook |
| KafkaConnectNoConnectors | ticket | runbook |
MySQL (component tier)#
| Alert | Severity | Runbook |
|---|
| MysqlInstanceDown | page | runbook |
| MysqlConnectionsSaturated | ticket | runbook |
| MysqlReplicationLagHigh | ticket | runbook |
| MysqlDiskFillingUp | ticket | runbook |
RabbitMQ (component tier)#
| Alert | Severity | Runbook |
|---|
| RabbitmqNodeDown | page | runbook |
| RabbitmqMemoryAlarm | page | runbook |
| RabbitmqDiskAlarm | page | runbook |
| RabbitmqQueueBacklog | ticket | runbook |
| RabbitmqQueueNoConsumers | ticket | runbook |
| Alert | Severity | Runbook |
|---|
| ElasticsearchClusterRed | page | runbook |
| ElasticsearchClusterYellow | ticket | runbook |
| ElasticsearchHeapHigh | ticket | runbook |
| ElasticsearchDiskWatermark | ticket | runbook |
| Alert | Severity | Runbook |
|---|
| NodeFilesystemAlmostFull | ticket | runbook |
| Alert | Severity | Runbook |
|---|
| TraefikDown | page | runbook |
| TraefikHigh5xxRate | ticket | runbook |
Environment (environment tier)#
| Alert | Severity | Runbook |
|---|
| EnvironmentHigh5xxRate | ticket | runbook |
Stock alerts (kube-prometheus-stack)#
Shipped by the kube-prometheus-stack defaultRules. Documented upstream — links go there.
alertmanager.rules#
| Alert | Severity | Runbook |
|---|
| AlertmanagerClusterCrashlooping | critical | runbook |
| AlertmanagerClusterDown | critical | runbook |
| AlertmanagerClusterFailedToSendAlerts | critical | runbook |
| AlertmanagerConfigInconsistent | critical | runbook |
| AlertmanagerFailedReload | critical | runbook |
| AlertmanagerFailedToSendAlerts | warning | runbook |
| AlertmanagerMembersInconsistent | critical | runbook |
config-reloaders#
| Alert | Severity | Runbook |
|---|
| ConfigReloaderSidecarErrors | warning | runbook |
etcd#
| Alert | Severity | Runbook |
|---|
| etcdDatabaseHighFragmentationRatio | warning | runbook |
| etcdDatabaseQuotaLowSpace | critical | — |
| etcdExcessiveDatabaseGrowth | warning | — |
| etcdGRPCRequestsSlow | critical | — |
| etcdHighCommitDurations | warning | — |
| etcdHighFsyncDurations | warning | — |
| etcdHighNumberOfFailedGRPCRequests | warning | — |
| etcdHighNumberOfFailedProposals | warning | — |
| etcdHighNumberOfLeaderChanges | warning | — |
| etcdInsufficientMembers | critical | — |
| etcdMemberCommunicationSlow | warning | — |
| etcdMembersDown | warning | — |
| etcdNoLeader | critical | — |
general.rules#
kube-apiserver-slos#
| Alert | Severity | Runbook |
|---|
| KubeAPIErrorBudgetBurn | critical | runbook |
kube-state-metrics#
| Alert | Severity | Runbook |
|---|
| KubeStateMetricsListErrors | critical | runbook |
| KubeStateMetricsShardingMismatch | critical | runbook |
| KubeStateMetricsShardsMissing | critical | runbook |
| KubeStateMetricsWatchErrors | critical | runbook |
kubernetes-apps#
| Alert | Severity | Runbook |
|---|
| KubeContainerWaiting | warning | runbook |
| KubeDaemonSetMisScheduled | warning | runbook |
| KubeDaemonSetNotScheduled | warning | runbook |
| KubeDaemonSetRolloutStuck | warning | runbook |
| KubeDeploymentGenerationMismatch | warning | runbook |
| KubeDeploymentReplicasMismatch | warning | runbook |
| KubeDeploymentRolloutStuck | warning | runbook |
| KubeHpaMaxedOut | warning | runbook |
| KubeHpaReplicasMismatch | warning | runbook |
| KubeJobFailed | warning | runbook |
| KubeJobNotCompleted | warning | runbook |
| KubePdbNotEnoughHealthyPods | warning | runbook |
| KubePodCrashLooping | warning | runbook |
| KubePodNotReady | warning | runbook |
| KubeStatefulSetGenerationMismatch | warning | runbook |
| KubeStatefulSetReplicasMismatch | warning | runbook |
| KubeStatefulSetUpdateNotRolledOut | warning | runbook |
kubernetes-resources#
kubernetes-storage#
| Alert | Severity | Runbook |
|---|
| KubePersistentVolumeErrors | critical | runbook |
| KubePersistentVolumeFillingUp | critical | runbook |
| KubePersistentVolumeInodesFillingUp | critical | runbook |
kubernetes-system#
| Alert | Severity | Runbook |
|---|
| KubeClientErrors | warning | runbook |
| KubeVersionMismatch | warning | runbook |
kubernetes-system-apiserver#
| Alert | Severity | Runbook |
|---|
| KubeAPIDown | critical | runbook |
| KubeAPITerminatedRequests | warning | runbook |
| KubeAggregatedAPIDown | warning | runbook |
| KubeAggregatedAPIErrors | warning | runbook |
| KubeClientCertificateExpiration | warning | runbook |
kubernetes-system-controller-manager#
| Alert | Severity | Runbook |
|---|
| KubeControllerManagerDown | critical | runbook |
kubernetes-system-kube-proxy#
| Alert | Severity | Runbook |
|---|
| KubeProxyDown | critical | runbook |
kubernetes-system-kubelet#
| Alert | Severity | Runbook |
|---|
| KubeNodeEviction | info | runbook |
| KubeNodeNotReady | warning | runbook |
| KubeNodePressure | info | runbook |
| KubeNodeReadinessFlapping | warning | runbook |
| KubeNodeUnreachable | warning | runbook |
| KubeletClientCertificateExpiration | warning | runbook |
| KubeletClientCertificateRenewalErrors | warning | runbook |
| KubeletDown | critical | runbook |
| KubeletPlegDurationHigh | warning | runbook |
| KubeletPodStartUpLatencyHigh | warning | runbook |
| KubeletServerCertificateExpiration | warning | runbook |
| KubeletServerCertificateRenewalErrors | warning | runbook |
| KubeletTooManyPods | info | runbook |
kubernetes-system-scheduler#
| Alert | Severity | Runbook |
|---|
| KubeSchedulerDown | critical | runbook |
node-exporter#
| Alert | Severity | Runbook |
|---|
| NodeBondingDegraded | warning | runbook |
| NodeCPUHighUsage | info | runbook |
| NodeClockNotSynchronising | warning | runbook |
| NodeClockSkewDetected | warning | runbook |
| NodeDiskIOSaturation | warning | runbook |
| NodeFileDescriptorLimit | warning | runbook |
| NodeFilesystemAlmostOutOfFiles | warning | runbook |
| NodeFilesystemAlmostOutOfSpace | warning | runbook |
| NodeFilesystemFilesFillingUp | warning | runbook |
| NodeFilesystemSpaceFillingUp | warning | runbook |
| NodeHighNumberConntrackEntriesUsed | warning | runbook |
| NodeMemoryHighUtilization | warning | runbook |
| NodeMemoryMajorPagesFaults | warning | runbook |
| NodeNetworkReceiveErrs | warning | runbook |
| NodeNetworkTransmitErrs | warning | runbook |
| NodeRAIDDegraded | critical | runbook |
| NodeRAIDDiskFailure | warning | runbook |
| NodeSystemSaturation | warning | runbook |
| NodeSystemdServiceCrashlooping | warning | runbook |
| NodeSystemdServiceFailed | warning | runbook |
| NodeTextFileCollectorScrapeError | warning | runbook |
node-network#
| Alert | Severity | Runbook |
|---|
| NodeNetworkInterfaceFlapping | warning | runbook |
prometheus#
| Alert | Severity | Runbook |
|---|
| PrometheusBadConfig | critical | runbook |
| PrometheusDuplicateTimestamps | warning | runbook |
| PrometheusErrorSendingAlertsToAnyAlertmanager | critical | runbook |
| PrometheusErrorSendingAlertsToSomeAlertmanagers | warning | runbook |
| PrometheusHighQueryLoad | warning | runbook |
| PrometheusKubernetesListWatchFailures | warning | runbook |
| PrometheusLabelLimitHit | warning | runbook |
| PrometheusMissingRuleEvaluations | warning | runbook |
| PrometheusNotConnectedToAlertmanagers | warning | runbook |
| PrometheusNotIngestingSamples | warning | runbook |
| PrometheusNotificationQueueRunningFull | warning | runbook |
| PrometheusOutOfOrderTimestamps | warning | runbook |
| PrometheusRemoteStorageFailures | critical | runbook |
| PrometheusRemoteWriteBehind | critical | runbook |
| PrometheusRemoteWriteDesiredShards | warning | runbook |
| PrometheusRuleFailures | critical | runbook |
| PrometheusSDRefreshFailure | warning | runbook |
| PrometheusScrapeBodySizeLimitHit | warning | runbook |
| PrometheusScrapeSampleLimitHit | warning | runbook |
| PrometheusTSDBCompactionsFailing | warning | runbook |
| PrometheusTSDBReloadsFailing | warning | runbook |
| PrometheusTargetLimitHit | warning | runbook |
| PrometheusTargetSyncFailure | critical | runbook |
prometheus-operator#
| Alert | Severity | Runbook |
|---|
| PrometheusOperatorListErrors | warning | runbook |
| PrometheusOperatorNodeLookupErrors | warning | runbook |
| PrometheusOperatorNotReady | warning | runbook |
| PrometheusOperatorReconcileErrors | warning | runbook |
| PrometheusOperatorRejectedResources | warning | runbook |
| PrometheusOperatorStatusUpdateErrors | warning | runbook |
| PrometheusOperatorSyncFailed | warning | runbook |
| PrometheusOperatorWatchErrors | warning | runbook |