Alert Catalog#

Every alert evaluated across SafetyWing clusters — 29 custom (component / environment / platform tiers, owned by us) and 133 stock (kube-prometheus-stack defaults). Custom alerts link to the runbook on this site; stock alerts link to the upstream prometheus-operator runbooks.

Generated from the live hetzner rule set + the infra-charts/cluster-monitors sources. Stock alerts are identical across clusters; custom alerts deploy per environment/cluster where the chart is enabled.

SafetyWing custom alerts#

Kafka (component tier)#

AlertSeverityRunbook
KafkaOfflinePartitionspagerunbook
KafkaNoActiveControllerpagerunbook
KafkaUnderReplicatedPartitionsticketrunbook
KafkaConsumerGroupLagHighticketrunbook

Kafka Connect (component tier)#

AlertSeverityRunbook
KafkaConnectFailedTaskspagerunbook
KafkaConnectWorkersDownpagerunbook
KafkaConnectNoConnectorsticketrunbook

MySQL (component tier)#

AlertSeverityRunbook
MysqlInstanceDownpagerunbook
MysqlConnectionsSaturatedticketrunbook
MysqlReplicationLagHighticketrunbook
MysqlDiskFillingUpticketrunbook

RabbitMQ (component tier)#

AlertSeverityRunbook
RabbitmqNodeDownpagerunbook
RabbitmqMemoryAlarmpagerunbook
RabbitmqDiskAlarmpagerunbook
RabbitmqQueueBacklogticketrunbook
RabbitmqQueueNoConsumersticketrunbook

Ceph (platform tier)#

AlertSeverityRunbook
CephHealthErrorpagerunbook
CephMonOutOfQuorumpagerunbook
CephHealthWarningticketrunbook
CephOSDDownticketrunbook
CephClusterNearFullticketrunbook

Elasticsearch (platform tier)#

AlertSeverityRunbook
ElasticsearchClusterRedpagerunbook
ElasticsearchClusterYellowticketrunbook
ElasticsearchHeapHighticketrunbook
ElasticsearchDiskWatermarkticketrunbook

Node (platform tier)#

AlertSeverityRunbook
NodeFilesystemAlmostFullticketrunbook

Traefik (platform tier)#

AlertSeverityRunbook
TraefikDownpagerunbook
TraefikHigh5xxRateticketrunbook

Environment (environment tier)#

AlertSeverityRunbook
EnvironmentHigh5xxRateticketrunbook

Stock alerts (kube-prometheus-stack)#

Shipped by the kube-prometheus-stack defaultRules. Documented upstream — links go there.

alertmanager.rules#

AlertSeverityRunbook
AlertmanagerClusterCrashloopingcriticalrunbook
AlertmanagerClusterDowncriticalrunbook
AlertmanagerClusterFailedToSendAlertscriticalrunbook
AlertmanagerConfigInconsistentcriticalrunbook
AlertmanagerFailedReloadcriticalrunbook
AlertmanagerFailedToSendAlertswarningrunbook
AlertmanagerMembersInconsistentcriticalrunbook

config-reloaders#

AlertSeverityRunbook
ConfigReloaderSidecarErrorswarningrunbook

etcd#

AlertSeverityRunbook
etcdDatabaseHighFragmentationRatiowarningrunbook
etcdDatabaseQuotaLowSpacecritical
etcdExcessiveDatabaseGrowthwarning
etcdGRPCRequestsSlowcritical
etcdHighCommitDurationswarning
etcdHighFsyncDurationswarning
etcdHighNumberOfFailedGRPCRequestswarning
etcdHighNumberOfFailedProposalswarning
etcdHighNumberOfLeaderChangeswarning
etcdInsufficientMemberscritical
etcdMemberCommunicationSlowwarning
etcdMembersDownwarning
etcdNoLeadercritical

general.rules#

AlertSeverityRunbook
InfoInhibitornonerunbook
TargetDownwarningrunbook
Watchdognonerunbook

kube-apiserver-slos#

AlertSeverityRunbook
KubeAPIErrorBudgetBurncriticalrunbook

kube-state-metrics#

AlertSeverityRunbook
KubeStateMetricsListErrorscriticalrunbook
KubeStateMetricsShardingMismatchcriticalrunbook
KubeStateMetricsShardsMissingcriticalrunbook
KubeStateMetricsWatchErrorscriticalrunbook

kubernetes-apps#

AlertSeverityRunbook
KubeContainerWaitingwarningrunbook
KubeDaemonSetMisScheduledwarningrunbook
KubeDaemonSetNotScheduledwarningrunbook
KubeDaemonSetRolloutStuckwarningrunbook
KubeDeploymentGenerationMismatchwarningrunbook
KubeDeploymentReplicasMismatchwarningrunbook
KubeDeploymentRolloutStuckwarningrunbook
KubeHpaMaxedOutwarningrunbook
KubeHpaReplicasMismatchwarningrunbook
KubeJobFailedwarningrunbook
KubeJobNotCompletedwarningrunbook
KubePdbNotEnoughHealthyPodswarningrunbook
KubePodCrashLoopingwarningrunbook
KubePodNotReadywarningrunbook
KubeStatefulSetGenerationMismatchwarningrunbook
KubeStatefulSetReplicasMismatchwarningrunbook
KubeStatefulSetUpdateNotRolledOutwarningrunbook

kubernetes-resources#

AlertSeverityRunbook
CPUThrottlingHighinforunbook
KubeCPUOvercommitwarningrunbook
KubeCPUQuotaOvercommitwarningrunbook
KubeMemoryOvercommitwarningrunbook
KubeMemoryQuotaOvercommitwarningrunbook
KubeQuotaAlmostFullinforunbook
KubeQuotaExceededwarningrunbook
KubeQuotaFullyUsedinforunbook

kubernetes-storage#

AlertSeverityRunbook
KubePersistentVolumeErrorscriticalrunbook
KubePersistentVolumeFillingUpcriticalrunbook
KubePersistentVolumeInodesFillingUpcriticalrunbook

kubernetes-system#

AlertSeverityRunbook
KubeClientErrorswarningrunbook
KubeVersionMismatchwarningrunbook

kubernetes-system-apiserver#

AlertSeverityRunbook
KubeAPIDowncriticalrunbook
KubeAPITerminatedRequestswarningrunbook
KubeAggregatedAPIDownwarningrunbook
KubeAggregatedAPIErrorswarningrunbook
KubeClientCertificateExpirationwarningrunbook

kubernetes-system-controller-manager#

AlertSeverityRunbook
KubeControllerManagerDowncriticalrunbook

kubernetes-system-kube-proxy#

AlertSeverityRunbook
KubeProxyDowncriticalrunbook

kubernetes-system-kubelet#

AlertSeverityRunbook
KubeNodeEvictioninforunbook
KubeNodeNotReadywarningrunbook
KubeNodePressureinforunbook
KubeNodeReadinessFlappingwarningrunbook
KubeNodeUnreachablewarningrunbook
KubeletClientCertificateExpirationwarningrunbook
KubeletClientCertificateRenewalErrorswarningrunbook
KubeletDowncriticalrunbook
KubeletPlegDurationHighwarningrunbook
KubeletPodStartUpLatencyHighwarningrunbook
KubeletServerCertificateExpirationwarningrunbook
KubeletServerCertificateRenewalErrorswarningrunbook
KubeletTooManyPodsinforunbook

kubernetes-system-scheduler#

AlertSeverityRunbook
KubeSchedulerDowncriticalrunbook

node-exporter#

AlertSeverityRunbook
NodeBondingDegradedwarningrunbook
NodeCPUHighUsageinforunbook
NodeClockNotSynchronisingwarningrunbook
NodeClockSkewDetectedwarningrunbook
NodeDiskIOSaturationwarningrunbook
NodeFileDescriptorLimitwarningrunbook
NodeFilesystemAlmostOutOfFileswarningrunbook
NodeFilesystemAlmostOutOfSpacewarningrunbook
NodeFilesystemFilesFillingUpwarningrunbook
NodeFilesystemSpaceFillingUpwarningrunbook
NodeHighNumberConntrackEntriesUsedwarningrunbook
NodeMemoryHighUtilizationwarningrunbook
NodeMemoryMajorPagesFaultswarningrunbook
NodeNetworkReceiveErrswarningrunbook
NodeNetworkTransmitErrswarningrunbook
NodeRAIDDegradedcriticalrunbook
NodeRAIDDiskFailurewarningrunbook
NodeSystemSaturationwarningrunbook
NodeSystemdServiceCrashloopingwarningrunbook
NodeSystemdServiceFailedwarningrunbook
NodeTextFileCollectorScrapeErrorwarningrunbook

node-network#

AlertSeverityRunbook
NodeNetworkInterfaceFlappingwarningrunbook

prometheus#

AlertSeverityRunbook
PrometheusBadConfigcriticalrunbook
PrometheusDuplicateTimestampswarningrunbook
PrometheusErrorSendingAlertsToAnyAlertmanagercriticalrunbook
PrometheusErrorSendingAlertsToSomeAlertmanagerswarningrunbook
PrometheusHighQueryLoadwarningrunbook
PrometheusKubernetesListWatchFailureswarningrunbook
PrometheusLabelLimitHitwarningrunbook
PrometheusMissingRuleEvaluationswarningrunbook
PrometheusNotConnectedToAlertmanagerswarningrunbook
PrometheusNotIngestingSampleswarningrunbook
PrometheusNotificationQueueRunningFullwarningrunbook
PrometheusOutOfOrderTimestampswarningrunbook
PrometheusRemoteStorageFailurescriticalrunbook
PrometheusRemoteWriteBehindcriticalrunbook
PrometheusRemoteWriteDesiredShardswarningrunbook
PrometheusRuleFailurescriticalrunbook
PrometheusSDRefreshFailurewarningrunbook
PrometheusScrapeBodySizeLimitHitwarningrunbook
PrometheusScrapeSampleLimitHitwarningrunbook
PrometheusTSDBCompactionsFailingwarningrunbook
PrometheusTSDBReloadsFailingwarningrunbook
PrometheusTargetLimitHitwarningrunbook
PrometheusTargetSyncFailurecriticalrunbook

prometheus-operator#

AlertSeverityRunbook
PrometheusOperatorListErrorswarningrunbook
PrometheusOperatorNodeLookupErrorswarningrunbook
PrometheusOperatorNotReadywarningrunbook
PrometheusOperatorReconcileErrorswarningrunbook
PrometheusOperatorRejectedResourceswarningrunbook
PrometheusOperatorStatusUpdateErrorswarningrunbook
PrometheusOperatorSyncFailedwarningrunbook
PrometheusOperatorWatchErrorswarningrunbook