<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Node on SafetyWing Runbooks</title><link>https://runbooks.safetywing.dev/runbooks/node/</link><description>Recent content in Node on SafetyWing Runbooks</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://runbooks.safetywing.dev/runbooks/node/index.xml" rel="self" type="application/rss+xml"/><item><title>NodeFilesystemAlmostFull</title><link>https://runbooks.safetywing.dev/runbooks/node/nodefilesystemalmostfull/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://runbooks.safetywing.dev/runbooks/node/nodefilesystemalmostfull/</guid><description>&lt;h2 id="meaning"&gt;Meaning&lt;a class="anchor" href="#meaning"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A node filesystem is running low on free space. This is a supplemental platform-tier rule on top of the kube-prometheus-stack node-exporter mixin.&lt;/p&gt;
&lt;p&gt;Fires when: available space on a non-ephemeral filesystem drops below the configured ratio.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-promql" data-lang="promql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;node_filesystem_avail_bytes{fstype&lt;span style="color:#f92672"&gt;!~&lt;/span&gt;&amp;#34;&lt;span style="color:#e6db74"&gt;tmpfs|overlay|squashfs&lt;/span&gt;&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt; node_filesystem_size_bytes{fstype&lt;span style="color:#f92672"&gt;!~&lt;/span&gt;&amp;#34;&lt;span style="color:#e6db74"&gt;tmpfs|overlay|squashfs&lt;/span&gt;&amp;#34;}&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;ratio&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;for: 15m&lt;/code&gt;, severity &lt;code&gt;ticket&lt;/code&gt;, tier &lt;code&gt;platform&lt;/code&gt; (cluster-wide, no &lt;code&gt;environment&lt;/code&gt; label). The offending filesystem is identified by the &lt;code&gt;instance&lt;/code&gt; and &lt;code&gt;mountpoint&lt;/code&gt; labels.&lt;/p&gt;
&lt;h2 id="impact"&gt;Impact&lt;a class="anchor" href="#impact"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A full filesystem on a node can wedge the kubelet, fail image pulls, block container log writes, evict pods (ephemeral-storage pressure), and on Talos can disrupt the system partition. If the node hosts stateful workloads (Ceph OSDs, MySQL/MOCO, RabbitMQ), data writes can stall or fail.&lt;/p&gt;</description></item></channel></rss>