Ongoing Code Execution Downtime

Incident Report for CoderPad

Postmortem

We’re very sorry for the downtime.

Turns out, collectd ate all our disk. Exacerbated by a failed rollover last night, today at about noon, our worker nodes went down due to lack of space on the boot drive. We’re going to probably nix collectd for metrics-gathering entirely. Thank you for your patience.

Posted Sep 28, 2018 - 12:31 PDT

Resolved

It looks like our nodes managed to run out of disk space. We're not sure why, but the service should be okay for now.

Posted Sep 28, 2018 - 12:21 PDT

Identified

We've identified host unhealthiness in our code execution backend and are redeploying now.

Posted Sep 28, 2018 - 12:13 PDT

This incident affected: Execution Tier.