HostUp

HostUp Status

All Systems Operational

99.99% uptime over 90 days

Active Incidents

In ProgressApril 24 – May 2026

Oracle Cloud — Compute infrastructure maintenance

Oracle Cloud, our cloud partner for parts of our web hosting, is performing maintenance on the infrastructure supporting affected compute instances. Per their notification, each instance is expected to be powered off for approximately 20 minutes in most cases, though it may take longer. Brief network outages can therefore occur during the maintenance window.

resolved2026-04-23 05:53 – 12:00

Pi & Omega — MySQL automatically upgraded from 8.4 to 9.7

This was caused by a cPanel bug and hit most hosting providers running cPanel worldwide — for us it impacted 2 web hosting servers (Pi and Omega). To be clear about where the fault lies: cPanel ships the MySQL community repo enabled by default, runs the nightly update mechanism, and pushed a major-version jump (8.4 → 9.7) to production databases with no staged rollout, no canary, and no versionlock — despite the breaking bug being filed upstream (CPANEL-52811) before the update went out. A one-line versionlock on cPanel's side would have prevented this for every one of their customers. They didn't ship it. We have now applied that versionlock ourselves and disabled cPanel's auto-updates entirely, which is something that arguably should never have been opt-out in the first place.

TL;DR

At 05:53 CEST, nightly dnf upgrade bumped mysql-community-* from 8.4 → 9.7 via the community repo (known upstream bug — cPanel case CPANEL-52811). MySQL 9.7 mysqld crashed during the Data Dictionary upgrade because the faveo database has Laravel Pulse tables with virtual columns using md5() (removed in MySQL 9). Server was down until 08:17 CEST when a clean MySQL 8.4.9 was installed and versionlocked. All 398 cPanel accounts' DB data was then recovered to its pre-crash state (up to 05:53) by:

  1. Rebuilding MySQL 8.4 clean.
  2. Seeding mysql.user from the broken 9.7 datadir (preserving original password hashes).
  3. Running JetBackup restore to fill in file/config data.
  4. Re-extracting 520 DBs directly from the broken 9.7 datadir via a sidecar mysqld to recover activity between JetBackup (April 19) and crash (April 23 05:53).
12:00 — Resolved

All services fully restored on both Pi and Omega. We ran a health check comparing table counts before vs. after to verify everything was imported correctly through the 9.7 → 8.4 downgrade. Going forward we are doing cPanel's job for them: auto-updates are disabled, MySQL 8.4 is versionlocked on every cPanel server we operate, and we no longer trust the cPanel/community-repo combination to ship production-safe updates unattended. If you still experience issues, please contact us at [email protected].

10:46 — Update

All services restored, but performance may still be degraded. Omega: restoring data up until the point of crash, ETA ~20 minutes. Pi: we are still working on resolving this.

08:17 — MySQL 8.4.9 reinstalled

A clean MySQL 8.4.9 was installed and versionlocked. Data recovery began — JetBackup restore plus re-extraction of DBs from the broken 9.7 datadir via a sidecar mysqld to recover activity up to the point of crash.

05:53 — Incident detected

Nightly dnf upgrade bumped MySQL 8.4 → 9.7 via the community repo. mysqld crashed during Data Dictionary upgrade due to virtual columns using md5() (removed in MySQL 9).

Latest Post-Mortem

26 March 2026I/O Degradation26m

VPS - RL1 Node 2 — I/O Storm Caused by krbd Sparse-Read Bug (CVE-2026-23136)

A brief network disruption triggered a known kernel bug in the Ceph storage client (krbd), causing an unrecoverable I/O retry loop that stalled all ~65 VMs on this node. No data was lost.

Timeline
00:36

First occurrence overnight. We recovered the node but didn't identify the root cause

17:40

It happened again. Server load hit 680+, kernel logs flooded with CRC checksum errors across all OSD connections simultaneously. Storage I/O completely stalled

17:45

Throttled the link to 2 Gbit with tc to break the retry loop. CRC errors stopped immediately

17:50

Identified the bond hash was set to layer2 instead of layer3+4, funneling all inbound traffic through one 10G NIC

17:55

Started migrating VMs to other nodes

18:06

All VMs restored. VMs that went read-only were rebooted to clear filesystem state

Impact

About 65 VMs on Node 2 had disk I/O stall completely. Some guest filesystems went read-only as a protective measure. All VMs were restored with no data loss.

Root Cause

The incident had two contributing factors:

Network bonding imbalance: This node's bond was configured with layer2 hashing, which selects the outgoing NIC based on MAC address. With only two endpoints (server and switch), inbound traffic landed almost entirely on one 10G NIC. During a Ceph deep scrub, the increased read traffic was enough to cause packet drops and CRC failures on the saturated link.

Kernel bug (CVE-2026-23136): When the CRC errors caused libceph to drop and reconnect OSD connections, a bug in the kernel's sparse-read state machine prevented recovery. On reconnect, the client misinterpreted new OSD replies as continuations of previous failed operations, causing every retry to fail immediately and trigger another reconnect. This created a self-sustaining loop that could not resolve on its own.

Because krbd handles all VM storage through a single kernel process, this loop affected every VM on the node simultaneously. With librbd (QEMU's userspace Ceph client), each VM maintains independent connections — the same bug does not exist in the userspace client, and even a connection failure would only affect the individual VM.

Resolution
  1. Throttled the link to 2 Gbit to break the retry loop
  2. Fixed the bond hash policy from layer2 to layer3+4
  3. Migrated VMs to other nodes and rebooted those in read-only state
  4. Rebooted the node to clear stale kernel Ceph state
Preventive Measures
  • All nodes confirmed on layer3+4 bond hashing — this was the only node still on layer2
  • Migrated all nodes from krbd to librbd (QEMU) on March 28. With librbd, connection faults are isolated per VM and the kernel sparse-read bug is not in the code path. Done via live migration with no downtime

Scheduled Maintenance

Completed

VPS - RL1 — Migration from krbd to QEMU librbd

All VMs were live-migrated with no downtime. All nodes now run librbd.

Website & Portal

agent

Operational

API

Operational

cloud.hostup.se

Customer portal

Operational

hostup.se

Operational

webmail

Operational

Web Hosting - cPanel

delta

Test site

Operational

lambda

Test site

Operational

mu

Test site

Operational

omega

Test site

Operational

pi

Test site

Operational

srv11

High-frequency cPanel

Operational

Web Hosting - ApisCP (Legacy)

epsilon

Test site

Maintenance

eta

Test site

Operational

orion

Test site

Operational

theta

Test site

Operational

zeta

Test site

Operational

VPS - RL1

Stockholm Älvsjö datacenter

IPv4 Gateway

IPv4 routing

Operational

IPv6 Gateway

IPv6 routing

Operational

Node 0

Legacy node

Operational

Node 12

HA cluster node

Operational

Node 13

HA cluster node

Operational

Node 16

HA cluster node

Operational

Node 23

HA cluster node

Operational

Node 24

HA cluster node

Operational

Node 25

HA cluster node

Operational

Node 26

HA cluster node

Operational

Node 3

Snapshot storage

Maintenance

Node 4

HA cluster node

Operational

Node 5

HA cluster node

Operational

Node 6

HA cluster node

Operational

Node 7

HA cluster node

Operational

Node 8

HA cluster node

Operational

Node 9

HA cluster node

Operational

VPS - RL2

Stockholm Älvsjö datacenter

IPv4 Gateway

IPv4 routing

Operational

IPv6 Gateway

IPv6 routing

Operational

Node 1

Hypervisor

Operational

Node 10

Hypervisor

Operational

Node 2

Hypervisor

Operational

Node 3

Hypervisor

Operational

Node 4

Hypervisor

Operational

Node 5

Hypervisor

Operational

Node 6

Hypervisor

Operational

Node 7

Hypervisor

Operational

Node 8

Hypervisor

Operational

Node 9

Hypervisor

Operational

DNS

Cloudflare whitelabel anycast nameservers

primary.ns.hostup.se

Cloudflare anycast

Operational

secondary.ns.hostup.se

Cloudflare anycast

Operational

Past Incidents

April 2026

ResolvedDuration: 11 min

theta

Status 522

25 Apr 2026, 08:59
ResolvedDuration: 15 min

orion

Status 522

25 Apr 2026, 08:55
ResolvedDuration: 16 min

agent

Status 502

25 Apr 2026, 08:53
ResolvedDuration: 11 min

pi

Status 522

25 Apr 2026, 04:50
ResolvedDuration: 7 min

hostup.se

Timeout (no headers received)

24 Apr 2026, 14:44
ResolvedDuration: 10 min

API

Timeout (no headers received)

24 Apr 2026, 14:26
ResolvedDuration: 17 min

hostup.se

Timeout (no headers received)

24 Apr 2026, 14:24
ResolvedDuration: 3h 20m

omega

Status 500

23 Apr 2026, 05:59
ResolvedDuration: 3h 51m

pi

Status 500

23 Apr 2026, 05:13
ResolvedDuration: 0 min

agent

Status 502

11 Apr 2026, 12:59

Issue not listed here?

Try our AI troubleshooting agent — it can check your website, verify DNS records, test if ports are open (SSH, RDP), and help determine if the issue is on your end or ours.

Automated health checks running every 30 seconds. Web hosting monitors use test WordPress sites — brief unavailability (1-2 min) may occur during auto-updates.