HostUp

HostUp Status

All Systems Operational

99.998% uptime over 90 days

Latest Post-Mortem

9 March 2026 · 6 hours agoNetwork Flapping~30m

VPS - RL1 Node 6 — Intermittent Storage Latency Due to DAC Cable Fault

Node 6 experienced intermittent high storage latency caused by a faulty DAC cable on one port of the Mellanox ConnectX NIC. The LACP bond port flapping disrupted Ceph OSD connectivity, causing brief I/O stalls for VMs on this node.

Timeline
16:29

mlx5_core lag map begins flapping between port 1 and port 2 every few seconds

16:29-16:31

Ceph monitor sessions lost, osd4 and osd6 marked down. MDS caps go stale. rbd watch errors (-107) on VM disks

16:35-16:50

Repeated cycles of port flapping, OSD bounce (osd4/osd6 down→up), monitor session hunting, and rbd watch errors approximately every 5-10 minutes

16:50

bond0 slave enp65s0f0np0 marked "link status definitely down" — bond fails over to remaining port

16:50-16:58

Continued intermittent OSD flapping as the faulty port attempts to recover

~17:00

Faulty port disabled with ifdown enp65s0f0np0 — all traffic stable on remaining port. Issue resolved

Impact

VMs on Node 6 experienced brief periods of elevated storage latency and intermittent I/O stalls as Ceph OSDs (osd4, osd6) bounced up and down. No complete outage — the LACP bond with dual switches kept the node online throughout, but performance was degraded.

Root Cause

A faulty DAC (Direct Attach Copper) cable on port enp65s0f0np0 of the dual-port Mellanox ConnectX NIC caused continuous link flapping. The mlx5_core driver repeatedly rebalanced the LACP lag map between the two ports every 3-4 seconds, disrupting established Ceph connections.

Each port flap caused the Ceph client on Node 6 to lose its monitor sessions and mark osd4 and osd6 as down. While the OSDs recovered within seconds each time, the repeated cycling caused rbd watch errors (-107 ENOTCONN) and MDS capability timeouts — resulting in elevated I/O latency for VMs whose disks were served by these OSDs.

The LACP bond across two separate switches prevented a full outage — traffic continued flowing through the healthy port — but the constant rebalancing between a good and bad port created the intermittent disruption pattern seen in the logs.

Resolution

Disabled the faulty port with ifdown enp65s0f0np0, leaving the bond running on the remaining healthy port. All Ceph connections stabilized immediately. The DAC cable will be replaced on the next datacenter visit.

Preventive Measures
  • Replace the faulty DAC cable on Node 6 at next datacenter visit
  • Benefit of dual-switch LACP confirmed — node stayed online throughout despite a complete port failure

Scheduled Maintenance

No maintenance currently scheduled.

Website & Portal

agent

Operational

API

Operational

cloud.hostup.se

Customer portal

Operational

hostup.se

Operational

webmail

Operational

Web Hosting - cPanel

delta

Test site

Operational

lambda

Test site

Operational

mu

Test site

Operational

omega

Test site

Operational

pi

Test site

Operational

srv11

High-frequency cPanel

Operational

Web Hosting - ApisCP (Legacy)

epsilon

Test site

Maintenance

eta

Test site

Operational

orion

Test site

Operational

theta

Test site

Operational

zeta

Test site

Operational

VPS - RL1

Stockholm Älvsjö datacenter

High Frequency Ryzen 9950x

High-performance compute

Operational

IPv4 Gateway

IPv4 routing

Operational

IPv6 Gateway

IPv6 routing

Operational

Node 0

Legacy node

Operational

Node 12

HA cluster node

Operational

Node 13

HA cluster node

Operational

Node 16

HA cluster node

Operational

Node 2

Hypervisor

Operational

Node 23

HA cluster node

Operational

Node 24

HA cluster node

Operational

Node 25

HA cluster node

Operational

Node 26

HA cluster node

Operational

Node 3

Snapshot storage

Operational

Node 4

HA cluster node

Operational

Node 5

HA cluster node

Operational

Node 6

HA cluster node

Operational

Node 7

HA cluster node

Operational

Node 8

HA cluster node

Operational

Node 9

HA cluster node

Operational

VPS - RL2

Stockholm Älvsjö datacenter

IPv4 Gateway

IPv4 routing

Operational

IPv6 Gateway

IPv6 routing

Operational

Node 1

Hypervisor

Operational

Node 2

Hypervisor

Operational

Node 3

Hypervisor

Operational

Node 4

Hypervisor

Operational

Node 5

Hypervisor

Operational

Node 6

Hypervisor

Operational

Node 7

Hypervisor

Operational

Node 8

Hypervisor

Operational

DNS

Cloudflare whitelabel anycast nameservers

primary.ns.hostup.se

Cloudflare anycast

Operational

secondary.ns.hostup.se

Cloudflare anycast

Operational

Past Incidents

March 2026

ResolvedDuration: 0 min

Node 3 (RL2)

No packets returned by host

4 Mar 2026, 10:08

February 2026

ResolvedDuration: 0 min

eta

Status 521

22 Feb 2026, 03:56
ResolvedDuration: 0 min

mu

Timeout (no headers received)

16 Feb 2026, 18:37
ResolvedDuration: 0 min

lambda

Status 502

16 Feb 2026, 06:18
ResolvedDuration: 1 min

mu

Couldn't connect to server

16 Feb 2026, 05:40
ResolvedDuration: 1 min

Node 1 (RL2)

No packets returned by host

9 Feb 2026, 12:10

January 2026

ResolvedDuration: 5 min

Node 3 (RL2)

No packets returned by host

31 Jan 2026, 17:34
ResolvedDuration: 0 min

Node 4 (RL2)

No packets returned by host

31 Jan 2026, 16:07
ResolvedDuration: 0 min

Node 6 (RL2)

No packets returned by host

31 Jan 2026, 16:02

Issue not listed here?

Try our AI troubleshooting agent — it can check your website, verify DNS records, test if ports are open (SSH, RDP), and help determine if the issue is on your end or ours.

Automated health checks running every 30 seconds. Web hosting monitors use test WordPress sites — brief unavailability (1-2 min) may occur during auto-updates.