Driftstörning / Outage

Start: 09:00
End: 09:15

Fault: Optical fiber outage causing network disruption of internet traffic.
Status: Partially remedied (failover path in use).
Work is being done to find the fault and remedy it.

Datastorage is not affected by the outage (multipathing).

Update:
At 10:40 backup and primary paths are now recovered. Full redundancy has been restored.

Further investigation into the failed optical fiber will be made.

Work is being undertaken to improve and speed up the fail-over of fiber redundancy for inter-DC connectivity.

Details:
At 09.00 a switch failure occurred in core infrastructure which burnt optical fiber transceivers.

Remediation: By 09.15 the fault had been found and all network paths had been migrated to secondary fiber connections.

Planned steps:
The faulty switch has been replaced and redundancy restored. We’re planning to perform maintenance work (future scheduled window) to implement new protocols for automated fail-over of fiber paths.

BGP Upstream flap Västberga DC

At 18:15 we were notifed that BGP sessions towards one of our upstream providers went down.
This caused some network dips for one of our core routers while routes were being rebuilt.

At 18.21 the BGP sessions towards upstream was restored.
We’ve asked upstream to investigat the unannounced loss of session.

19.00 Update: Upstream has replied and said that a high CPU situation on their router caused BGP sessions to ”flap” (disconnect and reconnect). The cpu usage is now stable and BGP sessions too.

Important security notification: Linux remote crash vulnerability

We’ve been notified that there is a new remote crash vulnerability many Linux systems .

The CVE has yet to be publicly released, it has just been reserved so far: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-11477
The register has published more details: https://www.theregister.co.uk/2019/06/17/linux_tcp_sack_kernel_crash/
Other cloud hosting vendors have published steps on how to mitigate this flaw and so is Adminor AB.

A patch to linux kernel will be issued by different vendors, meanwhile a mitigation of these attacks is to disable tcp_sack (tcp selective acknowledgement) .
It’s possible that the recent reboot of systems already have mitigation for this exploit but we have not been notified of such by upstream vendor as the exploit is still not entirely released.

We recommend that you disable tcp_sack as a pre-caution.
TCP sack is used to speed up TCP transfer by allowing computers to tell the server how much data is left to be sent. This should have minimal impact on normal operations but we still recommend monitoring for any negative performance impacts

Command to run which should not require a system reboot:
echo 0 > /proc/sys/net/ipv4/tcp_sack


To make the change persistent across reboots a command such as the following can be run:echo ’net.ipv4.tcp_sack = 0’ >> /etc/sysctl.conf

We recommend enabling tcp_sack when a kernel patch has been issued and system rebooted.
Please let us know if you need assistance .

Maintenance Cogent

Start time: 00:00 CET (GMT+1) 18-NOV-2018
End time: 06:00 CET (GMT+1) 18-NOV-2018
Expected Outage/Downtime: 45-60 minutes (for Cogent transit, not Adminor services)
Location: Odessa-Copenhaguen-Stockholm


Cogent is performing network maintenance which will cause some network dips momentarily. Adminors network routing / BGP will move traffic over to other transits to minimize network outage.