We’ve applied important security updates to some of our switches in Västberga.
Each switch requires less than 5 minutes to reboot.
Impact: 08:50 – 08:55, 09:50 – 09.55
Problem: Upstream/ISP router rebooted. Suspected DDOS. Operator is restarting services and restoring connectivity.
Future resolution: Replacing upstream router.
We’ve now omitted the ISP with connectivity issues and re-routed traffic to different ISP.
Fault: Optical fiber outage causing network disruption of internet traffic.
Status: Partially remedied (failover path in use).
Work is being done to find the fault and remedy it.
Datastorage is not affected by the outage (multipathing).
At 10:40 backup and primary paths are now recovered. Full redundancy has been restored.
Further investigation into the failed optical fiber will be made.
Work is being undertaken to improve and speed up the fail-over of fiber redundancy for inter-DC connectivity.
At 09.00 a switch failure occurred in core infrastructure which burnt optical fiber transceivers.
Remediation: By 09.15 the fault had been found and all network paths had been migrated to secondary fiber connections.
The faulty switch has been replaced and redundancy restored. We’re planning to perform maintenance work (future scheduled window) to implement new protocols for automated fail-over of fiber paths.
Uppströmsleverantör hade problem med sitt nätverk. Det drabbade vissa delar av Adminors nät.
Orsak var DDOS mot leverantör.
At 18:15 we were notifed that BGP sessions towards one of our upstream providers went down.
This caused some network dips for one of our core routers while routes were being rebuilt.
At 18.21 the BGP sessions towards upstream was restored.
We’ve asked upstream to investigat the unannounced loss of session.
19.00 Update: Upstream has replied and said that a high CPU situation on their router caused BGP sessions to ”flap” (disconnect and reconnect). The cpu usage is now stable and BGP sessions too.
notified that there is a new remote crash vulnerability many Linux systems .
The CVE has yet to be publicly released, it has just been reserved so far: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-11477
The register has published more details: https://www.theregister.co.uk/2019/06/17/linux_tcp_sack_kernel_crash/
Other cloud hosting vendors have published steps on how to mitigate this flaw
and so is Adminor AB.
A patch to linux kernel will be issued by different vendors, meanwhile a
mitigation of these attacks is to disable tcp_sack (tcp selective
It’s possible that the recent reboot of systems already have mitigation for
this exploit but we have not been notified of such by upstream vendor as the
exploit is still not entirely released.
We recommend that you disable tcp_sack as a pre-caution.
TCP sack is used to speed up TCP transfer by allowing computers to tell the
server how much data is left to be sent. This should have minimal impact on
normal operations but we still recommend monitoring for any negative
run which should not require a system reboot:
echo 0 > /proc/sys/net/ipv4/tcp_sack
To make the change persistent across reboots a command such as the following
can be run:echo
’net.ipv4.tcp_sack = 0’ >> /etc/sysctl.conf
We recommend enabling tcp_sack when a kernel patch has been issued and system
Please let us know if you need assistance .
Start time: 00:00 CET (GMT+1) 18-NOV-2018
End time: 06:00 CET (GMT+1) 18-NOV-2018
Expected Outage/Downtime: 45-60 minutes (for Cogent transit, not Adminor services)
Cogent is performing network maintenance which will cause some network dips momentarily. Adminors network routing / BGP will move traffic over to other transits to minimize network outage.
At 20:26 a switch had an unplanned reboot which caused a short network dip.
We’re monitoring to find the cause.
For our customers who still use Debian Wheezy.
If you haven’t had time to upgrade yet to Jessie or Stretch.
We recommend that you use Freexian apt repository for extended LTS support.
As of 01:50 the maintenance job had been completed succesfully.