Friday, March 26, 2010
Browsing / Access delays
---
11:04am Sites giving problems are those we would normally access through Telehouse North in London, we have therefore closed down transit and peering at THN which is forcing traffic to take alternative routes through differant transit partners. this has impoved things to some sites however some others remain problematic. The issues causing these delays are located outside of our network and thus outside of our control, we are waiting on updates as to when these problems will be corrected, we will then enable once more peering and transit at THN. Apologies to those customers affected by this issue.
Monday, March 15, 2010
C2 21CN ADSL
8:37pm - Problems appear to be resolved, most of the disconnected sessions have come back online - incident should be considered closed, we will however continue to monitor for a while.
Friday, January 22, 2010
C2 DSL Network
Monday, November 30, 2009
Network upgrades
While the installations and relocations take place there will be points in time where some circuits and systems will be deemed at risk, however traffic at these points will be manually set to traverse through alternate pathways and routers.
----
Dec 3rd update.
IFL2 Complete.
----
Dec 8th update.
Due to delays in getting some fibre links provisioned at TCW this site will be delayed, new circuits should be in by Dec 18th.
Monday, October 19, 2009
Core Network
Update 10:09; we've been told that the problems are down to a major incident in London which is affecting multiple parties, unfortunately it is of a scale which covers both our London based datacentres.
Update 10:18; THN now appears to be stable, all our connections to RBHX are now down (rather than flapping)
Update 10:47; we've now seen RBHX come back online, though no direct confirmation yet from our provider.
Update 13:45; we've just seen a blip on all our connections at RBHX, approx 1 minute.
For the near future services should still be considered at risk.
Monday, September 28, 2009
Manchester Transit HSRP Problems
During normal operation both routers would carry traffic, some customers having a higher priority on router 1 while the others have a higher priority on router 2. A number of scenarios are tested to ensure fail over does occur so this failure is a bit unusual.
An engineer is currently en-route to verify the status of the router experiencing problems.
Unexpected Reboot of rtr1.thn
The router appears to be stable after the reboot though reasons for why it may have rebooted are still being investigated. We will continue to monitor the router closely for the next few hours.
Saturday, September 19, 2009
Transit router
Customers directly connected to this router would have seen an outage of 32 minutes. Others may have experienced an outage of approx 30 seconds while their sessions were redirected to another router.
C2 apologise for the inconvenience this outage may have caused.
Thursday, June 18, 2009
Scheduled maintenance 9th June 04:00 and 06:00.
Following on from the Network incident report for 9th June, we are scheduling a network maintenance window for Sunday 21 June that will be open between 04:00 and 06:00. The purpose of this window will be to investigate and test the stability of the Manchester ring.
We are not expecting the network to experience any disruption in service, but the Manchester ring will obviously be more at risk of interruption while testing takes place.
Kind Regards
Stuart McKindley
Tuesday, June 12, 2007
Analysis of incident 11:25 to 11:45 11 June 2007
We have now completed our investigation of what happened on 11 June and why it caused such a problem.
At 11:25 we experienced a loss of network reach-ability on our
A layer 2 protocol problem had occurred affecting the switches in the ring.
Service on the
This Level3 problem was identified at 11:40 and rectified. Service returned for affected Level3 transit customers at 11:45.
Root cause analysis
It seems that the problem was caused by three separate factors interacting together.
- Mis-configuration of a customer switch at IFL2
- Engineering works being carried out in Telecity
- Mis-configuration of an old port by Level3
These probably interacted as follows:
A layer 2 loop control protocol problem occurred between two different isolated sides of the
Lessons to take to Heart
Preventative
Better control of legacy cables and ports is required both with suppliers and customers. Also customers’ ports need strict layer 2 protocol controls at all times without exception by omission or special case.
Restorative
We estimate that we were delayed by about 10 minutes in the problem analysis and fix. When catastrophic problems occur our priorities are:
- Diagnose and restore service as quickly as possible
- Triage customers and deal with fact-finding and result feedback
- Triage customers and ensure higher support band customers have service restored first / follow ups.
- Don’t deal with individual customer issues that cannot be quickly resolved / are out of the norm.
- Don’t deal with un-related issues
- Don’t give customers misleading information
- Work to ensure all customers are back fully enabled
- Analyse what went wrong
- Determine lessons to be learned
- Write Reason for Outage report.
When catastrophic events happen customers naturally want to know what the problem is. Our reception staff receive a high volume of calls in a short space of time.
The problem solving team need to focus on solving the problem and are isolated to avoid distractions.
The diagnostic team is guided by the problem solving team on what information to gather from which customers in order to build up a picture of what is happening over the ground. This process needs to happen quickly and be very focused and directed. Anything that slows this process down is bad for all concerned.
To optimise the above we will be implementing the following changes with immediate effect
1. In the first instance call answering staff will take ‘focused yet detailed’ messages and email these to engineers. This ensures that engineers pro-actively manage fault resolution and are not distracted by in bound calls.
2. Diagnostic / information gathers will seek to quickly dispatch information and retrieve feedback from information requests. If you need to consult a third party please call back when you have gathered the information.
3. A bulletin will be published to the NOC website giving full information about the incident in due course once the problem has been solved and analysed – this is standard procedure.
4. If you wish to make a complaint, please do so once the incident is solved using the complaints mechanism on our website. Your complaint will receive a response within 24 hours of the incident resolution.
5. Abusive callers will not be tolerated.
We do value you as customers and want to service you as individuals, and collectively fix network problems as quickly as possible, and to this end we need, and thank you, for your co-operation and participation in the problem solving process.
Wednesday, September 27, 2006
C2 NOC Blog Site
This site is intended for the posting of scheduled network maintenance tickits and updates, as well as an advisory site to provide updates in relation to outage and emergency work that is being carried out to re-instate services.
This site is deliberatly hosted outside of the C2 network so that it should be available even during any disruptive periods.
Kind Regards
Ben