Follow

2017.08.21 11:30 UTC Service incident - Postmortem

Customer reported a problem accessing Cxense Video console.
R&D identified that this was caused by our systems failed to establish a connection to our CDN due to an invalid certificate.

Incident timeline
2017.08.21 @ 11:41 UTC First customer reports problems with accessing the video console
2017.08.21 @ 11:47 UTC Support escalates to R&D's video team for investigation after also API errors are encountered
2017.08.21 @ 12:15 UTC R&D confirms that they are actively working on the troubleshooting
2017.08.22 @ 00:42 UTC Service incident resolved.

Root cause
A self-signed certificate expired on our Akamai origin that is maintained by Cxense. Creating a new one was delayed due to the manual nature of the original certificate management. The delay in issuing a new certificate was caused by an inefficient division of the tasks between two R&D teams, and inefficient communications processes with our vendor and the consultant involved in configuring the new certificate.

Preventive and corrective measures.
The main component causing the delayed resolution in this case was the failing monitoring. Cxense have improved the monitoring and will now be able to detect such connectivity problems earlier.
Cxense Infrastructure team has also included this particular certificate in the existing renewal processes for certificates, and this will ensure that the certificates are updated before expiring instead of waiting for connectivity failures before starting.
It is also expected that by starting the renewal process before the actual expiry has happened, the processes and accuracy of the work delivered by everybody involved will be a lot better and remove the risk for outages like this.

 

Have more questions? Submit a request

Comments

Powered by Zendesk