We experienced a temporary issue in our Azure Kubernetes Service (AKS) environment that affected the availability of key services due to a sudden increase in traffic.
Root Cause
A sudden and significant spike in access to EU Xink service led to increased demand on our infrastructure. As a result, some components in our Azure Kubernetes Service (AKS) cluster were unable to start due to insufficient available resources at the time. This caused delays in deploying services.
Impact
Xink Portal inaccessible during incident
Signatures inaccessible during incident
Resolution
Immediate Action: Increased the number of nodes in the AKS cluster to provide additional compute resources, allowing pending components to be scheduled and services to recover.
Permanent Fix: Adjusted auto-scaling thresholds and resource allocation policies to better handle sudden spikes in traffic without resource exhaustion.
Deployment: Applied configuration changes and resource scaling updates during the incident and validated them in production to ensure stability.
Monitoring: Post-resolution monitoring confirmed that all services are operating normally and resource levels remain healthy under load.
Posted Jun 11, 2025 - 13:47 CEST
Resolved
We already applied fixes. Please allow up to 30 minutes for the service to get to normal operation.
Posted Jun 11, 2025 - 10:50 CEST
Investigating
We are currently investigating this issue.
Posted Jun 11, 2025 - 10:18 CEST
This incident affected: EU Data Centre (Microsoft Azure Cloud) (Admin Portal).