Resilimap

Multi-cloud reliability monitoring

Site generated: 2026-03-07 00:04:06 UTC 3/7/2026, 12:04:06 AM local
Home / Atlassian / Incident Details

MICROSCORE-4348 Micros API outage to upgrade RDSs from db.t4g.large to db.m8g.large

Resolved Scheduled Maintenance
Started
Wed, Apr 30, 2025, 06:43 AM UTC
Last Updated
Wed, Apr 30, 2025, 06:43 AM UTC
Resolved
Wed, Apr 30, 2025, 07:00 AM UTC
This maintenance window has been completed (Total duration: 16m)

AI-Powered Analysis

Impact Severity
minor
Categories
database api
Affected Users
all-users
Root Cause Analysis
Upgrading RDS instances from db.t4g.large to db.m8g.large
Analysis performed by anthropic.claude-3-haiku-20240307-v1:0 on Fri, Mar 6, 2026, 09:02 AM UTC

Incident History

Apr 30, 06:43 UTC
Completed - okie dokie, RDSs have been upgraded as quickly as expected, nothing related seems to have exploded in the time since, so i think we're in the clear, yay

Apr 30, 06:31 UTC
Update - update: from the AWS console for commercial production:
- April 30, 2025, 16:09 (UTC+10:00) Multi-AZ instance failover completed
- April 30, 2025, 16:09 (UTC+10:00) The RDS instance was modified by customer.
- April 30, 2025, 16:08 (UTC+10:00) DB instance restarted
- April 30, 2025, 16:08 (UTC+10:00) The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted from 20 to 65.
- April 30, 2025, 16:08 (UTC+10:00) Multi-AZ instance failover started.
- April 30, 2025, 16:01 (UTC+10:00) Applying modification to database instance class
so it looks like that took 2 minutes from the RDS perspective, and we did see some 5XX responses from the Micros API during this time

Apr 30, 06:04 UTC
Update - update: we've kicked off the deployments:
- commercial production: https://deployment-bamboo.internal.atlassian.com/deploy/viewDeploymentResult.action?deploymentResultId=3456381285
- FedRAMP-moderate production: https://deployment-bamboo.internal.atlassian.com/deploy/viewDeploymentResult.action?deploymentResultId=3456381287
- it's hard to predict exactly when, but there should be separate 2-3 minute outages, and those should start 10-15 minutes from now

Apr 30, 05:32 UTC
Scheduled - - 2x separate periods of outage, each expected to be 2-3 minutes
- Micros API for commercial production from db.t4g.large to db.m8g.large
- Micros API for FedRAMP-moderate from db.t4g.large to db.m8g.large
- https://hello.atlassian.net/wiki/spaces/MCORE/pages/5218781089/LDR+UA-13147+AWS+deadline+versus+micros-server+RDSs
- https://hello.jira.atlassian.cloud/browse/MICROSCORE-4348

External Resources