MICROSCORE-4348 Micros API outage to upgrade RDSs from db.t4g.large to db.m8g.large
AI-Powered Analysis
Incident History
Apr 30, 06:43 UTC
Completed - okie dokie, RDSs have been upgraded as quickly as expected, nothing related seems to have exploded in the time since, so i think we're in the clear, yay
Apr 30, 06:31 UTC
Update - update: from the AWS console for commercial production:
- April 30, 2025, 16:09 (UTC+10:00) Multi-AZ instance failover completed
- April 30, 2025, 16:09 (UTC+10:00) The RDS instance was modified by customer.
- April 30, 2025, 16:08 (UTC+10:00) DB instance restarted
- April 30, 2025, 16:08 (UTC+10:00) The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted from 20 to 65.
- April 30, 2025, 16:08 (UTC+10:00) Multi-AZ instance failover started.
- April 30, 2025, 16:01 (UTC+10:00) Applying modification to database instance class
so it looks like that took 2 minutes from the RDS perspective, and we did see some 5XX responses from the Micros API during this time
Apr 30, 06:04 UTC
Update - update: we've kicked off the deployments:
- commercial production: https://deployment-bamboo.internal.atlassian.com/deploy/viewDeploymentResult.action?deploymentResultId=3456381285
- FedRAMP-moderate production: https://deployment-bamboo.internal.atlassian.com/deploy/viewDeploymentResult.action?deploymentResultId=3456381287
- it's hard to predict exactly when, but there should be separate 2-3 minute outages, and those should start 10-15 minutes from now
Apr 30, 05:32 UTC
Scheduled - - 2x separate periods of outage, each expected to be 2-3 minutes
- Micros API for commercial production from db.t4g.large to db.m8g.large
- Micros API for FedRAMP-moderate from db.t4g.large to db.m8g.large
- https://hello.atlassian.net/wiki/spaces/MCORE/pages/5218781089/LDR+UA-13147+AWS+deadline+versus+micros-server+RDSs
- https://hello.jira.atlassian.cloud/browse/MICROSCORE-4348