Resilimap

Multi-cloud reliability monitoring

Site generated: 2026-06-05 20:04:21 UTC 6/5/2026, 8:04:21 PM local
Back to all providers

Datadog

Last updated: 5/16/2026, 5:02:55 AM

✓ All Systems Operational
Active Incidents
0
Resolved
25
Scheduled Maint.
0
Total Incidents
25
Total Maint.
0
Critical
0

Recently Resolved Incidents

Azure Metrics Reporting

Resolved Unknown

May 15, 23:43 EDT
Resolved - This incident has been resolved and Azure metrics are reporting as expected.

May 15, 23:27 EDT
Monitoring - A fix has been implemented and we are monitoring the results.

May 15, 23:09 EDT
Update - We are continuing to work on a fix for this issue.

May 15, 22:40 EDT
Identified - The issue has been identified and a fix is being implemented.

May 15, 22:17 EDT
Investigating - We are investigating an issue submitting Azure metrics.

AI Analysis
Impact: major
Categories: azure, monitoring
Users: all-users
Root Cause: An issue submitting Azure metrics to Datadog's service
Started: 5/16/2026, 3:43:01 AM Resolved: N/A Duration: 496h 22m

Delayed Metric Loading

Resolved Unknown

May 14, 15:48 EDT
Resolved - This incident has been resolved.

May 14, 14:48 EDT
Monitoring - A fix has been implemented and we are monitoring the results.

May 14, 14:24 EDT
Investigating - We are investigating increased latency querying Metrics. As a result of this issue, some users may experience delays loading graphs in dashboards and notebooks, and delayed monitor evaluations.

Started: 5/14/2026, 7:48:24 PM Resolved: N/A Duration: 528h 16m

May 13, 11:53 EDT
Resolved - This incident has been resolved.

May 13, 11:36 EDT
Monitoring - We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

May 13, 11:16 EDT
Investigating - We are investigating degraded performance in the Database Monitoring pages and APM service pages.

During this time, some customers may be experiencing:
- The DBM database list page failing to load or displaying zero databases
- 503 errors when attempting to access database instance views
- Inability to view or interact with Database Monitoring data in our US1 datacenter
- The APM service page failing to load

Started: 5/13/2026, 3:53:19 PM Resolved: N/A Duration: 556h 11m

May 8, 10:07 EDT
Resolved - This incident has been resolved.

May 8, 02:21 EDT
Monitoring - We have identified upstream provider issues. Metrics, Logs, APM, RUM and CI visibility data are being processed normally for all customers. Alerting is also functional.
We are monitoring the situation.
https://health.aws.amazon.com/health/status

May 8, 01:44 EDT
Update - We have identified upstream provider issues. Metrics, Logs, APM and RUM data are being processed normally for all customers. Alerting is also functional. There are still processing delays for CI visibility.
We are monitoring the situation.
https://health.aws.amazon.com/health/status

May 8, 01:30 EDT
Update - We have identified upstream provider issues and are continuing to experience delays in processing data across multiple products. We are continuing to work on a fix.
https://health.aws.amazon.com/health/status

For metrics, distribution metrics and point metrics are being processed normally for all customers.

May 8, 00:48 EDT
Update - We have identified upstream provider issues and are continuing to experience delays in processing data across multiple products. We are continuing to work on a fix.
https://health.aws.amazon.com/health/status

For metrics, distribution metrics and point metrics are being processed normally for all customers.

May 8, 00:13 EDT
Update - We have identified upstream provider issues and are continuing to experience delays in processing data across multiple products. We are continuing to work on a fix.
https://health.aws.amazon.com/health/status

For metrics, distribution metrics and point metrics are being processed normally for all customers.

May 7, 22:59 EDT
Update - We have identified due to upstream provider issues, we are continuing to see unavailability of telemetry data coming from AWS into Datadog. We are continuing to work on a fix.
https://health.aws.amazon.com/health/status

For metrics we are still seeing delays in distribution metrics. Counts, rates and gauge metrics are being processed normally for most customers

May 7, 22:16 EDT
Update - We have identified due to upstream provider issues, we are continuing to see unavailability of telemetry data coming from AWS into Datadog. We are continuing to work on a fix.

https://health.aws.amazon.com/health/status

May 7, 22:02 EDT
Update - We have identified upstream provider issues and are continuing to experience delays in processing data across multiple products. We are working on a fix.

https://health.aws.amazon.com/health/status

May 7, 21:20 EDT
Identified - We have identified upstream provider issues and are continuing to experience delays in processing data across multiple products. We are working on a fix.

https://health.aws.amazon.com/health/status

May 7, 20:47 EDT
Update - Due to upstream provider issues, we are also continuing to see unavailability of telemetry data coming from AWS into Datadog.

https://health.aws.amazon.com/health/status

May 7, 20:24 EDT
Update - We are investigating increased latency across multiple products which began at 23:39 UTC. As a result of this issue, some users may see delays in data across the platform.

May 7, 20:06 EDT
Investigating - We are investigating delays in Monitors Notifications, which began at 23:39 UTC.

Started: 5/8/2026, 2:07:35 PM Resolved: N/A Duration: 677h 57m

May 5, 15:52 EDT
Resolved - This incident has been resolved.

May 5, 14:44 EDT
Monitoring - A fix has been implemented and we are monitoring the results.

May 5, 14:29 EDT
Identified - The issue has been identified and a fix is being implemented.

May 5, 14:14 EDT
Investigating - We’re currently investigating an issue affecting APM monitor delays in our US1 datacenter. As a result, some users may experience delayed alert triggering or temporary inconsistencies in monitors.

Started: 5/5/2026, 7:52:31 PM Resolved: N/A Duration: 744h 12m

Apr 30, 21:19 EDT
Resolved - This incident has been resolved.

Apr 30, 21:09 EDT
Monitoring - We have deployed a fix and we are monitoring the results.
We will provide another update once the service is fully operational.

Apr 30, 20:45 EDT
Investigating - We are actively investigating elevated error rates for Metrics Queries.
As a result of this issue, some users may see errors with metrics graphs on the web application or API.

Started: 5/1/2026, 1:19:56 AM Resolved: N/A Duration: 858h 45m

Apr 24, 12:50 EDT
Resolved - This incident has been resolved.

Apr 24, 12:44 EDT
Monitoring - A fix has been implemented and we are monitoring the results.

Apr 24, 12:43 EDT
Identified - The issue has been identified and a fix is being implemented.

Apr 24, 12:27 EDT
Investigating - We are currently investigating this issue.

Started: 4/24/2026, 4:50:36 PM Resolved: N/A Duration: 1011h 14m

Delayed Metrics

Resolved Unknown

Apr 15, 20:37 EDT
Resolved - This incident has been resolved.

Apr 15, 20:32 EDT
Monitoring - We have resolved the lag in distribution metrics and we are monitoring the results.
We will provide another update once the issue is fully resolved.

Apr 15, 19:54 EDT
Update - We are still investigating increased latency processing Distribution Metrics, delays have gone down but have not fully recovered.

Apr 15, 19:16 EDT
Update - We are investigating increased latency processing Distribution Metrics.
As a result of this issue, some users may see delays or gaps for distribution metrics on graphs.

To prevent false monitor alerts due to delayed data, monitors affected by the delay will not notify and will automatically resume once current data is available. All other monitors will operate normally.

Apr 15, 18:38 EDT
Investigating - We are investigating increased latency processing Metrics.
As a result of this issue, some users may see delays or gaps for metrics on graphs.

To prevent false monitor alerts due to delayed data, monitors affected by the delay will not notify and will automatically resume once current data is available. All other monitors will operate normally.

Started: 4/16/2026, 12:37:38 AM Resolved: N/A Duration: 1219h 27m

Apr 10, 16:59 EDT
Resolved - Between approximately 9:42 AM and 10:38 AM ET on April 10, we observed delivery failures for customers in the US1, AP1, and AP2 regions. During this window, notifications to the following integrations were delayed or temporarily undelivered:

• PagerDuty
• Slack
• Microsoft Teams
• Webhooks
• Jira
• ServiceNow
• OpsGenie
• VictorOps
• BigPanda
• Zendesk
• Sumologic

All delayed notifications have been replayed with the following exceptions: PagerDuty pages and Slack notifications queued during the outage window were not replayed, as they were no longer actionable at the time of recovery. Webhook notifications will be reprocessed and delivered at a later date.

No monitoring data was lost during this incident. Monitor evaluations continued normally — only the delivery of triggered notifications was affected.

Thank you for your patience.

Apr 10, 16:13 EDT
Update - Webhooks and Slack notifications are still catching up.

Apr 10, 15:44 EDT
Update - Webhooks and Slack notifications are still catching up.

Apr 10, 15:02 EDT
Update - Webhooks and Slack notifications are still catching up.

Apr 10, 14:33 EDT
Update - Notifications for ServiceNow and Microsoft Teams are now caught up. Webhooks and Slack notifications are still catching up.

Apr 10, 13:37 EDT
Update - Notifications for JIRA are now caught up.

Apr 10, 13:12 EDT
Update - Notifications for Opsgenie, VictorOps, BigPanda, Zendesk, and Sumologic are caught up. Webhooks, Microsoft Teams, Slack, JIRA, and ServiceNow are still catching up.

Apr 10, 12:15 EDT
Update - We are still catching-up on past notifications that may have been delayed.

Apr 10, 11:59 EDT
Update - We are still catching-up on past notifications that may have been delayed.

Apr 10, 11:42 EDT
Monitoring - New notifications are now successfully delivered, and we are catching-up on past notifications that may have been delayed

Apr 10, 11:42 EDT
Update - New notifications are now successfully delivered, and we are catching-up on past notifications that may have been delayed.

Apr 10, 11:23 EDT
Update - We are continuing to work on a fix for this issue.

Apr 10, 11:00 EDT
Identified - The issue has been identified and a fix is being implemented.

Apr 10, 10:54 EDT
Investigating - We are investigating an issue to send notifications to Pagerduty, VictorOps, Slack, Webhook, Microsoft Teams and OpsGenie.

Started: 4/10/2026, 8:59:49 PM Resolved: N/A Duration: 1343h 5m

Mar 27, 15:30 EDT
Resolved - This incident has been resolved.

Mar 27, 15:21 EDT
Monitoring - A fix has been implemented and we are monitoring the results.

Mar 27, 15:17 EDT
Identified - The issue has been identified and a fix is being implemented.

Mar 27, 14:57 EDT
Investigating - We are investigating increased latency in processing and storing Traces in APM.
As a result of this issue, some users may see missing or delayed traces in APM Trace Search since 5pm UTC. They may also experience delay in APM trace-based monitors.

Started: 3/27/2026, 7:30:04 PM Resolved: N/A Duration: 1680h 35m

Showing 10 of 25 resolved incidents