Resilimap

Multi-cloud reliability monitoring

Site generated: 2026-06-05 20:04:21 UTC 6/5/2026, 8:04:21 PM local
Back to all providers

Github

Last updated: 6/5/2026, 8:03:01 PM

✓ All Systems Operational
Active Incidents
0
Resolved
25
Scheduled Maint.
0
Total Incidents
25
Total Maint.
0
Critical
0

Recently Resolved Incidents

Jun 5, 18:43 UTC
Update - We are still exploring options to restore the deleted subscriptions, and we will provide another update soon. In the meantime, customers can manually re-subscribe their Slack and Teams channels to repositories.

Jun 5, 18:05 UTC
Monitoring - The degradation has been mitigated. We are monitoring to ensure stability.

Jun 5, 18:04 UTC
Update - During 14:49 UTC to 16:45 UTC, customers may have experienced authorization failures for legitimate requests. This was caused by a recently enabled feature flag, which has now been turned off as a mitigation. Customers should now see normal authorization behavior. This is also the cause of the chat integration issue, and we are exploring options to restore it. In the meantime, customers can manually re-subscribe their repo.

Jun 5, 17:25 UTC
Update - Customers may see unexpected repo unsubscription events in their Slack or Teams channels.

Jun 5, 17:20 UTC
Investigating - We are investigating reports of impacted performance for some GitHub services.

AI Analysis
Impact: major
Categories: authentication, api
Users: all-users
Root Cause: A recently enabled feature flag caused authorization failures, leading to the accidental deletion of subscriptions for chat integrations (Slack and MS Teams).
Started: 6/5/2026, 6:43:49 PM Resolved: N/A Duration: 1h 21m

Live updates degraded

Resolved Unknown

Jun 4, 20:32 UTC
Resolved - Everything is operating normally.

Jun 4, 20:20 UTC
Investigating - We are investigating reports of impacted performance for some GitHub services.

Started: 6/4/2026, 8:32:07 PM Resolved: N/A Duration: 23h 33m

Jun 4, 19:59 UTC
Resolved - This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

Jun 4, 19:59 UTC
Update - This issue is now fully resolved and Copilot Code Review is working as expected.

Jun 4, 19:41 UTC
Update - The mitigation for Copilot Code Review is now fully deployed, and new reviews are working as expected.  We are continuing to monitor for full resolution.

 Customers may need to re-request Copilot Code Review. Copilot Code Review Actions runs running for longer than 20 minutes may be safely cancelled.

Jun 4, 19:07 UTC
Monitoring - The degradation has been mitigated. We are monitoring to ensure stability.

Jun 4, 19:07 UTC
Update - The mitigation for Copilot Code Review is now fully deployed, and new reviews are working as expected.

Customers may need to re-request Copilot Code Review.  Copilot Code Review Actions runs running for longer than 20 minutes may be safely cancelled.

Jun 4, 18:52 UTC
Update - The mitigation for Copilot Code Review is rolling out and we are seeing early signs of recovery.

Jun 4, 18:22 UTC
Update - We have identified that Copilot Code Review users may see "Copilot ran into an error" on Pull Requests that requested Copilot Code Review.

A mitigation is in progress, we expect mitigation in approximately 30m.

GitHub Enterprise Cloud with Data Residency is not impacted.

Jun 4, 18:03 UTC
Update - We have identified that Copilot Code Review.  Users may see "Copilot ran into an error" on Pull Requests that requested Copilot Code Review.

A mitigation is in progress.

Jun 4, 18:02 UTC
Investigating - We are investigating reports of impacted performance for some GitHub services.

Started: 6/4/2026, 7:59:27 PM Resolved: N/A Duration: 24h 5m

Jun 4, 04:11 UTC
Resolved - Between June 1, 2026, 23:00 UTC and June 4, 2026 04:11 UTC, customers experienced delays in Dependabot scheduled version updates.

Pull request creation for version updates was delayed, with delays increasing over time and reaching up to two days. Approximately 1.5 million repositories with active Dependabot version update configurations were affected. Dependabot security updates were not affected. The primary cause was changes to an internal platform service that routes requests for Dependabot and other services.

We mitigated the incident by deploying a fix that enables batch enqueuing of update jobs, which significantly increased processing throughput. Once the backlog was drained, Dependabot returned to normal processing times.

To reduce the risk of recurrence, we are working on tuning batch size and concurrency limits for Dependabot update job processing. We are also adding monitoring for job processing lag to enable earlier detection and faster mitigation of similar issues.

Jun 4, 04:11 UTC
Update - Job lag has recovered to within normal operating thresholds. We are declaring this incident closed and will follow up with a summary soon.

Jun 4, 02:37 UTC
Update - Job lag has recovered from a peak of 1.71 days to 9h 9m at 19:29 UTC and continues to decrease. Backlog is draining at a healthy rate with no signs of reversal. New jobs are processing on schedule. Remaining lag will continue to drain over the next few hours as queued work completes; this is expected post-incident catch-up, not active impact. We will continue monitoring and re-engage if lag trend reverses.

Jun 4, 02:35 UTC
Monitoring - The degradation has been mitigated. We are monitoring to ensure stability.

Jun 3, 23:38 UTC
Update - We have applied mitigations and are continuing to see improvements in the Dependabot scheduled version updates.

Next update in 12 hours.

Jun 3, 21:10 UTC
Update - We are preparing a mitigation for the delayed Dependabot scheduled version updates.

Next update in 2 hours.

Jun 3, 20:18 UTC
Update - Customers may see delays of up to two days in Dependabot version updates.

Dependabot Security updates are not delayed.

The team is investigating mitigations for the backlog.

Next update in 1 hour.

Jun 3, 19:43 UTC
Update - We're seeing delays in Dependabot scheduled version update runs. Our team is actively working on a fix and will share updates as the situation develops.

Jun 3, 19:42 UTC
Investigating - We are investigating reports of impacted performance for some GitHub services.

Started: 6/4/2026, 4:11:59 AM Resolved: N/A Duration: 39h 53m

Jun 3, 06:46 UTC
Resolved - On June 2, 2026, between 21:54 UTC and June 3, 2026 06:45 UTC, the Spark service was degraded and users were unable to store or retrieve data for their Spark apps in one of our hosting regions. Users could still make changes to their app configuration during this time. The error rate peaked at 25% of affected requests to the service. Impact was limited to users whose requests were served through a single affected region; 43 users experienced errors during this window.

The root cause was a configuration that referenced a service component by a fixed address rather than a dynamic service endpoint. When the component was replaced, requests could no longer reach the fixed address and began to fail. We resolved the incident by updating the configuration to use a our standard service endpoints that are resilient to component replacement. Recovery time was extended because replacing the component required overrides to a temporary deployment safeguard.

We are working to add validation that prevents fixed infrastructure addresses from being used in application configuration outside of test environments and to improve our monitoring to reduce our time to detect.

Jun 3, 06:45 UTC
Monitoring - The degradation has been mitigated. We are monitoring to ensure stability.

Jun 3, 06:02 UTC
Update - We are investigating reports of issues with service(s): Spark. We will continue to keep users updated on progress towards mitigation.

Jun 3, 03:45 UTC
Update - We are investigating reports of impacted performance for some GitHub services.

---
Relevant stamps: dotcom

Jun 3, 03:13 UTC
Investigating - We are investigating reports of impacted performance for some GitHub services.

Started: 6/3/2026, 6:46:59 AM Resolved: N/A Duration: 61h 18m

Jun 2, 00:17 UTC
Resolved - This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

Jun 1, 21:59 UTC
Update - We have identified the root cause and applied mitigations to address delays in billing updates and are continuing to see improvement in the processing rate. We will continue to monitor the progress and will provide an update in few hours.

GitHub Enterprise Cloud with Data Residency is not impacted.

Jun 1, 19:27 UTC
Update - We are continuing to investigate delayed billing updates, on GitHub.com.  We have applied additional mitigations are continuing to see signs of improvement, and are continuing to work to improve the processing rate. We will continue to keep users updated on progress towards mitigation.

GitHub Enterprise Cloud with Data Residency is not impacted.

Next update in 2 hours.

Jun 1, 18:48 UTC
Update - We are continuing to investigate delayed billing updates, on GitHub.com.  We have applied additional mitigations and are seeing some more signs of improvement, and are continuing to work to improve the processing rate. We will continue to keep users updated on progress towards mitigation.

GitHub Enterprise Cloud with Data Residency is not impacted.

Jun 1, 17:28 UTC
Update - We are continuing to investigate delayed billing updates, on GitHub.com.   We have applied multiple mitigations and are seeing some signs of improvement, and are continuing to work to improve the processing rate.  We will continue to keep users updated on progress towards mitigation.

GitHub Enterprise Cloud with Data Residency is not impacted.

Jun 1, 16:42 UTC
Update - We are investigating reports of delayed billing updates, on GitHub.com. We are continuing to investigate delays in our job processing architecture. We are attempting to mitigate at the infrastructure level.  Code scanning runs and notifications have recovered.  We will continue to keep users updated on progress towards mitigation.

GitHub Enterprise Cloud with Data Residency is not impacted.

Jun 1, 15:43 UTC
Update - We are investigating reports of delayed code scanning runs, billing updates, email and mobile push notifications. We are investigating delays in our job processing architecture. We will continue to keep users updated on progress towards mitigation.

Jun 1, 15:17 UTC
Update - We are investigating reports of delayed code scanning runs and delayed billing updates.  We will continue to keep users updated on progress towards mitigation.

Jun 1, 15:17 UTC
Investigating - We are investigating reports of impacted performance for some GitHub services.

Started: 6/2/2026, 12:17:58 AM Resolved: N/A Duration: 91h 47m

May 28, 20:41 UTC
Resolved - On May 28th, 2026, between approximately 18:27 and 20:41 UTC, the GitHub Copilot service was degraded due to an issue with the Responses API of an upstream provider affecting the GPT-5.2, GPT-5.3-Codex, GPT-5.4, and GPT-5.5 models. Requests routed to these models via the Responses API returned elevated error rates, which also affected Copilot coding agent and Copilot code review. No other models were impacted.

We mitigated the incident by shifting traffic away from the affected models while the upstream provider deployed a fix.

GitHub is working to improve automated failover for the affected models and strengthen monitoring to prevent similar incidents in the future.

May 28, 20:06 UTC
Update - Open AI models are currently unavailable. We are shifting requests to other models to reduce impact.

May 28, 19:40 UTC
Update - We are investigating errors with Copilot requests using OpenAI models

May 28, 19:20 UTC
Update - Copilot is experiencing degraded performance. We are continuing to investigate.

May 28, 19:01 UTC
Investigating - We are investigating reports of impacted performance for some GitHub services.

Started: 5/28/2026, 8:41:58 PM Resolved: N/A Duration: 191h 23m

May 28, 19:07 UTC
Resolved - On May 28, 2026, between 19:07 UTC and 19:16 UTC, multiple GitHub services experienced elevated error rates. This was due to a change that was partially deployed to an authentication service, causing errors for dependent services including the web experience, REST API, Git operations, and GitHub Actions. At peak impact, 10% of GitHub Actions runs failed to queue or encountered errors while downloading actions. We mitigated the incident by rolling back the change.

We are expanding test coverage and improving our deployment validation process to prevent recurrence of this issue in the future.

Started: 5/28/2026, 7:07:00 PM Resolved: N/A Duration: 192h 58m

May 28, 01:32 UTC
Resolved - On May 28, 2026, between 00:54 UTC and 01:19 UTC, some users experienced errors when interacting with the Webhooks API, including webhook delivery history and configuration endpoints. On average, the error rate was 0.28% and peaked at 0.45%. This was due to a bug that caused a single Kubernetes pod to enter a CrashLoopBackOff after receiving a 500 with an empty response body from Cosmos DB.

We mitigated the incident by restarting the service. To prevent future incidents, we are pushing a change to handle this response scenario from Cosmos DB appropriately.

May 28, 01:27 UTC
Monitoring - The degradation affecting Webhooks has been mitigated. We are monitoring to ensure stability.

May 28, 01:13 UTC
Investigating - We are investigating reports of degraded performance for Webhooks

Started: 5/28/2026, 1:32:27 AM Resolved: N/A Duration: 210h 32m

May 27, 13:16 UTC
Resolved - On May 27, 2026, between 12:07 UTC and 13:16 UTC, users experienced degraded performance for Git operations, Pull Requests, Issues, GraphQL API, and related services on github.com. During this time, operations that depended on Git file servers experienced elevated error rates (3.5% of pushes via HTTPS and 0.2% of pushes via SSH failed; no fetches/clones failed). An internal analytics component generated unexpectedly high load, which caused CPU saturation on the underlying infrastructure. This led to cascading slowdowns and errors across services that depend on Git operations. The issue was mitigated by stopping the offending component. Services began recovering shortly after mitigation and were fully restored by 13:16 UTC. We are taking steps to add resource limits and kill switches for internal analytics components to prevent similar issues in the future.

May 27, 12:54 UTC
Update - We're continuing to investigate degraded performance of Git operations, Issues and Pull requests.

May 27, 12:10 UTC
Investigating - We are investigating reports of degraded performance for API Requests, Git Operations, Issues and Pull Requests

Started: 5/27/2026, 1:16:54 PM Resolved: N/A Duration: 222h 48m

Showing 10 of 25 resolved incidents