How to handle database outages?



5071 views Database Engineering



Why a database goes down?

An unexpected heavy load on your database can lead to a process crash or a massive slowdown.

Before jumping to the potential short-term and long-term solutions, ensure you monitor the database well. CPU, Memory, Disk, and Connections are being closely monitored.

Short term solutions

  • Kill the queries that have been running for a long time
  • Quickly scale up your database if you have been seeing a consistent heavy usage
  • Check if the recent deployment is the culprit; if so, revert asap
  • Reboot the database will calm the storm and buy you some time

Long term solutions

  • Ensure the right set of indexes is in place
  • Tune your database default parameters to gain optimal performance
  • Check for the notorious N+1 Queries
  • Upgrade the database version to get the best that DB can offer
  • Evaluate the need for Horizontal scaling using Replicas and Sharding

Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 38000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like



Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


Paid Courses

System Design for Beginners

A masterclass that helps early engineers and product managers become great at designing scalable systems.

300+ learners

Details →

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

1000+ learners

Details →

Redis Internals

Learn internals of Redis by re-implementing some of the core features in Golang.

98+ learners

Details →

Free Courses

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

823+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

651+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

1027+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

692+ learners

Details →