Monitoring vs Observability
Observability became mainstream just a few years ago but it's been everywhere since then so one would think that it's clear by now how does it differ from monitoring. Turns out it's not, at least not to all. As a consultant I'm being asked that question still to this day. So, here's a short brain dump from my side on the topic :)
Monitoring and observability are indeed related to each other. Both are essential concepts in the DevOps/Cloud world. But while they are related and often used together or interchangeably, they do have distinct meanings.
TLDR: while monitoring informs you when something is wrong, observability helps you understand why it's wrong.
In a modern system, especially with microservices architecture, observability becomes more important because these systems can be complex and dynamic. With all cloud-native tools and services it's not always feasible to predict all the possible problems that can arise. Therefore, having the capability to dive deep and ask arbitrary questions about system behaviour is much more valuable and useful than a typical "passive" monitoring.
Monitoring:
- Definition: Monitoring is the practice of gathering and checking metrics and logs to ensure that systems and applications are running as expected. It's very passive process, meaning, you gather data that later can be analysed or looked at on a dashboard.
- Focus: It's often associated with known issues or known performance indicators. For instance, you might monitor CPU utilization, memory usage, or disk I/O to ensure they remain within acceptable thresholds. But with typical monitoring you most likely won't be able to easily tell why is something happening.
- Reactive: Monitoring is often about setting up alerts based on predefined thresholds. When something goes wrong, for example, a server running out of memory, you get an alert.
Observability:
- Definition: Observability is more of an active process of "understanding" the data. By measuring all the "outputs" (logs, metrics, etc) of the system you try to determine what is exactly happening in the system.. It's about understanding the "why" behind the system's behaviour.
- Focus: Observability focuses on active debugging and allows engineers to ask arbitrary questions about their system without having known ahead of time what they wanted to ask. With monitoring you may see a spike in CPU usage, with observability you'll be able to easily tell what is causing the spike.
- Proactive: It's about exploring data, understanding system behaviour, and diagnosing issues without necessarily having predefined queries or thresholds.