AWS Cloud Governance Part 4: Monitoring and observabilityWill Pittman
How are my services performing? Are we experiencing an outage? Does the expected state of my environment align with reality? These are all traditional questions commonly asked when working in any technological space—and are no different when operating in the cloud.
Though with the increased functionality and usability offered by the cloud, the need to implement observable environments coupled with a complete monitoring strategy is imperative. AWS provides a wide breadth of native services that help accelerate monitoring and observability functions when operating on their platform. When used correctly, these services help teams identify issues quickly, respond accordingly, and resolve them before they evolve into a larger problem.
Monitoring vs. observability
The practice of monitoring is what makes something observable. Though they go hand-in-hand, it is important to understand the difference between the two. Monitoring provides the data points, while observability provides understanding and awareness from the metrics being collected.
Monitoring delivers information on what is currently happening within a given system or application through the active collection of logs, performance metrics, and traces. This information is based on known, predefined data points that can be used to generate alerts when abnormalities arise. While these data points can identify when something might be wrong, it doesn’t provide information as to why.
AWS provides a set of powerful tools that help monitor all aspects of your cloud environment, allowing for the collection of actionable metrics from the moment you start using the platform. For account monitoring, AWS CloudTrail logs user and API activity across your AWS infrastructure. Enabling CloudTrail for each of your AWS accounts provides an audit trail of actions across your organisation to help identify unusual or unwanted actions.
AWS CloudWatch, Amazon’s premier monitoring platform, natively integrates with over 70 AWS services to provide out-of-the-box metrics, log collection, and alert generation all within a single service. In addition to CloudWatch’s built-in functions, you can also stream in custom metrics and logs from an application to provide further functionality. For end-to-end tracing on distributed applications, AWS X-Ray allows teams to analyse and debug in real time.
Observability is the function that allows for analysis and understanding of a system’s performance based on the metrics it is generating. The focus of observability is the process by which you bring together multiple data points to identify and explore undefined patterns across multiple layers of your environment.
When monitoring can’t explain why an issue exists, observability steps in. Though this can be difficult to achieve in a cloud setting due to the sheer number of managed and un-managed services utilised by a given application. Sufficient monitoring must be enabled across the management control plane, application and network layers, and any additional platform as a service offerings incorporated into your application stack.
Fortunately, AWS offers several services to help achieve greater observability for workloads running on the cloud. Beyond CloudWatch’s monitoring capabilities, it offers additional features such as visualisation dashboards to view multiple metrics across multiple systems at once, CloudWatch Insights to analyse time-series data in real time, and CloudWatch Synthetics that can perform the same actions an end-user would, providing performance awareness for your application. AWS also offers Amazon Managed Grafana and Amazon Managed Service for Prometheus, providing teams well-known platforms used for querying, analysis, and visualisation of log and metric data.
The 5 Ws of monitoring and observability
Monitoring and observability are mutually dependent on each other: monitoring alone reduces the ability to understand the root of an issue and you can’t have meaningful observations without complete data sets. Though monitoring and observational processes that were established without thoroughly thinking through the desired outcome for those systems can be just as bad as not having either of them in place at all. To help avoid the misapplication of these services in AWS, it is important to think through the Ws of monitoring and observability.
The Ws of monitoring and observability
Once you have established a monitoring and observability strategy within your AWS environment, it is important to regularly check to see if that strategy is performing well for your organisation and if it still aligns with the needs of your business. Monitoring and observability is not a one-time setup. As your cloud presence transforms over time, your monitoring and observational capabilities will need to transform with it.
Some important data points that can help determine success are:
- Mean time to recovery: How fast was your system able to detect an issue and your team able to resolve it?
- False negatives/positives: How many events either warranted no real action or went undetected by your monitoring and observation systems?
- Unactionable alerts: How many alerts were generated but were unable to be acted upon due to lack of clarity of the alert itself or what was generating it?
- Alert quantity and acknowledgement: How many alerts were generated during a given time period and how many were acknowledged by team members?
Businesses are moving their applications to AWS at an increasing pace, placing the need for a cloud-tailored monitoring and observability strategy near the top of the list to ensure success. The accessibility and functionality that AWS promises from a feature standpoint are no different from the catalogue of monitoring and observation tools it offers. Though just because those services are offered and the individual configuration of them may seem simple, the ability to bring all those tools together to generate meaningful outcomes isn’t so simple.
How we can help
Credera is passionate about helping organisations foster cloud enablement that drives successful cloud adoption and valuable business outcomes. Our unique expertise in corporate strategy, innovation, and application development enables us to bring a holistic approach to your cloud adoption journey. To learn more, please get in touch with a member of our team.
AWS Cloud Governance Part 1: Three keys to starting your AWS governance journey
AWS Cloud Governance Part 2: Centralised account management and organisation controls
AWS Cloud Governance Part 3: Compliance, security, & cost management