Logging, Monitoring, and Observability in Google Cloud (LMOGC) – Outline

Detailed Course Outline

Module 1 - Introduction to Google Cloud Monitoring Tools

  • Understand the purpose and capabilities of Google Cloud operations-focused components: Logging, Monitoring, Error Reporting, and Service Monitoring
  • Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and Profiler

Module 2 - Avoiding Customer Pain

  • Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation
  • Measure customer pain with SLIs
  • Define critical performance measures
  • Create and use SLOs and SLAs
  • Achieve developer and operation harmony with error budgets

Module 3 - Alerting Policies

  • Develop alerting strategies
  • Define alerting policies
  • Add notification channels
  • Identify types of alerts and common uses for each
  • Construct and alert on resource groups
  • Manage alerting policies programmatically

Module 4 - Monitoring Critical Systems

  • Choose best practice monitoring project architectures
  • Differentiate Cloud IAM roles for monitoring
  • Use the default dashboards appropriately
  • Build custom dashboards to show resource consumption and application load
  • Define uptime checks to track aliveness and latency

Module 5 - Configuring Google Cloud Services for Observability

  • Integrate logging and monitoring agents into Compute Engine VMs and images
  • Enable and utilize Kubernetes Monitoring
  • Extend and clarify Kubernetes monitoring with Prometheus
  • Expose custom metrics through code, and with the help of OpenCensus

Module 6 - Advanced Logging and Analysis

  • Identify and choose among resource tagging approaches
  • Define log sinks (inclusion filters) and exclusion filters
  • Create metrics based on logs
  • Define custom metrics
  • Use Error Reporting to link application errors to Logging
  • Export logs to BigQuery

Module 7 - Monitoring Network Security and Audit Logs

  • Collect and analyze VPC Flow logs and Firewall Rules logs
  • Enable and monitor Packet Mirroring
  • Explain the capabilities of Network Intelligence Center
  • Use Admin Activity audit logs to track changes to the configuration or metadata of resources
  • Use Data Access audit logs to track accesses or changes to user-provided resource data
  • Use System Event audit logs to track GCP administrative actions

Module 8 - Managing Incidents

  • Define incident management roles and communication channels
  • Mitigate incident impact
  • Troubleshoot root causes
  • Resolve incidents
  • Document incidents in a post-mortem process

Module 9 - Monitoring Network Security and Audit Logs

  • Collect and analyze VPC Flow logs and Firewall Rules logs.
  • Enable and monitor Packet Mirroring.
  • Explain the capabilities of Network Intelligence Center.
  • Use Admin Activity audit logs to track changes to the configuration or metadata of resources.
  • Use Data Access audit logs to track accesses or changes to user-provided resource data.
  • Use System Event audit logs to track GCP administrative actions.

Module 10 - Optimizing Stackdriver Costs

  • Understand Stackdriver billing
  • Analyze Stackdriver resource utilization
  • Implement best practices for Stackdriver cost control