Observability

The term stands for “how well you can understand the internals of your system by examining its output”.

Workflow on errors

  • Go to Tempo, and search traces with an error. Maybe add some more filters
  • Select one of the traces
  • find the span with the error
  • get the log with the error

Correlations with Grafana, Loki and Tempo

  • Metrics are correlated with traces
  • traces are correlated with logs

Benefits

  • Logs that belong to same trace are grouped together
  • When you search for an error, you can search by trace, which is much less cluttered
  • For each trace and each span, we get the duration. So If something takes very long, we can see it from the trace.
  • If fronted receives errors with trace ids (and these are logged in in web analytics, or we show them to the user), then we can quickly find the corresponding error log.

Examples