{ miguel alho }

The OpenTelemetry Bootcamp: Sampling and dealing with high volumes

My notes & takeaways (14)

Kind of worried, as I heard “three pillars” being mentioned… but still, it’s one of the common models used.

Traces are something that are mostly automatic.

Traces are expensive : there’s a cost to CPU / Memory used, cost in data transfer to cloud provider, and cost in storage used to save them.

The cost of the tooling (Jaeger) is not as relevant when compared to the above costs.

Trace sample percentage can / should be chosen based on use case. Cost analysis can help define percentage ranges. Different percentages can be used within the use case (example: 100% of errors but 10 of the rest)

Head sampling : decide to keep sample when the span starts. SDK makes the decision, normally.

Tail sampling: decide to keep sample when trace completes. Collector can make this choice.

Tail sampling is a part of otel-collector-contrib_tail_sampling package / plugin, added to the collector.

When using tail sampling in the collector, samples are collected during a period of time (decision_wait), though that means it is subject to failure/trace loss if the collector fails (out of memory, etc)

Interesting conditions to choose on: