Analysis Templates Reference
An AnalysisTemplate is a resource that defines how to perform verification
testing, including:
- Container images and commands to run
- Queries to external monitoring tools
- How to interpret results from metric providers
- Success or failure criteria
- Frequency and duration of measurements
AnalysisTemplate resources (and the AnalysisRun resources that are spawned
from them) are CRDs re-used from the
Argo Rollouts project. They were
intentionally built to be useful in contexts other than Argo Rollouts. Re-using
this resource type to define verification processes means those processes
benefit from this rich and battle-tested feature of Argo Rollouts.
This reference guide is intended to give a brief introduction to
AnalysisTemplates for some common use cases. Please consult the
relevant sections
of the Argo Rollouts documentation for comprehensive coverage of the full
range of AnalysisTemplate capabilities.
AnalysisTemplates integrate natively with many popular open-source and
commercial monitoring tools, including:
In addition to monitoring tools, analysis can integrate with internal systems by:
- Running containerized processes as Kubernetes Jobs
- Making HTTP requests and interpreting JSON responses
Arguments
AnalysisTemplates may declare a set of arguments that can be "passed" in by
the Stage. The arguments are resolved at the time the AnalysisRun is
created and can then be referenced in metrics configuration. Arguments are
dereferenced using the syntax: {{ args.<name> }}.
Unlike Kargo promotion processes, which require expressions to be enclosed
within ${{ }}, Argo Rollouts AnalysisTemplates require expressions to be
enclosed within {{ }} (i.e. without $).
The following example shows an AnalysisTemplate with three arguments. Values
for arguments can have a default value, supplied by the Stage, or obtained
from a Secret if the value is sensitive (e.g. a bearer token for an HTTP
request):
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: args-example
spec:
  args:
  # An argument can specify a value to be used as its default.
  # This will be overridden by a value supplied by the Stage.
  - name: api-url
    value: http://example/measure
  # If an argument specifies no value, it is considered a required
  # argument and must be supplied by the Stage.
  - name: service-name
  # Arguments can be obtained from a Secret in the Project Namespace
  - name: api-token
    valueFrom:
      secretKeyRef:
        name: token-secret
        key: apiToken
  metrics:
  - name: webmetric
    successCondition: result == 'true'
    provider:
      web:
        # placeholders are resolved when an AnalysisRun is created
        url: "{{ args.api-url }}?service={{ args.service-name }}"
        headers:
        - key: Authorization
          value: "Bearer {{ args.api-token }}"
        jsonPath: "{$.results.ok}"
Success Condition
When interpreting the result of a query, an
Expression Language expression can be used to evaluate
the response. The response payload is set in a variable result. The following
will interpret the response of a Prometheus query, and require that the element
of the returned vector is greater than or equal to 0.95:
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    # Expr expression that can be evaluated to true or false
    # NOTE: prometheus queries return results in the form of a vector.
    # So it is common to access the index 0 of the returned array to obtain the value
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: "http://prometheus.example.com:9090"
        query: |
          sum(irate(
            istio_requests_total{reporter="source",response_code!~"5.*"}[5m]
          )) /
          sum(irate(
            istio_requests_total{reporter="source"}[5m]
          ))
Failure Conditions and Limits
As an alternative to successCondition, a failureCondition can be used to
describe when a measurement is considered failed. Additionally, failureLimit
can also be used to specify the maximum number of failed measurements that are
allowed before the entire AnalysisRun is considered Failed.
The following example continually polls a Prometheus server to get the total
number of errors (i.e., HTTP response code >= 500) every five minutes, causing
the measurement to fail if ten or more errors are encountered. The entire
AnalysisRun is considered to have Failed after three failed measurements.
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: failure-condition-example
spec:
  metrics:
  - name: total-errors
    interval: 5m
    failureCondition: result[0] >= 10
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus.example.com:9090
        query: |
          sum(irate(
            istio_requests_total{reporter="source",response_code=~"5.*"}[5m]
          ))
Delaying Measurements
In some scenarios, it may be necessary to delay the start of a metric
measurement. For example, some time may need to pass after an update in order
for new data to populate in the monitoring services. The initialDelay option
can be used to delay the start of measurements. Each metric can be configured
to have a different delay.
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: initial-delay-example
spec:
  metrics:
  - name: success-rate
    # Duration before measurement collection. Default is no delay
    initialDelay: 5m
    successCondition: result[0] >= 0.90
    provider:
      prometheus:
        address: http://prometheus.example.com:9090
        query: ...
Example Metric Types
Web
An HTTP request can be performed against some external service to obtain the measurements.
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: web-metric-example
spec:
  args:
  - name: api-token
    valueFrom:
      secretKeyRef:
        name: token-secret
        key: apiToken
  metrics:
  - name: webmetric
    successCondition: result == true
    provider:
      web:
        url: "http://example.com/api/v1/measurement"
        # HTTP Method. valid values are GET|POST|PUT. Defaults to GET
        method: POST
        # Timeout for the request. Defaults to 10 seconds
        timeoutSeconds: 20 
        headers:
        - key: Authorization
          value: "Bearer {{ args.api-token }}"
          # if body is a json, it is recommended to set the Content-Type
        - key: Content-Type 
          value: "application/json"
        # Requst body to send. 
        body: |
          {"foo": "bar"}
        # Optional JSON path to set the value of `result` in successCondition/failureCondition
        jsonPath: "{$.data.ok}"
Job
A Kubernetes Job can be used to perform analysis. When a Job is used, the
metric is considered successful if the Job completes with an exit code of
zero and is otherwise considered to have failed.
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: integration-test
  namespace: guestbook
spec:
  metrics:
  - name: integration-test
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: sleep
                image: alpine:latest
                command: [sleep, "10"]
              restartPolicy: Never
          backoffLimit: 1