1. Introduction

Ansible is simple, flexible, and powerful. Like any powerful tool, there are many ways to use it, some better than others.

This document aims to gather good practices from the field of Ansible practitioners at Red Hat, consultants, developers, and others. And thus it strives to give any Red Hat employee, partner or customer (or any Ansible user) a guideline from which to start in good conditions their automation journey.

Those are opinionated guidelines based on the experience of many people. They are not meant to be followed blindly if they don’t fit the reader’s specific use case, organization or needs; there is a reason why they are called good and not best practices.

The reader of this document is expected to have working practice of Ansible. If they are new to Ansible, the Getting started section of the official Ansible documentation is a better place to start.

This document is split in six main sections. Each section covers a different aspect of automation using Ansible (and in a broader term the whole Red Hat Ansible Automation Platform, including Ansible Tower):

  1. structures: we need to know what to use for which purpose before we can delve into the details, this section explains this.

  2. roles: as we recommend to use roles to host the most actual Ansible code, this is also where we’ll cover the more low level aspects of code (tasks, variables, etc…​).

  3. collections

  4. playbooks

  5. inventories

  6. plugins

Each section is then made of guidelines, one sentence hopefully easy to remember, followed by description, rationale and examples. The HTML version of this document makes the content collapsable so that all guidelines can be seen at once in a very overseeable way, for the reader to uncollapse the content of guidelines they is interested in.

A rationale is expected for each good practice, with a reference if applicable. It is really helpful to know not only how to do certain things, but why to do them in this way. It will also help with further revisions of the standards as some items may become obsolete or no longer applicable. If the reason is not included, there is a risk of keeping items that are no longer applicable, or alternatively blindly removing items that should be kept. It also has great educational value for understanding how things actually work (or how they don’t).

1.1. Where to get and maintain this document

This document is published to https://redhat-cop.github.io/automation-good-practices/, it is open source and its source code is maintained at https://github.com/redhat-cop/automation-good-practices/.

2. Automation structures

Before we start to describe roles, playbooks, etc, we need to decide which one we use for what. This section is meant for topics which span across multiple structures and don’t fit nicely within one.

2.1. Define which structure to use for which purpose

Details
Explanations

define for which use case to use roles, playbooks, potentially workflows (in Ansible Tower/AWX), and how to split the code you write.

Rationale

especially when writing automation in a team, it is important to have a certain level of consistence and make sure everybody has the same understanding. By lack of doing so, your automation becomes unreadable and difficult to grasp for new members or even for existing members.

This structure will also help you to have a consistent level of modelization so that re-usability becomes easier. If one team member uses roles where another one uses playbooks, they will both struggle to reuse the code of each other. Metaphorically speaking, only if stones have been cut at roughly the same size, can they be properly used to build a house.

Examples

The following is only one example of how to structure your content but has proven robust enough on multiple occasions.

a hierarchy of landscape type function and component
Figure 1. Structure of Automation
  • a landscape is anything you want to deploy at once, the underlay of your cloud, a three tiers application, a complete application cluster…​ This level is represented at best by a Tower/AWX workflow, potentially by a "playbook of playbooks", i.e. a playbook made of imported type playbooks, as introduced next.

  • a type must be defined such that each managed host has one and only one type, applicable using a unique playbook.

  • each type is then made of multiple functions, represented by roles, so that the same function used by the same type can be re-used, written only once.

  • and finally components are used to split a function in maintainable bits. By default a component is a task file within the function-role, if the role becomes too big, there is a case for splitting the function role into multiple component roles.

    if functions are defined mostly for re-usability purposes, components are more used for maintainability / readability purposes. A re-usable component might be a candidate for promotion to a function.

    Let’s have a more concrete example to clarify:

  • as already written, a landscape could be a three tier application with web-front-end, middleware and database. We would probably create a workflow to deploy this landscape at once.

  • our types would be relatively obvious here as we would have "web-front-end server", "middleware server" and "database server". Each type can be fully deployed by one and only one playbook (avoid having numbered playbooks and instructions on how to call them one after the other).

  • each server type is then made up of one or more functions, each implemented as a role. For example, the middleware server type could be made of a "virtual machine" (to create the virtual machine hosting the middleware server), a "base Linux OS" and a "JBoss application server" function.

  • and then the base OS role could be made of multiple components (DNS, NTP, SSH, etc), each represented by a separate tasks/{component}.yml file, included or imported from the tasks/main.yml file of the function-role. If a component becomes too big to fit within one task file, it might make sense that it gets its own component-role, included from the function-role.

    in terms of re-usability, see how you could simply create a new "integrated three tiers server" type (e.g. for test purposes), by just re-combining the "virtual machine", "base Linux OS", "JBoss application server", "PostgreSQL database" and "Apache web-server" function-roles into one new playbook.

Beware that those rules, once defined, shouldn’t be applied too strictly. There can always be reasons for breaking the rules, and sometimes the discussion you can have in the team to decide what is what is more important. For example if a "hardened Linux OS" and a "normal Linux OS" are two different functions, or the same function with different parameters. You could consider SSH to be a function on its own and not a component of the base OS. Also, external re-usable roles and collections, obviously not respecting your rules, might force you to bend them. Important is to break the rules not by ignorance of those but because of good and practical reasons. Respecting the rules is to know and acknowledge them, not to follow them blindly even if they don’t make sense. As long as exceptions are discussed openly in the team, they won’t hurt.

3. Roles Good Practices for Ansible

this section has been rewritten, using OASIS metastandards repository as a starting place. If you have anything to add or review, please comment.

3.1. Role design considerations

3.1.1. Basic design

Details
Explanations

Design roles focused on the functionality provided, not the software implementation.

Rationale

Try to design roles focused on the functionality, not on the software implementation behind it. This will help abstracting differences between different providers, and help the user to focus on the functionality, not on technical details.

Examples

For an example, designing a role to implement an NTP configuration on a server would be a role. The role internally would have the logic to decide whether to use ntpd, chronyd, and the ntp site configurations. However, when the underlying implementations become too divergent, for example implementing an email server with postfix or sendmail, then separate roles are encouraged.

3.1.2. Role Structure

Details
Explanations

New roles should be initiated in line, with the skeleton directory, which has standard boilerplate code for a Galaxy-compatible Ansible role and some enforcement around these standards

Rationale

A consistent file tree structure will help drive consistency and reusability across the entire environment.

3.1.3. Role Distribution

Details
Explanations

Use semantic versioning for Git release tags. Use 0.y.z before the role is declared stable (interface-wise).

Rationale

There are some restrictions for Ansible Galaxy and Automation Hub. The versioning must be in strict X.Y.Z[ab][W] format, where X, Y, and Z are integers.

3.1.4. Naming parameters

Details
  • All defaults and all arguments to a role should have a name that begins with the role name to help avoid collision with other names. Avoid names like packages in favor of a name like foo_packages.

    Rationale

    Ansible has no namespaces, doing so reduces the potential for conflicts and makes clear what role a given variable belongs to.)

  • Same argument applies for modules provided in the roles, they also need a $ROLENAME_ prefix: foo_module. While they are usually implementation details and not intended for direct use in playbooks, the unfortunate fact is that importing a role makes them available to the rest of the playbook and therefore creates opportunities for name collisions.

  • Moreover, internal variables (those that are not expected to be set by users) are to be prefixed by two underscores: __foo_variable.

    Rationale

    role variables, registered variables, custom facts are usually intended to be local to the role, but in reality are not local to the role - as such a concept does not exist, and pollute the global namespace. Using the name of the role reduces the potential for name conflicts and using the underscores clearly marks the variables as internals and not part of the common interface. The two underscores convention has prior art in some popular roles like geerlingguy.ansible-role-apache). This includes variables set by set_fact and register, because they persist in the namespace after the role has finished!

  • Prefix all tags within a role with the role name or, alternatively, a "unique enough" but descriptive prefix.

  • Do not use dashes in role names. This will cause issues with collections.

3.1.5. Providers

Details

When there are multiple implementations of the same functionality, we call them “providers”. A role supporting multiple providers should have an input variable called $ROLENAME_provider. If this variable is not defined, the role should detect the currently running provider on the system, and respect it.

Rationale

users can be surprised if the role changes the provider if they are running one already. If there is no provider currently running, the role should select one according to the OS version.

Example

on RHEL 7, chrony should be selected as the provider of time synchronization, unless there is ntpd already running on the system, or user requests it specifically. Chrony should be chosen on RHEL 8 as well, because it is the only provider available.

The role should set a variable or custom fact called $ROLENAME_provider_os_default to the appropriate default value for the given OS version.

Rationale

users may want to set all their managed systems to a consistent state, regardless of the provider that has been used previously. Setting $ROLENAME_provider would achieve it, but is suboptimal, because it requires selecting the appropriate value by the user, and if the user has multiple system versions managed by a single playbook, a common value supported by all of them may not even exist. Moreover, after a major upgrade of their systems, it may force the users to change their playbooks to change their $ROLENAME_provider setting, if the previous value is not supported anymore. Exporting $ROLENAME_provider_os_default allows the users to set $ROLENAME_provider: "{{ $ROLENAME_provider_os_default }}" (thanks to the lazy variable evaluation in Ansible) and thus get a consistent setting for all the systems of the given OS version without having to decide what the actual value is - the decision is delegated to the role).

3.1.6. Distributions and Versions

Details
Explanations

Avoid testing for distribution and version in tasks. Rather add a variable file to "vars/" for each supported distribution and version with the variables that need to change according to the distribution and version.

Rationale

This way it is easy to add support to a new distribution by simply dropping a new file in to "vars/", see below Supporting multiple distributions and versions. See also Vars vs Defaults which mandates "Avoid embedding large lists or 'magic values' directly into the playbook." Since distribution-specific values are kind of "magic values", it applies to them. The same logic applies for providers: a role can load a provider-specific variable file, include a provider-specific task file, or both, as needed. Consider making paths to templates internal variables if you need different templates for different distributions.

3.1.7. Check Mode

Details
  • The role should work in check mode, meaning that first of all, they should not fail check mode, and they should also not report changes when there are no changes to be done. If it is not possible to support it, please state the fact and provide justification in the documentation. This applies to the first run of the role.

  • Reporting changes properly is related to the other requirement: idempotency. Roles should not perform changes when applied a second time to the same system with the same parameters, and it should not report that changes have been done if they have not been done. Due to this, using command: is problematic, as it always reports changes. Therefore, override the result by using changed_when:

  • Concerning check mode, one usual obstacle to supporting it are registered variables. If there is a task which registers a variable and this task does not get executed (e.g. because it is a command: or another task which is not properly idempotent), the variable will not get registered and further accesses to it will fail (or worse, use the previous value, if the role has been applied before in the play, because variables are global and there is no way to unregister them). To fix, either use a properly idempotent module to obtain the information (e.g. instead of using command: cat to read file into a registered variable, use slurp and apply .content|b64decode to the result like here), or apply proper check_mode: and changed_when: attributes to the task. more_info.

  • Another problem are commands that you need to execute to make changes. In check mode, you need to test for changes without actually applying them. If the command has some kind of "--dry-run" flag to enable executing without making actual changes, use it in check_mode (use the variable ansible_check_mode to determine whether we are in check mode). But you then need to set changed_when: according to the command status or output to indicate changes. See (https://github.com/linux-system-roles/selinux/pull/38/files#diff-2444ad0870f91f17ca6c2a5e96b26823L101) for an example.

  • Another problem is using commands that get installed during the install phase, which is skipped in check mode. This will make check mode fail if the role has not been executed before (and the packages are not there), but does the right thing if check mode is executed after normal mode.

  • To view reasoning for supporting why check mode in first execution may not be worthwhile: see here. If this is to be supported, see hhaniel’s proposal, which seems to properly guard even against such cases.

3.1.8. Idempotency

Details
Explanations

Reporting changes properly is related to the other requirement: idempotency. Roles should not perform changes when applied a second time to the same system with the same parameters, and it should not report that changes have been done if they have not been done. Due to this, using command: is problematic, as it always reports changes. Therefore, override the result by using changed_when:

Rationale

Additional automation or other integrations, such as with external ticketing systems, should rely on the idempotence of the ansible role to report changes accurately

3.1.9. Supporting multiple distributions and versions

Use Cases
  • The role developer needs to be able to set role variables to different values depending on the OS platform and version. For example, if the name of a service is different between EL8 and EL9, or a config file location is different.

  • The role developer needs to handle the case where the user specifies gather_facts: false in the playbook.

  • The role developer needs to access the platform specific vars in role integration tests without making a copy.

The recommended solution below requires at least some ansible_facts to be defined, and so relies on gathering some facts. If you just want to ensure the user always uses gather_facts: true, and do not want to handle this in the role, then the role documentation should state that gather_facts: true or setup: is required in order to use the role, and the role should use fail: with a descriptive error message if the necessary facts are not defined.

If it is desirable to use roles that require facts, but fact gathering is expensive, consider using a cache plugin List of Cache Plugins, and also consider running a periodic job on the controller to refresh the cache.

3.1.9.1. Platform specific variables
Details
Explanations

You normally use vars/main.yml (automatically included) to set variables used by your role. If some variables need to be parameterized according to distribution and version (name of packages, configuration file paths, names of services), use this in the beginning of your tasks/main.yml:

Examples
- name: Ensure ansible_facts used by role
  setup:
    gather_subset: min
  when: not ansible_facts.keys() | list |
    intersect(__rolename_required_facts) == __rolename_required_facts

- name: Set platform/version specific variables
  include_vars: "{{ __rolename_vars_file }}"
  loop:
    - "{{ ansible_facts['os_family'] }}.yml"
    - "{{ ansible_facts['distribution'] }}.yml"
    - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
    - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
  vars:
    __rolename_vars_file: "{{ role_path }}/vars/{{ item }}"
  when: __rolename_vars_file is file
  • Add this as the first task in tasks/main.yml:

    - name: Set platform/version specific variables
      include_tasks: tasks/set_vars.yml
  • Add files to vars/ for the required OS platforms and versions.

The files in the loop are in order from least specific to most specific:

  • os_family covers a group of closely related platforms (e.g. RedHat covers RHEL, CentOS, Fedora)

  • distribution (e.g. Fedora) is more specific than os_family

  • distribution_distribution_major_version (e.g. RedHat_8) is more specific than distribution

  • distribution_distribution_version (e.g. RedHat_8.3) is the most specific

See Commonly Used Facts for an explanation of the facts and their common values.

Each file in the loop list will allow you to add or override variables to specialize the values for platform and/or version. Using the when: item is file test means that you do not have to provide all of the vars/ files, only the ones you need. For example, if every platform except Fedora uses srv_name for the service name, you can define myrole_service: srv_name in vars/main.yml then define myrole_service: srv2_name in vars/Fedora.yml. In cases where this would lead to duplicate vars files for similar distributions (e.g. CentOS 7 and RHEL 7), use symlinks to avoid the duplication.

With this setup, files can be loaded twice. For example, on Fedora, the distribution_major_version is the same as distribution_version so the file vars/Fedora_31.yml will be loaded twice if you are managing a Fedora 31 host. If distribution is RedHat then os_family will also be RedHat, and vars/RedHat.yml will be loaded twice. This is usually not a problem - you will be replacing the variable with the same value, and the performance hit is negligible. If this is a problem, construct the file list as a list variable, and filter the variable passed to loop using the unique filter (which preserves the order):
- name: Set vars file list
  set_fact:
    __rolename_vars_file_list:
      - "{{ ansible_facts['os_family'] }}.yml"
      - "{{ ansible_facts['distribution'] }}.yml"
      - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
      - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"

- name: Set platform/version specific variables
  include_vars: "{{ __rolename_vars_file }}"
  loop: "{{ __rolename_vars_file_list | unique | list }}"
  vars:
    __rolename_vars_file: "{{ role_path }}/vars/{{ item }}"
  when: __rolename_vars_file is file

Or define your __rolename_vars_file_list in your vars/main.yml.

The task Ensure ansible_facts used by role handles the case where the user specifies gather_facts: false in the playbook. It gathers only the facts required by the role. The role developer may need to add additional facts to the list, and use a different gather_subset. See Setup Module for more information. Gathering facts can be expensive, so gather only the facts required by the role.

Using a separate task file for tasks/set_vars.yml allows role integration tests to access the internal variables. For example, if the role developer wants to pre-populate a VM with the packages used by the role, the following tasks can be used:

- hosts: all
  tasks:
    - name: Set platform/version specific variables
      include_role:
        name: my.fqcn.rolename
        tasks_from: set_vars.yml
        public: true

    - name: Install test packages
      package:
        name: "{{ __rolename_packages }}"
        state: present

In this way, the role developer does not have to copy and maintain a separate list of role packages.

3.1.10. Platform specific tasks

Details

Platform specific tasks, however, are different. You probably want to perform platform specific tasks once, for the most specific match. In that case, use lookup('first_found') with the file list in order of most specific to least specific, including a "default":

- name: Perform platform/version specific tasks
  include_tasks: "{{ lookup('first_found', __rolename_ff_params) }}"
  vars:
    __rolename_ff_params:
      files:
        - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
        - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
        - "{{ ansible_facts['distribution'] }}.yml"
        - "{{ ansible_facts['os_family'] }}.yml"
        - "default.yml"
      paths:
        - "{{ role_path }}/tasks/setup"

Then you would provide tasks/setup/default.yml to do the generic setup, and e.g. tasks/setup/Fedora.yml to do the Fedora specific setup. The tasks/setup/default.yml is required in order to use lookup('first_found'), which will give an error if no file is found.

If you want to have the "use first file found" semantics, but do not want to have to provide a default file, add skip: true:

- name: Perform platform/version specific tasks
  include_tasks: "{{ lookup('first_found', __rolename_ff_params) }}"
  vars:
    __rolename_ff_params:
      files:
        - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
        - "{{ ansible_facts['os_family'] }}.yml"
      paths:
        - "{{ role_path }}/tasks/setup"
      skip: true

NOTE:

  • Use include_tasks or include_vars with lookup('first_found') instead of with_first_found. loop is not needed - the include forms take a string or a list directly.

  • Always specify the explicit, absolute path to the files to be included, using {{ role_path }}/vars or {{ role_path }}/tasks, when using these idioms. See below "Ansible Best Practices" for more information.

  • Use the ansible_facts['name'] bracket notation rather than the ansible_facts.name or ansible_name form. For example, use ansible_facts['distribution'] instead of ansible_distribution or ansible.distribution. The ansible_name form relies on fact injection, which can break if there is already a fact of that name. Also, the bracket notation is what is used in Ansible documentation such as Commonly Used Facts and Operating System and Distribution Variance.

3.1.11. Supporting multiple providers

Details

Use a task file per provider and include it from the main task file, like this example from storage:

- name: include the appropriate provider tasks
  include_tasks: "main_{{ storage_provider }}.yml"

The same process should be used for variables (not defaults, as defaults can not be loaded according to a variable). You should guarantee that a file exists for each provider supported, or use an explicit, absolute path using role_path. See below "Ansible Best Practices" for more information.

3.1.12. Generating files from templates

Details
  • Add {{ ansible_managed | comment }} at the top of the template file file to indicate that the file is managed by Ansible roles, while making sure that multi-line values are properly commented. For more information, see Adding comments to files.

  • When commenting, don’t include anything like "Last modified: {{ date }}". This would change the file at every application of the role, even if it doesn’t need to be changed for other reasons, and thus break proper change reporting.

  • Use standard module parameters for backups, keep it on unconditionally (backup: true), until there is a user request to have it configurable.

  • Make prominently clear in the HOWTO (at the top) what settings/configuration files are replaced by the role instead of just modified.

  • Use {{ role_path }}/subdir/ as the filename prefix when including files if the name has a variable in it.

    Rationale

    your role may be included by another role, and if you specify a relative path, the file could be found in the including role. For example, if you have something like include_vars: "{{ ansible_facts['distribution'] }}.yml" and you do not provide every possible vars/{{ ansible_facts['distribution'] }}.yml in your role, Ansible will look in the including role for this file. Instead, to ensure that only your role will be referenced, use include_vars: "{{role_path}}/vars/{{ ansible_facts['distribution'] }}.yml". Same with other file based includes such as include_tasks. See Ansible Developer Guide » Ansible architecture » The Ansible Search Path for more information.

3.1.13. Vars vs Defaults

Details
  • Avoid embedding large lists or "magic values" directly into the playbook. Such static lists should be placed into the vars/main.yml file and named appropriately

  • Every argument accepted from outside of the role should be given a default value in defaults/main.yml. This allows a single place for users to look to see what inputs are expected. Document these variables in the role’s README.md file copiously

  • Use the defaults/main.yml file in order to avoid use of the default Jinja2 filter within a playbook. Using the default filter is fine for optional keys on a dictionary, but the variable itself should be defined in defaults/main.yml so that it can have documentation written about it there and so that all arguments can easily be located and identified.

  • Don’t define defaults in defaults/main.yml if there is no meaningful default. It is better to have the role fail if the variable isn’t defined than have it do something dangerously wrong. Still do add the variable to defaults/main.yml but commented out, so that there is one single source of input variables.

  • Avoid giving default values in vars/main.yml as such values are very high in the precedence order and are difficult for users and consumers of a role to override.

  • As an example, if a role requires a large number of packages to install, but could also accept a list of additional packages, then the required packages should be placed in vars/main.yml with a name such as foo_packages, and the extra packages should be passed in a variable named foo_extra_packages, which should default to an empty array in defaults/main.yml and be documented as such.

3.1.14. Documentation conventions

Details
  • Use fully qualified role names in examples, like: linux-system-roles.$ROLENAME (with the Galaxy prefix).

  • Use RFC 5737, 7042 and 3849 addresses in examples.

  • Modules should have complete metadata, documentation, example and return blocks as described in the Ansible docs.

3.1.15. Don’t use host group names or at least make them a parameter

Details
Explanations

It is relatively common to use (inventory) group names in roles:

  • either to loop through the hosts in the group, generally in a cluster context

  • or to validate that a host is in a specific group

    Instead, store the host name(s) in a (list) variable, or at least make the group name a parameter of your role. You can always set the variable at group level to avoid repetitions.

Rationale

Groups are a feature of the data in your inventory, meaning that you mingle data with code when you use those groups in your code. Rely on the inventory-parsing process to provide your code with the variables it needs instead of enforcing a specific structure of the inventory. Not all inventory sources are flexible enough to provide exactly the expected group name. Even more importantly, in a cluster context for example, if the group name is fixed, you can’t describe (and hence automate) more than one cluster in each inventory. You can’t possibly have multiple groups with the same name in the same inventory. On the other hand, variables can have any kind of value for each host, so that you can have as many clusters as you want.

Examples

Assuming we have the following inventory (not according to recommended practices for sake of simplicity):

Listing 1. An inventory with two clusters
[cluster_group_A]
host1 ansible_host=localhost
host2 ansible_host=localhost
host3 ansible_host=localhost

[cluster_group_B]
host4 ansible_host=localhost
host5 ansible_host=localhost
host6 ansible_host=localhost

[cluster_group_A:vars]
cluster_group_name=cluster_group_A

[cluster_group_B:vars]
cluster_group_name=cluster_group_B

We can then use one of the following three approaches in our role (here as playbook, again for sake of simplicity):

Listing 2. A playbook showing how to loop through a group
---
- name: show how to loop through a set of groups
  hosts: cluster_group_?
  gather_facts: false
  become: false

  tasks:
    - name: the loop happens for each host, might be too much
      debug:
        msg: do something with {{ item }}
      loop: "{{ groups[cluster_group_name] }}"
    - name: the loop happens only for the first host in each group
      debug:
        msg: do something with {{ item }}
      loop: "{{ groups[cluster_group_name] }}"
      when: inventory_hostname == groups[cluster_group_name][0]
    - name: make the first host of each group fail to simulate non-availability
      assert:
        that: inventory_hostname != groups[cluster_group_name][0]
    - name: the loop happens only for the first _available_ host in each group
      debug:
        msg: do something with {{ item }}
      loop: "{{ groups[cluster_group_name] }}"
      when: >-
        inventory_hostname == (groups[cluster_group_name]
        | intersect(ansible_play_hosts))[0]

The first approach is probably best to create a cluster configuration file listing all cluster’s hosts. The other approaches are good to make sure each action is performed only once, but this comes at the price of many skips. The second one fails if the first host isn’t reachable (which might be what you’d want anyway), and the last one has the best chance to be executed once and only once, even if some hosts aren’t available.

the variable cluster_group_name could have a default group name value in your role, of course properly documented, for simple use cases.

Overall, it is best to avoid this kind of constructs if the use case permits, as they are clumsy.

3.1.16. Prefix task names in sub-tasks files of roles

Details
Explanation

It is a common practice to have tasks/main.yml file including other tasks files, which we’ll call sub-tasks files. Make sure that the tasks' names in these sub-tasks files are prefixed with a shortcut reminding of the sub-tasks file’s name.

Rationale

Especially in a complex role with multiple (sub-)tasks file, it becomes difficult to understand which task belongs to which file. Adding a prefix, in combination with the role’s name automatically added by Ansible, makes it a lot easier to follow and troubleshoot a role play.

Examples

In a role with one tasks/main.yml task file, including tasks/sub.yml, the tasks in this last file would be named as follows:

Listing 3. A prefixed task in a sub-tasks file
- name: sub | some task description
  mytask: [...]

The log output will then look something like TASK [myrole : sub | some task description] **, which makes it very clear where the task is coming from.

with a verbosity of 2 or more, ansible-playbook will show the full path to the task file, but this generally means that you need to restart the play in a higher verbosity to get the information you could have had readily available.

3.2. References

Details

Links that contain additional standardization information that provide context, inspiration or contrast to the standards described above.

4. Collections good practices

Note: Unreviewed work. Please contribute to the discussion in the Automation Red Hat COP

4.1. Collection Structure should be at the type or landscape level

Details
Explanations

Collections should be comprised of roles collected either at the type or landscape level. See The Structures Definition

Rationale

Gathering and publishing collections, rather than individual roles, allows for easier distribution and particularly becomes more important when we discuss Execution Environments.

4.2. Create implicit collection variables and reference them in your roles' defaults variables

Details
Explanations

Often, variables will want to be defined on a collection level, but this can cause issues with roles being able to be reused. By defining collection wide variables and referencing them in roles' defaults variables, this can be made clear and roles can remain reusable. Collection variables are nowhere defined explicitly and are to be documented in the collection’s documentation.

Rationale

Variables that are shared across collections can cause collisions when roles are reused outside of the original collection. Role variables should continue to be named according to our recommendations for naming variables It still remains possible to overwrite collection variable values for a specific role. Each role has it’s own set of defaults for the variable.

Examples

For a collection "mycollection", two roles exist. "alpha" and "beta". For this example, there is no default for the controller_username and would have to be defined in one’s inventory. The no_log variable does have defaults defined, and thus only needs to be defined if the default is being overwritten.

Listing 4. Alpha defaults/main.yml
# specific role variables
alpha_job_name: 'some text'
# collection wide variables
alpha_controller_username: "{{ mycollection_controller_username }}"
alpha_no_log: "{{ mycollection_no_log | default('true') }}"
Listing 5. Beta defaults/main.yml
# specific role variables
beta_job_name: 'some other text'
# collection wide variables
beta_controller_username: "{{ mycollection_controller_username }}"
beta_no_log: "{{ mycollection_no_log | default('false') }}"

4.3. Include a README file in each collection

Details
Explanation

Include a README file that is in the root of the collection and which contains:

  • Information about the purpose of the collection

  • A link to the collection license file

  • General usage information such as which versions of ansible-core are supported and any libraries or SDKs which are required by the collection

Generating the README’s plugin documentation from the plugin code helps eliminate documentation errors. Supplemental documentation such as user guides may be written in reStructured Text (rst) and located in the docs/docsite/rst/ directory of the collection.

Examples

Use https://github.com/ansible-network/collection_prep to generate the documentation for the collection

4.4. Include a license file in a collection root directory

Details
Explanation

Include a license file in the root directory Name the license file either LICENSE or COPYING. The contents may be either the text of the applicable license, or a link to the canonical reference for the license on the Internet (such as https://opensource.org/licenses/BSD-2-Clause ) If any file in the collection is licensed differently from the larger collection it is a part of (such as module utilities), note the applicable license in the header of the file.

5. Playbooks good practices

5.1. Keep your playbooks as simple as possible

Details
Explanations

Don’t put too much logic in your playbook, put it in your roles (or even in custom modules), and try to limit your playbooks to a list of a roles.

Rationale

Roles are meant to be re-used and the structure helps you to make your code re-usable. The more code you put in roles, the higher the chances you, or others, can reuse it. Also, if you follow the type-function pattern, you can very easily create new (type) playbooks by just re-shuffling the roles. This way you can create a playbook for each purpose without having to duplicate a lot of code. This, in turn, also helps with the maintainability as there is only a single place where necessary changes need to be implemented, and that is in the role

Examples
Listing 6. An example of playbook containing only roles
- name: a playbook can solely be a list of roles
  hosts: all
  gather_facts: false
  become: false

  roles:
    - role1
    - role2
    - role3
we’ll explain later why there might be a case for using include_role/import_role tasks instead of the role section.

5.2. Use either the tasks or roles section in playbooks, not both

Details
Explanations

A playbook can contain pre_tasks, roles, tasks and post_tasks sections. Avoid using both roles and tasks sections, the latter possibly containing import_role or include_role tasks.

Rationale

The order of execution between roles and tasks isn’t obvious, and hence mixing them should be avoided.

Examples

Either you need only static importing of roles and you can use the roles section, or you need dynamic inclusion and you should use only the tasks section. Of course, for very simple cases, you can just use tasks without roles.

5.3. Use tags cautiously either for roles or for complete purposes

Details
Explanations

limit your usage of tags to two aspects:

  1. either tags called like the roles to switch on/off single roles,

  2. or specific tags to reach a meaningful purpose

Don’t set tags which can’t be used on their own, or can be destructive if used on their own.

Also document tags and their purpose(s).

Rationale

there is nothing worse than tags which can’t be used alone, they bear the risk to destroy something by being called standalone. An acceptable exception is the pattern to use the role name as tag name, which can be useful while developing the playbook to test, or exclude, individual roles.

Important is that your users don’t need to learn the right sequence of tags necessary to get a meaningful result, one tag should be enough.

Examples
Listing 7. An example of playbook importing roles with tags
- name: a playbook can be a list of roles imported with tags
  hosts: all
  gather_facts: false
  become: false

  tasks:
    - name: import role1
      import_role:
        name: role1
      tags:
        - role1
        - deploy
    - name: import role2
      import_role:
        name: role2
      tags:
        - role2
        - deploy
        - configure
    - name: import role3
      import_role:
        name: role3
      tags:
        - role3
        - configure

You see that each role can be skipped/run individually, but also that the tags deploy and configure can be used to do something we’ll assume to be meaningful, without having to explain at length what they do.

The same approach is also possible with include_role but requires additionally to apply the same tags to the role’s tasks, which doesn’t make the code easier to read:

Listing 8. An example of playbook including roles with tags
- name: a playbook can be a list of roles included with tags applied
  hosts: all
  gather_facts: false
  become: false

  tasks:
    - name: include role1
      include_role:
        name: role1
        apply:
          tags:
            - role1
            - deploy
      tags:
        - role1
        - deploy
    - name: include role2
      include_role:
        name: role2
        apply:
          tags:
            - role2
            - deploy
            - configure
      tags:
        - role2
        - deploy
        - configure
    - name: include role3
      include_role:
        name: role3
        apply:
          tags:
            - role3
            - configure
      tags:
        - role3
        - configure

5.4. Use the verbosity parameter with debug statements

Details
Explanations

Debug messages should have a verbosity defined as appropriate for the message.

Rationale

Debug messages are useful during testing and development, and can be useful to retain as playbooks go into production for future troubleshooting. However, log messages will clutter your output, which can confuse users with non-relevant information.

Examples
Listing 9. Adding verbosity to debug messages
- name: don't make messages always display
  debug:
    msg: "This message will clutter your log in production"

- name: this message will only appear when verbosity is 2 or more
  debug:
    msg: "Some more debug information if needed"
    verbosity: 2

6. Inventories and Variables Good Practices for Ansible

6.1. Identify your Single Source(s) of Truth and use it/them in your inventory

Details
Explanations

A Single Source of Truth (SSOT) is the place where the "ultimate" truth about a certain data is generated, stored and maintained. There can be more than one SSOT, each for a different piece of information, but they shouldn’t overlap and even less conflict. As you create your inventory, you identify these SSOTs and combine them into one inventory using dynamic inventory sources (we’ll see how later on). Only the aspects which are not already provided by other sources are kept statically in your inventory. Doing this, your inventory becomes another source of truth, but only for the data it holds statically, because there is no other place to keep it.

Rationale

You limit your effort to maintain your inventory to its absolute minimum and you avoid generating potentially conflicting information with the rest of your IT.

Examples

You can typically identify three kinds of candidates as SSOTs:

  • technical ones, where your managed devices live anyway, like a cloud or virtual manager (OpenStack, RHV, Public Cloud API, …​) or management systems (Satellite, monitoring systems, …​). Those sources provide you with technical information like IP addresses, OS type, etc.

  • managed ones, like a Configuration Management Database (CMDB), where your IT anyway manages a lot of information of use in an inventory. A CMDB provides you with more organizational information, like owner or location, but also with "to-be" technical information.

  • the inventory itself, only for the data which doesn’t exist anywhere else.

    Ansible provides a lot of inventory plugins to pull data from those sources and they can be combined into one big inventory. This gives you a complete model of the environment to be automated, with limited effort to maintain it, and no confusion about where to modify it to get the result you need.

6.2. Differentiate clearly between "As-Is" and "To-Be" information

Details
Explanations

As you combine multiple sources, some will represent:

  • discovered information grabbed from the existing environment, this is the "As-Is" information.

  • managed information entered in a tool, expressing the state to be reached, hence the "To-Be" information.

    In general, the focus of an inventory is on the managed information because it represents the desired state you want to reach with your automation. This said, some discovered information is required for the automation to work.

Rationale

Mixing up these two kind of information can lead to your automation taking the wrong course of action by thinking that the current situation is aligned with the desired state. That can make your automation go awry and your automation engineers confused. There is a reason why Ansible makes the difference between "facts" (As-Is) and "variables" (To-Be), and so should you. In the end, automation is making sure that the As-Is situation complies to the To-Be description.

many CMDBs have failed because they don’t respect this principle. This and the lack of automation leads to a mix of unmaintained As-Is and To-Be information with no clear guideline on how to keep them up-to-date, and no real motivation to do so.
Examples

The technical tools typically contain a lot of discovered information, like an IP address or the RAM size of a VM. In a typical cloud environment, the IP address isn’t part of the desired state, it is assigned on the fly by the cloud management layer, so you can only get it dynamically from the cloud API and you won’t manage it. In a more traditional environment nevertheless, the IP address will be static, managed more or less manually, so it will become part of your desired state. In this case, you shouldn’t use the discovered information or you might not realize that there is a discrepancy betweeen As-Is and To-Be.

The RAM size of a VM will be always present in two flavours, e.g. As-Is coming from the technical source and To-Be coming from the CMDB, or your static inventory, and you shouldn’t confuse them. By lack of doing so, your automation might not correct the size of the VM where it should have aligned the As-Is with the To-Be.

6.3. Define your inventory as structured directory instead of single file

Details
Explanations

Everybody has started with a single file inventory in ini-format (the courageous ones among us in YAML format), combining list of hosts, groups and variables. An inventory can nevertheless be also a directory containing:

  • list(s) of hosts

  • list(s) of groups, with sub-groups and hosts belonging to those groups

  • dynamic inventory plug-ins configuration files

  • dynamic inventory scripts (deprecated but still simple to use)

  • structured host_vars directories

  • structured group_vars directories

    The recommendation is to start with such a structure and extend it step by step.

Rationale

It is the only way to combine simply multiple sources into one inventory, without the trouble to call ansible with multiple -i {inventory_file} parameters, and keep the door open for extending it with dynamic elements.

It is also simpler to maintain in a Git repository with multiple maintainers as the chance to get a conflict is reduced because the information is spread among multiple files. You can drop roles' defaults/main.yml file into the structure and adapt it to your needs very quickly.

And finally it gives you a better overview of what is in your inventory without having to dig deeply into it, because already the structure (as revealed with tree or find) gives you a first idea of where to search what. This makes on-boarding of new maintainers a lot easier.

Examples

The following is a complete inventory as described before. You don’t absolutely need to start at this level of complexity, but the experience shows that once you get used to it, it is actually a lot easier to understand and maintain than a single file.

Listing 10. Tree of a structured inventory directory
inventory_example/  (1)
├── dynamic_inventory_plugin.yml  (2)
├── dynamic_inventory_script.py  (3)
├── groups_and_hosts  (4)
├── group_vars/  (5)
│   ├── alephs/
│   │   └── capital_letter.yml
│   ├── all/
│   │   └── ansible.yml
│   ├── alphas/
│   │   ├── capital_letter.yml
│   │   └── small_caps_letter.yml
│   ├── betas/
│   │   └── capital_letter.yml
│   ├── greek_letters/
│   │   └── small_caps_letter.yml
│   └── hebrew_letters/
│       └── small_caps_letter.yml
└── host_vars/  (6)
    ├── host1.example.com/
    │   └── ansible.yml
    ├── host2.example.com/
    │   └── ansible.yml
    └── host3.example.com/
        ├── ansible.yml
        └── capital_letter.yml
1 this is your inventory directory
2 a configuration file for a dynamic inventory plug-in
3 a dynamic inventory script, old style and deprecated but still used (and supported)
4 a file containing a static list of hosts and groups, the name isn’t important (often called hosts but some might confuse it with /etc/hosts and it also contains groups). See below for an example.
5 the group_vars directory to define group variables. Notice how each group is represented by a directory of its name containing one or more variable files.
6 the host_vars directory to define host variables. Notice how each host is represented by a directory of its name containing one or more variable files.

The groups and hosts file could look as follows, important is to not put any variable definition in this file.

Listing 11. Content of the groups_and_hosts file
[all]
host1.example.com
host2.example.com
host3.example.com

[alphas]
host1.example.com

[betas]
host2.example.com

[greek_letters:children]
alphas
betas

[alephs]
host3.example.com

[hebrew_letters:children]
alephs

Listing the hosts under [all] isn’t really required but makes sure that no host is forgotten, should it not belong to any other group. The ini-format isn’t either an obligation but it seems easier to read than YAML, as long as no variable is involved, and makes it easier to maintain in an automated manner using lineinfile (without needing to care for the indentation).

Regarding the group and host variables, the name of the variable files is actually irrelevant, you can verify it by calling ansible-inventory -i inventory_example --list: you will see nowhere the name capital_letter or small_caps_letter (you might see ansible though, but for other reasons…​). We nevertheless follow the convention to name our variable files after the role they are steering (so we assume the roles capital_letter and small_caps_letter). If correctly written, the defaults/main.yml file from those roles can be simply "dropped" into our inventory structure and adapted accordingly to our needs. We reserve the name ansible.yml for the Ansible related variables (user, connection, become, etc).

you can even create a sub-directory in a host’s or group’s variable directory and put there the variable files. This is useful if you have many variables related to the same topic you want to group together but maintain in separate files. For example Satellite requires many variables to be fully configured, so you can have a structure as follows (again, the name of the sub-directory satellite and of the files doesn’t matter):
Listing 12. Example of a complex tree of variables with sub-directory
inventory_satellite/
├── groups_and_hosts
└── host_vars/
    └── sat6.example.com/
        ├── ansible.yml
        └── satellite/
            ├── content_views.yml
            ├── hostgroups.yml
            └── locations.yml

6.4. Rely on your inventory to loop over hosts, don’t create lists of hosts

Details
Explanations

To perform the same task on multiple hosts, don’t create a variable with a list of hosts and loop over it. Instead use as much as possible the capabilities of your inventory, which is already a kind of list of hosts.

The anti-pattern is especially obvious in the example of provisioning hosts on some kind of manager. Commonly seen automation tasks of this kind are spinning up a list of VMs via a hypervisor manager like oVirt/RHV or vCenter, or calling a management tool like Foreman/Satellite or even our beloved AWX/Tower/controller.

Rationale

There are 4 main reasons for following this advice:

  1. a list of hosts is more difficult to maintain than an inventory structure, and tends to become very quickly difficult to oversee. This is especially true as you generally need to maintain your hosts also in your inventory. This brings us to the 2nd advantage:

  2. you avoid duplicating information, as you often need the same kind of information in your inventory that you also need in order to provision your VMs. In your inventory, you can also use groups to define group variables, automatically inherited by hosts. You can try to implement a similar inheritance pattern with your list of hosts, but it quickly becomes difficult and hand-crafted.

  3. as you loop through the hosts of an inventory, Ansible helps you with parallelization, throttling, etc, all of which you can’t do easily with your own list (technically, you can combine async and loop to reach something like this, but it’s a lot more complex to handle than letting Ansible do the heavy lifting for you).

  4. you can very simply limit the play to certain hosts, using for example the --limit parameter of ansible-playbook (or the 'limit' field in Tower/controller), even using groups and patterns. You can’t really do this with your own list of hosts.

Examples

Our first idea could be to define managers and hosts first in an inventory:

Listing 13. Content of the "bad" groups_and_hosts file
[managers]
manager_a
manager_b

[managed_hosts]
host1
host2
host3

Each manager has a list of hosts, which can look like this:

Listing 14. List of hosts in inventory_bad/host_vars/manager_a/provision.yml
provision_list_of_hosts:
  - name: host1
    provision_value: uno
  - name: host2
    provision_value: due

So that we can loop over the list in this way:

Listing 15. The "bad" way to loop over hosts
- name: provision hosts in a bad way
  hosts: managers
  gather_facts: false
  become: false

  tasks:
    - name: create some file to simulate an API call to provision a host
      copy:
        content: "{{ item.provision_value }}\n"
        dest: "/tmp/bad_{{ inventory_hostname }}_{{ item.name }}.txt"
        force: true
      loop: "{{ provision_list_of_hosts }}"
check the resulting files using e.g. head -n-0 /tmp/bad_*.

As said, no way to limit the hosts provisioned, and no parallelism. Compare then with the recommended approach, with a slightly different structure:

Listing 16. Content of the "good" groups_and_hosts file
[managers]
manager_a
manager_b

[managed_hosts_a]
host1
host2

[managed_hosts_b]
host3

[managed_hosts:children]
managed_hosts_a
managed_hosts_b

It is now the hosts and their groups which carry the relevant information, it is not anymore parked in one single list (and can be used for other purposes):

Listing 17. The "good" variable structure
$ cat inventory_good/host_vars/host1/provision.yml
provision_value: uno
$ cat inventory_good/group_vars/managed_hosts_a/provision.yml
manager_hostname: manager_a

And the provisioning playbook now runs in parallel and can be limited to specific hosts:

Listing 18. The "good" way to loop over hosts
- name: provision hosts in a good way
  hosts: managed_hosts
  gather_facts: false
  become: false

  tasks:
    - name: create some file to simulate an API call to provision a host
      copy:
        content: "{{ provision_value }}\n"
        dest: "/tmp/good_{{ manager_hostname }}_{{ inventory_hostname }}.txt"
        force: true

The result isn’t overwhelming in this simple setup but you would of course better appreciate if the provisioning would take half an hour instead of a fraction of seconds:

Listing 19. Comparison of the execution times between the "good" and the "bad" implementation
$ ANSIBLE_STDOUT_CALLBACK=profile_tasks \
	ansible-playbook -i inventory_bad playbook_bad.yml
Saturday 23 October 2021  13:11:45 +0200 (0:00:00.040)       0:00:00.040 ******
Saturday 23 October 2021  13:11:45 +0200 (0:00:00.858)       0:00:00.899 ******
===============================================================================
create some file to simulate an API call to provision a host ------------ 0.86s
$ ANSIBLE_STDOUT_CALLBACK=profile_tasks \
	ansible-playbook -i inventory_good playbook_good.yml
Saturday 23 October 2021  13:11:55 +0200 (0:00:00.040)       0:00:00.040 ******
Saturday 23 October 2021  13:11:56 +0200 (0:00:00.569)       0:00:00.610 ******
===============================================================================
create some file to simulate an API call to provision a host ------------ 0.57s
if for some reason, you can’t follow the recommendation, you can at least avoid duplicating too much information by indirectly referencing the hosts' variables as in "{{ hostvars[item.name]['provision_value'] }}". Not so bad…​

6.5. Restrict your usage of variable types

Details
Explanations
  • Avoid playbook and play variables, as well as include_vars. Opt for inventory variables instead.

  • Avoid using scoped variables unless required for runtime reasons, e.g. for loops and for temporary variables based on runtime variables. Another valid exception is when nested variables are too complicated to be defined at once.

Rationale

There are 22 levels of variable precedence. This is almost impossible to keep in mind for a "normal" human and can lead to all kind of weird behaviours if not under control. In addition, the use of play(book) variables is not recommended as it blurs the separation between code and data. The same applies to all constructs including specific variable files as part of the play (i.e. include_vars). By reducing the number of variable types, you end up with a more simple and overseeable list of variables. Together with some explanations why they have their specific precedence, so that they become easier to remember and use wisely:

  1. role defaults (defined in defaults/main.yml), they are…​ defaults and can be overwritten by anything.

  2. inventory vars, they truly represent your desired state. They have their own internal precedence (group before host) but that’s easy to remember.

  3. host facts don’t represent a desired state but the current state, and no other variable should have the same name because of Differentiate clearly between "As-Is" and "To-Be" information so that the precedence doesn’t really matter.

  4. role vars (defined in vars/main.yml) represent constants used by the role to separate code from data, and shouldn’t either collide with the inventory variables, but can be overwritten by extra vars if you know what you’re doing.

  5. scoped vars, at the block or task level, are local to their scope and hence internal to the role, and can’t collide with other variable types.

  6. runtime vars, defined by register or set_facts, are taking precedence over almost everything defined previously, which makes sense as they represent the current state of the automation.

  7. scoped params, at the role or include level this time, are admittedly a bit out of order and should be avoided to limit surprises.

  8. and lastly, extra_vars overwrite everything else (even runtime vars, which can be quite surprising)

we didn’t explicitly consider Workflow and Job Template variables but they are all extra vars in this consideration.

The following picture summarizes this list in a simplified and easier to keep in mind way, highlighting which variables are meant to overwrite others:

flow of variable precedences in 3 lanes
Figure 2. Flow of variable precedences
even if we write that variables shouldn’t overwrite each other, they still all share the same namespace and can potentially overwrite each other. It is your responsibility as automation author to make sure they don’t.

6.6. Prefer inventory variables over extra vars to describe the desired state

Details
Explanations

Don’t use extra vars to define your desired state. Make sure your inventory completely describes how your environment is supposed to look like. Use extra vars only for troubleshooting, debugging or validation purposes.

Rationale

Inventory variables are typically in some kind of persistent tracked storage (be it a database or Git), and should be your sole source representing your desired state so that you can refer to it non-ambiguously. On the other hand, extra vars are bound to a specific job or ansible-call and disappear together with history.

Examples

Don’t use extra vars for the RAM size of VM to create, because this is part of the desired state of your environment, and nobody would know one year down the line if the VM was really created with the proper RAM size according to the state of the inventory. You may use an extra variable to protect a critical part of a destructive playbook, something like are_you_really_really_sure: true/false, which is validated before e.g. a VM is destroyed and recreated to change parameters which can’t be changed on the fly. You can also use extra vars to enforce fact values which can’t be reproduced easily, like overwriting ansible_memtotal_mb to simulate a RAM size fact of terabytes to validate that your code can cope with it.

Another example could be the usage of no_log: "{{ no_log_in_case_of_trouble | default(true) }} to exceptionally "uncover" the output of failing tasks even though they are security relevant.

7. Plugins good practices

Work in Progress…​

7.1. Python Guidelines

  • Review Ansible guidelines for modules and development.

  • Use PEP8.

  • File headers and functions should have comments for their intent.

8. Coding Style Good Practices for Ansible

It has proven useful to agree on certain guiding principles as early as possible in any automation project. Doing so makes it much easier to onboard new Ansible developers. Project guidelines can also be shared with other departments working on automation which in turn improves the re-usability of playbooks, roles, modules, and documentation.

Another major benefit is that it makes code review process less time-consuming and more reliable; making both the developer and reviewer more likely to engage in a constructive review conversation.

This section contains suggestions for such coding-style guidelines. The list is neither complete nor are all of the guidelines necessary in every automation project. Experience shows that it makes sense to start with a minimum set of guidelines because the longer the list the lower the chance of people actually reading through it. Additional guidelines can always be added later should the situation warrant it.

8.1. Naming things

  • Use valid Python identifiers following standard naming conventions of being in snake_case_naming_schemes for all YAML or Python files, variables, arguments, repositories, and other such names (like dictionary keys).

  • Do not use special characters other than underscore in variable names, even if YAML/JSON allow them.

    Details
    Explanation

    Using such variables in Jinja2 or Python would be then very confusing and probably not functional.

    Rationale

    even when Ansible currently allows names that are not valid identifier, it may stop allowing them in the future, as it happened in the past already. Making all names valid identifiers will avoid encountering problems in the future. Dictionary keys that are not valid identifiers are also less intuitive to use in Jinja2 (a dot in a dictionary key would be particularly confusing).

  • Use mnemonic and descriptive names that are human-readable and do not shorten more than necessary. A pattern object[_feature]_action has proven useful as it guarantees a proper sorting in the file system for roles and playbooks. Systems support long identifier names, so use them!

  • Avoid numbering roles and playbooks, you’ll never know how they’ll be used in the future.

  • Name all tasks, plays, and task blocks to improve readability.

  • Write task names in the imperative (e.g. "Ensure service is running"), this communicates the action of the task.

  • Avoid abbreviations in names, or use capital letter for abbreviations where it cannot be avoided.

8.2. YAML and Jinja2 Syntax

  • Indent at two spaces

  • Indent list contents beyond the list definition

    Details
    Listing 20. Do this:
    example_list:
      - example_element_1
      - example_element_2
      - example_element_3
      - example_element_4
    Listing 21. Don’t do this:
    example_list:
    - example_element_1
    - example_element_2
    - example_element_3
    - example_element_4
  • Split long expressions into multiple lines.

    Details
    Rationale

    long lines are difficult to read, many teams even ask for a line length limit around 120-150 characters.

    Examples

    there are multiple ways to avoid long lines but the most generic one is to use the YAML folding sign (>):

    Listing 22. Usage of the YAML folding sign
    - name: call a very long command line
      command: >
        echo Lorem ipsum dolor sit amet, consectetur adipiscing elit.
        Maecenas mollis, ante in cursus congue, mauris orci tincidunt nulla,
        non gravida tortor mi non nunc.
    - name: set a very long variable
      set_fact:
        meaningless_variable: >-
          Ut ac neque sit amet turpis ullamcorper auctor.
          Cras placerat dolor non ipsum posuere malesuada at ac ipsum.
          Duis a neque fermentum nulla imperdiet blandit.
    use the sign >- if it is important that the last line return code doesn’t become part of the string (e.g. when defining a string variable).
  • If the when: condition results in a line that is too long, and is an and expression, then break it into a list of conditions.

    Details
    Rationale

    Ansible will and the list elements together (Ansible UseGuide » Conditionals). Multiple conditions that all need to be true (a logical and) can also be specified as a list, but beware of bare variables in when:.

    Examples
    Listing 23. Do this
    when:
      - myvar is defined
      - myvar | bool
    Listing 24. instead of this
    when: myvar is defined and myvar | bool
  • All roles need to, minimally, pass a basic ansible-playbook syntax check run

  • Spell out all task arguments in YAML style and do not use key=value type of arguments

    Details
    Listing 25. Do this:
    tasks:
      - name: Print a message
        ansible.builtin.debug:
          msg: This is how it's done.
    Listing 26. Don’t do this:
    tasks:
      - name: Print a message
        ansible.builtin.debug: msg="This is the exact opposite of how it's done."
  • Use true and false for boolean values in playbooks.

    Details
    Explanation

    Do not use the Ansible-specific yes and no as boolean values in YAML as these are completely custom extensions used by Ansible and are not part of the YAML spec and also avoid the use of the Python-style True and False for boolean values in playbooks.

    Rationale

    YAML 1.1 allows all variants whereas YAML 1.2 allows only true/false, and we want to be ready for when it becomes the default, and avoid a massive migration effort.

  • Avoid comments in playbooks when possible. Instead, ensure that the task name value is descriptive enough to tell what a task does. Variables are commented in the defaults and vars directories and, therefore, do not need explanation in the playbooks themselves.

  • Use a single space separating the template markers from the variable name inside all Jinja2 template points. For instance, always write it as {{ variable_name_here }}. The same goes if the value is an expression. {{ variable_name | default('hiya, doc') }}

  • When naming files, use the .yml extension and not .yaml. .yml is what ansible-galaxy init does when creating a new role template.

  • Use double quotes for YAML strings with the exception of Jinja2 strings which will use single quotes.

  • Do not use quotes unless you have to, especially for short module-keyword-like strings like present, absent, etc. But do use quotes for user-side strings such as descriptions, names, and messages.

  • Even if JSON is valid YAML and Ansible understands it, do only use JSON syntax if it makes sense (e.g. a variable file automatically generated) or adds to the readability. In doubt, nobody expects JSON so stick to YAML.

8.3. Ansible Guidelines

  • Ensure that all tasks are idempotent.

  • Ansible variables use lazy evaluation.

  • Prefer the command module over the shell module unless you explicitly need shell functionality such as, e.g., piping. Even better, use a dedicated module, if it exists. If not, see the section about idempotency and check mode and make sure that your task is idempotent and supports check mode properly; your task will likely need options such as changed_when: and maybe check_mode:).

  • Anytime command or shell modules are used, add a comment in the code with justification to help with future maintenance.

  • Use the | bool filter when using bare variables (expressions consisting of just one variable reference without any operator) in when.

  • Break complex task files down into discrete parts.

    Details
    Rationale

    Task files that are very or and/or contain highly nested blocks are difficult to maintain. Breaking a large or complex task file into multiple discrete files makes it easier to read and understand what is being done in each part.

  • Use bracket notation instead of dot notation for value retrieval (e.g. item['key'] vs. item.key)

    Details
    Rationale

    Dot notation will fail in some cases (such as when a variable name includes a hyphen) and it’s better to stay consistent than to switch between the two options within a role or playbook. Additionally, some key names collide with attributes and methods of Python dictionaries such as count, copy, title, and others (refer to the Ansible User Guide for an extended list)

    Example

    This post provdes an excellent demonstration of how using dot notation syntax can impact your playbooks.

  • Do not use meta: end_play.

    Details
    Rationale

    It aborts the whole play instead of a given host (with multiple hosts in the inventory). If absolutely necessary, consider using meta: end_host.

  • Task names can be made dynamic by using variables (wrapped in Jinja2 templates), this helps with reading the logs.

  • Do not use variables (wrapped in Jinja2 templates) for play names; variables don’t get expanded properly there. The same applies to loop variables (by default item) in task names within a loop. They, too, don’t get properly expanded and hence are not to be used there.

  • Do not override role defaults or vars or input parameters using set_fact. Use a different name instead.

    Details
    Rationale

    a fact set using set_fact can not be unset and it will override the role default or role variable in all subsequent invocations of the role in the same playbook. A fact has a different priority than other variables and not the highest, so in some cases overriding a given parameter will not work because the parameter has a higher priority (Ansible User Guide » Using Variables)

  • Use the smallest scope for variables. Facts are global for playbook run, so it is preferable to use other types of variables. Therefore limit (preferably avoid) the use of set_fact. Role variables are exposed to the whole play when the role is applied using roles: or import_role:. A more restricted scope such as task or block variables is preferred.

  • Beware of ignore_errors: true; especially in tests. If you set on a block, it will ignore all the asserts in the block ultimately making them pointless.

  • Do not use the eq, equalto, or == Jinja tests introduced in Jinja 2.10, use Ansible built-in match, search, or regex instead.

    Details
    Explanation

    The issue is only with Jinja versions older than 2.10. RPM distributions of Ansible generally use the underlying OS platform python library for Jinja e.g. python-jinja2. This is especially problematic on EL7. The only supported Ansible RPM on that platform is 2.9, which uses the EL7 platform python-jinja2 library, which is 2.7 (and will likely never be upgraded). As of mid-2022, there are many users using EL7 for the control node. I believe this means AAP 1.x users will also be affected. Users not affected:

    • AAP 2.x users - there should be an option to use EL8 runners, or otherwise, build the EEs in such a way as to use Jinja 2.11 or later

    • Users running Ansible from a pip install

    • Users running Ansible installed via RPM on EL8 or later

    Rationale

    These tests are not present in versions of Jinja older than 2.10, which are used on older controller platforms, such as EL7. If you want to ensure that your code works on older platforms, use the built-in Ansible tests such as (match), (search), or (regex) instead.

    Example

    You have a list of dict, and you want to filter out elements that have the key type with the value bad_type.

    Listing 27. Do this:
    tasks:
      - name: Do something
        some.module:
          param: "{{ list_of_dict | rejectattr('type', 'search', '^bad_type$') | list }}"
    Listing 28. Don’t do this:
    tasks:
      - name: Do something
        some.module:
          param: "{{ list_of_dict | rejectattr('type', 'eq', 'bad_type') | list }}"

    When using match, search, or regex, and you want an exact match, you must specify the regex ^STRING$, otherwise, you will match partial strings.

  • Avoid the use of when: foo_result is changed whenever possible. Use handlers, and, if necessary, handler chains to achieve this same result.

  • Use the various include/import statements in Ansible.

    Details
    Explanation

    Doing so can lead to simplified code and a reduction of repetition. This is the closest that Ansible comes to callable sub-routines, so use judgment about callable routines to know when to similarly include a sub playbook. Some examples of good times to do so are

    • When a set of multiple commands share a single when conditional

    • When a set of multiple commands are being looped together over a list of items

    • When a single large role is doing many complicated tasks and cannot easily be broken into multiple roles, but the process proceeds in multiple related stages

  • Avoid calling the package module iteratively with the {{ item }} argument, as this is impressively more slow than calling it with the line name: "{{ foo_packages }}". The same can go for many other modules that can be given an entire list of items all at once.

  • Use meta modules when possible.

    Details
    Rationale

    This will allow our playbooks to run on the widest selection of operating systems possible without having to modify any more tasks than is necessary.

    Examples
    • Instead of using the upstart and systemd modules, use the service module when at all possible.

    • Similarly for package management, use package instead of yum or dnf or similar.

  • Avoid the use of lineinfile wherever that might be feasible.

    Details
    Rationale

    Slight miscalculations in how it is used can lead to a loss of idempotence. Modifying config files with it can cause the Ansible code to become arcane and difficult to read, especially for someone not familiar with the file in question. Try editing files directly using other built-in modules (e.g. ini_file, blockinfile, xml), or reading and parsing. If you are modifying more than a tiny number of lines or in a manner more than trivially complex, try leveraging the template module, instead. This will allow the entire structure of the file to be seen by later users and maintainers. The use of lineinfile should include a comment with justification. Alternatively, most configuration files have their own modules, such as community.general.ssh_config or community.general.nmcli. Using these make code cleaner to read and ensure idempotence.

  • Limit use of the copy module to copying remote files and to uploading binary blobs. For all other file pushes, use the template module. Even if there is nothing in the file that is being templated at the current moment, having the file handled by the template module now makes adding that functionality much simpler than if the file is initially handled by the copy and then needs to be moved before it can be edited.

  • When using the template module, append .j2 to the template file name.

    Details
    Example

    If you want to use the ansible.builtin.template module to create a file called example.conf somewhere on the managed host, name the template for this file templates/example.conf.j2.

    Rationale

    When you are at the stage of writing a template file you usually already know how the file should end up looking on the file system, so at that point it is convenient to use Jinja2 syntax highlighting to make sure your templating syntax checks out. Should you need syntax highlighting for whatever language the target file should be in, it is very easy to define in your editor settings to use, e.g., HTML syntax highlighting for all files ending in .html.j2. It is much less straightforward to automatically enable Jinja2 syntax highlighting for some files ending on .html.

  • Keep filenames and templates as close to the name on the destination system as possible.

    Details
    Rationale

    This will help with both editor highlighting as well as identifying source and destination versions of the file at a glance. Avoid duplicating the remote full path in the role directory, however, as that creates unnecessary depth in the file tree for the role. Grouping sets of similar files into a subdirectory of templates is allowable, but avoid unnecessary depth to the hierarchy.

  • Using agnostic modules like package only makes sense if the features required are very limited. In many cases, if the platform is different, the package name is also different so that using package doesn’t help a lot. Prefer then the more specific yum, dnf or apt module if you anyway need to differentiate.