Sign In
Sign In

Installing and Configuring cloud-init in Linux

Installing and Configuring cloud-init in Linux
Hostman Team
Technical writer
Linux
26.09.2024
Reading time: 10 min

cloud-init is a free and open-source package designed for configuring Linux-based virtual machines during their startup.

In a traditional (home) environment, we would install systems from a CD or USB drive and manually configure them via a standard installer. However, in a cloud environment, we may need to configure systems regularly and frequently create, delete, and restart instances. In such cases, manual configuration becomes impractical and unfeasible.

cloud-init automates the configuration process and standardizes the setup of virtual machines.

What Is cloud-init

The main task of cloud-init is to process input metadata (such as user data) and configure the virtual machine before it starts. This allows us to pre-configure servers, install software, prepare working directories, and create users with specific permissions.

Cloud-init and Hostman Cloud Servers

Hostman cloud servers support working with cloud-init scripts through the control panel. Hostman’s documentation includes a brief guide on using cloud-init scripts directly on their cloud servers. Essentially, Hostman offers a text editor for cloud-init scripts accessible via a web browser, allowing users to pass configuration data directly to the utility before the system starts.

Installing Cloud-init

There are several ways to get a Linux OS with cloud-init:

  • Use a specialized Linux OS image with pre-installed cloud-init (we’ll mention some key examples below).

  • Use pre-built distributions from cloud providers (most cloud platforms support cloud-init, though the setup processes may vary).

  • Build a custom OS image using HashiCorp Packer.

  • Manually install the cloud-init package.

Cloud-init Images

  • Ubuntu: The most common cloud-init image is Ubuntu 22.04 Cloud Images, officially created by Canonical for public cloud use. These images are optimized and tailored for cloud tasks.

  • Debian: Similarly, Debian Cloud offers specialized cloud images for Debian users.

  • Alma Linux: Another distribution designed for cloud deployment is Alma Linux Cloud.

  • VMware: VMware’s Photon image, built for cloud environments, also comes with pre-installed cloud-init.

Alternatively, you can install cloud-init manually.

Installation via APT

In most Linux distributions, cloud-init is installed like any other package and includes three systemd services located in the /lib/systemd/system/ directory:

  • cloud-init.service

  • cloud-config.service

  • cloud-final.service

Additionally, there are two more auxiliary systemd services:

  • cloud-init-local.service

  • cloud-init-hotplugd.service

Before installing, it's best to update the list of available repositories:

sudo apt update

Then, download the cloud-init package via APT:

sudo apt install cloud-init

In some Linux images, cloud-init may already be installed by default. If so, the system will notify you after running the install command.

cloud-init also supports additional modules that expand configuration capabilities. The full list of modules is available in the official documentation.

Running cloud-init

Since cloud-init operates as a service, it starts immediately after the systemd utility starts, i.e., when the physical machine starts and before the system connects to the network. This allows for pre-configuring network settings, gateways, DNS addresses, etc.

Cloud-init Workflow

There are three main stages in cloud-init’s workflow, during which the system is configured. Each stage triggers specific cloud-init services:

  1. Before networking (init): Initial setup before the network starts, including system settings, network configurations, and disk preparation.

    • cloud-init-local.service

    • cloud-init.service

  2. After networking (config): Network is available, so updates and required packages are installed.

    • cloud-config.service

  3. Final stage (final): Final configurations, such as user creation and permission assignments, are applied.

    • cloud-final.service

    • cloud-init-hotplugd.service

Cloud-init Modules

cloud-init offers additional modules that enhance system configuration. These modules run in sequence at various stages. Depending on the specific use case, they can be triggered during any of the three stages. Module execution is managed through three lists in the configuration file:

  • cloud_init_modules: Modules run during the initialization (init) stage before the network starts.

  • cloud_config_modules: Modules run during the configuration (cloud) stage after the network is up.

  • cloud_final_modules: Modules run during the final stage.

In more detail, cloud-init’s stages can be broken down into five steps:

  1. systemd checks if cloud-init needs to run during system boot.

  2. cloud-init starts, locates local data sources, and applies the configurations. At this stage, the network is configured.

  3. During the initial setup, cloud-init processes user data and runs the modules listed under cloud_init_modules in the configuration file.

  4. During the configuration phase, cloud-init runs the modules listed under cloud_config_modules.

  5. In the final stage, cloud-init runs the modules from cloud_final_modules, installing the specified packages.

You can find more details on the cloud-init workflow in the official documentation.

Each module also has an additional parameter that specifies how often the module runs during system configuration:

  • per instance: The module runs each time a new system instance (clone or snapshot) boots.

  • per once: The module runs only once during the initial system boot.

  • per always: The module runs at every system startup.

Cloud-init Configuration

In public (AWS, GCP, Azure, Hostman) or private clouds (OpenStack, CloudStack), a service usually provides the virtual machine with environment data. cloud-init uses these data in a specific order:

  • User data (user-data): Configurations and directives defined in the cloud.cfg file. These may include files to run, packages to install, and shell scripts. Typically, user-data configure specific virtual machine instances.

  • Metadata (meta-data): Environment information, such as the server name or instance ID, used after user-data.

  • Vendor data (vendor-data): Information from cloud service providers, used for default settings, applied after metadata.

Metadata is often available at a URL like http://localhost/latest/meta-data/, and user data at http://localhost/latest/user-data/.

Cloud-init Scripts

When the system boots, cloud-init first checks the YAML configuration files with the scripts and then executes the instructions. YAML is a format for data serialization that looks like markup but is not.

The primary YAML configuration file for cloud-init is located at /etc/cloud/cloud.cfg. This file serves as the main configuration script, with directives and parameters for specific cloud-init modules.

You can write scripts as YAML files (using #cloud-config) or as shell scripts (using #!/bin/sh).

Here’s a simple example of a cloud-init script setting a hostname:

#cloud-config
hostname: my-host
fqdn: my-address.com
manage_etc_hosts: true

In this example:

  • #cloud-config: indicates that the instructions are for cloud-init in YAML format.

  • hostname: sets the short hostname.

  • fqdn: sets the fully qualified domain name.

  • manage_etc_hosts: allows cloud-init to manage the /etc/hosts file.

If this option is set to false, cloud-init won’t overwrite manual changes to /etc/hosts on reboot.

Cloud-init Script Examples

Cloud-init configuration using YAML should start with #cloud-config.

Users and Groups

When a virtual machine starts, you can predefine users with the users directive:

#cloud-config
users:
  - name: userOne
    gecos: This is the first user
    groups: sudo
    shell: sh
    system: true

  - name: userTwo
    gecos: This is the second user
    groups: sudo
    shell: /bin/bash
    system: false
    expiredate: '2030-01-02'

As shown, each new user entry begins with a dash, and parameters are specified in a "key: value" format.

These parameters mean:

  • name: User account name

  • gecos: Brief info about the user

  • groups: Groups the user belongs to

  • shell: Default shell for the user, here set to the simplest sh.

  • system: If true, the account will be a system account without a home directory.

  • expiredate: The user's expiration date in the "YYYY-MM-DD" format.

Changing User Passwords

Another simple directive is chpasswd, used to reset an existing user's password. Example configuration:

#cloud-config
chpasswd:
  list: |
    userOne:passOne
    userTwo:passTwo
    userThree:passThree
  expire: false

This sets a list of users and their new passwords. The | symbol indicates a multi-line entry. The expire parameter defines whether the password will need to be changed after expiration.

Updating the Repository List

cloud-config has a directive for updating the available package list: package_update. It's the declarative equivalent of running

 sudo apt update 

By default, it's set to true, meaning cloud-init will always update the package list unless explicitly disabled:

#cloud-config
package_update: false

Installing Specific Packages

For updating or installing specific packages, use the packages directive:

#cloud-config
packages:
  - nginx
  - nodejs

Running Commands

The runcmd directive allows you to execute console commands through cloud-config. Simply pass a list of commands that cloud-init will run in sequence:

#cloud-config
runcmd:
  - echo 'This is a string command!' >> /somefile.txt
  - [ sh, -c, "echo 'This is a list command!' >> /somefile.txt" ]

Here, two types of commands are used:

  1. As a simple string.

  2. As a YAML list specifying the executable and its arguments.

Another similar directive is bootcmd. While runcmd runs commands only on the system's first boot, bootcmd runs commands on every boot:

#cloud-config
bootcmd:
  - echo 'Command that runs at every system boot!'

Creating and Running a Script

You can combine runcmd with the write_files directive to create and run a script:

#cloud-config
write_files:
  - path: /run/scripts/somescript.sh
    content: |
      #!/bin/bash
      echo 'This script just executed!'
    permissions: '0755'
runcmd:
  - [ sh, "/run/scripts/somescript.sh" ]

The permissions parameter (set to 0755) means the script is readable and executable by all, but only writable by the owner.

Overriding Module Execution

You can override the list of modules to be executed at specific configuration stages. For example, the default cloud_config_modules list might look like this:

#cloud-config
cloud_config_modules:
  - emit_upstart
  - snap
  - ssh-import-id
  - locale
  - set-passwords
  - grub-dpkg
  - apt-pipelining
  - apt-configure
  - ubuntu-advantage
  - ntp
  - timezone
  - disable-ec2-metadata
  - runcmd
  - byobu

Remember, there are three stages:

  • cloud_init_modules

  • cloud_config_modules

  • cloud_final_modules

If you remove runcmd, for example, the commands within it won’t execute.

Updating Repositories and Installing Packages via Shell Script

cloud-init configurations can also consist purely of shell scripts. In this case, the script starts with #!/bin/sh instead of #cloud-config:

#!/bin/sh
apt update
apt -y install nodejs
apt -y install nginx

The -y flag automatically answers "yes" to any prompts during installation.

Conclusion

In this guide, we covered the theoretical and practical aspects of using cloud-init:

  • How cloud-init works.

  • How to interact with cloud-init for system configuration.

  • Writing scripts in YAML or shell format.

  • Example configurations.

cloud-init runs before the system boots, ensuring that the instance follows the desired configuration (network, directories, packages, updates). cloud-init uses modules for specific configuration tasks, and the system configuration is done in phases:

  • init (before networking)

  • config (after networking)

  • final (last stage)

More detailed information is available in the official documentation maintained by Canonical, the primary developer of Ubuntu.

Linux
26.09.2024
Reading time: 10 min

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start
Email us