OpenStack:Ironic

From Define Wiki
Revision as of 10:14, 9 April 2020 by Mariusz (talk | contribs) (→‎Recommended post-deployment configuration: Add a link to CentOS 7 cloud images)
Jump to navigation Jump to search

Reference diagram

Taken from docs for Genomics England:

Error creating thumbnail: File missing

Recommended kolla-ansible overrides

NOTE: Stein or newer is recommended for this, but when multitenancy won't be enabled, Rocky will also work.

Overrides in /etc/kolla/globals.yml enabling Ironic and its Neutron agent (OpenStack Stein):

enable_ironic: "yes"
enable_ironic_neutron_agent: "yes"

Overrides in /etc/kolla/config/ironic/ironic-conductor.conf:

[DEFAULT]
enabled_network_interfaces = noop,flat
default_network_interface = flat

[pxe]
tftp_server = <ip-of-a-dedicated-interface>
pxe_append_params = nofb nomodeset vga=normal console=ttyS1,115200 console=tty0 sshkey="ssh-rsa AAAA..." ipa-debug=1 coreos.autologin

[deploy]
default_boot_option = local

[agent]
deploy_logs_collect = on_failure

[conductor]
clean_callback_timeout = 300

The IP for "tftp_server" can be the same as for the interface on which internal OpenStack APIs are running on the host that's hosting Ironic (only setup with one controller hosting all Ironic components has been tested). Also set "sshkey" to the public key of your deploy node or headnode to enable SSH access to the node that's being deployed.

Add these overrides to /etc/kolla/config/neutron/ml2_conf.ini to make sure that Neutron's ML2 agent can create VLAN networks on Ironic's bridge:

[ml2_type_vlan]
network_vlan_ranges = physnet1:300:400

Replace physnet1 with the provider interface that's set up on the interface leading to baremetal nodes and the numbers -- with VLAN ID range that you want your VLAN networks to be in (or simply remove the numbers to allow for all VLAN IDs to be used).

Multitenancy

Deployments with multitenancy require a few more overrides.

First, enable the "neutron" network interface by replacing

[DEFAULT]
enabled_network_interfaces = noop,flat

with

[DEFAULT]
enabled_network_interfaces = noop,flat,neutron

in /etc/kolla/config/ironic/ironic-conductor.conf. You can also change default_network_interface to neutron if you want the multitenant driver to be the default for all nodes. The network driver can be set on each node individually, so you might as well leave the default to be flat and only enable the driver on selected (or all) nodes.

The "neutron" Ironic driver requires specifying provisioning and cleaning networks upfront in the config file, so add these lines to /etc/kolla/config/ironic/ironic-conductor.conf:

[neutron]
cleaning_network = <provisioning-network-UUID>
provisioning_network = <provisioning-network-UUID>

and add UUIDs of your provisioning network. The provisioning network should be a network (preferably flat, but can also be VALN tagged) with external router set to a gateway for this network set on the physical switch.

There are also extra overrides for the ML2 driver which should be added to /etc/kolla/config/neutron/ml2_conf.ini. Here is an example for a Supermicro switch (confirmed working on SSE-G24-TG4 and MBM-GEM-004):

[ml2]
mechanism_drivers = openvswitch,baremetal,l2population,genericswitch

[genericswitch:supermicro_1]
#device_type = netmiko_supermicro
device_type = netmiko_cisco_s300
ip = <IP-for-ssh-access-to-baremetal-switch>
username = <admin-user-on-switch>
password = <admin-password-on-switch>

[ml2_type_vlan]
network_vlan_ranges = physnet1:100:200

Here mechanism_drivers adds genericswitch to the list of enabled drivers, which is then configured in the [genericswitch:] subsection. The existing netmiko driver for Cisco IOS is used and the driver excepts SSH access to be configured on the switch. Notice that password specified in this config is plain text (not encrypted). Lastly, a range of VLANs is specified on the provider interface that connects to baremetal nodes, so only a subset of VLAN IDs is used for tenant networks.

Recommended post-deployment configuration

NOTE: Steps described here are for localboot nodes and Ironic without multitenancy only.

First, build CoreOS-based Ironic Python Agent (IPA) deploy images. Here are commands to set up your IPA image building environment on a Ubuntu Xenial:

$ sudo apt-get update
$ sudo apt-get install docker.io gzip uuid-runtime cpio findutils grep gnupg cgroup-lite git build-essential python-pip python-dev -y
$ sudo service docker start
$ git clone https://git.openstack.org/openstack/ironic-python-agent
$ cd ironic-python-agent/imagebuild/coreos
$ git checkout cf30024f96e798f5607c9664b0b6236db3232119
$ sudo pip install -r ~/ironic-python-agent/requirements.txt
$ sudo make

Resulting kernel image and ramdisk will be available in the UPLOAD subdirectory.

NOTE: Ready to use deploy images are available here: http://185.93.31.43/ironic-images/

Transfer/mount deploy images and add them to Glance:

$ openstack image create --public --container-format aki --disk-format aki --file ~/coreos_production_pxe.vmlinuz ironic-deploy_kernel
$ openstack image create --public --container-format ari --disk-format ari --file ~/coreos_production_pxe_image-oem.cpio.gz ironic-deploy_ramdisk

Also add regular operating system images - no special options are required here and regular cloud images can be used. Here is an example for a CentOS image downloaded from https://cloud.centos.org/centos/7/images/:

$ openstack image create --public --container-format bare --disk-format qcow2 --file ~/CentOS-7-x86_64-GenericCloud-1907.qcow2 centos7-1907

Create a flavour for your baremetal nodes:

$ openstack flavor create --ram 1024 --vcpus 2 --disk 100 baremetal.small
$ openstack flavor set --property resources:CUSTOM_BAREMETAL=1 baremetal.small
$ openstack flavor set --property resources:VCPU=0 baremetal.small
$ openstack flavor set --property resources:MEMORY_MB=0 baremetal.small
$ openstack flavor set --property resources:DISK_GB=0 baremetal.small

Create a provisioning network and a subnet. These can be just regular flat provider networks, or can be using VLANs. Here is an example for an interface on a VLAN:

$ openstack network create public304 --provider-physical-network physnet1 --provider-network-type vlan --provider-segment 304
$ openstack subnet create --dhcp --allocation-pool start=10.6.44.101,end=10.6.47.254 --network public304 --subnet-range 10.6.44.0/22 --gateway 10.6.44.1 public304-subnet

Finaly, add your nodes to Ironic's database:

$ openstack baremetal node create --name <node-name> \
--driver ipmi --driver-info ipmi_username=<ipmi-username> --driver-info ipmi_password=<ipmi-password> --driver-info ipmi_address=<ipmi-address> \
--driver-info cleaning_network=<uuid-of-the-provisioning-network> --driver-info provisioning_network=<uuid-of-the-provisioning-network> \
--driver-info deploy_kernel=<uuid-of-the-ironic-deploy_kernel-image> --driver-info deploy_ramdisk=<uuid-of-the-ironic-deploy_ramdisk-image> \
--resource-class baremetal --network-interface flat
$ openstack baremetal port create <mac-address-of-the-node's-provisioning-interface> --node <node-uuid>
$ openstack baremetal node manage <node-name>
$ openstack baremetal node provide <node-name>

If everything went well, a deploy by launching instances, like so:

$ openstack server create --image <image-to-provision-node-with> --flavor baremetal.small --security-group ping-and-ssh --key-name mykey --network <name-of-the-provisioning-network> <instance-name>

Multitenancy

When adding a node to Ironic, use --network-interface neutron instead of --network-interface flat.

Next up, when adding a port, you'll need to specify details about the switch port this baremetal node is connected to:

openstack baremetal port set --local-link-connection switch_info="supermicro_1" --local-link-connection switch_id="<MAC-address-of-the-switchport>" --local-link-connection port_id="gi 0/<port-number>" <port-UUID>

Here switch_info is the name given to the genericswitch config section in ironic-conductor.conf, port_id is the name of the port, for example on Supermicro switches this can be gi 0/1, gi 0/2, etc. (this is basically the name that you pass when running the interface <intname> command on the switch). The switch_id is not really that important (as it's not part of configuration of the switch port), but it has to be a MAC address -- makes sense for it to be the MAC of the switch port the node is connected to.

With these in place, you can now create a tenant network on the same provider as the provisioning network and with VLAN as provider type. You don't need to specify VLAN ID, as this one will be selected by Neutron (from the range of VLANs in the ML2 config) and created on the baremetal switch. Also, add to this network a subnet with whatever IP range and DHCP enabled.

Then when launching a baremetal instance, in the --network parameter you specify the tenant network (instead of the provisioning network). Ironic will first provision the node in the provisioning network and when this is done, Neutron will add switchport of the node to the VLAN of the tenant network, so after a reboot the instance will be available in the tenant network.

NOTE: In Stein there is a bug where genericdriver (for Cisco IOS) skips one of the switch command. As a workaround, make sure that all baremetal switchports are set to "access" mode.

References

  1. https://docs.openstack.org/kolla-ansible/rocky/reference/ironic-guide.html#post-deployment-configuration
  2. https://docs.openstack.org/ironic/rocky/install/configure-glance-images.html
  3. https://github.com/openstack/ironic-python-agent/tree/cf30024f96e798f5607c9664b0b6236db3232119/imagebuild/coreos
  4. https://docs.openstack.org/ironic/stein/admin/multitenancy.html

Software RAID

Support for software RAID has been added in the Train release, so if you're on an older release, you'll have to add the following parameters to your globals.yml:

ironic_tag: "train"
ironic_install_type: "source"

and then run a kolla-ansible deploy with the "ironic" tag. Make sure that support for software RAID is present in the revision of the Ironic Python Agent that you're using for building deploy images (or use images available on http://185.93.31.43/ironic-images/).

Reconfigure your Ironic node(s) for software RAID:

# openstack baremetal node set <node-name> --raid-interface agent
# cat ~/raid_1_basic.json
{
 "logical_disks": [
                   {
                    "size_gb": "MAX",
                    "raid_level": "1",
                    "controller": "software"
                   }
                  ]
}

# openstack baremetal node set <node-name> --target-raid-config ~/raid_1_basic.json

Feel free to edit the JSON file (the above is just a basic one for setting up a RAID-1 on 2 disks), but bear in mind limitations of the software RAID driver (check links in References). Here is a more advanced config - with 2 RAID-1 arrays on 2 disks - the first array (the one of size 50GB) is where the operating system will go:

{
 "logical_disks": [
                   {
                    "size_gb": "50",
                    "raid_level": "1",
                    "controller": "software",
                    "is_root_volume": true
                   }
                   {
                    "size_gb": "MAX",
                    "raid_level": "1",
                    "controller": "software"
                   }
                  ]
}

Now the node is ready for RAID creation. In Ironic this is done during the "cleaning" stage.

# openstack baremetal node manage <node-name>
# cat ~/raid_cleaning_steps.json
[{
  "interface": "raid",
  "step": "delete_configuration"
},
{
  "interface": "deploy",
  "step": "erase_devices_metadata"
},
{
  "interface": "raid",
  "step": "create_configuration"
}]
# openstack baremetal node clean <node-name> --clean-steps ~/raid_cleaning_steps.json

There is no need to modify this "cleaning steps" JSON, it should work for any RAID configuration.

This should put the node in the "cleaning" state. Wait for the node get back to the "manageable" state (feel free to fire up the virtual console of this node to keep an eye on the process). Check ironic-conductor logs for errors if the node gets into "clean failed" instead.

When the node is "manageable" again, add this root_device hint specifying your array as the root device and give the node back to Nova:

# openstack baremetal node set <node-name> --property 'root_device={"name": "/dev/md127"}'
# openstack baremetal node provide <node-name>

To provision baremetal instances capable of booting from a software RAID, you'll need your target image (the one that you're flashing the system with) to include mdadm. This is already done in Ubuntu cloud images (at least in Xenial and Bionic), but requires baking an new image when using CentOS 7 cloud images (at least for build 1907 and older). Here are all the mdadm-related commands do use when baking a custom image for CentOS 7:

# yum install mdadm -y
# vi /etc/dracut.conf
# grep hostonly /etc/dracut.conf
hostonly="no"
# dracut -f --add mdraid --add-drivers xfs -v

When baking this image you can also add in your custom cloud-init networking config and set root password if you want.

Instead of baking your own images, you can use qcow2 images available here: http://185.93.31.43/ironic-images/

Finally, launch an instance off of this new image and using the baremetal flavour.

References

  1. https://docs.openstack.org/ironic/train/admin/raid.html
  2. https://techblog.web.cern.ch/techblog/post/ironic_software_raid/

Troubleshooting

First, make sure all Ironic containers are up and running. Here is a list from a working Stein-based environment:

# docker ps | grep ironic
...        registry.vscaler.com:5000/kolla/centos-source-ironic-pxe:stein                  "dumb-init --single-…"   ...   ironic_pxe
...        registry.vscaler.com:5000/kolla/centos-source-ironic-api:stein                  "dumb-init --single-…"   ...   ironic_api
...        registry.vscaler.com:5000/kolla/centos-source-ironic-conductor:stein            "dumb-init --single-…"   ...   ironic_conductor
...        registry.vscaler.com:5000/kolla/centos-binary-ironic-neutron-agent:stein        "dumb-init --single-…"   ...   ironic_neutron_agent
...        registry.vscaler.com:5000/kolla/centos-binary-nova-compute-ironic:stein         "dumb-init --single-…"   ...   nova_compute_ironic

Re-run kolla-ansible deploy with tags "ironic" if ironic_pxe, ironic_api or ironic_conductor are missing, "neutron" if ironic_neutron_agent is missing or "nova" if nova_compute_ironic is missing.

Compare the list of baremetal nodes outputted by openstack baremetal node list with the list of hypervisors from openstack hypervisor list and make sure each node's UUID is on the list of hypervisors. Check ironic-conductor logs if they're not. Fix the problem, remove the problematic nodes and add them back in.

Keep an eye on the iKVM virtual console when building an instance and deal with any DHCP, TFTP or PXE errors that show up. {TODO: Add examples of such problems and solutions to them} If the machine PXE boots successfully, the machine will boot the deploy image that gives you access to the coreos user. Make sure that the machine can reach internal OpenStack API.

If your instance shows up as ACTIVE, but you can't ping it, you should delete the instance and create it again using an image that has root password enabled so you can log into the machine through its OOB console. To add a root password to regular cloud images, you can use the virt-customize tool. Here is a command adding a root password to a CentOS 7 image:

virt-customize -a CentOS-7-x86_64-GenericCloud-1907.qcow2 --root-password password:vScaler2019?!

Cloud-init only brings up the first connected interface, so in most cases this means an onboard interface. If this is not the interface you PXE boot from, you may want to bake a custom config for cloud-init, like this one:

network:
  version: 2
  ethernets:
    enp59s0f0:
      dhcp4: true

or this more advanced one - with a bond:

network:
  version: 2
  ethernets:
    eno1:
      dhcp4: true
  bonds:
    bond0:
      interfaces:
        - enp175s0f0
        - enp175s0f1
      dhcp4: true
      parameters:
        mode: 802.3ad
        mii-monitor-interval: 100

Put your config in /etc/cloud/cloud.cfg.d/custom-networking.cfg

Baremetal instances have their ports marked as DOWN in Neutron, like this:

# openstack server list
+--------------------------------------+------------+--------+-----------------------------+--------------+-----------------+
| ID                                   | Name       | Status | Networks                    | Image        | Flavor          |
+--------------------------------------+------------+--------+-----------------------------+--------------+-----------------+
| 8314b478-310b-4d2a-a316-180c380ab5cf | compute001 | ACTIVE | ironic-prov=192.168.202.104 | centos7-1907 | baremetal.small |
+--------------------------------------+------------+--------+-----------------------------+--------------+-----------------+
# openstack port list | grep 192.168.202.104
| 2d43f9c2-e2a1-420c-8bb3-63a2d01e73c0 |      | b8:59:9f:e2:37:0e | ip_address='192.168.202.104', subnet_id='c54051ef-6344-4026-80da-529f57df6213' | DOWN   |
# openstack baremetal port list | grep b8:59:9f:e2:37:0e
| 957ed1c1-416e-468e-a87b-f14a7eaf5255 | b8:59:9f:e2:37:0e |
# openstack baremetal port show 957ed1c1-416e-468e-a87b-f14a7eaf5255 | grep node
| node_uuid             | 1907746d-4663-48d5-9b93-bd63165db382                             |
# openstack baremetal node list | grep 1907746d-4663-48d5-9b93-bd63165db382
| 1907746d-4663-48d5-9b93-bd63165db382 | node011 | 8314b478-310b-4d2a-a316-180c380ab5cf | power on    | active             | False       |

This is normal and nothing to worry about. {TODO: Why is this port marked as DOWN?}

If you know your node will end in ERROR state and you want to peek inside the deploy image during provisioning, here is how to do this:

  1. Launch a baremetal instance on the node.
  2. As soon as the instance get scheduled to the node (check that instance UUID shows up on the output from openstack baremetal node list), put your node in maintenance mode by running openstack baremetal node maintenance set <node-name>.
  3. Get into the console of the node (for example using SOL) and stop the IPA service: systemctl stop ironic-python-agent.
  4. Check journald logs, make changes, etc. In CoreOS-based deploy images all IPA-related files are in /opt/ironic-python-agent.
  5. When you're done, remove your node from maintenance (openstack baremetal node maintenance unset <node-name>) and start the IPA service (systemctl start ironic-python-agent).

Common error messages

Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: 2019-12-13 13:30:06.652 1699 ERROR root [-] Command execution error: Command execution failed: Installing GRUB2 boot loader to device /dev/md127 failed with Unexpected error while running command.
Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: Command: chroot /tmp/tmpbMmsW1 /bin/sh -c "grub2-install /dev/sda"
Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: Exit code: 1
Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: Stdout: u''
Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: Stderr: u"Installing for i386-pc platform.\ngrub2-install: error: disk `md127,1' not found.\n".: CommandExecutionError: Command execution failed: Installing GRUB2 boot loader to device /dev/md127 failed with Unexpected error while running command.
Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: Command: chroot /tmp/tmpbMmsW1 /bin/sh -c "grub2-install /dev/sda"
Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: Exit code: 1
Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: Stdout: u''
Dec 13 13:30:06 host-192-168-202-51 chroot[1699]: Stderr: u"Installing for i386-pc platform.\ngrub2-install: error: disk `md127,1' not found.\n".

Source: ironic-conductor logs

Your target image doesn't support software RAID. Use/bake an image with the mdadm package installed and mdadm support in the initrd enabled.


2019-12-13 13:04:53.424 6 ERROR ironic.drivers.modules.agent_base_vendor [req-e55e913d-1216-4072-b1cc-94cdca5b72b1 - - - - -] Asynchronous exception: Node failed to deploy. Exception: Deploy failed for instance 66956849-b57c-49b7-ac78-9a0b4f2117b7. Error: Unexpected error while running command.
Command: sudo ironic-rootwrap /etc/ironic/rootwrap.conf dd if=/var/lib/ironic/images/baf22641-4698-450f-a26a-ecca012b4a24/disk of=/dev/disk/by-path/ip-192.168.202.51:3260-iscsi-iqn.2008-10.org.openstack:baf22641-4698-450f-a26a-ecca012b4a24-lun-1 bs=1M oflag=direct
Exit code: 1
Stdout: u''
Stderr: u"/bin/dd: error writing '/dev/disk/by-path/ip-192.168.202.51:3260-iscsi-iqn.2008-10.org.openstack:baf22641-4698-450f-a26a-ecca012b4a24-lun-1': No space left on device\n8192+0 records in\n8191+0 records out\n8588886016 bytes (8.6 GB) copied, 17.2107 s, 499 MB/s\n" for node baf22641-4698-450f-a26a-ecca012b4a24: InstanceDeployFailure: Deploy failed for instance 66956849-b57c-49b7-ac78-9a0b4f2117b7. Error: Unexpected error while running command.

Source: ironic-conductor logs

Ironic Python Agents tries to dd your image to a partition instead of a block device. Either clean (openstack baremetal node clean...) the node to get rid of the partition or set the device_root hint [2] on the node to your block device (for example /dev/md127 for software RAID).


2019-12-16 09:57:00.598 6 ERROR ironic.drivers.modules.agent_base_vendor [-] Agent returned error for clean step {u'interface': u'raid', u'priority': 0, u'step': u'create_configuration', u'abortable': False} on node c3ebcf94-a4db-49bf-927e-1bd186b01a71 : {u'message': u"Clean step failed: Error performing clean_step create_configuration: Software RAID caused unknown error: Failed to create partitions on /dev/sda: Unexpected error while running command.\nCommand: parted /dev/sda -s -a optimal -- mkpart primary 2048s -1\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.\\n'", u'code': 500, u'type': u'CleaningError', u'details': u"Error performing clean_step create_configuration: Software RAID caused unknown error: Failed to create partitions on /dev/sda: Unexpected error while running command.\nCommand: parted /dev/sda -s -a optimal -- mkpart primary 2048s -1\nExit code: 1\nStdout: u''\nStderr: u'Error: Partition(s) 1 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.\\n'"}.

Source: ironic-conductor logs

Cleanup has failed. Run the openstack baremetal node clean... command again.


2019-10-22 12:32:41.312 6 ERROR ironic.conductor.utils [req-37fbce85-8a88-4a61-9a4a-22d6f37dc398 - - - - -] Asynchronous exception: Node failed to deploy. Exception: Connection to agent failed: Failed to connect to the agent running on node d4b3434e-047c-4715-8e89-dfd849477cdc for invoking command image.install_bootloader. Error: HTTPConnectionPool(host='172.16.100.97', port=9999): Read timed out. (read timeout=60) for node: AgentConnectionFailed: Connection to agent failed: Failed to connect to the agent running on node d4b3434e-047c-4715-8e89-dfd849477cdc for invoking command image.install_bootloader. Error: HTTPConnectionPool(host='172.16.100.97', port=9999): Read timed out. (read timeout=60)

Source: ironic-conductor logs

Root cause of this error is unknown, but the problem disappears after a short while.


# openstack server show test_ironic | grep fault
| fault                               | {u'message': u"Host 'controller003-ironic' is not mapped to any cell", u'code': 400, u'created': u'2019-12-11T12:10:15Z'} |

Source: the openstack CLI

This can happen if nova-compute-ironic containers are not up when you add baremetal nodes to Ironic. Make sure these containers are fully functional and then run nova-manage cell_v2 map_cell_and_hosts from one of the nova-api container to fix the problem.

References

  1. https://cloudinit.readthedocs.io/en/18.4/topics/network-config-format-v2.html
  2. https://docs.openstack.org/ironic/train/install/advanced.html#specifying-the-disk-for-deployment-root-device-hints

Ironic POC basic test environment

Error creating thumbnail: File missing

Revision of kolla-ansible used (branch stable/rocky):

commit 668da3c332fcd58fa2b023e8bb74ca8225e222bc
Author: Jeffrey Zhang <zhang.lei.fly@gmail.com>
Date:   Tue Dec 11 16:01:03 2018 +0800

    Add cache configuration for ceilometer project

    when using ceilometer+gnocchi, for every notification sample, ceilometer
    will update the resource even if is not updated.

    We should add [cache] section to make ceilometer cache the resource, and
    stop send the useless update request.

    Closes-Bug: #1807841
    Change-Id: Ic33b4cd5ba8165c20878cab068f38a3948c9d31d
    (cherry picked from commit 55bf29ec6c459dc46cefdee69acb8e427763e409)

Standard all-in-one inventory has been used.

kolla-ansible config (/etc/kolla/globals.yml):

---
config_strategy: "COPY_ALWAYS"
kolla_base_distro: "centos"
kolla_install_type: "binary"
openstack_release: "7.0.2"
kolla_internal_vip_address: "192.168.10.254"
kolla_external_vip_address: "172.28.128.254"
docker_registry: "registry.vscaler.com:5000"
network_interface: "enp131s0f1.10"
kolla_external_vip_interface: "eno1"
neutron_external_interface: "enp131s0f0"
neutron_bridge_name: "br-ironic"
neutron_plugin_agent: "openvswitch"
enable_cinder_backup: "no"
enable_haproxy: "yes"
enable_heat: "yes"
enable_horizon: "yes"
enable_horizon_ironic: "{{ enable_ironic | bool }}"
enable_ironic: "yes"
enable_ironic_neutron_agent: "yes"
enable_swift: "no"
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:
neutron_tenant_network_types: "vlan,flat"
enable_neutron_provider_networks: yes

Config overrides for Ironic (/etc/kolla/config/ironic/ironic-conductor.conf):

[DEFAULT]
my_ip=192.168.10.10
enabled_network_interfaces=noop,flat,neutron
default_network_interface=flat

[deploy]
default_boot_option = netboot

Here, eno1 is the interface providing access to the Ironic host from inside the Labs:

# cat /etc/sysconfig/network-scripts/ifcfg-eno1
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="eno1"
UUID="7e04ebd9-2e2a-4297-b417-b9cb5e498259"
DEVICE="eno1"
ONBOOT="yes"
IPADDR="172.28.128.1"
PREFIX="16"
GATEWAY="172.28.0.2"
DNS1="172.28.0.2"
DNS2="8.8.8.8"
IPV6_PRIVACY="no"

enp131s0f0 is an interface that is up, but has no IP set (this will be used by Neutron to put external bridge on) and enp131s0f1.10 is a tagged secondary interface used for hosting internal OpenStack APIs and Ironic's TFTP server.

# cat /etc/sysconfig/network-scripts/ifcfg-enp131s0f1.10
DEVICE=enp131s0f1.10
NAME=enp131s0f1.10
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.10.10
PREFIX=24
NETWORK=192.168.10.0
VLAN=yes

IPMI to the baremetal host is available through another tagged interface, enp131s0f1.201:

# cat /etc/sysconfig/network-scripts/ifcfg-enp131s0f1.201
DEVICE=enp131s0f1.201
NAME=enp131s0f1.201
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.201.254
PREFIX=24
NETWORK=192.168.201.0
VLAN=yes

Note that because OpenStack here is deployed on a single machine, HAProxy is not strictly required.

If your playbook creates the ironic_dnsmasq container, stop/remove it, so you don't run into potential problems with 2 DHCPs on the same network.

Post deployment setup

Follow https://docs.openstack.org/kolla-ansible/rocky/reference/ironic-guide.html#post-deployment-configuration

For building baremetal images follow https://docs.openstack.org/ironic/rocky/install/configure-glance-images.html Here is an example of an Ubuntu image with a heat agent allowing to run script-based software deployments:

$ disk-image-create baremetal ubuntu dhcp-all-interfaces os-collect-config os-refresh-config os-apply-config heat-config heat-config-cfn-init heat-config-script -o ubuntu-software-config-ironic.qcow2

Add images to Glance:

$ openstack image create --container-format aki --disk-format aki --file ~/ubuntu-software-config-ironic.vmlinuz ubuntu-software-config-ironic_kernel
$ openstack image create --container-format ari --disk-format ari --file ~/ubuntu-software-config-ironic.initrd ubuntu-software-config-ironic_initramfs
$ openstack image create --public --container-format bare --disk-format qcow2 --file ~/ubuntu-software-config-dgx.qcow2 \
--property kernel_id=a8743614-38dc-43ed-8d3b-f4e1b4240eb0 --property ramdisk_id=1d370924-762b-49a0-8619-7f13aa8dafe1 ubuntu-software-config-dgx

In the last command kernel_id and ramdisk_id point to UUIDs of kernel and ramdisk images assigned them by Glance.

Note on localboot: The above setup is susceptible to this bug: https://storyboard.openstack.org/#!/story/2002929. To avoid the problem you can set default_boot_option = local in Ironic overrides, so that your baremetal servers will be able to boot from their local disk after they are done provisioning. More importantly, with local boot you can use regular cloud images - without having to extract kernel and ramdisk out of them first (you'll still need the kernel and ramdisk for initial deploy). This approach will be used in the next section.

Ironic POC with multi-tenancy

TODO: Add diagram


kolla-ansible config (/etc/kolla/globals.yml):

---
config_strategy: "COPY_ALWAYS"
kolla_base_distro: "centos"
kolla_install_type: "binary"
openstack_release: "7.0.2"
kolla_internal_vip_address: "192.168.10.254"
kolla_external_vip_address: "172.28.128.254"
docker_registry: "registry.vscaler.com:5000"
network_interface: "enp131s0f1.10"
kolla_external_vip_interface: "eno1"
neutron_external_interface: "enp131s0f0"
neutron_bridge_name: "br-ironic"
neutron_plugin_agent: "openvswitch"
enable_cinder_backup: "no"
enable_haproxy: "yes"
enable_heat: "yes"
enable_horizon: "yes"
enable_horizon_ironic: "{{ enable_ironic | bool }}"
enable_ironic: "yes"
enable_ironic_neutron_agent: "yes"
enable_swift: "no"
neutron_tenant_network_types: "vlan,flat"
neutron_server_image: "registry.vscaler.com:5000/kolla/centos-source-neutron-server-with-genericswitch"
neutron_server_tag: "7.1.0"
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:
enable_neutron_provider_networks: yes

Ironic-specific overrides (/etc/kolla/config/ironic/ironic-conductor.conf):

[DEFAULT]
my_ip=192.168.10.10
enabled_network_interfaces=noop,flat,neutron
default_network_interface=neutron

[deploy]
default_boot_option = local

Network interface config is exactly the same as in the previous iteration of the deployment (without multi-tenancy).