OpenStack:Ironic

From Define Wiki
Revision as of 12:08, 13 December 2019 by Mariusz (talk | contribs) (Add docs on software RAID)
Jump to navigation Jump to search

Recommended kolla-ansible overrides

NOTE: Stein or newer is recommended for this, but when multitenancy won't be enabled, Rocky will also work.

Overrides in /etc/kolla/globals.yml enabling Ironic and its Neutron agent (OpenStack Stein):

enable_ironic: "yes"
enable_ironic_neutron_agent: "yes"

Overrides in /etc/kolla/config/ironic/ironic-conductor.conf:

[DEFAULT]
enabled_network_interfaces = noop,flat
default_network_interface = flat

[pxe]
tftp_server = <ip-of-a-dedicated-interface>
pxe_append_params = nofb nomodeset vga=normal console=ttyS1,115200 console=tty0 sshkey="ssh-rsa AAAA..." ipa-debug=1 coreos.autologin

[deploy]
default_boot_option = local

[agent]
deploy_logs_collect = on_failure

[conductor]
clean_callback_timeout = 300

The IP for "tftp_server" can be the same as for the interface on which internal OpenStack APIs are running on the host that's hosting Ironic (only setup with one controller hosting all Ironic components has been tested). Also set "sshkey" to the public key of your deploy node or headnode to enable SSH access to the node that's being deployed.

=== Multitenancy Deployments with multitenancy require a few more overrides.

First, enable the "neutron" network interface by replacing

[DEFAULT]
enabled_network_interfaces = noop,flat
</nowkik>
with

 <nowiki>
[DEFAULT]
enabled_network_interfaces = noop,flat,neutron

in /etc/kolla/config/ironic/ironic-conductor.conf. You can also change default_network_interface to neutron if you want the multitenant driver to be the default for all nodes. The network driver can be set on each node individually, so you might as well leave the default to be flat and only enable the driver on selected (or all) nodes.

The "neutron" Ironic driver requires specifying provisioning and cleaning networks upfront in the config file, so add these lines to /etc/kolla/config/ironic/ironic-conductor.conf:

[neutron]
cleaning_network = <provisioning-network-UUID>
provisioning_network = <provisioning-network-UUID>

and add UUIDs of your provisioning network. The provisioning network should be a network (preferably flat, but can also be VALN tagged) with external router set to a gateway for this network set on the physical switch.

There are also extra overrides for the ML2 driver which should be added to /etc/kolla/config/neutron/ml2_conf.ini. Here is an example for a Supermicro switch (confirmed working on SSE-G24-TG4 and MBM-GEM-004):

[ml2]
mechanism_drivers = openvswitch,baremetal,l2population,genericswitch

[genericswitch:supermicro_1]
#device_type = netmiko_supermicro
device_type = netmiko_cisco_s300
ip = <IP-for-ssh-access-to-baremetal-switch>
username = <admin-user-on-switch>
password = <admin-password-on-switch>

[ml2_type_vlan]
network_vlan_ranges = physnet1:100:200

Here mechanism_drivers adds genericswitch to the list of enabled drivers, which is then configured in the [genericswitch:] subsection. The existing netmiko driver for Cisco IOS is used and the driver excepts SSH access to be configured on the switch. Notice that password specified in this config is plain text (not encrypted). Lastly, a range of VLANs is specified on the provider interface that connects to baremetal nodes, so only a subset of VLAN IDs is used for tenant networks.


Recommended post-deployment configuration

NOTE: Steps described here are for localboot nodes and Ironic without multitenancy only.

First, build CoreOS-based Ironic Python Agent (IPA) deploy images. Here are commands to set up your IPA image building environment on a Ubuntu Xenial:

$ sudo apt-get update
$ sudo apt-get install docker.io gzip uuid-runtime cpio findutils grep gnupg cgroup-lite git build-essential python-pip python-dev -y
$ sudo service docker start
$ git clone https://git.openstack.org/openstack/ironic-python-agent
$ cd ironic-python-agent/imagebuild/coreos
$ git checkout cf30024f96e798f5607c9664b0b6236db3232119
$ sudo pip install -r ~/ironic-python-agent/requirements.txt
$ sudo make

Resulting kernel image and ramdisk will be available in the UPLOAD subdirectory.

NOTE: Ready to use deploy images are available here: http://185.93.31.43/ironic-images/

Transfer/mount deploy images and add them to Glance:

$ openstack image create --public --container-format aki --disk-format aki --file ~/coreos_production_pxe.vmlinuz ironic-deploy_kernel
$ openstack image create --public --container-format ari --disk-format ari --file ~/coreos_production_pxe_image-oem.cpio.gz ironic-deploy_ramdisk

Also add regular operating system images - no special options are required here and regular cloud images can be used. Here is an example for a CentOS image:

$ openstack image create --public --container-format bare --disk-format qcow2 --file ~/CentOS-7-x86_64-GenericCloud-1907.qcow2 centos7-1907

Create a flavour for your baremetal nodes:

$ openstack flavor create --ram 1024 --vcpus 2 --disk 100 baremetal.small
$ openstack flavor set --property resources:CUSTOM_BAREMETAL=1 baremetal.small
$ openstack flavor set --property resources:VCPU=0 baremetal.small
$ openstack flavor set --property resources:MEMORY_MB=0 baremetal.small
$ openstack flavor set --property resources:DISK_GB=0 baremetal.small

Create a provisioning network and a subnet. These can be just regular flat provider networks, or can be using VLANs. Here is an example for an interface on a VLAN:

$ openstack network create public304 --provider-physical-network physnet1 --provider-network-type vlan --provider-segment 304
$ openstack subnet create --dhcp --allocation-pool start=10.6.44.101,end=10.6.47.254 --network public304 --subnet-range 10.6.44.0/22 --gateway 10.6.44.1 public304-subnet

Finaly, add your nodes to Ironic's database:

$ openstack baremetal node create --name <node-name> \
--driver ipmi --driver-info ipmi_username=<ipmi-username> --driver-info ipmi_password=<ipmi-password> --driver-info ipmi_address=<ipmi-address> \
--driver-info cleaning_network=<uuid-of-the-provisioning-network> --driver-info provisioning_network=<uuid-of-the-provisioning-network> \
--driver-info deploy_kernel=<uuid-of-the-ironic-deploy_kernel-image> --driver-info deploy_ramdisk=<uuid-of-the-ironic-deploy_ramdisk-image> \
--resource-class baremetal --network-interface flat
$ openstack baremetal port create <mac-address-of-the-node's-provisioning-interface> --node <node-uuid>
$ openstack baremetal node manage <node-name>
$ openstack baremetal node provide <node-name>

If everything went well, a deploy by launching instances, like so:

$ openstack server create --image <image-to-provision-node-with> --flavor baremetal.small --security-group ping-and-ssh --key-name mykey --network <name-of-the-provisioning-network> <instance-name>

=== Multitenancy When adding a node to Ironic, use --network-interface neutron instead of --network-interface flat.

Next up, when adding a port, you'll need to specify details about the switch port this baremetal node is connected to:

openstack baremetal port set --local-link-connection switch_info="supermicro_1" --local-link-connection switch_id="<MAC-address-of-the-switchport>" --local-link-connection port_id="gi 0/<port-number>" <port-UUID>

Here switch_info is the name given to the genericswitch config section in ironic-conductor.conf, port_id is the name of the port, for example on Supermicro switches this can be gi 0/1, gi 0/2, etc. (this is basically the name that you pass when running the interface <intname> command on the switch). The switch_id is not really that important (as it's not part of configuration of the switch port), but it has to be a MAC address -- makes sense for it to be the MAC of the switch port the node is connected to.

With these in place, you can now create a tenant network on the same provider as the provisioning network and with VLAN as provider type. You don't need to specify VLAN ID, as this one will be selected by Neutron (from the range of VLANs in the ML2 config) and created on the baremetal switch. Also, add to this network a subnet with whatever IP range and DHCP enabled.

Then when launching a baremetal instance, in the --network parameter you specify the tenant network (instead of the provisioning network). Ironic will first provision the node in the provisioning network and when this is done, Neutron will add switchport of the node to the VLAN of the tenant network, so after a reboot the instance will be available in the tenant network.

NOTE: In Stein there is a bug where genericdriver (for Cisco IOS) skips one of the switch command. As a workaround, make sure that all baremetal switchports are set to "access" mode.

References

  1. https://docs.openstack.org/kolla-ansible/rocky/reference/ironic-guide.html#post-deployment-configuration
  2. https://docs.openstack.org/ironic/rocky/install/configure-glance-images.html
  3. https://github.com/openstack/ironic-python-agent/tree/cf30024f96e798f5607c9664b0b6236db3232119/imagebuild/coreos
  4. https://docs.openstack.org/ironic/stein/admin/multitenancy.html

Software RAID

Support for software RAID has been added in the Train release, so if you're on an older release, you'll have to add the following parameters to your globals.yml:

ironic_tag: "train"
ironic_install_type: "source"

and then run a kolla-ansible deploy with the "ironic" tag. Make sure that support for software RAID is present in the revision of the Ironic Python Agent that you're using for building deploy images (or use images available on http://185.93.31.43/ironic-images/).

Reconfigure your Ironic node(s) for software RAID:

# openstack baremetal node set <node-name> --raid-interface agent
# cat ~/raid_1_basic.json
{
 "logical_disks": [
                   {
                    "size_gb": "MAX",
                    "raid_level": "1",
                    "controller": "software"
                   }
                  ]
}

# openstack baremetal node set <node-name> --target-raid-config ~/raid_1_basic.json

Feel free to edit the JSON file (the above is just a basic one for setting up a RAID-1 on 2 disks), but bear in mind limitations of the software RAID driver (check links in References).

Now the node is ready for RAID creation. In Ironic this is done during the "cleaning" stage.

# openstack baremetal node manage <node-name>
# cat ~/raid_cleaning_steps.json
[{
  "interface": "raid",
  "step": "delete_configuration"
},
{
  "interface": "deploy",
  "step": "erase_devices_metadata"
},
{
  "interface": "raid",
  "step": "create_configuration"
}]
# openstack baremetal node clean <node-name> --clean-steps ~/raid_cleaning_steps.json

There is no need to modify this "cleaning steps" JSON, it should work for any RAID configuration.

This should put the node in the "cleaning" state. Wait for the node get back to the "manageable" state (feel free to fire up the virtual console of this node to keep an eye on the process). Check ironic-conductor logs for errors if the node gets into "clean failed" instead.

When the node is "manageable" again, add this root_device hint specifying your array as the root device and give the node back to Nova:

# openstack baremetal node set <node-name> --property 'root_device={"name": "/dev/md127"}'
# openstack baremetal node provide <node-name>

Then provision baremetal instances as usual.

References

  1. https://docs.openstack.org/ironic/train/admin/raid.html
  2. https://techblog.web.cern.ch/techblog/post/ironic_software_raid/


Troubleshooting

First, make sure all Ironic containers are up and running. Here is a list from a working Stein-based environment:

# docker ps | grep ironic
...        registry.vscaler.com:5000/kolla/centos-source-ironic-pxe:stein                  "dumb-init --single-…"   ...   ironic_pxe
...        registry.vscaler.com:5000/kolla/centos-source-ironic-api:stein                  "dumb-init --single-…"   ...   ironic_api
...        registry.vscaler.com:5000/kolla/centos-source-ironic-conductor:stein            "dumb-init --single-…"   ...   ironic_conductor
...        registry.vscaler.com:5000/kolla/centos-binary-ironic-neutron-agent:stein        "dumb-init --single-…"   ...   ironic_neutron_agent
...        registry.vscaler.com:5000/kolla/centos-binary-nova-compute-ironic:stein         "dumb-init --single-…"   ...   nova_compute_ironic

Re-run kolla-ansible deploy with tags "ironic" if ironic_pxe, ironic_api or ironic_conductor are missing, "neutron" if ironic_neutron_agent is missing or "nova" if nova_compute_ironic is missing.

Compare the list of baremetal nodes outputted by openstack baremetal node list with the list of hypervisors from openstack hypervisor list and make sure each node's UUID is on the list of hypervisors. Check ironic-conductor logs if they're not. Fix the problem, remove the problematic nodes and add them back in.

Keep an eye on the iKVM virtual console when building an instance and deal with any DHCP, TFTP or PXE errors that show up. {TODO: Add examples of such problems and solutions to them} If the machine PXE boots successfully, the machine will boot the deploy image that gives you access to the coreos user. Make sure that the machine can reach internal OpenStack API.

If your instance shows up as ACTIVE, but you can't ping it, you should delete the instance and create it again using an image that has root password enabled so you can log into the machine through its OOB console. To add a root password to regular cloud images, you can use the virt-customize tool. Here is a command adding a root password to a CentOS 7 image:

virt-customize -a CentOS-7-x86_64-GenericCloud-1907.qcow2 --root-password password:vScaler2019?!

Cloud-init only brings up the first connected interface, so in most cases this means an onboard interface. If this is not the interface you PXE boot from, you may want to bake a custom config for cloud-init, like this one:

network:
  version: 2
  ethernets:
    enp59s0f0:
      dhcp4: true

or this more advanced one - with a bond:

network:
  version: 2
  ethernets:
    eno1:
      dhcp4: true
  bonds:
    bond0:
      interfaces:
        - enp175s0f0
        - enp175s0f1
      dhcp4: true
      parameters:
        mode: 802.3ad
        mii-monitor-interval: 100

Put your config in /etc/cloud/cloud.cfg.d/custom-networking.cfg

Baremetal instances have their ports marked as DOWN in Neutron, like this:

# openstack server list
+--------------------------------------+------------+--------+-----------------------------+--------------+-----------------+
| ID                                   | Name       | Status | Networks                    | Image        | Flavor          |
+--------------------------------------+------------+--------+-----------------------------+--------------+-----------------+
| 8314b478-310b-4d2a-a316-180c380ab5cf | compute001 | ACTIVE | ironic-prov=192.168.202.104 | centos7-1907 | baremetal.small |
+--------------------------------------+------------+--------+-----------------------------+--------------+-----------------+
# openstack port list | grep 192.168.202.104
| 2d43f9c2-e2a1-420c-8bb3-63a2d01e73c0 |      | b8:59:9f:e2:37:0e | ip_address='192.168.202.104', subnet_id='c54051ef-6344-4026-80da-529f57df6213' | DOWN   |
# openstack baremetal port list | grep b8:59:9f:e2:37:0e
| 957ed1c1-416e-468e-a87b-f14a7eaf5255 | b8:59:9f:e2:37:0e |
# openstack baremetal port show 957ed1c1-416e-468e-a87b-f14a7eaf5255 | grep node
| node_uuid             | 1907746d-4663-48d5-9b93-bd63165db382                             |
# openstack baremetal node list | grep 1907746d-4663-48d5-9b93-bd63165db382
| 1907746d-4663-48d5-9b93-bd63165db382 | node011 | 8314b478-310b-4d2a-a316-180c380ab5cf | power on    | active             | False       |

This is normal and nothing to worry about. {TODO: Why is this port marked as DOWN?}

References

  1. https://cloudinit.readthedocs.io/en/18.4/topics/network-config-format-v2.html

Ironic POC basic test environment

Error creating thumbnail: File missing

Revision of kolla-ansible used (branch stable/rocky):

commit 668da3c332fcd58fa2b023e8bb74ca8225e222bc
Author: Jeffrey Zhang <zhang.lei.fly@gmail.com>
Date:   Tue Dec 11 16:01:03 2018 +0800

    Add cache configuration for ceilometer project

    when using ceilometer+gnocchi, for every notification sample, ceilometer
    will update the resource even if is not updated.

    We should add [cache] section to make ceilometer cache the resource, and
    stop send the useless update request.

    Closes-Bug: #1807841
    Change-Id: Ic33b4cd5ba8165c20878cab068f38a3948c9d31d
    (cherry picked from commit 55bf29ec6c459dc46cefdee69acb8e427763e409)

Standard all-in-one inventory has been used.

kolla-ansible config (/etc/kolla/globals.yml):

---
config_strategy: "COPY_ALWAYS"
kolla_base_distro: "centos"
kolla_install_type: "binary"
openstack_release: "7.0.2"
kolla_internal_vip_address: "192.168.10.254"
kolla_external_vip_address: "172.28.128.254"
docker_registry: "registry.vscaler.com:5000"
network_interface: "enp131s0f1.10"
kolla_external_vip_interface: "eno1"
neutron_external_interface: "enp131s0f0"
neutron_bridge_name: "br-ironic"
neutron_plugin_agent: "openvswitch"
enable_cinder_backup: "no"
enable_haproxy: "yes"
enable_heat: "yes"
enable_horizon: "yes"
enable_horizon_ironic: "{{ enable_ironic | bool }}"
enable_ironic: "yes"
enable_ironic_neutron_agent: "yes"
enable_swift: "no"
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:
neutron_tenant_network_types: "vlan,flat"
enable_neutron_provider_networks: yes

Config overrides for Ironic (/etc/kolla/config/ironic/ironic-conductor.conf):

[DEFAULT]
my_ip=192.168.10.10
enabled_network_interfaces=noop,flat,neutron
default_network_interface=flat

[deploy]
default_boot_option = netboot

Here, eno1 is the interface providing access to the Ironic host from inside the Labs:

# cat /etc/sysconfig/network-scripts/ifcfg-eno1
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="eno1"
UUID="7e04ebd9-2e2a-4297-b417-b9cb5e498259"
DEVICE="eno1"
ONBOOT="yes"
IPADDR="172.28.128.1"
PREFIX="16"
GATEWAY="172.28.0.2"
DNS1="172.28.0.2"
DNS2="8.8.8.8"
IPV6_PRIVACY="no"

enp131s0f0 is an interface that is up, but has no IP set (this will be used by Neutron to put external bridge on) and enp131s0f1.10 is a tagged secondary interface used for hosting internal OpenStack APIs and Ironic's TFTP server.

# cat /etc/sysconfig/network-scripts/ifcfg-enp131s0f1.10
DEVICE=enp131s0f1.10
NAME=enp131s0f1.10
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.10.10
PREFIX=24
NETWORK=192.168.10.0
VLAN=yes

IPMI to the baremetal host is available through another tagged interface, enp131s0f1.201:

# cat /etc/sysconfig/network-scripts/ifcfg-enp131s0f1.201
DEVICE=enp131s0f1.201
NAME=enp131s0f1.201
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.201.254
PREFIX=24
NETWORK=192.168.201.0
VLAN=yes

Note that because OpenStack here is deployed on a single machine, HAProxy is not strictly required.

If your playbook creates the ironic_dnsmasq container, stop/remove it, so you don't run into potential problems with 2 DHCPs on the same network.

Post deployment setup

Follow https://docs.openstack.org/kolla-ansible/rocky/reference/ironic-guide.html#post-deployment-configuration

For building baremetal images follow https://docs.openstack.org/ironic/rocky/install/configure-glance-images.html Here is an example of an Ubuntu image with a heat agent allowing to run script-based software deployments:

$ disk-image-create baremetal ubuntu dhcp-all-interfaces os-collect-config os-refresh-config os-apply-config heat-config heat-config-cfn-init heat-config-script -o ubuntu-software-config-ironic.qcow2

Add images to Glance:

$ openstack image create --container-format aki --disk-format aki --file ~/ubuntu-software-config-ironic.vmlinuz ubuntu-software-config-ironic_kernel
$ openstack image create --container-format ari --disk-format ari --file ~/ubuntu-software-config-ironic.initrd ubuntu-software-config-ironic_initramfs
$ openstack image create --public --container-format bare --disk-format qcow2 --file ~/ubuntu-software-config-dgx.qcow2 \
--property kernel_id=a8743614-38dc-43ed-8d3b-f4e1b4240eb0 --property ramdisk_id=1d370924-762b-49a0-8619-7f13aa8dafe1 ubuntu-software-config-dgx

In the last command kernel_id and ramdisk_id point to UUIDs of kernel and ramdisk images assigned them by Glance.

Note on localboot: The above setup is susceptible to this bug: https://storyboard.openstack.org/#!/story/2002929. To avoid the problem you can set default_boot_option = local in Ironic overrides, so that your baremetal servers will be able to boot from their local disk after they are done provisioning. More importantly, with local boot you can use regular cloud images - without having to extract kernel and ramdisk out of them first (you'll still need the kernel and ramdisk for initial deploy). This approach will be used in the next section.

Ironic POC with multi-tenancy

TODO: Add diagram


kolla-ansible config (/etc/kolla/globals.yml):

---
config_strategy: "COPY_ALWAYS"
kolla_base_distro: "centos"
kolla_install_type: "binary"
openstack_release: "7.0.2"
kolla_internal_vip_address: "192.168.10.254"
kolla_external_vip_address: "172.28.128.254"
docker_registry: "registry.vscaler.com:5000"
network_interface: "enp131s0f1.10"
kolla_external_vip_interface: "eno1"
neutron_external_interface: "enp131s0f0"
neutron_bridge_name: "br-ironic"
neutron_plugin_agent: "openvswitch"
enable_cinder_backup: "no"
enable_haproxy: "yes"
enable_heat: "yes"
enable_horizon: "yes"
enable_horizon_ironic: "{{ enable_ironic | bool }}"
enable_ironic: "yes"
enable_ironic_neutron_agent: "yes"
enable_swift: "no"
neutron_tenant_network_types: "vlan,flat"
neutron_server_image: "registry.vscaler.com:5000/kolla/centos-source-neutron-server-with-genericswitch"
neutron_server_tag: "7.1.0"
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:
enable_neutron_provider_networks: yes

Ironic-specific overrides (/etc/kolla/config/ironic/ironic-conductor.conf):

[DEFAULT]
my_ip=192.168.10.10
enabled_network_interfaces=noop,flat,neutron
default_network_interface=neutron

[deploy]
default_boot_option = local

Network interface config is exactly the same as in the previous iteration of the deployment (without multi-tenancy).