Difference between revisions of "VScaler: Thoubleshooting Kolla issues"

From Define Wiki
Jump to navigation Jump to search
(bifrost-base source image build failure)
Line 105: Line 105:
 
<syntaxhighlight>
 
<syntaxhighlight>
 
pip install docker==2.4
 
pip install docker==2.4
 +
</syntaxhighlight>
 +
 +
== bifrost-base source image build failure ==
 +
 +
The current version of kolla in the pip repositories is affected by the following bug:
 +
 +
[https://bugs.launchpad.net/kolla/+bug/1667308]
 +
 +
Symptoms:
 +
 +
<syntaxhighlight>
 +
ERROR:kolla.image.build.bifrost-base:The command '/bin/sh -c bash -c './scripts/env-setup.sh && source ./env-vars && ansible-playbook -vvvv -i /bifrost/playbooks/inventory/target /bifrost/playbooks/install.yaml -e @/tmp/build_arg.yml && yum clean all'' returned a non-zero code: 1
 +
</syntaxhighlight>
 +
 +
Workaround - Apply the following patch to kolla/docker/bifrost/bifrost-base/Dockerfile.j2:
 +
 +
[https://review.openstack.org/#/c/437974/2/docker/bifrost/bifrost-base/Dockerfile.j2]
 +
 +
<syntaxhighlight>
 +
- RUN bash -c './scripts/env-setup.sh && source ./env-vars && \
 +
+ RUN bash -c 'sed -e "s/\-\-force\-reinstall //g" -i /bifrost/playbooks/roles/bifrost-{ironic,keystone}-install/tasks/install.yml' \
 +
    && bash -c './scripts/env-setup.sh && source ./env-vars && \
 
</syntaxhighlight>
 
</syntaxhighlight>

Revision as of 12:03, 20 November 2017

Log location

The logs are on the nodes under: /var/lib/docker/volumes/kolla_logs/_data/

When a service fails you will find useful info in the koala logs of the container of that service. To check the logs of nova-conductor service for example we'll do:

[root@head01 ~]# ssh controller01
[root@controller01-enp2s0 ~]# tail /var/lib/docker/volumes/kolla_logs/_data/nova/nova-conductor.log

Interface ansible_<if> does not exist

If you see a message of this sort in the kolla-ansible output, it's most likely referring to a node that has an interface with a different name than the one specified in the "network_interface" variable in the /etc/kolla/globals.yaml file.

Solution

To get past this issue, just add api_interface=ens5 next to the node's name in the inventory file. I've had a time when I need to add tunnel_interface=ens5 as well. The error message will tell you if the tunnel or the api one is the problem. The inventory file should look like this:

...
gpu01-ens5 tunnel_interface=ens5 api_interface=ens5
...

Connection refused errors in nova-conductor logs

When I was doing a deploy the nova service wouldn't come up properly. After checking the nova-conductor logs on the controller node that reported the error, I saw a lot of errors like this:

ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.106:5672 is unreachable: [Errno 111] ECONNREFUSED

Solution

Disable SELINUX on the controller nodes and reboot them!

Debugging containers that dont start

When a container fails to start - you can recreate the error using the image name and then pass -a to docker start:

[root@controller01 ~]# docker ps -a
CONTAINER ID        IMAGE                                                     COMMAND             CREATED             STATUS                    PORTS               NAMES
4de70ccf5a4d        10.10.10.1:4000/kolla/centos-binary-glance-api:4.0.3      "kolla_start"       10 hours ago        Exited (1) 10 hours ago                       bootstrap_glance
10f4038a7d77        10.10.10.1:4000/kolla/centos-binary-keystone:4.0.3        "kolla_start"       10 hours ago        Up 10 hours                                   keystone
e96bf1cb3258        10.10.10.1:4000/kolla/centos-binary-rabbitmq:4.0.3        "kolla_start"       10 hours ago        Up 10 hours                                   rabbitmq
b0094b42cb75        10.10.10.1:4000/kolla/centos-binary-mariadb:4.0.3         "kolla_start"       10 hours ago        Up 10 hours                                   mariadb
d49e0b00bf84        10.10.10.1:4000/kolla/centos-binary-memcached:4.0.3       "kolla_start"       10 hours ago        Up 10 hours                                   memcached
1a1599296c59        10.10.10.1:4000/kolla/centos-binary-keepalived:4.0.3      "kolla_start"       10 hours ago        Up 10 hours                                   keepalived
accc84f93171        10.10.10.1:4000/kolla/centos-binary-haproxy:4.0.3         "kolla_start"       10 hours ago        Up 10 hours                                   haproxy
f25d30f403d2        10.10.10.1:4000/kolla/centos-binary-cron:4.0.3            "kolla_start"       10 hours ago        Up 10 hours                                   cron
0be143a36b6d        10.10.10.1:4000/kolla/centos-binary-kolla-toolbox:4.0.3   "kolla_start"       10 hours ago        Up 10 hours                                   kolla_toolbox
2f667f97a160        10.10.10.1:4000/kolla/centos-binary-fluentd:4.0.3         "kolla_start"       10 hours ago        Up 10 hours                                   fluentd
[root@controller01 ~]# docker start -a  bootstrap_glance
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting file /etc/glance/glance-api.conf
INFO:__main__:Coping file from /var/lib/kolla/config_files/glance-api.conf to /etc/glance/glance-api.conf
INFO:__main__:Setting file /etc/glance/glance-api.conf owner to glance:glance
INFO:__main__:Setting file /etc/glance/glance-api.conf permission to 0600
ERROR:__main__:MissingRequiredSource: /var/lib/kolla/config_files/ceph.* file is not found

"AttributeError: 'module' object has no attribute 'APIClient'" during kolla-build

If the following error appears during kolla build:

[root@node01-head tools]# kolla-build bifrost-deploy --type source --base centos
INFO:kolla.image.build:Found the docker image folder at /usr/share/kolla/docker
Traceback (most recent call last):
  File "/usr/bin/kolla-build", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/kolla/cmd/build.py", line 30, in main
    statuses = build.run_build()
  File "/usr/lib/python2.7/site-packages/kolla/image/build.py", line 1110, in run_build
    kolla = KollaWorker(conf)
  File "/usr/lib/python2.7/site-packages/kolla/image/build.py", line 586, in __init__
    self.dc = docker.APIClient(version='auto', **docker_kwargs)
AttributeError: 'module' object has no attribute 'APIClient'

Work around by removing docker-py and installing docker==2.4.

pip uninstall docker-py
pip install docker==2.4

If a later version of docker is installed the following error will be produced instead:

[root@node01-head tools]# kolla-build bifrost-deploy --type source --base centos
Traceback (most recent call last):
  File "/usr/bin/kolla-build", line 7, in <module>
    from kolla.cmd.build import main
  File "/usr/lib/python2.7/site-packages/kolla/cmd/build.py", line 26, in <module>
    from kolla.image import build
  File "/usr/lib/python2.7/site-packages/kolla/image/build.py", line 32, in <module>
    import docker
ImportError: No module named docker

Downgrade docker to work around:

pip install docker==2.4

bifrost-base source image build failure

The current version of kolla in the pip repositories is affected by the following bug:

[1]

Symptoms:

ERROR:kolla.image.build.bifrost-base:The command '/bin/sh -c bash -c './scripts/env-setup.sh && source ./env-vars && ansible-playbook -vvvv -i /bifrost/playbooks/inventory/target /bifrost/playbooks/install.yaml -e @/tmp/build_arg.yml && yum clean all'' returned a non-zero code: 1

Workaround - Apply the following patch to kolla/docker/bifrost/bifrost-base/Dockerfile.j2:

[2]

- RUN bash -c './scripts/env-setup.sh && source ./env-vars && \
+ RUN bash -c 'sed -e "s/\-\-force\-reinstall //g" -i /bifrost/playbooks/roles/bifrost-{ironic,keystone}-install/tasks/install.yml' \
    && bash -c './scripts/env-setup.sh && source ./env-vars && \