Difference between revisions of "VScaler: Thoubleshooting Kolla issues"
(Added "Waiting for nova-compute service up timing out due to rabbitmq breaking") |
|||
| (2 intermediate revisions by the same user not shown) | |||
| Line 107: | Line 107: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | == | + | == Waiting for nova-compute service up timing out due to rabbitmq breaking == |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
Symptoms: | Symptoms: | ||
| − | |||
<syntaxhighlight> | <syntaxhighlight> | ||
| − | + | TASK [nova : Waiting for nova-compute service up] **************************************************************************************************** | |
| + | FAILED - RETRYING: Waiting for nova-compute service up (20 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (19 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (18 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (17 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (16 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (15 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (14 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (13 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (12 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (11 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (10 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (9 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (8 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (7 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (6 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (5 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (4 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (3 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (2 retries left). | ||
| + | FAILED - RETRYING: Waiting for nova-compute service up (1 retries left). | ||
| + | fatal: [node01 -> node01]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["docker", "exec", "kolla_toolbox", "openstack", "--os-interface", "internal", "--os-auth-url", "http://10.10.11.254:35357", "--os-identity-api-version", "3", "--os-project-domain-name", "default", "--os-tenant-name", "admin", "--os-username", "admin", "--os-password", "uBz6L2iZDlkKIv8kCaOzoFVjGSOHRf6x9gsgGNDp", "--os-user-domain-name", "default", "compute", "service", "list", "-f", "json", "--service", "nova-compute"], "delta": "0:00:02.176113", "end": "2018-03-13 15:59:06.604696", "rc": 0, "start": "2018-03-13 15:59:04.428583", "stderr": "", "stderr_lines": [], "stdout": "[]", "stdout_lines": ["[]"]} | ||
| + | |||
| + | |||
| + | /var/lib/docker/volumes/kolla_logs/_data/nova/nova-compute.log | ||
| + | 2018-03-13 16:58:38.788 7 ERROR oslo.messaging._drivers.impl_rabbit [req-803bcc07-60c9-4129-ab06-40809c99e46d - - - - -] [89b4b687-89f7-42fe-92a1-9085e6fe35a7] AMQP server on 10.10.11.3:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None: error: [Errno 111] ECONNREFUSED | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | + | Solution: | |
| − | + | Rabbitmq server names must match node hostnames. The connection has to go through the correct interface. | |
| + | Eg: | ||
<syntaxhighlight> | <syntaxhighlight> | ||
| − | + | root@headnode:~# cat /etc/hosts | |
| − | + | 10.10.10.2 node01 | |
| − | + | 10.10.11.2 node01-eth1 | |
</syntaxhighlight> | </syntaxhighlight> | ||
| − | + | Assuming kolla is configured to use the eth1 interface, the hostname used in the inventory file will be "node01-eth1" in this case. The node hostname must match the one in the inventory. | |
<syntaxhighlight> | <syntaxhighlight> | ||
| − | + | root@node01:~# hostnamectl set-hostname node01-eth1 | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
Latest revision as of 10:36, 14 March 2018
Log location
The logs are on the nodes under: /var/lib/docker/volumes/kolla_logs/_data/
When a service fails you will find useful info in the koala logs of the container of that service. To check the logs of nova-conductor service for example we'll do:
[root@head01 ~]# ssh controller01
[root@controller01-enp2s0 ~]# tail /var/lib/docker/volumes/kolla_logs/_data/nova/nova-conductor.logInterface ansible_<if> does not exist
If you see a message of this sort in the kolla-ansible output, it's most likely referring to a node that has an interface with a different name than the one specified in the "network_interface" variable in the /etc/kolla/globals.yaml file.
Solution
To get past this issue, just add api_interface=ens5 next to the node's name in the inventory file. I've had a time when I need to add tunnel_interface=ens5 as well. The error message will tell you if the tunnel or the api one is the problem. The inventory file should look like this:
...
gpu01-ens5 tunnel_interface=ens5 api_interface=ens5
...Connection refused errors in nova-conductor logs
When I was doing a deploy the nova service wouldn't come up properly. After checking the nova-conductor logs on the controller node that reported the error, I saw a lot of errors like this:
ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.106:5672 is unreachable: [Errno 111] ECONNREFUSEDSolution
Disable SELINUX on the controller nodes and reboot them!
Debugging containers that dont start
When a container fails to start - you can recreate the error using the image name and then pass -a to docker start:
[root@controller01 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4de70ccf5a4d 10.10.10.1:4000/kolla/centos-binary-glance-api:4.0.3 "kolla_start" 10 hours ago Exited (1) 10 hours ago bootstrap_glance
10f4038a7d77 10.10.10.1:4000/kolla/centos-binary-keystone:4.0.3 "kolla_start" 10 hours ago Up 10 hours keystone
e96bf1cb3258 10.10.10.1:4000/kolla/centos-binary-rabbitmq:4.0.3 "kolla_start" 10 hours ago Up 10 hours rabbitmq
b0094b42cb75 10.10.10.1:4000/kolla/centos-binary-mariadb:4.0.3 "kolla_start" 10 hours ago Up 10 hours mariadb
d49e0b00bf84 10.10.10.1:4000/kolla/centos-binary-memcached:4.0.3 "kolla_start" 10 hours ago Up 10 hours memcached
1a1599296c59 10.10.10.1:4000/kolla/centos-binary-keepalived:4.0.3 "kolla_start" 10 hours ago Up 10 hours keepalived
accc84f93171 10.10.10.1:4000/kolla/centos-binary-haproxy:4.0.3 "kolla_start" 10 hours ago Up 10 hours haproxy
f25d30f403d2 10.10.10.1:4000/kolla/centos-binary-cron:4.0.3 "kolla_start" 10 hours ago Up 10 hours cron
0be143a36b6d 10.10.10.1:4000/kolla/centos-binary-kolla-toolbox:4.0.3 "kolla_start" 10 hours ago Up 10 hours kolla_toolbox
2f667f97a160 10.10.10.1:4000/kolla/centos-binary-fluentd:4.0.3 "kolla_start" 10 hours ago Up 10 hours fluentd
[root@controller01 ~]# docker start -a bootstrap_glance
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting file /etc/glance/glance-api.conf
INFO:__main__:Coping file from /var/lib/kolla/config_files/glance-api.conf to /etc/glance/glance-api.conf
INFO:__main__:Setting file /etc/glance/glance-api.conf owner to glance:glance
INFO:__main__:Setting file /etc/glance/glance-api.conf permission to 0600
ERROR:__main__:MissingRequiredSource: /var/lib/kolla/config_files/ceph.* file is not found"AttributeError: 'module' object has no attribute 'APIClient'" during kolla-build
If the following error appears during kolla build:
[root@node01-head tools]# kolla-build bifrost-deploy --type source --base centos
INFO:kolla.image.build:Found the docker image folder at /usr/share/kolla/docker
Traceback (most recent call last):
File "/usr/bin/kolla-build", line 11, in <module>
sys.exit(main())
File "/usr/lib/python2.7/site-packages/kolla/cmd/build.py", line 30, in main
statuses = build.run_build()
File "/usr/lib/python2.7/site-packages/kolla/image/build.py", line 1110, in run_build
kolla = KollaWorker(conf)
File "/usr/lib/python2.7/site-packages/kolla/image/build.py", line 586, in __init__
self.dc = docker.APIClient(version='auto', **docker_kwargs)
AttributeError: 'module' object has no attribute 'APIClient'Work around by removing docker-py and installing docker==2.4.
pip uninstall docker-py
pip install docker==2.4If a later version of docker is installed the following error will be produced instead:
[root@node01-head tools]# kolla-build bifrost-deploy --type source --base centos
Traceback (most recent call last):
File "/usr/bin/kolla-build", line 7, in <module>
from kolla.cmd.build import main
File "/usr/lib/python2.7/site-packages/kolla/cmd/build.py", line 26, in <module>
from kolla.image import build
File "/usr/lib/python2.7/site-packages/kolla/image/build.py", line 32, in <module>
import docker
ImportError: No module named dockerDowngrade docker to work around:
pip install docker==2.4Waiting for nova-compute service up timing out due to rabbitmq breaking
Symptoms:
TASK [nova : Waiting for nova-compute service up] ****************************************************************************************************
FAILED - RETRYING: Waiting for nova-compute service up (20 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (19 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (18 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (17 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (16 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (15 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (14 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (13 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (12 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (11 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (10 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (9 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (8 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (7 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (6 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (5 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (4 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (3 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (2 retries left).
FAILED - RETRYING: Waiting for nova-compute service up (1 retries left).
fatal: [node01 -> node01]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["docker", "exec", "kolla_toolbox", "openstack", "--os-interface", "internal", "--os-auth-url", "http://10.10.11.254:35357", "--os-identity-api-version", "3", "--os-project-domain-name", "default", "--os-tenant-name", "admin", "--os-username", "admin", "--os-password", "uBz6L2iZDlkKIv8kCaOzoFVjGSOHRf6x9gsgGNDp", "--os-user-domain-name", "default", "compute", "service", "list", "-f", "json", "--service", "nova-compute"], "delta": "0:00:02.176113", "end": "2018-03-13 15:59:06.604696", "rc": 0, "start": "2018-03-13 15:59:04.428583", "stderr": "", "stderr_lines": [], "stdout": "[]", "stdout_lines": ["[]"]}
/var/lib/docker/volumes/kolla_logs/_data/nova/nova-compute.log
2018-03-13 16:58:38.788 7 ERROR oslo.messaging._drivers.impl_rabbit [req-803bcc07-60c9-4129-ab06-40809c99e46d - - - - -] [89b4b687-89f7-42fe-92a1-9085e6fe35a7] AMQP server on 10.10.11.3:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None: error: [Errno 111] ECONNREFUSEDSolution:
Rabbitmq server names must match node hostnames. The connection has to go through the correct interface.
Eg:
root@headnode:~# cat /etc/hosts
10.10.10.2 node01
10.10.11.2 node01-eth1Assuming kolla is configured to use the eth1 interface, the hostname used in the inventory file will be "node01-eth1" in this case. The node hostname must match the one in the inventory.
root@node01:~# hostnamectl set-hostname node01-eth1