Difference between revisions of "FreeIPA Command and Cheatsheet"
(add hostgroup issues) |
(add resintall using slurm appliance) |
||
| Line 1: | Line 1: | ||
== common problems == | == common problems == | ||
| + | === hpc slurm-appliance force reintegrate node with freeipa === | ||
| + | you might need to do this when moving from OLD installs which didn't use DNS for cluster ldap server lookups | ||
| + | a symptom of a common problem is not all users are being resolved correctly in SSD and it is different users on different nodes | ||
| + | |||
| + | like this: | ||
| + | <pre>[root@small-001 ~]# ls -l /home | ||
| + | total 16 | ||
| + | drwx------. 3 admin admins 119 May 5 2022 admin | ||
| + | drwx------. 26 antony antony 4096 Dec 20 22:19 antony | ||
| + | drwx------. 6 bdigby bdigby 172 Apr 20 2022 bdigby | ||
| + | drwx------. 17 2046 2046 4096 Dec 14 19:20 chiggins | ||
| + | drwx------. 12 2036 2036 322 Jan 9 15:52 dpower | ||
| + | drwx------. 8 2044 2044 212 Nov 8 15:39 gcarancini | ||
| + | drwxr-xr-x. 20 hanthony hanthony 4096 Dec 19 07:14 hanthony | ||
| + | drwxr-xr-x. 6 root root 93 Dec 20 17:54 headnode-original | ||
| + | drwx------. 4 2008 2008 167 Dec 20 22:20 jack | ||
| + | drwx------. 7 jvergaj jvergaj 261 Jun 20 2022 jvergaj | ||
| + | drwx------. 3 2018 2018 119 May 26 2022 kguerrero | ||
| + | drwx------. 3 2011 2011 119 May 26 2022 kguerrerovazquez | ||
| + | drwx------. 19 kryan kryan 4096 Dec 20 22:20 kryan | ||
| + | drwx------. 3 2040 2040 165 Nov 8 18:02 lryan | ||
| + | drwx------. 5 2042 2042 158 Nov 8 15:37 mali | ||
| + | drwx------. 4 2014 2014 151 Dec 20 22:21 modalaigh | ||
| + | drwx------. 8 2038 2038 244 Aug 29 13:35 mvillacampa | ||
| + | drwx------. 7 2035 2035 193 Jan 8 12:14 mwalsh | ||
| + | drwx------. 3 2043 2043 119 Nov 8 15:31 mzainula | ||
| + | drwx------. 3 2050 2050 119 Dec 7 13:47 ouso | ||
| + | drwx------. 7 2048 2048 234 Nov 8 17:43 oza | ||
| + | drwx------. 3 2013 2013 94 May 26 2022 pasat | ||
| + | drwx------. 4 2024 2024 139 Nov 8 16:18 pmurchan | ||
| + | drwx------. 3 rsupport rsupport 119 Apr 14 2022 rsupport | ||
| + | drwx------. 6 2041 2041 177 Nov 8 16:06 sgordon | ||
| + | drwxrwx---. 4 root testshared 83 Dec 20 18:21 shared | ||
| + | drwx------. 9 smatthews smatthews 320 Jan 6 00:03 smatthews | ||
| + | drwx------. 3 2017 2017 139 May 26 2022 ssmith | ||
| + | drwx------. 6 test test 192 Oct 18 22:07 test | ||
| + | drwx------. 4 2052 2052 139 Nov 8 16:29 training2 | ||
| + | </pre> | ||
| + | |||
| + | ==== overview ==== | ||
| + | The process is to | ||
| + | # drain the node | ||
| + | # ssh into the compute node and uninstall the ipa client and let it reboot | ||
| + | # ssh into controller node and remove or move the backup of the old keytab | ||
| + | # rerun the playbook and let it reinstall the ipa client using a one time password | ||
| + | in this example we are going to do it on small-002 | ||
| + | ==== drain node ==== | ||
| + | <pre> | ||
| + | [antony@login-1 ~]$ sudo scontrol update nodename=small-002 stat=drain reason="update sssd config" | ||
| + | [sudo] password for antony: | ||
| + | [antony@login-1 ~]$ sinfo | ||
| + | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
| + | defq* up 60-00:00:0 1 drain small-002 | ||
| + | defq* up 60-00:00:0 3 idle small-[001,003-004] | ||
| + | </pre> | ||
| + | ==== ununstall client on compute node ==== | ||
| + | <pre> | ||
| + | [rocky@login-1 ~]$ ssh small-002 | ||
| + | [rocky@small-002 ~]$ sudo ipa-client-install --uninstall | ||
| + | Unenrolling client from IPA server | ||
| + | Unenrolling host failed: Error obtaining initial credentials: Cannot contact any KDC for requested realm. | ||
| + | |||
| + | Removing Kerberos service principals from /etc/krb5.keytab | ||
| + | Disabling client Kerberos and LDAP configurations | ||
| + | Redundant SSSD configuration file /etc/sssd/sssd.conf was moved to /etc/sssd/sssd.conf.deleted | ||
| + | Restoring client configuration files | ||
| + | Unconfiguring the NIS domain. | ||
| + | nscd daemon is not installed, skip configuration | ||
| + | nslcd daemon is not installed, skip configuration | ||
| + | Systemwide CA database updated. | ||
| + | Client uninstall complete. | ||
| + | The original nsswitch.conf configuration has been restored. | ||
| + | You may need to restart services or reboot the machine. | ||
| + | Do you want to reboot the machine? [no]: yes | ||
| + | client_loop: send disconnect: Broken pipe | ||
| + | [rocky@login-1 ~]$ | ||
| + | </pre> | ||
| + | ==== remove or move the backup of the old keytab ==== | ||
| + | <pre> | ||
| + | [rocky@login-1 ~]$ ssh controller | ||
| + | Last login: Tue Jan 9 22:49:43 2024 from 10.0.3.240 | ||
| + | [rocky@controller ~]$ cd /var/lib/state/freeipa/ | ||
| + | -bash: cd: /var/lib/state/freeipa/: Permission denied | ||
| + | [rocky@controller ~]$ sudo su - | ||
| + | Last login: Wed Jan 10 10:24:29 GMT 2024 on pts/0 | ||
| + | [root@controller ~]# cd /var/lib/state/freeipa/ | ||
| + | [root@controller freeipa]# rm -rf small-002 | ||
| + | [root@controller freeipa]# logout | ||
| + | [rocky@controller ~]$ logout | ||
| + | Connection to controller closed. | ||
| + | [rocky@login-1 ~]$ | ||
| + | </pre> | ||
| + | ==== rerun the playbook ==== | ||
| + | NOTE deploy node!!! | ||
| + | <pre> | ||
| + | [antony@cix-slurm-deploy ~]$ cd ansible-slurm-appliance/ | ||
| + | [antony@cix-slurm-deploy ansible-slurm-appliance]$ source source_me.sh | ||
| + | Setting APPLIANCES_ENVIRONMENT_ROOT to /home/antony/ansible-slurm-appliance/environments/production | ||
| + | Setting APPLIANCES_REPO_ROOT to /home/antony/ansible-slurm-appliance | ||
| + | Setting ANSIBLE_CONFIG to /home/antony/ansible-slurm-appliance/environments/production/ansible.cfg | ||
| + | production/ (slurm-appliance) [antony@cix-slurm-deploy ansible-slurm-appliance]$ | ||
| + | production/ (slurm-appliance) [antony@cix-slurm-deploy ansible-slurm-appliance]$ ansible-playbook -e @environments/production/cluster_extra_vars.yml ansible/site.yml -t freeipa | ||
| + | [DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead. This feature will be removed in version 2.16. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. | ||
| + | |||
| + | PLAY [localhost] **************************************************************************************************************************************************************************************************************************** | ||
| + | [WARNING]: While constructing a mapping from /home/antony/ansible-slurm-appliance/environments/production/inventory/group_vars/all/openondemend.yml, line 2, column 1, found a duplicate dict key (openondemand_ldap_search_username). Using | ||
| + | last defined value only. | ||
| + | [WARNING]: While constructing a mapping from /home/antony/ansible-slurm-appliance/environments/production/inventory/group_vars/all/openondemend.yml, line 2, column 1, found a duplicate dict key (openondemand_auth). Using last defined | ||
| + | value only. | ||
| + | |||
| + | PLAY [Ensure control node is in inventory] ************************************************************************************************************************************************************************************************** | ||
| + | |||
| + | TASK [Gathering Facts] ********************************************************************************************************************************************************************************************************************** | ||
| + | ok: [login-1] | ||
| + | ok: [controller] | ||
| + | ok: [small-002] | ||
| + | ok: [small-001] | ||
| + | ok: [small-004] | ||
| + | ok: [small-003] | ||
| + | |||
| + | <snip> | ||
| + | |||
| + | TASK [freeipa : Add host to IPA] ************************************************************************************************************************************************************************************************************ | ||
| + | skipping: [login-1] | ||
| + | skipping: [controller] | ||
| + | skipping: [small-001] | ||
| + | skipping: [small-003] | ||
| + | skipping: [small-004] | ||
| + | changed: [small-002] | ||
| + | |||
| + | TASK [freeipa : Set fact for ipa host password] ********************************************************************************************************************************************************************************************* | ||
| + | skipping: [login-1] | ||
| + | skipping: [controller] | ||
| + | skipping: [small-001] | ||
| + | ok: [small-002] | ||
| + | skipping: [small-003] | ||
| + | skipping: [small-004] | ||
| + | |||
| + | TASK [freeipa : Renrole with FreeIPA using backed-up keytab] ******************************************************************************************************************************************************************************** | ||
| + | skipping: [small-002] | ||
| + | ok: [login-1] | ||
| + | ok: [small-004] | ||
| + | ok: [small-001] | ||
| + | ok: [small-003] | ||
| + | ok: [controller] | ||
| + | |||
| + | TASK [freeipa : Enrole with FreeIPA using random password] ********************************************************************************************************************************************************************************** | ||
| + | skipping: [login-1] | ||
| + | skipping: [controller] | ||
| + | skipping: [small-001] | ||
| + | skipping: [small-003] | ||
| + | skipping: [small-004] | ||
| + | changed: [small-002] | ||
| + | |||
| + | <snip> | ||
| + | PLAY RECAP ********************************************************************************************************************************************************************************************************************************** | ||
| + | controller : ok=12 changed=0 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0 | ||
| + | login-1 : ok=12 changed=0 unreachable=0 failed=0 skipped=18 rescued=0 ignored=0 | ||
| + | small-001 : ok=12 changed=1 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0 | ||
| + | small-002 : ok=13 changed=4 unreachable=0 failed=0 skipped=16 rescued=0 ignored=0 | ||
| + | small-003 : ok=12 changed=1 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0 | ||
| + | small-004 : ok=12 changed=0 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0 | ||
| + | </pre> | ||
| + | done | ||
| + | |||
| + | ==== confirm fixed by ls -l /home on compute ==== | ||
| + | <pre> | ||
| + | [rocky@login-1 ~]$ ssh small-002 ls -l /home | ||
| + | total 16 | ||
| + | drwx------. 3 admin admins 119 May 5 2022 admin | ||
| + | drwx------. 26 antony antony 4096 Jan 10 11:36 antony | ||
| + | drwx------. 6 bdigby bdigby 172 Apr 20 2022 bdigby | ||
| + | drwx------. 17 chiggins chiggins 4096 Dec 14 19:20 chiggins | ||
| + | drwx------. 12 dpower dpower 322 Jan 9 15:52 dpower | ||
| + | drwx------. 8 gcarancini gcarancini 212 Nov 8 15:39 gcarancini | ||
| + | drwxr-xr-x. 20 hanthony hanthony 4096 Dec 19 07:14 hanthony | ||
| + | drwxr-xr-x. 6 root root 93 Dec 20 17:54 headnode-original | ||
| + | drwx------. 4 jack jack 167 Dec 20 22:20 jack | ||
| + | drwx------. 7 jvergaj jvergaj 261 Jun 20 2022 jvergaj | ||
| + | drwx------. 3 kguerrero kguerrero 119 May 26 2022 kguerrero | ||
| + | drwx------. 3 2011 2011 119 May 26 2022 kguerrerovazquez | ||
| + | drwx------. 19 kryan kryan 4096 Dec 20 22:20 kryan | ||
| + | drwx------. 3 lryan lryan 165 Nov 8 18:02 lryan | ||
| + | drwx------. 5 mali mali 158 Nov 8 15:37 mali | ||
| + | drwx------. 4 modalaigh modalaigh 151 Dec 20 22:21 modalaigh | ||
| + | drwx------. 8 mvillacampa mvillacampa 244 Aug 29 13:35 mvillacampa | ||
| + | drwx------. 7 mwalsh mwalsh 193 Jan 8 12:14 mwalsh | ||
| + | drwx------. 3 mzainula mzainula 119 Nov 8 15:31 mzainula | ||
| + | drwx------. 3 ouso ouso 119 Dec 7 13:47 ouso | ||
| + | drwx------. 7 oza oza 234 Nov 8 17:43 oza | ||
| + | drwx------. 3 pasat pasat 94 May 26 2022 pasat | ||
| + | drwx------. 4 pmurchan pmurchan 139 Nov 8 16:18 pmurchan | ||
| + | drwx------. 3 rsupport rsupport 119 Apr 14 2022 rsupport | ||
| + | drwx------. 6 sgordon sgordon 177 Nov 8 16:06 sgordon | ||
| + | drwxrwx---. 4 root testshared 83 Dec 20 18:21 shared | ||
| + | drwx------. 9 smatthews smatthews 320 Jan 6 00:03 smatthews | ||
| + | drwx------. 3 ssmith ssmith 139 May 26 2022 ssmith | ||
| + | drwx------. 6 test test 192 Oct 18 22:07 test | ||
| + | drwx------. 4 training2 training2 139 Nov 8 16:29 training2 | ||
| + | </pre> | ||
| + | |||
| + | fixed | ||
=== All ipa commands return ipa: ERROR: did not receive Kerberos credentials when I'm root it's broken! === | === All ipa commands return ipa: ERROR: did not receive Kerberos credentials when I'm root it's broken! === | ||
this is normal you need to authenticate as a user with permission to run these commands (e.g. an admin member) to fix run <code>kinit <username> and enter the password</code>. If you have configured 2FA on a user you WILL need to do that too. | this is normal you need to authenticate as a user with permission to run these commands (e.g. an admin member) to fix run <code>kinit <username> and enter the password</code>. If you have configured 2FA on a user you WILL need to do that too. | ||
Latest revision as of 12:19, 10 January 2024
common problems
hpc slurm-appliance force reintegrate node with freeipa
you might need to do this when moving from OLD installs which didn't use DNS for cluster ldap server lookups
a symptom of a common problem is not all users are being resolved correctly in SSD and it is different users on different nodes
like this:
[root@small-001 ~]# ls -l /home total 16 drwx------. 3 admin admins 119 May 5 2022 admin drwx------. 26 antony antony 4096 Dec 20 22:19 antony drwx------. 6 bdigby bdigby 172 Apr 20 2022 bdigby drwx------. 17 2046 2046 4096 Dec 14 19:20 chiggins drwx------. 12 2036 2036 322 Jan 9 15:52 dpower drwx------. 8 2044 2044 212 Nov 8 15:39 gcarancini drwxr-xr-x. 20 hanthony hanthony 4096 Dec 19 07:14 hanthony drwxr-xr-x. 6 root root 93 Dec 20 17:54 headnode-original drwx------. 4 2008 2008 167 Dec 20 22:20 jack drwx------. 7 jvergaj jvergaj 261 Jun 20 2022 jvergaj drwx------. 3 2018 2018 119 May 26 2022 kguerrero drwx------. 3 2011 2011 119 May 26 2022 kguerrerovazquez drwx------. 19 kryan kryan 4096 Dec 20 22:20 kryan drwx------. 3 2040 2040 165 Nov 8 18:02 lryan drwx------. 5 2042 2042 158 Nov 8 15:37 mali drwx------. 4 2014 2014 151 Dec 20 22:21 modalaigh drwx------. 8 2038 2038 244 Aug 29 13:35 mvillacampa drwx------. 7 2035 2035 193 Jan 8 12:14 mwalsh drwx------. 3 2043 2043 119 Nov 8 15:31 mzainula drwx------. 3 2050 2050 119 Dec 7 13:47 ouso drwx------. 7 2048 2048 234 Nov 8 17:43 oza drwx------. 3 2013 2013 94 May 26 2022 pasat drwx------. 4 2024 2024 139 Nov 8 16:18 pmurchan drwx------. 3 rsupport rsupport 119 Apr 14 2022 rsupport drwx------. 6 2041 2041 177 Nov 8 16:06 sgordon drwxrwx---. 4 root testshared 83 Dec 20 18:21 shared drwx------. 9 smatthews smatthews 320 Jan 6 00:03 smatthews drwx------. 3 2017 2017 139 May 26 2022 ssmith drwx------. 6 test test 192 Oct 18 22:07 test drwx------. 4 2052 2052 139 Nov 8 16:29 training2
overview
The process is to
- drain the node
- ssh into the compute node and uninstall the ipa client and let it reboot
- ssh into controller node and remove or move the backup of the old keytab
- rerun the playbook and let it reinstall the ipa client using a one time password
in this example we are going to do it on small-002
drain node
[antony@login-1 ~]$ sudo scontrol update nodename=small-002 stat=drain reason="update sssd config" [sudo] password for antony: [antony@login-1 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST defq* up 60-00:00:0 1 drain small-002 defq* up 60-00:00:0 3 idle small-[001,003-004]
ununstall client on compute node
[rocky@login-1 ~]$ ssh small-002 [rocky@small-002 ~]$ sudo ipa-client-install --uninstall Unenrolling client from IPA server Unenrolling host failed: Error obtaining initial credentials: Cannot contact any KDC for requested realm. Removing Kerberos service principals from /etc/krb5.keytab Disabling client Kerberos and LDAP configurations Redundant SSSD configuration file /etc/sssd/sssd.conf was moved to /etc/sssd/sssd.conf.deleted Restoring client configuration files Unconfiguring the NIS domain. nscd daemon is not installed, skip configuration nslcd daemon is not installed, skip configuration Systemwide CA database updated. Client uninstall complete. The original nsswitch.conf configuration has been restored. You may need to restart services or reboot the machine. Do you want to reboot the machine? [no]: yes client_loop: send disconnect: Broken pipe [rocky@login-1 ~]$
remove or move the backup of the old keytab
[rocky@login-1 ~]$ ssh controller Last login: Tue Jan 9 22:49:43 2024 from 10.0.3.240 [rocky@controller ~]$ cd /var/lib/state/freeipa/ -bash: cd: /var/lib/state/freeipa/: Permission denied [rocky@controller ~]$ sudo su - Last login: Wed Jan 10 10:24:29 GMT 2024 on pts/0 [root@controller ~]# cd /var/lib/state/freeipa/ [root@controller freeipa]# rm -rf small-002 [root@controller freeipa]# logout [rocky@controller ~]$ logout Connection to controller closed. [rocky@login-1 ~]$
rerun the playbook
NOTE deploy node!!!
[antony@cix-slurm-deploy ~]$ cd ansible-slurm-appliance/ [antony@cix-slurm-deploy ansible-slurm-appliance]$ source source_me.sh Setting APPLIANCES_ENVIRONMENT_ROOT to /home/antony/ansible-slurm-appliance/environments/production Setting APPLIANCES_REPO_ROOT to /home/antony/ansible-slurm-appliance Setting ANSIBLE_CONFIG to /home/antony/ansible-slurm-appliance/environments/production/ansible.cfg production/ (slurm-appliance) [antony@cix-slurm-deploy ansible-slurm-appliance]$ production/ (slurm-appliance) [antony@cix-slurm-deploy ansible-slurm-appliance]$ ansible-playbook -e @environments/production/cluster_extra_vars.yml ansible/site.yml -t freeipa [DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead. This feature will be removed in version 2.16. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. PLAY [localhost] **************************************************************************************************************************************************************************************************************************** [WARNING]: While constructing a mapping from /home/antony/ansible-slurm-appliance/environments/production/inventory/group_vars/all/openondemend.yml, line 2, column 1, found a duplicate dict key (openondemand_ldap_search_username). Using last defined value only. [WARNING]: While constructing a mapping from /home/antony/ansible-slurm-appliance/environments/production/inventory/group_vars/all/openondemend.yml, line 2, column 1, found a duplicate dict key (openondemand_auth). Using last defined value only. PLAY [Ensure control node is in inventory] ************************************************************************************************************************************************************************************************** TASK [Gathering Facts] ********************************************************************************************************************************************************************************************************************** ok: [login-1] ok: [controller] ok: [small-002] ok: [small-001] ok: [small-004] ok: [small-003] <snip> TASK [freeipa : Add host to IPA] ************************************************************************************************************************************************************************************************************ skipping: [login-1] skipping: [controller] skipping: [small-001] skipping: [small-003] skipping: [small-004] changed: [small-002] TASK [freeipa : Set fact for ipa host password] ********************************************************************************************************************************************************************************************* skipping: [login-1] skipping: [controller] skipping: [small-001] ok: [small-002] skipping: [small-003] skipping: [small-004] TASK [freeipa : Renrole with FreeIPA using backed-up keytab] ******************************************************************************************************************************************************************************** skipping: [small-002] ok: [login-1] ok: [small-004] ok: [small-001] ok: [small-003] ok: [controller] TASK [freeipa : Enrole with FreeIPA using random password] ********************************************************************************************************************************************************************************** skipping: [login-1] skipping: [controller] skipping: [small-001] skipping: [small-003] skipping: [small-004] changed: [small-002] <snip> PLAY RECAP ********************************************************************************************************************************************************************************************************************************** controller : ok=12 changed=0 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0 login-1 : ok=12 changed=0 unreachable=0 failed=0 skipped=18 rescued=0 ignored=0 small-001 : ok=12 changed=1 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0 small-002 : ok=13 changed=4 unreachable=0 failed=0 skipped=16 rescued=0 ignored=0 small-003 : ok=12 changed=1 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0 small-004 : ok=12 changed=0 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0
done
confirm fixed by ls -l /home on compute
[rocky@login-1 ~]$ ssh small-002 ls -l /home total 16 drwx------. 3 admin admins 119 May 5 2022 admin drwx------. 26 antony antony 4096 Jan 10 11:36 antony drwx------. 6 bdigby bdigby 172 Apr 20 2022 bdigby drwx------. 17 chiggins chiggins 4096 Dec 14 19:20 chiggins drwx------. 12 dpower dpower 322 Jan 9 15:52 dpower drwx------. 8 gcarancini gcarancini 212 Nov 8 15:39 gcarancini drwxr-xr-x. 20 hanthony hanthony 4096 Dec 19 07:14 hanthony drwxr-xr-x. 6 root root 93 Dec 20 17:54 headnode-original drwx------. 4 jack jack 167 Dec 20 22:20 jack drwx------. 7 jvergaj jvergaj 261 Jun 20 2022 jvergaj drwx------. 3 kguerrero kguerrero 119 May 26 2022 kguerrero drwx------. 3 2011 2011 119 May 26 2022 kguerrerovazquez drwx------. 19 kryan kryan 4096 Dec 20 22:20 kryan drwx------. 3 lryan lryan 165 Nov 8 18:02 lryan drwx------. 5 mali mali 158 Nov 8 15:37 mali drwx------. 4 modalaigh modalaigh 151 Dec 20 22:21 modalaigh drwx------. 8 mvillacampa mvillacampa 244 Aug 29 13:35 mvillacampa drwx------. 7 mwalsh mwalsh 193 Jan 8 12:14 mwalsh drwx------. 3 mzainula mzainula 119 Nov 8 15:31 mzainula drwx------. 3 ouso ouso 119 Dec 7 13:47 ouso drwx------. 7 oza oza 234 Nov 8 17:43 oza drwx------. 3 pasat pasat 94 May 26 2022 pasat drwx------. 4 pmurchan pmurchan 139 Nov 8 16:18 pmurchan drwx------. 3 rsupport rsupport 119 Apr 14 2022 rsupport drwx------. 6 sgordon sgordon 177 Nov 8 16:06 sgordon drwxrwx---. 4 root testshared 83 Dec 20 18:21 shared drwx------. 9 smatthews smatthews 320 Jan 6 00:03 smatthews drwx------. 3 ssmith ssmith 139 May 26 2022 ssmith drwx------. 6 test test 192 Oct 18 22:07 test drwx------. 4 training2 training2 139 Nov 8 16:29 training2
fixed
All ipa commands return ipa: ERROR: did not receive Kerberos credentials when I'm root it's broken!
this is normal you need to authenticate as a user with permission to run these commands (e.g. an admin member) to fix run kinit <username> and enter the password. If you have configured 2FA on a user you WILL need to do that too.
the following example is in the freeipa podman container in the HPC stack
[root@freeipa-1 ~]# podman exec -it freeipa bash [root@freeipa-1 /]# ipa user-find ipa: ERROR: did not receive Kerberos credentials [root@freeipa-1 /]# kinit antony Password for antony@CLUSTER.INTERNAL: [root@freeipa-1 /]# ipa user-find antony -------------- 1 user matched -------------- User login: antony First name: Antony Last name: Cleave Home directory: /define/home/antony Login shell: /bin/bash Principal name: antony@CLUSTER.INTERNAL Principal alias: antony@CLUSTER.INTERNAL Email address: antony@cluster.internal UID: 2006 GID: 2006 SSH public key fingerprint: SHA256:/M1Hr8rxG+Im8OOiPeDAqJNEDlyvvpTpAa8hnPJvEaI (ssh-ed25519) Account disabled: False ---------------------------- Number of entries returned 1 ----------------------------
but I don't want to type it. That's a bit of a tough cookie this is how kerberos works if you have the password in a file however you can cat it like so:
[root@freeipa-1 /]# kdestroy -A [root@freeipa-1 /]# klist klist: No credentials cache found (filename: /tmp/krb5cc_0) [root@freeipa-1 /]# ipa user-find antony ipa: ERROR: did not receive Kerberos credentials Password for antony@CLUSTER.INTERNAL: [root@freeipa-1 /]# ipa user-find antony -------------- 1 user matched -------------- User login: antony First name: Antony Last name: Cleave Home directory: /define/home/antony Login shell: /bin/bash Principal name: antony@CLUSTER.INTERNAL Principal alias: antony@CLUSTER.INTERNAL Email address: antony@cluster.internal UID: 2006 GID: 2006 SSH public key fingerprint: SHA256:/M1Hr8rxG+Im8OOiPeDAqJNEDlyvvpTpAa8hnPJvEaI (ssh-ed25519) Account disabled: False ---------------------------- Number of entries returned 1 ---------------------------- [root@freeipa-1 /]#
I can't login to the new node I added
FreeIPA allows you to configure login auth to a whole cluster from one place. You are probably hitting a host based access control restriction. Check /var/log/secure or system journal for errors like this:
Nov 07 09:52:25 login-1.cluster.internal sshd[88168]: Authorized to antony, krb5 principal antony@CLUSTER.INTERNAL (ssh_gssapi_krb5_cmdok) Nov 07 09:52:25 login-1.cluster.internal sshd[88168]: pam_sss(sshd:account): Access denied for user antony: 6 (Permission denied) Nov 07 09:52:25 login-1.cluster.internal sshd[88168]: fatal: Access denied for user antony by PAM account configuration [preauth]
see that permission denied. . . that's the killer here
how to fix is system dependent but on the NUIG cluster I have modified the allow_all to allow access from the cluster.internal hostgroup. This is NOT the same as the DNS zone. We can check the hbac-rulse here
[root@login-1 ~]# ipa hbacrule-find -------------------- 4 HBAC rules matched -------------------- Rule name: allow_admins Service category: all Enabled: True Rule name: allow_all User category: all Service category: all Description: Allow all users to access any host from any host in cluster.internal hostgroup keep it up to date! Enabled: True Rule name: allow_jack Service category: all Enabled: True Rule name: allow_systemd-user User category: all Host category: all Description: Allow pam_systemd to run user@.service to create a system user session Enabled: True ---------------------------- Number of entries returned 4 ----------------------------
and you can see the settings for each of those like so:
[root@login-1 ~]# ipa hbacrule-show allow_all Rule name: allow_all User category: all Service category: all Description: Allow all users to access any host from any host in cluster.internal hostgroup keep it up to date! Enabled: True Host Groups: cluster.internal
see that host group! that's why antony can't login to login-1
[root@login-1 ~]# ipa hostgroup-show cluster.internal Host-group: cluster.internal Member hosts: head.cluster.internal, small-001.cluster.internal, small-003.cluster.internal, small-002.cluster.internal, small-004.cluster.internal, test-001.cluster.internal
lets fix that by adding it to the hostgroup
[root@login-1 ~]# ipa hostgroup-add-member cluster.internal --hosts=login-1.cluster.internal Host-group: cluster.internal Member hosts: head.cluster.internal, small-001.cluster.internal, small-003.cluster.internal, small-002.cluster.internal, small-004.cluster.internal, test-001.cluster.internal, login-1.cluster.internal Member of HBAC rule: allow_all, allow_admins ------------------------- Number of members added 1 -------------------------
I cant get my replica container to start it gets partway and silently fails
sometimes this happens however I spend hours bashing my head against a wall so you don't have to.
i found that no matter what options were passed to the container it would just get stuck. So I used the following command to wipe EVERYTHING and start again and jump into the container
DO NOT RUN THIS ON THE LAST FUNCTIONING IPA SERVER WITH CA
podman rm -f freeipa; rm -rf /var/lib/state ; mkdir /var/lib/state ; podman run --network=host --detach --name freeipa --replace --user root --volume /var/lib/state:/data:Z -it docker.io/freeipa/freeipa-server:rocky-8 ipa-replica-install --domain cluster.internal --server head.cluster.internal --admin-password=REDACTED podman exec -it freeipa bash
once inside I could see that the ipa client was functional so I used kinit to become an admin user and manually added the server to the ipaservers host group and complete the ipa replica install as per this guide
https://www.freeipa.org/page/V4/Replica_Setup
this is approx what I did (I lost the output)
kinit antony ipa hostgroup-add-member ipaservers --hosts freeipa-1.cluster.internal ipa-replica-install --setup-ca --realm CLUSTER.INTERNAL --domain cluster.internal --setup-dns --auto-reverse --auto-forwarders --no-ui-redirect
at this point there is a stupid prompt that it can't use itself to resolve dns. . . yep it aint setup yet so say yes to continue it should then eventually complete and you have a functioning container
ok so maybe you don't want that admin password in the commandline of the container for the world + dog to read lets fix that to fix we
- delete the container but not the state folder
- create a new container without the password (there is a keytab in the state folder that takes it's place)
- create a systemd unit file to start this on boot as it gives up more failure control that and always restart podman container (i.e. only retry 10 times and give up)
- start the service
podman stop freeipa podman create --network=host --name freeipa --user root --volume /var/lib/state:/data:Z docker.io/freeipa/freeipa-server:rocky-8 ipa-replica-install --domain cluster.internal --server head.cluster.internal --setup-ca --principal=admin --setup-dns --auto-reverse --auto-forwarders --no-ui-redirect [root@freeipa-1 ~]# cat >/usr/lib/systemd/system/ipa-podman.service <<EOF # ipa-podman.service [Unit] Description=Podman container-freeipa-replica.service Documentation=man:podman-generate-systemd(1) Wants=network.target After=network-online.target [Service] Environment=PODMAN_SYSTEMD_UNIT=%n Restart=on-failure RestartSec=10s ExecStart=/usr/bin/podman start "freeipa" ExecStop=/usr/bin/podman stop --ignore "freeipa" -t 30 KillMode=no User=root Group=root TimeoutStartSec=180 Type=forking [Install] WantedBy=multi-user.target default.target EOF systemctl start ipa-podman systemctl enable ipa-podman
and it's been runnning for an hour while I write this wiki page
[root@freeipa-1 ~]# systemctl status ipa-podman
● ipa-podman.service - Podman container-freeipa-replica.service
Loaded: loaded (/usr/lib/systemd/system/ipa-podman.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2023-11-01 22:07:11 GMT; 1h 3min ago
Docs: man:podman-generate-systemd(1)
Main PID: 31664 (conmon)
Tasks: 1 (limit: 49024)
Memory: 1.2M
CGroup: /system.slice/ipa-podman.service
└─31664 /usr/bin/conmon --api-version 1 -c 9e30b6ca044debd861c5f18ee8ca5f04b1b2c54eeafca39e4515048e4a89f7b4 -u 9e30b6ca044debd861c5f18ee8ca5f04b1b2c54eeafca39e4515048e4a89f7b4 -r /usr/bin/runc -b /var/lib/containers/storage/o>
Nov 01 22:07:10 freeipa-1.cluster.internal systemd[1]: Starting Podman container-freeipa-replica.service...
Nov 01 22:07:11 freeipa-1.cluster.internal podman[31624]: freeipa
Nov 01 22:07:11 freeipa-1.cluster.internal systemd[1]: Started Podman container-freeipa-replica.service.