FreeIPA Command and Cheatsheet

From Define Wiki
Jump to navigation Jump to search

common problems

hpc slurm-appliance force reintegrate node with freeipa

you might need to do this when moving from OLD installs which didn't use DNS for cluster ldap server lookups

a symptom of a common problem is not all users are being resolved correctly in SSD and it is different users on different nodes

like this:

[root@small-001 ~]# ls -l /home
total 16
drwx------.  3 admin     admins      119 May  5  2022 admin
drwx------. 26 antony    antony     4096 Dec 20 22:19 antony
drwx------.  6 bdigby    bdigby      172 Apr 20  2022 bdigby
drwx------. 17      2046       2046 4096 Dec 14 19:20 chiggins
drwx------. 12      2036       2036  322 Jan  9 15:52 dpower
drwx------.  8      2044       2044  212 Nov  8 15:39 gcarancini
drwxr-xr-x. 20 hanthony  hanthony   4096 Dec 19 07:14 hanthony
drwxr-xr-x.  6 root      root         93 Dec 20 17:54 headnode-original
drwx------.  4      2008       2008  167 Dec 20 22:20 jack
drwx------.  7 jvergaj   jvergaj     261 Jun 20  2022 jvergaj
drwx------.  3      2018       2018  119 May 26  2022 kguerrero
drwx------.  3      2011       2011  119 May 26  2022 kguerrerovazquez
drwx------. 19 kryan     kryan      4096 Dec 20 22:20 kryan
drwx------.  3      2040       2040  165 Nov  8 18:02 lryan
drwx------.  5      2042       2042  158 Nov  8 15:37 mali
drwx------.  4      2014       2014  151 Dec 20 22:21 modalaigh
drwx------.  8      2038       2038  244 Aug 29 13:35 mvillacampa
drwx------.  7      2035       2035  193 Jan  8 12:14 mwalsh
drwx------.  3      2043       2043  119 Nov  8 15:31 mzainula
drwx------.  3      2050       2050  119 Dec  7 13:47 ouso
drwx------.  7      2048       2048  234 Nov  8 17:43 oza
drwx------.  3      2013       2013   94 May 26  2022 pasat
drwx------.  4      2024       2024  139 Nov  8 16:18 pmurchan
drwx------.  3 rsupport  rsupport    119 Apr 14  2022 rsupport
drwx------.  6      2041       2041  177 Nov  8 16:06 sgordon
drwxrwx---.  4 root      testshared   83 Dec 20 18:21 shared
drwx------.  9 smatthews smatthews   320 Jan  6 00:03 smatthews
drwx------.  3      2017       2017  139 May 26  2022 ssmith
drwx------.  6 test      test        192 Oct 18 22:07 test
drwx------.  4      2052       2052  139 Nov  8 16:29 training2

overview

The process is to

  1. drain the node
  2. ssh into the compute node and uninstall the ipa client and let it reboot
  3. ssh into controller node and remove or move the backup of the old keytab
  4. rerun the playbook and let it reinstall the ipa client using a one time password

in this example we are going to do it on small-002

drain node

[antony@login-1 ~]$ sudo scontrol update nodename=small-002 stat=drain reason="update sssd config"
[sudo] password for antony:
[antony@login-1 ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*        up 60-00:00:0      1  drain small-002
defq*        up 60-00:00:0      3   idle small-[001,003-004]

ununstall client on compute node

[rocky@login-1 ~]$ ssh small-002
[rocky@small-002 ~]$ sudo ipa-client-install --uninstall
Unenrolling client from IPA server
Unenrolling host failed: Error obtaining initial credentials: Cannot contact any KDC for requested realm.

Removing Kerberos service principals from /etc/krb5.keytab
Disabling client Kerberos and LDAP configurations
Redundant SSSD configuration file /etc/sssd/sssd.conf was moved to /etc/sssd/sssd.conf.deleted
Restoring client configuration files
Unconfiguring the NIS domain.
nscd daemon is not installed, skip configuration
nslcd daemon is not installed, skip configuration
Systemwide CA database updated.
Client uninstall complete.
The original nsswitch.conf configuration has been restored.
You may need to restart services or reboot the machine.
Do you want to reboot the machine? [no]: yes
client_loop: send disconnect: Broken pipe
[rocky@login-1 ~]$

remove or move the backup of the old keytab

[rocky@login-1 ~]$ ssh controller
Last login: Tue Jan  9 22:49:43 2024 from 10.0.3.240
[rocky@controller ~]$ cd /var/lib/state/freeipa/
-bash: cd: /var/lib/state/freeipa/: Permission denied
[rocky@controller ~]$ sudo su -
Last login: Wed Jan 10 10:24:29 GMT 2024 on pts/0
[root@controller ~]# cd /var/lib/state/freeipa/
[root@controller freeipa]# rm -rf small-002
[root@controller freeipa]# logout
[rocky@controller ~]$ logout
Connection to controller closed.
[rocky@login-1 ~]$

rerun the playbook

NOTE deploy node!!!

[antony@cix-slurm-deploy ~]$ cd ansible-slurm-appliance/
[antony@cix-slurm-deploy ansible-slurm-appliance]$ source source_me.sh
Setting APPLIANCES_ENVIRONMENT_ROOT to /home/antony/ansible-slurm-appliance/environments/production
Setting APPLIANCES_REPO_ROOT to /home/antony/ansible-slurm-appliance
Setting ANSIBLE_CONFIG to /home/antony/ansible-slurm-appliance/environments/production/ansible.cfg
production/ (slurm-appliance) [antony@cix-slurm-deploy ansible-slurm-appliance]$
production/ (slurm-appliance) [antony@cix-slurm-deploy ansible-slurm-appliance]$ ansible-playbook -e @environments/production/cluster_extra_vars.yml ansible/site.yml -t freeipa
[DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead. This feature will be removed in version 2.16. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.

PLAY [localhost] ****************************************************************************************************************************************************************************************************************************
[WARNING]: While constructing a mapping from /home/antony/ansible-slurm-appliance/environments/production/inventory/group_vars/all/openondemend.yml, line 2, column 1, found a duplicate dict key (openondemand_ldap_search_username). Using
last defined value only.
[WARNING]: While constructing a mapping from /home/antony/ansible-slurm-appliance/environments/production/inventory/group_vars/all/openondemend.yml, line 2, column 1, found a duplicate dict key (openondemand_auth). Using last defined
value only.

PLAY [Ensure control node is in inventory] **************************************************************************************************************************************************************************************************

TASK [Gathering Facts] **********************************************************************************************************************************************************************************************************************
ok: [login-1]
ok: [controller]
ok: [small-002]
ok: [small-001]
ok: [small-004]
ok: [small-003]

<snip>

TASK [freeipa : Add host to IPA] ************************************************************************************************************************************************************************************************************
skipping: [login-1]
skipping: [controller]
skipping: [small-001]
skipping: [small-003]
skipping: [small-004]
changed: [small-002]

TASK [freeipa : Set fact for ipa host password] *********************************************************************************************************************************************************************************************
skipping: [login-1]
skipping: [controller]
skipping: [small-001]
ok: [small-002]
skipping: [small-003]
skipping: [small-004]

TASK [freeipa : Renrole with FreeIPA using backed-up keytab] ********************************************************************************************************************************************************************************
skipping: [small-002]
ok: [login-1]
ok: [small-004]
ok: [small-001]
ok: [small-003]
ok: [controller]

TASK [freeipa : Enrole with FreeIPA using random password] **********************************************************************************************************************************************************************************
skipping: [login-1]
skipping: [controller]
skipping: [small-001]
skipping: [small-003]
skipping: [small-004]
changed: [small-002]

<snip>
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
controller                 : ok=12   changed=0    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0
login-1                    : ok=12   changed=0    unreachable=0    failed=0    skipped=18   rescued=0    ignored=0
small-001                  : ok=12   changed=1    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0
small-002                  : ok=13   changed=4    unreachable=0    failed=0    skipped=16   rescued=0    ignored=0
small-003                  : ok=12   changed=1    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0
small-004                  : ok=12   changed=0    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0

done

confirm fixed by ls -l /home on compute

[rocky@login-1 ~]$ ssh small-002 ls -l /home
total 16
drwx------.  3 admin       admins       119 May  5  2022 admin
drwx------. 26 antony      antony      4096 Jan 10 11:36 antony
drwx------.  6 bdigby      bdigby       172 Apr 20  2022 bdigby
drwx------. 17 chiggins    chiggins    4096 Dec 14 19:20 chiggins
drwx------. 12 dpower      dpower       322 Jan  9 15:52 dpower
drwx------.  8 gcarancini  gcarancini   212 Nov  8 15:39 gcarancini
drwxr-xr-x. 20 hanthony    hanthony    4096 Dec 19 07:14 hanthony
drwxr-xr-x.  6 root        root          93 Dec 20 17:54 headnode-original
drwx------.  4 jack        jack         167 Dec 20 22:20 jack
drwx------.  7 jvergaj     jvergaj      261 Jun 20  2022 jvergaj
drwx------.  3 kguerrero   kguerrero    119 May 26  2022 kguerrero
drwx------.  3        2011        2011  119 May 26  2022 kguerrerovazquez
drwx------. 19 kryan       kryan       4096 Dec 20 22:20 kryan
drwx------.  3 lryan       lryan        165 Nov  8 18:02 lryan
drwx------.  5 mali        mali         158 Nov  8 15:37 mali
drwx------.  4 modalaigh   modalaigh    151 Dec 20 22:21 modalaigh
drwx------.  8 mvillacampa mvillacampa  244 Aug 29 13:35 mvillacampa
drwx------.  7 mwalsh      mwalsh       193 Jan  8 12:14 mwalsh
drwx------.  3 mzainula    mzainula     119 Nov  8 15:31 mzainula
drwx------.  3 ouso        ouso         119 Dec  7 13:47 ouso
drwx------.  7 oza         oza          234 Nov  8 17:43 oza
drwx------.  3 pasat       pasat         94 May 26  2022 pasat
drwx------.  4 pmurchan    pmurchan     139 Nov  8 16:18 pmurchan
drwx------.  3 rsupport    rsupport     119 Apr 14  2022 rsupport
drwx------.  6 sgordon     sgordon      177 Nov  8 16:06 sgordon
drwxrwx---.  4 root        testshared    83 Dec 20 18:21 shared
drwx------.  9 smatthews   smatthews    320 Jan  6 00:03 smatthews
drwx------.  3 ssmith      ssmith       139 May 26  2022 ssmith
drwx------.  6 test        test         192 Oct 18 22:07 test
drwx------.  4 training2   training2    139 Nov  8 16:29 training2

fixed

All ipa commands return ipa: ERROR: did not receive Kerberos credentials when I'm root it's broken!

this is normal you need to authenticate as a user with permission to run these commands (e.g. an admin member) to fix run kinit <username> and enter the password. If you have configured 2FA on a user you WILL need to do that too.

the following example is in the freeipa podman container in the HPC stack

[root@freeipa-1 ~]# podman exec -it freeipa bash
[root@freeipa-1 /]# ipa user-find
ipa: ERROR: did not receive Kerberos credentials
[root@freeipa-1 /]# kinit antony
Password for antony@CLUSTER.INTERNAL:
[root@freeipa-1 /]# ipa user-find antony
--------------
1 user matched
--------------
  User login: antony
  First name: Antony
  Last name: Cleave
  Home directory: /define/home/antony
  Login shell: /bin/bash
  Principal name: antony@CLUSTER.INTERNAL
  Principal alias: antony@CLUSTER.INTERNAL
  Email address: antony@cluster.internal
  UID: 2006
  GID: 2006
  SSH public key fingerprint: SHA256:/M1Hr8rxG+Im8OOiPeDAqJNEDlyvvpTpAa8hnPJvEaI (ssh-ed25519)
  Account disabled: False
----------------------------
Number of entries returned 1
----------------------------

but I don't want to type it. That's a bit of a tough cookie this is how kerberos works if you have the password in a file however you can cat it like so:

[root@freeipa-1 /]# kdestroy -A
[root@freeipa-1 /]# klist
klist: No credentials cache found (filename: /tmp/krb5cc_0)
[root@freeipa-1 /]# ipa user-find antony
ipa: ERROR: did not receive Kerberos credentials
Password for antony@CLUSTER.INTERNAL:
[root@freeipa-1 /]# ipa user-find antony
--------------
1 user matched
--------------
  User login: antony
  First name: Antony
  Last name: Cleave
  Home directory: /define/home/antony
  Login shell: /bin/bash
  Principal name: antony@CLUSTER.INTERNAL
  Principal alias: antony@CLUSTER.INTERNAL
  Email address: antony@cluster.internal
  UID: 2006
  GID: 2006
  SSH public key fingerprint: SHA256:/M1Hr8rxG+Im8OOiPeDAqJNEDlyvvpTpAa8hnPJvEaI (ssh-ed25519)
  Account disabled: False
----------------------------
Number of entries returned 1
----------------------------
[root@freeipa-1 /]#

I can't login to the new node I added

FreeIPA allows you to configure login auth to a whole cluster from one place. You are probably hitting a host based access control restriction. Check /var/log/secure or system journal for errors like this:

Nov 07 09:52:25 login-1.cluster.internal sshd[88168]: Authorized to antony, krb5 principal antony@CLUSTER.INTERNAL (ssh_gssapi_krb5_cmdok)
Nov 07 09:52:25 login-1.cluster.internal sshd[88168]: pam_sss(sshd:account): Access denied for user antony: 6 (Permission denied)
Nov 07 09:52:25 login-1.cluster.internal sshd[88168]: fatal: Access denied for user antony by PAM account configuration [preauth]

see that permission denied. . . that's the killer here

how to fix is system dependent but on the NUIG cluster I have modified the allow_all to allow access from the cluster.internal hostgroup. This is NOT the same as the DNS zone. We can check the hbac-rulse here

[root@login-1 ~]# ipa hbacrule-find
--------------------
4 HBAC rules matched
--------------------
  Rule name: allow_admins
  Service category: all
  Enabled: True

  Rule name: allow_all
  User category: all
  Service category: all
  Description: Allow all users to access any host from any host in cluster.internal hostgroup keep it up to date!
  Enabled: True

  Rule name: allow_jack
  Service category: all
  Enabled: True

  Rule name: allow_systemd-user
  User category: all
  Host category: all
  Description: Allow pam_systemd to run user@.service to create a system user session
  Enabled: True
----------------------------
Number of entries returned 4
----------------------------

and you can see the settings for each of those like so:

[root@login-1 ~]# ipa hbacrule-show allow_all
  Rule name: allow_all
  User category: all
  Service category: all
  Description: Allow all users to access any host from any host in cluster.internal hostgroup keep it up to date!
  Enabled: True
  Host Groups: cluster.internal

see that host group! that's why antony can't login to login-1

[root@login-1 ~]# ipa hostgroup-show cluster.internal
  Host-group: cluster.internal
  Member hosts: head.cluster.internal, small-001.cluster.internal, small-003.cluster.internal, small-002.cluster.internal, small-004.cluster.internal, test-001.cluster.internal

lets fix that by adding it to the hostgroup

[root@login-1 ~]# ipa hostgroup-add-member cluster.internal --hosts=login-1.cluster.internal
  Host-group: cluster.internal
  Member hosts: head.cluster.internal, small-001.cluster.internal, small-003.cluster.internal, small-002.cluster.internal, small-004.cluster.internal, test-001.cluster.internal, login-1.cluster.internal
  Member of HBAC rule: allow_all, allow_admins
-------------------------
Number of members added 1
-------------------------

I cant get my replica container to start it gets partway and silently fails

sometimes this happens however I spend hours bashing my head against a wall so you don't have to.

i found that no matter what options were passed to the container it would just get stuck. So I used the following command to wipe EVERYTHING and start again and jump into the container

DO NOT RUN THIS ON THE LAST FUNCTIONING IPA SERVER WITH CA

podman rm -f freeipa; rm -rf /var/lib/state ; mkdir /var/lib/state ; podman run --network=host --detach --name freeipa --replace  --user root --volume /var/lib/state:/data:Z -it  docker.io/freeipa/freeipa-server:rocky-8 ipa-replica-install --domain cluster.internal --server head.cluster.internal --admin-password=REDACTED

podman exec -it freeipa bash

once inside I could see that the ipa client was functional so I used kinit to become an admin user and manually added the server to the ipaservers host group and complete the ipa replica install as per this guide

https://www.freeipa.org/page/V4/Replica_Setup

this is approx what I did (I lost the output)

kinit antony
ipa hostgroup-add-member ipaservers --hosts freeipa-1.cluster.internal
ipa-replica-install --setup-ca --realm CLUSTER.INTERNAL --domain cluster.internal  --setup-dns --auto-reverse --auto-forwarders --no-ui-redirect

at this point there is a stupid prompt that it can't use itself to resolve dns. . . yep it aint setup yet so say yes to continue it should then eventually complete and you have a functioning container

ok so maybe you don't want that admin password in the commandline of the container for the world + dog to read lets fix that to fix we

  1. delete the container but not the state folder
  2. create a new container without the password (there is a keytab in the state folder that takes it's place)
  3. create a systemd unit file to start this on boot as it gives up more failure control that and always restart podman container (i.e. only retry 10 times and give up)
  4. start the service
podman stop freeipa
podman create --network=host --name freeipa  --user root --volume /var/lib/state:/data:Z  docker.io/freeipa/freeipa-server:rocky-8 ipa-replica-install --domain cluster.internal --server head.cluster.internal --setup-ca --principal=admin --setup-dns --auto-reverse --auto-forwarders --no-ui-redirect
[root@freeipa-1 ~]# cat >/usr/lib/systemd/system/ipa-podman.service <<EOF
# ipa-podman.service

[Unit]
Description=Podman container-freeipa-replica.service
Documentation=man:podman-generate-systemd(1)
Wants=network.target
After=network-online.target


[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
RestartSec=10s
ExecStart=/usr/bin/podman start  "freeipa"
ExecStop=/usr/bin/podman stop --ignore "freeipa" -t 30
KillMode=no
User=root
Group=root
TimeoutStartSec=180
Type=forking

[Install]
WantedBy=multi-user.target default.target
EOF
systemctl start ipa-podman
systemctl enable ipa-podman

and it's been runnning for an hour while I write this wiki page

[root@freeipa-1 ~]# systemctl status ipa-podman
● ipa-podman.service - Podman container-freeipa-replica.service
   Loaded: loaded (/usr/lib/systemd/system/ipa-podman.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-11-01 22:07:11 GMT; 1h 3min ago
     Docs: man:podman-generate-systemd(1)
 Main PID: 31664 (conmon)
    Tasks: 1 (limit: 49024)
   Memory: 1.2M
   CGroup: /system.slice/ipa-podman.service
           └─31664 /usr/bin/conmon --api-version 1 -c 9e30b6ca044debd861c5f18ee8ca5f04b1b2c54eeafca39e4515048e4a89f7b4 -u 9e30b6ca044debd861c5f18ee8ca5f04b1b2c54eeafca39e4515048e4a89f7b4 -r /usr/bin/runc -b /var/lib/containers/storage/o>

Nov 01 22:07:10 freeipa-1.cluster.internal systemd[1]: Starting Podman container-freeipa-replica.service...
Nov 01 22:07:11 freeipa-1.cluster.internal podman[31624]: freeipa
Nov 01 22:07:11 freeipa-1.cluster.internal systemd[1]: Started Podman container-freeipa-replica.service.