Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pmon] psud and thermalctld fail due to lack of ipmitool #21322

Open
linyutsung opened this issue Jan 3, 2025 · 0 comments
Open

[pmon] psud and thermalctld fail due to lack of ipmitool #21322

linyutsung opened this issue Jan 3, 2025 · 0 comments

Comments

@linyutsung
Copy link
Contributor

linyutsung commented Jan 3, 2025

Description

The psud and thermalctld in pmon docker failed to update the database and generated a backtrace log

Steps to reproduce the issue:

  1. install sonic image on DUT
  2. after pmon docker ready, check the log /var/log/syslog
  3. check the sonic cli command (show platform fan/psu/temperature )

Describe the results you received:

root@sonic:~# grep pmon /var/log/syslog | grep psud
2025 Jan 3 02:46:31.829915 sonic INFO pmon#supervisord 2025-01-03 02:46:31,829 INFO spawned: 'psud' with pid 31
2025 Jan 3 02:46:35.813213 sonic INFO pmon#supervisord: psud /bin/sh: 1: ipmitool: not found
2025 Jan 3 02:46:35.813681 sonic WARNING pmon#psud[31]: Failed to update PSU data - Command 'ipmitool sdr dump /usr/local/sdr_dump' returned non-zero exit status 127.
2025 Jan 3 02:46:37.841044 sonic INFO pmon#supervisord: psud /bin/sh: 1: ipmitool: not found
2025 Jan 3 02:46:38.819092 sonic INFO pmon#supervisord: psud /bin/sh: 1: ipmitool: not found
2025 Jan 3 02:46:41.830326 sonic INFO pmon#supervisord 2025-01-03 02:46:41,829 INFO success: psud entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

root@sonic:~# grep pmon /var/log/syslog | grep thermalctld
2025 Jan 3 02:46:31.846485 sonic INFO pmon#supervisord 2025-01-03 02:46:31,845 INFO spawned: 'thermalctld' with pid 35
2025 Jan 3 02:46:32.834891 sonic WARNING pmon#thermalctld[35]: Thermal manager is not supported on this platform
2025 Jan 3 02:46:32.836044 sonic INFO pmon#thermalctld: Start thermal monitoring loop
2025 Jan 3 02:46:37.841044 sonic INFO pmon#supervisord: thermalctld /bin/sh: 1: ipmitool: not found
2025 Jan 3 02:46:37.841551 sonic INFO pmon#supervisord: thermalctld Process Process-1:
2025 Jan 3 02:46:37.847725 sonic INFO pmon#supervisord: thermalctld Traceback (most recent call last):
2025 Jan 3 02:46:37.849707 sonic INFO pmon#supervisord: thermalctld File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
2025 Jan 3 02:46:37.849764 sonic INFO pmon#supervisord: thermalctld self.run()
2025 Jan 3 02:46:37.849764 sonic INFO pmon#supervisord: thermalctld File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
2025 Jan 3 02:46:37.849809 sonic INFO pmon#supervisord: thermalctld self._target(*self._args, **self._kwargs)
2025 Jan 3 02:46:37.850110 sonic INFO pmon#supervisord: thermalctld File "/usr/local/bin/thermalctld", line 789, in task_worker
2025 Jan 3 02:46:37.850159 sonic INFO pmon#supervisord: thermalctld self.main()
2025 Jan 3 02:46:37.850159 sonic INFO pmon#supervisord: thermalctld File "/usr/local/bin/thermalctld", line 768, in main
2025 Jan 3 02:46:37.850233 sonic INFO pmon#supervisord: thermalctld self.fan_updater.update()
2025 Jan 3 02:46:37.850233 sonic INFO pmon#supervisord: thermalctld File "/usr/local/bin/thermalctld", line 232, in update
2025 Jan 3 02:46:37.850233 sonic INFO pmon#supervisord: thermalctld self._refresh_fan_drawer_status(drawer, drawer_index)
2025 Jan 3 02:46:37.850339 sonic INFO pmon#supervisord: thermalctld File "/usr/local/bin/thermalctld", line 279, in _refresh_fan_drawer_status
2025 Jan 3 02:46:37.850339 sonic INFO pmon#supervisord: thermalctld [('presence', str(try_get(fan_drawer.get_presence, False))),
2025 Jan 3 02:46:37.850339 sonic INFO pmon#supervisord: thermalctld ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Jan 3 02:46:37.850363 sonic INFO pmon#supervisord: thermalctld File "/usr/local/bin/thermalctld", line 49, in try_get
2025 Jan 3 02:46:37.850481 sonic INFO pmon#supervisord: thermalctld ret = callback()
2025 Jan 3 02:46:37.850481 sonic INFO pmon#supervisord: thermalctld ^^^^^^^^^^
2025 Jan 3 02:46:37.850481 sonic INFO pmon#supervisord: thermalctld File "/usr/local/lib/python3.11/dist-packages/sonic_platform_pddf_base/pddf_fan_drawer.py", line 50, in get_presence
2025 Jan 3 02:46:37.850505 sonic INFO pmon#supervisord: thermalctld status = self._fan_list[0].get_presence()
2025 Jan 3 02:46:37.850505 sonic INFO pmon#supervisord: thermalctld ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Jan 3 02:46:37.850524 sonic INFO pmon#supervisord: thermalctld File "/usr/local/lib/python3.11/dist-packages/sonic_platform/fan.py", line 131, in get_presence
2025 Jan 3 02:46:37.850674 sonic INFO pmon#supervisord: thermalctld output = self.pddf_obj.get_attr_name_output(device, attr)
2025 Jan 3 02:46:37.850674 sonic INFO pmon#supervisord: thermalctld ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Jan 3 02:46:37.850674 sonic INFO pmon#supervisord: thermalctld File "/usr/local/lib/python3.11/dist-packages/sonic_platform_pddf_base/pddfapi.py", line 989, in get_attr_name_output
2025 Jan 3 02:46:37.850698 sonic INFO pmon#supervisord: thermalctld output['status']=self.bmc_get_cmd(bmc_attr)
2025 Jan 3 02:46:37.850698 sonic INFO pmon#supervisord: thermalctld ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Jan 3 02:46:37.850718 sonic INFO pmon#supervisord: thermalctld File "/usr/local/lib/python3.11/dist-packages/sonic_platform_pddf_base/pddfapi.py", line 950, in bmc_get_cmd
2025 Jan 3 02:46:37.850894 sonic INFO pmon#supervisord: thermalctld value = self.non_raw_ipmi_get_request(bmc_attr)
2025 Jan 3 02:46:37.850894 sonic INFO pmon#supervisord: thermalctld ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Jan 3 02:46:37.850894 sonic INFO pmon#supervisord: thermalctld File "/usr/local/lib/python3.11/dist-packages/sonic_platform_pddf_base/pddfapi.py", line 884, in non_raw_ipmi_get_request
2025 Jan 3 02:46:37.850919 sonic INFO pmon#supervisord: thermalctld self.populate_bmc_cache_db(bmc_attr)
2025 Jan 3 02:46:37.850919 sonic INFO pmon#supervisord: thermalctld File "/usr/local/lib/python3.11/dist-packages/sonic_platform_pddf_base/pddfapi.py", line 862, in populate_bmc_cache_db
2025 Jan 3 02:46:37.850919 sonic INFO pmon#supervisord: thermalctld subprocess.check_output(sdr_dump_cmd, shell=True, universal_newlines=True)
2025 Jan 3 02:46:37.850951 sonic INFO pmon#supervisord: thermalctld File "/usr/lib/python3.11/subprocess.py", line 466, in check_output
2025 Jan 3 02:46:37.850951 sonic INFO pmon#supervisord: thermalctld return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2025 Jan 3 02:46:37.850970 sonic INFO pmon#supervisord: thermalctld ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Jan 3 02:46:37.851062 sonic INFO pmon#supervisord: thermalctld File "/usr/lib/python3.11/subprocess.py", line 571, in run
2025 Jan 3 02:46:37.851062 sonic INFO pmon#supervisord: thermalctld raise CalledProcessError(retcode, process.args,
2025 Jan 3 02:46:37.851062 sonic INFO pmon#supervisord: thermalctld subprocess.CalledProcessError: Command 'ipmitool sdr dump /usr/local/sdr_dump' returned non-zero exit status 127.
2025 Jan 3 02:46:41.845069 sonic INFO pmon#supervisord 2025-01-03 02:46:41,844 INFO success: thermalctld entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

root@sonic:~# show platform fan
Fan Not detected

root@sonic:~# show platform temperature
Thermal Not detected

root@sonic:~# show platform psu
PSU Model Serial HW Rev Voltage (V) Current (A) Power (W) Status LED


PSU 1 N/A N/A N/A N/A N/A N/A NOT PRESENT
PSU 2 N/A N/A N/A N/A N/A N/A NOT PRESENT

Describe the results you expected:

The show platform command result should be ok

Output of show version:

root@sonic:~# show version

SONiC Software Version: SONiC.master.736172-8e246d26b
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-22-2-amd64
Build commit: 8e246d2
Build date: Thu Jan 2 13:11:59 UTC 2025
Built by: azureuser@7e6fca80c000002

Platform: x86_64-ufispace_s9300_32d-r0
HwSKU: UFISPACE-S9300-32D
ASIC: broadcom
ASIC Count: 1
Serial Number: WJD1B77200002B1
Model Number: S9300-32D-9R6
Hardware Revision: N/A
Uptime: 06:17:16 up 3:33, 1 user, load average: 1.39, 1.28, 1.11
Date: Fri 03 Jan 2025 06:17:16

Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-brcm latest 673483369c01 762MB
docker-syncd-brcm master.736172-8e246d26b 673483369c01 762MB
docker-gbsyncd-broncos latest bfa47ad0a09b 352MB
docker-gbsyncd-broncos master.736172-8e246d26b bfa47ad0a09b 352MB
docker-gbsyncd-credo latest 7b882ec06a1f 325MB
docker-gbsyncd-credo master.736172-8e246d26b 7b882ec06a1f 325MB
docker-orchagent latest 4138146fa539 356MB
docker-orchagent master.736172-8e246d26b 4138146fa539 356MB
docker-nat latest eb92d166d56f 345MB
docker-nat master.736172-8e246d26b eb92d166d56f 345MB
docker-fpm-frr latest 3ef6529d12ac 377MB
docker-fpm-frr master.736172-8e246d26b 3ef6529d12ac 377MB
docker-macsec latest b01821486e66 345MB
docker-teamd latest 2b567764df9e 342MB
docker-teamd master.736172-8e246d26b 2b567764df9e 342MB
docker-sflow latest 44ea2eae3125 343MB
docker-sflow master.736172-8e246d26b 44ea2eae3125 343MB
docker-dhcp-relay latest 2f50645becd0 323MB
docker-sonic-bmp latest 88a8e038d2e5 314MB
docker-sonic-bmp master.736172-8e246d26b 88a8e038d2e5 314MB
docker-platform-monitor latest 654840a777ac 433MB
docker-platform-monitor master.736172-8e246d26b 654840a777ac 433MB
docker-eventd latest 668daf9d268d 313MB
docker-eventd master.736172-8e246d26b 668daf9d268d 313MB
docker-snmp latest adfb96f44acd 353MB
docker-snmp master.736172-8e246d26b adfb96f44acd 353MB
docker-router-advertiser latest 3c73737cf88e 313MB
docker-router-advertiser master.736172-8e246d26b 3c73737cf88e 313MB
docker-mux latest a3725e51b65d 365MB
docker-mux master.736172-8e246d26b a3725e51b65d 365MB
docker-lldp latest 94621c6c2734 359MB
docker-lldp master.736172-8e246d26b 94621c6c2734 359MB
docker-sonic-gnmi latest 76ab5aac59e8 403MB
docker-sonic-gnmi master.736172-8e246d26b 76ab5aac59e8 403MB
docker-database latest cb6f50a3fa5c 322MB
docker-database master.736172-8e246d26b cb6f50a3fa5c 322MB
docker-sonic-mgmt-framework latest 55c7f9f0923b 400MB
docker-sonic-mgmt-framework master.736172-8e246d26b 55c7f9f0923b 400MB

Output of show techsupport:

sonic_dump_sonic_20250103_061349.tar.gz

Additional information you deem important (e.g. issue happens only occasionally):

We notice that the ipmitool were removed in the pmon docker build. Currently we had workaround to add ipmitool installtion in pmon Dockerfile.j2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant