This plugin provides a new UFM telemetry prometheus endpoint with more human-readable labels (e.g. the device name, port number,etc...) instead of the labels that based on the GUIDs on the original UFM telemetry endpoint. The new telemetry endpoint could be consumed by the Prometheus Server to collect the metrics with their labels, and then the collected metrics could be monitored using Grafana
NVIDIA UFM Telemetry platform provides network validation tools to monitor network performance and conditions, capturing and streaming rich real-time network telemetry information, application workload usage to an on-premise or cloud-based database for further analysis. As a fabric manager, the UFM Telemetry holds a real-time network telemetry information of the network topology. This information should be monitored, over time (as it can change with time) towards a monitoring system like Grafana. In order to do so, we present UFM Telemetry Grafana Plugin.
-
Login as admin
-
Run
enable
config terminal
-
Make sure that UFM is running
show ufm status
-
if UFM is down then run it
ufm start
-
Make sure docker is running
no docker shutdown
-
Load the latest plugin container
- In case of HA, load the plugin on the standby node as well;
- if your appliance is connected to the internet, you could simply run:
docker pull mellanox/ufm-plugin-grafana-dashboard
- if your appliance is not connected to the internet, you need to load the image offline
- Use a machine that is connected to the internet to save the docker image
docker save mellanox/ufm-plugin-grafana-dashboard:latest | gzip > ufm-plugin-grafana-dashboard.tar.gz
- Move the file to scp shared location that is accessible to the appliance
- Fetch the image to the appliance
image fetch scp://user@hostname/path-to-file/ufm-plugin-grafana-dashboard.tar.gz
- Load the image
docker load ufm-plugin-grafana-dashboard.tar.gz
- Use a machine that is connected to the internet to save the docker image
-
Enable & start the plugin
ufm plugin grafana-dashboard add
-
Check that plugin is up and running with
show ufm plugin
-
Load the latest plugin container
- In case of HA, load the plugin on the standby node as well;
- if your machine is connected to the internet, you could simply run:
docker pull mellanox/ufm-plugin-grafana-dashboard
- if your appliance is not connected to the internet, you need to load the image offline
- Use a machine that is connected to the internet to save the docker image
docker save mellanox/ufm-plugin-grafana-dashboard:latest | gzip > ufm-plugin-grafana-dashboard.tar.gz
- Move the file to some shared location that is accessible to the UFM machine
- Load the image to UFM machine
docker load < /[some-shared-location]/ufm-plugin-grafana-dashboard.tar.gz
- Use a machine that is connected to the internet to save the docker image
- In case of UFM-SDN Appliance Gen 3, you need to make sure that the port of endpoint is opened:
ufw show
- You should see the port 8982 listed, otherwise you need open it by:
ufw allow 8982
- You should see the port 8982 listed, otherwise you need open it by:
-
Enable & start the plugin
docker exec ufm /opt/ufm/scripts/manage_ufm_plugins.sh add -p grafana-dashboard
-
Check that plugin is up and running with
docker exec ufm /opt/ufm/scripts/manage_ufm_plugins.sh show
-
Install the latest version of UFM.
-
Load the latest plugin container
- In case of HA, load the plugin on the standby node as well;
- if your machine is connected to the internet, you could simply run:
docker pull mellanox/ufm-plugin-grafana-dashboard
- if your appliance is not connected to the internet, you need to load the image offline
- Use a machine that is connected to the internet to save the docker image
docker save mellanox/ufm-plugin-grafana-dashboard:latest | gzip > ufm-plugin-grafana-dashboard.tar.gz
- Move the file to some shared location that is accessible to the UFM machine
- Load the image to UFM machine
docker load < /[some-shared-location]/ufm-plugin-grafana-dashboard.tar.gz
- Use a machine that is connected to the internet to save the docker image
-
To enable & start the plugin, run :
/opt/ufm/scripts/manage_ufm_plugins.sh add -p grafana-dashboard
-
Check that plugin is up and running with
docker ps;
Log file grafana-dashboard-plugin.log is located in /opt/ufm/files/log on the host.
This endpoint provides the metrics and could be consumed by any Prometheus server.
http://<UFM_HOST>:8982/labels/enterprise
METHOD: PUT
URL: https://[HOST-IP]/ufmRest/plugin/grafana-dashboard/conf
Payload Example:
{
"ufm": {
"port": 8000
},
"ufm-telemetry-endpoint": {
"host": "127.0.0.1",
"port": 9001,
"url": "enterprise"
},
"logs-config": {
"log_file_backup_count": 5,
"log_file_max_size": 10485760,
"logs_file_name": "/log/grafana-dashboard-plugin.log",
"logs_level": "INFO"
}
}
cURL Example:
curl -XPUT 'https://10.209.36.68/ufmRest/plugin/grafana-dashboard/conf/' \
-k \
-u admin:123456 \
-H 'Content-Type: application/json' \
-d '{"ufm-telemetry-endpoint":{"host": "127.0.0.1","port": 9002,"url": "enterprise"}}'
Parameter | Required | Description |
---|---|---|
ufm-telemetry-endpoint.host | True | Hostname or IPv4 or IPv6 of the original UFM Telemetry Endpoint, which is normally the localhost [Default is 127.0.0.1] |
ufm-telemetry-endpoint.port | True | Port of the original UFM Telemetry Endpoint [Default is 9001] |
ufm-telemetry-endpoint.url | True | URL of the original UFM Telemetry Endpoint [Default is 'enterprise'] |
logs-config.logs_file_name | True | Log file name [Default = '/log/grafana-dashboard-plugin.log'] |
logs-config.logs_level | True | Default is 'INFO' |
logs-config.max_log_file_size | True | Maximum log file size in Bytes [Default is 10 MB] |
logs-config.log_file_backup_count | True | Maximum number of backup log files [Default is 5] |
1. You can install the Prometheus & Grafana externally and configure them to consume the metrics from the endpoint#1
To install Grafana on your machine, please follow the installation guide .
To install Prometheus server on your machine, please follow the installation guide
a) After installing the Prometheus server, edit prometheus.yml
# metrics_path : 'labels/enterprise'
# static_configs
#- targets: ["{UFM enterprise IP}:{Prometheus endpoint port, usually 8982}"]
b) Run Prometheus server.
c) Run Grafana server.
d) Add Prometheus server as data source for Grafana and name it prometheus (case-sensitive).
e) Import Infiniband_Telemetry.json to your Grafana dashboard