-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Buffers Configuration Update design
Switch Buffers Configuration feature allows to distribute buffer memory among the ports in order to guarantee lossless traffic flow.
Current buffers configuration is predefined for the "worst case scenario" and does not take into account particular port settings and conditions.
Suggested update to the Buffers Configuration assumes dynamic profile selection based on current port parameters to ensure the most efficient buffer memory usage and maximize switch throughput.
Port speed supported: 10G, 25G, 40G, 50G, 100G
Cable length supported: 5m, 40m, 300m
Port speed is explicitly specified in the minigraph configuration file.
Cable length for each port can be specified in the minigraph or determined from the device port role and the port neighbor device role. This is also stored in the minigraph under PngDec/Devices/Device, tag 'type':
<Device i:type="ToRRouter">
...
</Device>
Table below shows mapping of neighbor ports roles to cable length:
Device and Neighbor roles | Cable length |
---|---|
ToRRouter(T0) - Server | 5m |
ToRRouter(T0) - LeafRouter(T1) | 40m |
LeafRouter(T1) - SpineRouter(T2) | 300m |
Buffers Configuration profiles for all combinations of port speed + cable length should be defined. Profile for each port is selected basing on port speed and attached cable length. If no exactly matching mapping found, the closest option will be used with the greater parameters.
Buffers Configuration update will be done in two stages. The first part of the update includes changing of the data model and the buffers configuration json generation. Second part implements port buffer profile update in run time. Update is triggered by the port speed change.
Implementation of this feature consists of two parts. Part 1 includes initial buffers configuration performed only on switch start. Part 2 allows buffer configuration change on port speed change.
Changes in Part1 implement port buffers configuration on switch initialization.
Files *buffers.json
(e.g. msn2700.32ports.buffers.json
) located at sonic-buildimage/src/sonic-swss/swssconfig/sample/
contain buffers related switch configuration (pools, profiles, binding of profiles to ports, etc).
The number of profiles as well as their parameters are hardware specific and should be calculated for every switch model separately. The table below contains the list of profiles for Mellanox MSN2700
Profile Name | Size | threshold | Pool Name | Xon | Xoff |
---|---|---|---|---|---|
pg_lossless_40G_5m_profile | 34K | 1 | ingress_lossless | 18K | 16K |
pg_lossless_50G_5m_profile | 34K | 1 | ingress_lossless | 18K | 16K |
pg_lossless_100G_5m_profile | 36K | 1 | ingress_lossless | 18K | 18K |
pg_lossless_40G_40m_profile | 41K | 1 | ingress_lossless | 18K | 23K |
pg_lossless_50G_40m_profile | 41K | 1 | ingress_lossless | 18K | 23K |
pg_lossless_100G_40m_profile | 53K | 1 | ingress_lossless | 18K | 35K |
pg_lossless_40G_300m_profile | 92K | 1 | ingress_lossless | 18K | 74K |
pg_lossless_50G_300m_profile | 92K | 1 | ingress_lossless | 18K | 74K |
pg_lossless_100G_300m_profile | 180K | 1 | ingress_lossless | 18K | 162K |
... | ... | ... | ... | ... | ... |
-
PG Profiles for all combinations of supported speed and cable length should be declared.
-
PG Profiles declared as map in the j2 template like this:
{% set pg_profiles = { 'pg_lossless_10G_5m_profile': { 'xon': 18432, 'xoff': 16384, 'size': 34816, 'dynamic_th': 1 }, 'pg_lossless_25G_5m_profile': { 'xon': 18432, 'xoff': 16384, 'size': 34816, 'dynamic_th': 1 }, 'pg_lossless_40G_5m_profile': { 'xon': 18432, 'xoff': 16384, 'size': 34816, 'dynamic_th': 1 }, 'pg_lossless_50G_5m_profile': { 'xon': 18432, 'xoff': 16384, 'size': 34816, 'dynamic_th': 1 }, ... %}
-
"SET" block will be generated only for those profiles which were actually used for at least one port.
Existing PG configuration in json file:
"BUFFER_PG_TABLE:Ethernet0,Ethernet4,Ethernet8,...:3-4": {
"profile" : "[BUFFER_PROFILE_TABLE:pg_lossless_profile]"
},
"OP": "SET"
replaced with the jinja2 template described below.
Port parameters to profile look-up table:
{%- set portconfig2profile = {
'40000_5m' : 'pg_lossless_40G_5m_profile', //name + size?
'40000_40m' : 'pg_lossless_40G_40m_profile',
'40000_300m' : 'pg_lossless_40G_300m_profile',
'50000_5m' : 'pg_lossless_50G_5m_profile',
'50000_40m' : 'pg_lossless_50G_40m_profile',
'50000_300m' : 'pg_lossless_50G_300m_profile',
'100000_5m' : 'pg_lossless_100G_5m_profile',
'100000_40m' : 'pg_lossless_100G_40m_profile',
'100000_300m': 'pg_lossless_100G_300m_profile'
...
}
-%}
Port and neighbor role to cable length look-up table:
{% set ports2cable = {
'ToRRouter_Server' : '5m',
'LeafRouter_ToRRouter' : '40m',
'SpineRouter_LeafRouter' : '300m'
}
%}
Macro (function) to determine cable length
{% set switch_role = minigraph_devices[minigraph_hostname]['type'] %}
{% macro cable_length(interface_name) %}
// pseudocode
if found in minigraph
return ethernet_interfaces['cable']
else
{% set nei = '"'+minigraph_neighbors[interface['name']]['name']+'"' -%}
{% set nei_role = minigraph_devices[nei]['type'] -%}
if found switch_role + nei_role
return {{ ports2cable[switch_role_nei_role] or ports2cable[nei_role_switch_role]}}
else
return max_length
{%- endmacro %}
Loop to generate Ethernet port-to-profile mapping tables:
{% pg_range = '3-4' %}
{% set ingress_lossless_pg_pool_size = 0 %}
{% for interface in ethernet_interfaces %}
// pseudocode
{% set speed = interface['speed'] %}
{% cable = cable_length(interface['name']) %}
{% set port_config = speed + '_' + cable -%}
if !(portconfig2profile has_key port_config)
port_config = find_closest_greater_profile(speed, cable)
profile = portconfig2profile[port_config]
ingress_lossless_pg_pool_size += profile.size
{
"BUFFER_PG_TABLE:{{ interface['name'] }}:{{ pg_range }}": {
"profile" : "[BUFFER_PROFILE_TABLE:{{ profile }}]"
},
"OP": "SET"
}{% if not loop.last %},{% endif %}
{% endfor %}
Buffers pool ingress_lossless_pool previously used for all ingress lossless profiles is now split into two parts:
- static part with the same name ingress_lossless_pool and size decreased but enough to serve all lossless profiles except the PG
- PG part which is calculated as a sum of sizes needed for profiles used for all ports.
Example:
{%
{
"BUFFER_POOL_TABLE:ingress_lossless_pg_pool": {
"size": "{{ ingress_lossless_pg_pool_size }}",
"type": "ingress",
"mode": "dynamic"
},
"OP": "SET"
}
%}
ingress_lossless_pg_pool_size calculation is covered in chapter "BUFFER_PG_TABLE update"
The declaration of the following tables could be replaced with the template in the interfaces list part. This will make config file more generic
- BUFFER_PORT_INGRESS_PROFILE_LIST
- BUFFER_PORT_EGRESS_PROFILE_LIST
- BUFFER_QUEUE_TABLE
E.g.
{
"BUFFER_QUEUE_TABLE:{{ all_ethernet_interfaces }}:0-1": {
"profile" : "[BUFFER_PROFILE_TABLE:q_lossy_profile]"
},
"OP": "SET"
}
The value of all_ethernet_interfaces variable can be assigned in the loop used for PG profiles (see chapter "BUFFER_PG_TABLE update")
Changes in part #2 implement port buffers configuration in run time.
- All profiles will be declared in the initial json file.
- new field
status
will be added to the profile table to indicate profile status. Values for the field "active" or "inactive". Value will be set to "active" when profile is created in SAI and used by one or more ports. Value "inactive" means profile is not used and not created in SAI.
; field = value
status = "active/inactive" ; Session state.
Initially all profiles should be declared with status "inactive".
BufferOrch component should be updated to create only profiles which were actually used. Field status should be updated accordingly.
- Where to get cable length and port_config-2-buffer profile mapping in run-time (Part 2)?
-
For Users
-
For Developers
-
Subgroups/Working Groups
-
Presentations
-
Join Us