Skip to content

Enable ECN on lossless queues

Ying Xie edited this page Oct 17, 2017 · 7 revisions

Overview

IP based RDMA protocol (e.g. RoCEv2) uses PFC to enable drop-free network. ECN is the important algorithm to support the model. However, ECN is currently enabled for lossy queues for current configuration. Therefore, we need also enable ECN for lossless queues for new configurations.

SAI specific changes

Requirement

Enabling ECN on lossless queue is considered necessary and it should be enabled in SONiC configuration. For non-ECN capable packet in lossless queue, if threshold is exceeded it should be dropped. Design test cases to ensure ECN works on lossless queue.

ECN shall always go out before PFC packets got generated on the same link.

ECN threshold will be calculated and push down to SAI. It is not SAI's responsibility to auto-calculate ECN threshold.

SAI client should be able to change ECN on the flight without restarting service.

Example change

QoS configuration for TD2

Change queue 0-1 at the end of the line to 3-4.

SONiC specific changes

Goal

  1. Enable dynamic ECN configuration change when the switch is already up and running.
  2. Move QoS configuration to conf_DB. So dynamic configuration changes will be persisted.
  3. Move away from APP_DB gradually.
  4. Apply similar concept to MMU buffer management.

Requirements

  1. Dynamic ECN configuration

    1.1. Design a cli (console) command that supports showing and setting ECN configuration.

    1.2. Any dynamic set value should go to conf_DB directly.

  2. Move QoS configuration to conf_DB

    2.1. Currently the configuration resides inside swss container, under /etc/swss/config.d/

    2.2. On a system with the target code, this file should not exist at above location.

    2.3. swssconfig.sh should be updated not to load the configuration from above location. 2.4. The configuration j2 file should be mounted or symbolic linked to a common place in the base o.s. (The j2 file should be moved to template folder on base o.s.)

    2.4. A mechanism should run once at the first boot to load the configuration into conf_DB.

    2.4.1 At first boot, there is a procedure of initialize config_DB, one step is to load minigraph
    
    2.4.2 The conversion of qos.j2 to qos.json should happen after loading minigraph
    
    2.4.3 There is already mechanism to execute this logic only once.
    

    2.5. Subsequent boots should take configuration directly from conf_DB.

    2.6. Reload minigraph script should also invoke qos configuration conversion and application procedure.

  3. Move away from APP_DB

    3.1. Orchagent should subscribe to both conf_DB and APP_DB for the time being.

    3.2. Design (2) should be strictly followed, making sure there is only one copy of data being fed to orchagent.

    3.3. Orchagent will apply the latest change to SAI.

    3.4. [information only, out of scope of this design] Broadcom platform will continue using the existing configuration management until the relevant j2 files is produced. Then move to the common infrastructure by this design.

  4. Apply similar design to MMU.

    4.1. Same design could be applied to MMU buffer management. Particularly, the link speed change handling.

    4.2. Intelligence of auto configuration change in reaction to port attribute change should remain with an independent entity, it could be administrator, or a process subscribe to state_DB.

    4.3. When relevant changes happens in state_DB, e.g. link speed change, a new buffer configuration will be calculated by the subscribing process, and push to conf_DB. Preferrably use the cli command or a common service of the cli command. Avoid duplicate logic in this process and the cli command.

    4.4. Requirement for state_DB is outside scope of this document.

Reference

Please reference section 4 of Congestion Control for Large-Scale RDMA Deployments

Clone this wiki locally