-
Notifications
You must be signed in to change notification settings - Fork 156
A VMA Basic Sockperf test Examples
VMA performance can be validated using Sockperf in number of working modes. The main use cases are Ping-Pong and Underload.
Ping-Pong test: a packet (Ping) with specified message size is sent to the server side and sent back (Pong) to the client. the time it takes from sending the packet until getting it back divided by two is the result from this test.
Underload test: Measure the latency of a single packets under a load of millions Packets Per Second (without waiting for reply of packet before sending subsequent packet on time)
More on Sockperf can be found in here: Sockperf WiKi Best results can be achieved by running latest VMA version, Tuned machine and using the right VMA_SPEC - more on that can be found in here: VMA Performance Tuning Guide
Notes:
- The NUMA been used is the closest to the NIC
- The cores are the most optimized on this machine
-
First machine (Server side):
$ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 19,13 sockperf sr --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
-
Second machine run (Client side):
$ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 19,13 sockperf pp --time 4 --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
-
First machine (Server side):
`$ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 17,21,19 sockperf sr --msg-size 64 --ip 5.5.1.1 --port 19142 --tcp' -
Second machine run (Client side):
$ VMA_SPEC=latency LD_PRELOAD=$VMA_LOAD numactl --cpunodebind=1 taskset -c 17,21,19 sockperf ul --time 4 --msg-size 64 --ip 5.5.1.1 --port 19142 --tcp
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.2.10-0 Release built on Mar 28 2017 03:35:42
VMA INFO: Cmd Line: taskset -c 19,13 sockperf pp --time 4 --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.0-2.0.0.1:
VMA INFO: Spec Latency [VMA_SPEC]
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: Log Level INFO [VMA_TRACELEVEL]
VMA INFO: Tx QP WRE 256 [VMA_TX_WRE]
VMA INFO: Tx QP WRE Batching 4 [VMA_TX_WRE_BATCHING]
VMA INFO: Rx QP WRE 256 [VMA_RX_WRE]
VMA INFO: Rx QP WRE Batching 4 [VMA_RX_WRE_BATCHING]
VMA INFO: Rx Poll Loops -1 [VMA_RX_POLL]
VMA INFO: Rx Prefetch Bytes Before Poll 256 [VMA_RX_PREFETCH_BYTES_BEFORE_POLL]
VMA INFO: GRO max streams 0 [VMA_GRO_STREAMS_MAX]
VMA INFO: Select Poll (usec) -1 [VMA_SELECT_POLL]
VMA INFO: Select Poll OS Force Enabled [VMA_SELECT_POLL_OS_FORCE]
VMA INFO: Select Poll OS Ratio 1 [VMA_SELECT_POLL_OS_RATIO]
VMA INFO: Select Skip OS 1 [VMA_SELECT_SKIP_OS]
VMA INFO: CQ Drain Interval (msec) 100 [VMA_PROGRESS_ENGINE_INTERVAL]
VMA INFO: CQ Interrupts Moderation Disabled [VMA_CQ_MODERATION_ENABLE]
VMA INFO: CQ AIM Max Count 128 [VMA_CQ_AIM_MAX_COUNT]
VMA INFO: CQ Adaptive Moderation Disabled [VMA_CQ_AIM_INTERVAL_MSEC]
VMA INFO: CQ Keeps QP Full Disabled [VMA_CQ_KEEP_QP_FULL]
VMA INFO: Avoid sys-calls on tcp fd Enabled [VMA_AVOID_SYS_CALLS_ON_TCP_FD]
VMA INFO: Internal Thread Affinity 0 [VMA_INTERNAL_THREAD_AFFINITY]
VMA INFO: Thread mode Single [VMA_THREAD_MODE]
VMA INFO: Mem Allocate type 2 (Huge Pages) [VMA_MEM_ALLOC_TYPE]
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.2.10-0 Release built on Mar 28 2017 03:35:42
VMA INFO: Cmd Line: taskset -c 19,13 sockperf sr --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.0-2.0.0.1:
VMA INFO: Spec Latency [VMA_SPEC]
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: Log Level INFO [VMA_TRACELEVEL]
VMA INFO: Tx QP WRE 256 [VMA_TX_WRE]
VMA INFO: Tx QP WRE Batching 4 [VMA_TX_WRE_BATCHING]
VMA INFO: Rx QP WRE 256 [VMA_RX_WRE]
VMA INFO: Rx QP WRE Batching 4 [VMA_RX_WRE_BATCHING]
VMA INFO: Rx Poll Loops -1 [VMA_RX_POLL]
VMA INFO: Rx Prefetch Bytes Before Poll 256 [VMA_RX_PREFETCH_BYTES_BEFORE_POLL]
VMA INFO: GRO max streams 0 [VMA_GRO_STREAMS_MAX]
VMA INFO: Select Poll (usec) -1 [VMA_SELECT_POLL]
VMA INFO: Select Poll OS Force Enabled [VMA_SELECT_POLL_OS_FORCE]
VMA INFO: Select Poll OS Ratio 1 [VMA_SELECT_POLL_OS_RATIO]
VMA INFO: Select Skip OS 1 [VMA_SELECT_SKIP_OS]
VMA INFO: CQ Drain Interval (msec) 100 [VMA_PROGRESS_ENGINE_INTERVAL]
VMA INFO: CQ Interrupts Moderation Disabled [VMA_CQ_MODERATION_ENABLE]
VMA INFO: CQ AIM Max Count 128 [VMA_CQ_AIM_MAX_COUNT]
VMA INFO: CQ Adaptive Moderation Disabled [VMA_CQ_AIM_INTERVAL_MSEC]
VMA INFO: CQ Keeps QP Full Disabled [VMA_CQ_KEEP_QP_FULL]
VMA INFO: Avoid sys-calls on tcp fd Enabled [VMA_AVOID_SYS_CALLS_ON_TCP_FD]
VMA INFO: Internal Thread Affinity 0 [VMA_INTERNAL_THREAD_AFFINITY]
VMA INFO: Thread mode Single [VMA_THREAD_MODE]
VMA INFO: Mem Allocate type 2 (Huge Pages) [VMA_MEM_ALLOC_TYPE]
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.2.10-0 Release built on Mar 28 2017 03:35:42
VMA INFO: Cmd Line: sockperf sr --msg-size 14 --ip 11.4.3.3 --port 19140 --tcp
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.0-2.0.0.1:
VMA INFO: Spec Latency [VMA_SPEC]
sockperf: == version #2.8-0.git3dd5971d7d7a ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 11.4.3.3 PORT = 19140 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=4.100 sec; SentMessages=1492229; ReceivedMessages=1492228
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=4.000 sec; SentMessages=1455879; ReceivedMessages=1455879
sockperf: ====> avg-lat= 1.359 (std-dev=0.031)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 1.359 usec
sockperf: Total 1455879 observations; each percentile contains 14558.79 observations
sockperf: ---> <MAX> observation = 6.271
sockperf: ---> percentile 99.999 = 2.085
sockperf: ---> percentile 99.990 = 1.569
sockperf: ---> percentile 99.900 = 1.463
sockperf: ---> percentile 99.000 = 1.428
sockperf: ---> percentile 90.000 = 1.396
sockperf: ---> percentile 75.000 = 1.378
sockperf: ---> percentile 50.000 = 1.359
sockperf: ---> percentile 25.000 = 1.338
sockperf: ---> <MIN> observation = 1.253
- VMA and OFED version - confirm that you are using the correct version and VMA is active
- VMA SPEC - for measuring latency it's mostly needed
- Average latency: Summary: Latency is 1.359 usec
- Maximum latency: observation = 6.271