Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: configurable max_retries #307

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ remote_addr = "example.com:2333" # Necessary. The address of the server
default_token = "default_token_if_not_specify" # Optional. The default token of services, if they don't define their own ones
heartbeat_timeout = 40 # Optional. Set to 0 to disable the application-layer heartbeat test. The value must be greater than `server.heartbeat_interval`. Default: 40 seconds
retry_interval = 1 # Optional. The interval between retry to connect to the server. Default: 1 second
max_retries = 0 # Optional. The maximum number of retries to connect to the server. Set to 0 to retry forever. Default: 0

[client.transport] # The whole block is optional. Specify which transport to use
type = "tcp" # Optional. Possible values: ["tcp", "tls", "noise"]. Default: "tcp"
Expand Down Expand Up @@ -135,6 +136,7 @@ token = "whatever" # Necessary if `client.default_token` not set
local_addr = "127.0.0.1:1081" # Necessary. The address of the service that needs to be forwarded
nodelay = true # Optional. Determine whether to enable TCP_NODELAY for data transmission, if applicable, to improve the latency but decrease the bandwidth. Default: true
retry_interval = 1 # Optional. The interval between retry to connect to the server. Default: inherits the global config
max_retries = 0 # Optional. The maximum number of retries to connect to the server. Set to 0 to retry forever. Default: inherits the global config

[client.services.service2] # Multiple services can be defined
local_addr = "127.0.0.1:1082"
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ remote_addr = "example.com:2333" # Necessary. The address of the server
default_token = "default_token_if_not_specify" # Optional. The default token of services, if they don't define their own ones
heartbeat_timeout = 40 # Optional. Set to 0 to disable the application-layer heartbeat test. The value must be greater than `server.heartbeat_interval`. Default: 40 seconds
retry_interval = 1 # Optional. The interval between retry to connect to the server. Default: 1 second
max_retries = 0 # Optional. The maximum number of retries to connect to the server. Set to 0 to retry forever. Default: 0

[client.transport] # The whole block is optional. Specify which transport to use
type = "tcp" # Optional. Possible values: ["tcp", "tls", "noise"]. Default: "tcp"
Expand Down Expand Up @@ -137,6 +138,7 @@ token = "whatever" # Necessary if `client.default_token` not set
local_addr = "127.0.0.1:1081" # Necessary. The address of the service that needs to be forwarded
nodelay = true # Optional. Override the `client.transport.nodelay` per service
retry_interval = 1 # Optional. The interval between retry to connect to the server. Default: inherits the global config
max_retries = 0 # Optional. The maximum number of retries to connect to the server. Set to 0 to retry forever. Default: inherits the global config

[client.services.service2] # Multiple services can be defined
local_addr = "127.0.0.1:1082"
Expand Down
10 changes: 10 additions & 0 deletions src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ use backoff::{backoff::Backoff, future::retry_notify};
use bytes::{Bytes, BytesMut};
use std::collections::HashMap;
use std::net::SocketAddr;
use std::process::exit;
use std::sync::Arc;
use tokio::io::{self, copy_bidirectional, AsyncReadExt, AsyncWriteExt};
use tokio::net::{TcpStream, UdpSocket};
Expand Down Expand Up @@ -507,6 +508,7 @@ impl ControlChannelHandle {
let (shutdown_tx, shutdown_rx) = oneshot::channel();

let mut retry_backoff = run_control_chan_backoff(service.retry_interval.unwrap());
let max_retries = service.max_retries.unwrap_or(0);

let mut s = ControlChannel {
digest,
Expand All @@ -520,6 +522,7 @@ impl ControlChannelHandle {
tokio::spawn(
async move {
let mut start = Instant::now();
let mut retries = 0;

while let Err(err) = s
.run()
Expand All @@ -543,6 +546,13 @@ impl ControlChannelHandle {
panic!("{:#}. Break", err);
}

retries += 1;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually want to only abort the program when failures are in line. For example, three retry failures in line and abort the program. But your change seems to limit the total retry failures. It's common for long-running instances to have more retry failures because of network outrage. Say it runs for an hour, then fails, and runs for another hour, etc. In that case we don't want to limit the total retry failures, or we can only run for three hours. That said, you want to limit the retry failures in a similar style with the retry_backoff


if max_retries > 0 && retries >= max_retries {
error!("Max retries reached. Exiting...");
exit(1)
}

start = Instant::now();
}
}
Expand Down
6 changes: 6 additions & 0 deletions src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ pub struct ClientServiceConfig {
pub token: Option<MaskedString>,
pub nodelay: Option<bool>,
pub retry_interval: Option<u64>,
pub max_retries: Option<u64>,
}

impl ClientServiceConfig {
Expand Down Expand Up @@ -208,6 +209,8 @@ pub struct ClientConfig {
pub heartbeat_timeout: u64,
#[serde(default = "default_client_retry_interval")]
pub retry_interval: u64,
#[serde(default)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct way to set default value? See also retry_interval

pub max_retries: Option<u64>,
}

fn default_heartbeat_interval() -> u64 {
Expand Down Expand Up @@ -282,6 +285,9 @@ impl Config {
if s.retry_interval.is_none() {
s.retry_interval = Some(client.retry_interval);
}
if s.max_retries.is_none() {
s.max_retries = client.max_retries;
}
}

Config::validate_transport_config(&client.transport, false)?;
Expand Down
Loading