tencent cloud

Feedback

Solution to Excessive Clients in TIME_WAIT Status

Last updated: 2024-01-04 14:39:00

    Background

    When performing stress testing on CLB, you may encounter connection failures caused by too many client TIME-WAIT (all ports are occupied in short time). Below are reasons and solutions:

    Linux parameter description

    tcp_timestamps : Whether to enable TCP timestamps. A timestamp is negotiated in TCP three-way handshake. If either party does not support this parameter, it will not be used in this connection. tcp_tw_recycle : Whether to enable reuse of TCP TIME-WAIT state. tcp_tw_reuse : When enabled, connections in TIME_WAIT state that exceeds 1 second can be directly reused.

    Cause Analysis

    The client has too many TIME_WAIT because it proactively closes connections. When the client closes a connection, the connection will enter TIME_WAIT state and be reused after 60 seconds by default. In this case, you can enable tcp_tw_recycle and tcp_tw_reuse parameters to facilitate reuse of connections in TIME_WAIT state. If tcp_timestamps is currently disabled on CLB, tcp_tw_recycle and tcp_tw_reuse parameters enabled by the client will not take effect, and connections in TIME_WAIT state cannot be quickly reused. The following describes some Linux parameters and reasons why tcp_timestamps cannot be enabled on CLB:
    1. tcp_tw_recycle and tcp_tw_resuse only take effect when tcp_timestamps is enabled.
    2. In a FULLNAT scenario, tcp_timestamps and tcp_tw_recycle cannot be enabled at the same time, because the public network client may fail to access the server through the NAT gateway. The reasons are as follows: If both tcp_tw_recycle and tcp_timestamps are enabled, timestamp in the socket connect requests of the same source IP (same server) must be incremental within 60 seconds. Taking 2.6.32 kernel as an example, the details are as follows:
    if(tmp_opt.saw_tstamp && tcp_death_row.sysctl_tw_recycle &&
    (dst = inet_csk_route_req(sk,req))!= NULL &&
    (peer = rt_get_peer((struct rtable *)dst))!= NULL &&
    peer->v4daddr == saddr){
    if(get_seconds()< peer->tcp_ts_stamp + TCP_PAWS_MSL &&
    (s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW){
    NET_INC_STATS_BH(sock_net(sk),LINUX_MIB_PAWSPASSIVEREJECTED)
    goto ↓drop_and_release;
    }
    }
    Note:
    tmp_opt.saw_tstamp: This socket supports tcp_timestamp. sysctl_tw_recycle: tcp_tw_recycle has been enabled for this server. TCP_PAWS_MSL: 60s; the last TCP communication of the source IP occurred within 60 seconds. TCP_PAWS_WINDOW: 1; the timestamp of the last TCP communication of the source IP is greater than that of this TCP communication.
    3. On CLB (Layer-7), tcp_timestamps is disabled because the public network client may fail to access the server through the NAT gateway, as shown in the example below: a. A quintuple is still in TIME_WAIT state. In the port allocation policy of the NAT gateway, the same quintuple is reused in twice the maximum segment lifetime (2MSL), and a SYN packet is sent. b. When tcp_timestamps is enabled and the following two conditions are met, the SYN packet will be dropped (because timestamp option is enabled, and the packet is considered as old). i. Timestamp of last time > Timestamp of this time ii. Packets are received within 24 days (the timestamp field is 32-bit and the timestamp is updated once per 1 millisecond by default in Linux. The timestamp will wrap around after 24 days).
    Note:
    This problem is more obvious on mobile devices because clients share limited public network IPs under the ISP's NAT gateway and a quintuple can be reused in 2MSL. The timestamps sent from different clients may not be incremental.
    Taking 2.6.32 kernel as an example, the details are as follows:
    static inline int tcp_paws_check(const struct tcp_options_received *rx_opt,int paws_win)
    {
    if((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval)<= paws_win)
    return 1;
    if(unlikely(get_seconds()>=rx_opt->ts_recent_stamp + TCP_PAWS_24DAYS))
    return 1;
    return 0;
    }
    Note:
    rx_opt->ts_recent: Timestamp of last time rx_opt->rcv_tsval: Timestamp received in this time get_seconds(): Current time rx_opt->ts_recent_stamp: Time when the previous packet was received

    Solution

    If the client has too many TIME_WAIT, see below for solutions:
    HTTP uses non-persistent connections (Connection: close). In this case, CLB proactively closes the connection, and the client will not generate TIME_WAIT.
    If the scenario requires a persistent connection, enable SO_LINGER option of the socket and use RST to close the connection to avoid the TIME_WAIT state and achieve fast port reuse.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support