Failed to validate connection org.postgresql.jdbc.PgConnection@3a84e4a4 (This connection has been closed.) (OD-893)
Zhou You opened 3 years ago

连接池配置会占满数据库所有链接,影响其它程序

select datname,pid,application_name,state from pg_stat_activity;

CleanShot 2022-08-27 at 00.38.24@2x.png CleanShot 2022-08-27 at 00.37.25@2x.png

  • Robin Shen commented 3 years ago

    大部分都是idle状态,你可以编辑hibernate.properties将maximumPoolSize设置小一点。

  • Robin Shen commented 3 years ago

    另外这个错误容易复现吗?

  • Zhou You commented 3 years ago

    一开始我通过环境变量指定一些配置但是没作用看了代码发现是硬编码的几个变量有用,后来改了hibernate.properties

    hibernate.hikari.minimumIdle=5
    hibernate.hikari.maximumPoolSize=25
    #hibernate.hikari.keepaliveTime=60000
    hibernate.hikari.idleTimeout=120000
    hibernate.hikari.leakDetectionThreshold=150000
    #hibernate.hikari.maxLifeTime=180000
    hibernate.hikari.connectionTimeout=3000
    hibernate.hikari.validationTimeout=2500
    

    数据库就是用supabase/postgres:14.1.0.61镜像启的容器 改了配置之后发现可能有conn没有被释放或者在PG那边还是被占用?

    2022-08-27 07:30:10,946 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:30:10,946 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:30:40,946 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:30:40,947 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:31:10,947 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:31:10,947 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:31:40,947 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:31:40,948 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:32:10,948 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:32:10,948 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:32:40,948 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:32:40,948 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:33:10,948 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:33:10,949 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:33:40,949 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:33:40,949 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:34:10,949 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - Before cleanup stats (total=5, active=0, idle=5, waiting=0)
    2022-08-27 07:34:10,949 DEBUG [HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool HikariPool-1 - After cleanup  stats (total=5, active=0, idle=5, waiting=0)
    

    CleanShot 2022-08-27 at 07.35.04@2x.png 连接占用是从5开始慢慢增长的

  • Zhou You commented 3 years ago

    可能是Docker (Swarm / K8S) 的IPVS default idle timeout(900s)超时被丢弃了空闲连接,因此,当 hikari 关闭连接时,postgres 没有任何感知,但是从 postgres 的角度来看,仍然存在连接。造成连接泄漏,我改下maxLifeTime观察一天看下情况

  • Robin Shen commented 3 years ago

    IPVS 这方面不太了解。如果找到原因请告知,我也好写在文档里,感谢 ??

  • Zhou You commented 3 years ago

    经过确认确实是IPVS的问题,将maxLifetime改为10分钟,再也没有连接泄漏了

    #hibernate.hikari.minimumIdle=10
    #hibernate.hikari.maximumPoolSize=10
    #hibernate.hikari.keepaliveTime=60000
    #hibernate.hikari.idleTimeout=120000
    hibernate.hikari.leakDetectionThreshold=150000
    hibernate.hikari.maxLifetime=600000
    hibernate.hikari.connectionTimeout=3000
    hibernate.hikari.validationTimeout=2500
    

    CleanShot 2022-08-27 at 20.22.07@2x.png 不过还是建于连接池的配置最好都能通过环境变量进行调整

  • Robin Shen changed state to 'Closed' 3 years ago
    Previous Value Current Value
    Open
    Closed
  • Robin Shen commented 3 years ago

    Thank you. These properties will be added to environment variables.

  • jbauer commented 3 years ago

    As I also run OneDev in docker I encountered that problem.

    Instead of changing the maxLifetime setting of HikariCP connection pool, you can also configure Linux to send tcp keep alive probes earlier. This is generally a good idea for any swarm service that uses IPVS and that has long idle times for TCP connections.

    In my docker compose file I have added the following to all services that have such an idle behavior:

    sysctls:
      - net.ipv4.tcp_keepalive_time=300
      - net.ipv4.tcp_keepalive_intvl=30
      - net.ipv4.tcp_keepalive_probes=10
    

    The above tells docker to set the sysctl properties for the onedev service/container. Linux will then send the first tcp keep alive probe after 300 seconds of inactivity and then continues to send probes every 30 seconds. It will also send up to 10 probes if no probes have been answered. After 10 failed probes the connection is considered dead. This will keep the IPVS updated and the connection will not be silently killed by IPVS because of idle time. Because IPVS does not tell anybody that it has killed a connection (does not send any reset packets) both sides of the connection might leak the connection. In case of postgres the configured max connections on postgres side can be reached easily.

  • Zhou You commented 3 years ago

    default TCP keepalive settings:

    $ sysctl -a | grep tcp_keep
    net.ipv4.tcp_keepalive_intvl = 75
    net.ipv4.tcp_keepalive_probes = 9
    net.ipv4.tcp_keepalive_time = 7200
    

    However, as https://cloud.google.com/compute/docs/troubleshooting/general-tips mentions:

    Note that idle connections are tracked for a maximum of 10 minutes, after which their traffic is subject to firewall rules, including the implied deny ingress rule. If your instance initiates or accepts long-lived connections with an external host, you should adjust TCP keep-alive settings on your Compute Engine instances to less than 600 seconds to ensure that connections are refreshed before the timeout occurs.

    They recommend the following settings:

    sudo /sbin/sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5
    

    In k8s we can to do this via the initContainer route.

    initContainers:
      - command:
        - sysctl
        - -w
        - net.ipv4.tcp_keepalive_time=60
        - net.ipv4.tcp_keepalive_intvl=60
        - net.ipv4.tcp_keepalive_probes=5
        image: busybox:stable
        name: prepare-sysctl
        securityContext:
          privileged: true
    

    The Kubernetes issue tracker mentions this: https://github.com/kubernetes/kubernetes/issues/32457#issuecomment-680325785 Since these settings seem to be reasonable defaults, I propose we make these our charts defaults as well.

    However, it is simple and convenient to setting of HikariCP connection pool through environment variables

  • jbauer commented 3 years ago

    However, it is simple and convenient to setting of HikariCP connection pool through environment variables

    Sure it is an easy solution, but at the cost of postgres having to tear down and spin up a postgres backend every x minutes. That is something you can avoid when adjusting keep alive settings.

    Keep in mind that maxLifetime also kills connections that have recently be used because it is a hard limit for each connection regardless of its usage. If your minimumIdle is smaller than maximumPoolSize then you can use idleTimeout to tear down idle connections earlier or you can use keepaliveTime so that HikariCP does a ping regularly which should refresh IPVS as well. If minimumIdle and maximumPoolSize are the same then you have to use keepaliveTime because the pool has a fixed size.

    However I still think configuring tcp keep alive settings for a given k8s/swarm service is better, because I feel like the application should not assume/research anything about the underlying default system configuration in a cloud environment. Instead at deployment time the linux namespace for that container/service should be configured to better support the application characteristics.

    But at the end all solutions work.

issue 1/1
Type
Question
Priority
Normal
Assignee
Issue Votes (0)
Watchers (4)
Reference
OD-893
Please wait...
Connection lost or session expired, reload to recover
Page is in error, reload to recover