Saturday, July 11, 2015

Checkpoint Standby Cluster Member Interface Not Reachable

It was a curious test that I tried to ping other interfaces on Checkpoint 4200 Cluster's active and passive firewalls. The result was interesting, I were able to ping both active (10.9.30.43) and standby (10.9.30.44) interfaces which are at the same zone as test PC (10.9.30.14), but not all of other interfaces on both cluster members. Only those active cluster member interfaces (such as 172.17.30.43) are reachable. Standby cluster member's interface (172.17.30.44) is unreachable at all. Not only icmp traffic, but also all other traffic such as https, ssh, sync traffic does not work on standby member's interface.

I did my search and found from Checkpoint Support Site, Checkpoint's explanation is "this is expected behavior. Connections to the Standby cluster members are not supported in HA clusters, by default."

To find out more details about this firewall behavior, I did some basic troubleshooting to see packets flow.

1. While I am pinging from  pc 10.9.30.14 to standby member 172.17.30.44, I got echo timed out. But 172.17.30.43 replied back

2. Check the drop packets from Active member 172.17.30.43, it seems the packets dropped by active firewall.It did not pass the traffic to standby member.

[Expert@CP1:0]# fw ctl zdebug drop | grep 10.9.30.14
;[cpu_0];[fw4_1];fw_log_drop_ex: Packet proto=1 10.9.30.14:2048 -> 172.17.30.43:19538 dropped by fwchain_reject_mtu Reason: rejected;
;[cpu_0];[fw4_1];fw_log_drop_ex: Packet proto=1 10.9.30.14:2048 -> 172.17.30.43:19537 dropped by fwchain_reject_mtu Reason: rejected;
;[cpu_0];[fw4_1];fw_log_drop_ex: Packet proto=1 10.99.30.14:2048 -> 172.17.30.43:19536 dropped by fwchain_reject_mtu Reason: rejected;

Note: during the research, also SK97587 mentioned "in some cases when the traffic originates from the standby member, return traffic is forwarded from the VIP to the active member, which drops that traffic."

My old post "Check Point Cluster Member Gateway Drops Ping Packets Without Log in Smartview Tracker" has a similar symptoms as this case, but cause is different. The solution is enable simultaneous ping parameter in the kernel by this command:  fw ctl set int fw_allow_simultaneous_ping

Resolution:

The solution is pretty simple, there is a magic parameter in firewall kernel for this kind of situation:

[Expert@CP1:0]# fw ctl get int fwha_forw_packet_to_not_active
fwha_forw_packet_to_not_active = 0

Basically, when this parameter is set to "0", packet forwarding will NOT be done to a non-active member. Instead, a reset packet will be sent to the client.

Set following command on both Cluster Members:

# fw ctl set int fwha_forw_packet_to_not_active 1

With following command you can verify the setting:
# fw ctl get int fwha_forw_packet_to_not_active


To set it permanently to survive reboot, add this line to the file $FWDIR/boot/modules/fwkern.conf :
fwha_forw_packet_to_not_active=1

Then reboot. Perform this on both cluster members.



Reference:

a. Troubleshooting "Clear text packet should be encrypted" error in ClusterXL




4 comments:

  1. Great explanation.

    ReplyDelete
  2. Another solution exists. Instead of enabling something in the kernel you can set static routing so traffic to cluster members are routed directly to cluster members not to cluster VIP address. It have to be set at all interface where do you suppose income traffic to cluster members.

    Example: I try to connect to cluster members at ssh. I need to us IP addresses of the nearest FW interface so the traffic does not go through virtual cluster IPs. Or I can set static routes.

    Mgmt IPs:: 10.1.1.252, 10.1.1.253, virtual 10.1.1.254
    Connection network IPs:: 192.168.1.252, 192.168.1.253, 192.168.1.254 virtual.
    - it is a connection network between FW and segments with administrators

    1. solution - use IPs
    Use 192.168.1.252 instead of 10.1.1.252. Use 192.168.1.253 instead of 10.1.1.253.

    2. solution - static routes and using Mgmt IPs
    Static routes at routers/L3 switches behind FW:
    ip route 10.1.1.252 255.255.255.255 192.168.1.252 name Route_to_FW1_through_FW1
    ip route 10.1.1.253 255.255.255.255 192.168.1.253 name Route_to_FW2_through_FW2

    ReplyDelete
  3. Solution #2 worked fine in my case. thanks.

    ReplyDelete
  4. Worked for me. Added on the fly command and the standby member started responding. Nice post.Thanks.

    ReplyDelete

NetSec Youtube Videos