Wednesday, October 21, 2015

Checkpoint Gateway SSH Connection Intermittenly Slow Issue - CONFD CPU High

When Gaia released at R75.40 on 2012, our Checkpoint firewalls have been adopted it right away with an upgrade. Since then we have upgraded to R77.10, R77.20 and recently planing to R77.30. The new version's experience was quite good, but just recently we are starting to feel the Gaia CLI and Porttal is getting slower and slower. 

For example, the ssh login process is taking a couple of minutes to show the prompt. WebUi is consistently showing lost database connection when saving any changes. You will have to re-login again to WebUI. SNMP Monitoring shows your device is up and reachable by ping but could not poll any SNMP information. After a couple of minutes, sometimes, it may take more than 10 minutes or longer, everything goes back normal. It did not happen all the time, just a couple of times per day. Most of times, log in, snmp access are fine.

Also some times, you will find out save config command will cause database timeout issue too.

FW-CP2> save config
NMSCFD0026  Timeout waiting for response from database server.

Actually Checkpoint has a couple of sk relating to this issue:  such as sk104761 for this. Based on sk104761 : Each change made in Gaia Clish or in Gaia Portal is saved under a revision in the Gaia Database - /config/db/initial_db file. Once this file becomes large, confd process consumes more CPU to read from this file, or to save new data to this file.

[[email protected]:0]# cd /config/db
[[email protected]:0]# ls -l
total 218836
-rw-r--r-- 1 admin root    133250 Sep 20 13:28 initial
-rw-r--r-- 1 admin root 223720448 Sep 20 13:28 initial_db

The Initial_db file has been increased to size 220M. So what is Initial_db, and can we delete it? Answer of course is no.

From sk101273, "The /config/db/initial file must be present and valid (in other words, not corrupted) at boot time for IP Series Appliance to get configured. Otherwise, the IP Series Appliance will go into first-time boot mode and attempt to configure itself using DHCP, or wait for the user to configure it through the serial console port."

Lets take a look what is inside:
[[email protected]:0]# cat initial
# Generated by /bin/confd on Sun Sep 20 13:28:32 2015
configurationChange t
centrallyManaged t
inactto:default 720
file was
by /bin/confd
on Tue
6 16\:16\:12
ntp:server: t
ntp:server: 1
ntp:server: t
ntp:server: t
ntp:server: t
ntp:server: 1
ntp:server: t
dhcp:dhcpc:interface:eth3 t
dhcp:dhcpc:interface:eth3:timeout 60
dhcp:dhcpc:interface:eth3:retry 300
dhcp:dhcpc:interface:eth3:reboot 10
machine:hostname FW-GRU1-CP1
update_upgrade_info:set_counter f
5 17\:23\:44
installer:available_install_packages_number 4
installer:available_download_packages_number 7
installer:category_is_aligned:3 1
installer:category_is_aligned:5 1
installer:category_is_aligned:1 1
installer:category_is_aligned:4 0
installer:ftw_random_res 1
installer:d_weekday Saturday
installer:d_hours 17
installer:d_minutes 30

To verify that this is indeed the issue:
  1. Log in to Expert mode.
  2. Backup the current Gaia configuarion database:
    [[email protected]]# cp  /config/db/initial_db  /config/db/initial_db_backup
  3. Connect to the Gaia configuration database:
    [[email protected]]# sqlite3 /config/db/initial_db
  4. Query the database using the SQLite to identify the issue:
    sqlite> select * from revisions where time like "%1969%";
    If any entries are returned, the system is likely experiencing this issue.
  5. Exit from SQLite:
    sqlite> .exit

[[email protected]:0]# sqlite3 /config/db/initial_db 
SQLite version 3.6.20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> sqlite> select * from revisions where time like "%1969%";
Error: near "sqlite": syntax error
sqlite> select * from revisions where time like "%1969%";
cluster:shared_feature_lock:admin|0|||||1969-12-31 19:00:00|1
cluster:shared_feature_lock:cadmin|0|||||1969-12-31 19:00:00|1
cdm:per_exec|0|||||1969-12-31 19:00:00|1
cdm:total|0|||||1969-12-31 19:00:00|1
cdm:enable|0|||||1969-12-31 19:00:00|1
lcd:screensaver:mode|0|||||1969-12-31 19:00:00|1
lcd:screensaver:timeout|0|||||1969-12-31 19:00:00|1
lcd:backlight:support|0|||||1969-12-31 19:00:00|1
zoneinfo:Atlantic:Faroe|0|||||1969-12-31 19:00:00|1
zoneinfo:Atlantic:Stanley|0|||||1969-12-31 19:00:00|1
zoneinfo:Atlantic:Canary|0|||||1969-12-31 19:00:00|1
zoneinfo:Atlantic:St_Helena|0|||||1969-12-31 19:00:00|1
zoneinfo:Atlantic:South_Georgia|0|||||1969-12-31 19:00:00|1

Once cause confirmed, contact Checkpoint Support to get a fix patch and apply it.

Some other SKs, sk95238, sk102988  are having similar solution on this issue. Basically, a Jumbo Hotfix will have this to be fixed.


No comments:

Post a Comment