The result of finding is monitord service. Monitord server is used by device sensors to monitor hardware and saves data into DB file stored on local. Before R76, it will keep one year data in DB. After R76, it only keeps 3 months history to save devices resources during process the data. In my case, the DB file is more than 350M which cause monitord service consumes lots memory to process DB file. Although we are using R77.10, it seems upgrading to R771.10, not fresh installation, Â wont reset your DB file structure.
There is workaround provided at SK93587. Here are all steps I recorded to fix this.
1. Before applied the workaround, monitord is using 42.5% MEM.
top - 10:56:37 up 10 days, Â 1:08, Â 1 user, Â load average: 0.00, 0.06, 0.43
Tasks: Â 83 total, Â 3 running, Â 80 sleeping, Â 0 stopped, Â 0 zombie
Cpu(s): Â 1.2%us, Â 1.1%sy, Â 0.0%ni, 97.3%id, Â 0.2%wa, Â 0.1%hi, Â 0.1%si, Â 0.0%st
Mem: Â Â 957272k total, Â 947392k used, Â Â 9880k free, Â Â 2772k buffers
Swap: Â 2096472k total, Â Â 43292k used, Â 2053180k free, Â 209280k cached
%MEM Â PID USER Â Â Â PR Â NI Â VIRT Â RES Â SHR S %CPU Â Â TIME+ Â COMMAND Â Â Â Â Â Â Â
 5.0  4226 admin   15  0  263m  47m  11m S  0.4  59:12.98 cpd           0.1  2782 admin   15  0  2172 1084  836 R  0.2  0:00.05 top         Â
 0.8  3988 admin   15  0 24344 7956 5780 S  0.2  22:38.83 snmpd        Â
 1.4  3947 admin   16  0 33796  13m 7964 S  0.1  2947:10 confd        Â
42.5  3952 admin   15  0  400m 397m 2332 S  0.1 119:05.53 monitord      Â
 0.1  3545 admin   18  0  1708  688  584 S  0.1  2:38.13 syslogd       Â
 0.1   1 admin   15  0  2040  580  548 S  0.0  0:01.47 init        Â
 0.0   2 admin   RT  -5   0   0   0 S  0.0  0:00.00 migration/0     Â
 0.0   3 admin   15  0   0   0   0 S  0.0  0:00.67 ksoftirqd/0     Â
 0.0   4 admin   RT  -5   0   0   0 S  0.0  0:00.00 watchdog/0     Â
 0.0   5 admin   10  -5   0   0   0 S  0.0  0:01.56 events/0                                                Â
Next is the top outputs sorted by %MEM: Â Â Â Â Â Â Â
top - 10:58:15 up 10 days, Â 1:10, Â 1 user, Â load average: 0.00, 0.04, 0.38
Tasks: Â 83 total, Â 3 running, Â 80 sleeping, Â 0 stopped, Â 0 zombie
Cpu(s): Â 0.3%us, Â 0.3%sy, Â 0.0%ni, 99.0%id, Â 0.3%wa, Â 0.0%hi, Â 0.0%si, Â 0.0%st
Mem: Â Â 957272k total, Â 947972k used, Â Â 9300k free, Â Â 3036k buffers
Swap: Â 2096472k total, Â Â 43292k used, Â 2053180k free, Â 209708k cached
%MEM Â PID USER Â Â Â PR Â NI Â VIRT Â RES Â SHR S %CPU Â Â TIME+ Â COMMAND Â Â Â Â Â Â Â
42.5  3952 admin   15  0  400m 397m 2332 S  0.3 119:05.63 monitord      Â
 6.9  6938 admin   19  0  122m  64m 3836 S  0.0  19:09.09 DAService      Â
 5.0  4226 admin   15  0  263m  47m  11m S  0.0  59:13.25 cpd         Â
 2.0  4386 admin   15  0  284m  18m  10m S  0.0  1:23.18 fw_full       Â
 1.5  3948 admin   15  0 38032  13m 1704 S  0.0  70:42.63 searchd       Â
 1.4  3947 admin   15  0 33796  13m 7964 S  0.0  2947:10 confd        Â
 1.4  6779 admin   15  0  163m  13m 7252 S  0.0  0:03.49 rtmd        Â
 0.8  3988 admin   15  0 24344 7956 5780 S  0.0  22:39.07 snmpd        Â
2. Rebuild monitord DB
[Expert@CP-DMZ-1:0]# tellpm process:monitord[Expert@CP-DMZ-1:0]#Â
Message from syslogd@ at Wed Aug 26 10:59:39 2015 ...
CP-DMZ-1 monitord[3952]: monitord got killedÂ
[Expert@CP-DMZ-1:0]# top  (Sorted result by %MEM)
        Â
top - 11:00:09 up 10 days, Â 1:12, Â 1 user, Â load average: 0.00, 0.02, 0.33
Tasks: Â 82 total, Â 2 running, Â 80 sleeping, Â 0 stopped, Â 0 zombie
Cpu(s): Â 2.3%us, Â 1.7%sy, Â 0.0%ni, 95.7%id, Â 0.3%wa, Â 0.0%hi, Â 0.0%si, Â 0.0%st
Mem: Â Â 957272k total, Â 542928k used, Â 414344k free, Â Â 3620k buffers
Swap: Â 2096472k total, Â Â 42700k used, Â 2053772k free, Â 208824k cached
 PID USER    PR  NI  VIRT  RES  SHR S %CPU %MEM   TIME+  COMMAND       Â
 6938 admin   19  0  122m  64m 3836 S  0.0  6.9  19:09.09 DAService      Â
 4226 admin   15  0  263m  47m  11m S  1.0  5.0  59:13.62 cpd         Â
 4386 admin   15  0  284m  18m  10m S  0.0  2.0  1:23.18 fw_full       Â
 3948 admin   15  0 38032  13m 1704 S  0.0  1.5  70:42.63 searchd       Â
 3947 admin   15  0 33796  13m 7968 S  0.0  1.4  2947:10 confd        Â
 6779 admin   15  0  163m  13m 7252 S  0.0  1.4  0:03.49 rtmd        Â
 3930 admin   15  0 25300 7996 6340 S  0.0  0.8  0:00.41 pm         Â
 3988 admin   15  0 24344 7956 5780 S  0.3  0.8  22:39.35 snmpd        Â
 4339 admin   15  0  149m 7352 5748 S  0.0  0.8  0:00.51 cphamcset      Â
 4367 admin   15  0 32944 7224 6472 S  0.0  0.8  1:09.32 routed       Â
 4374 admin   16  0 33044 7168 6976 S  0.0  0.7  0:13.16 routed       Â
 3951 admin   18  0 99768 7024 6620 S  0.0  0.7  0:06.79 rconfd       Â
 3983 admin   17  0 25272 6816 6136 S  0.0  0.7  0:00.34 cloningd      Â
 2228 admin   15  0 21000 5972 3324 S  0.0  0.6  0:00.52 clish        Â
 4240 admin   15  0  150m 5732 5592 S  0.0  0.6  0:00.75 mpdaemon                                                                                Â
[Expert@CP-DMZ-1:0]# cd /var/log
[Expert@CP-DMZ-1:0]# ls -l db
-rw-r--r-- 1 admin root 356237312 Aug 26 10:45 db
[Expert@CP-DMZ-1:0]# cp /var/log/db  /var/log/db_ORIGINAL
[Expert@CP-DMZ-1:0]# Â sqlite3 /var/log/dbÂ
SQLite version 3.6.20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> VACUUM;
sqlite> .exitÂ
[Expert@CP-DMZ-1:0]# tellpm process:monitord t
[Expert@CP-DMZ-1:0]#Â
3. Check Memory usage after workaround applied
The memory usage has been reduced to only 4.9%, dropped from 42.5% we found from Step 1top - 11:15:24 up 10 days, Â 1:27, Â 1 user, Â load average: 0.00, 0.05, 0.18
Tasks: Â 83 total, Â 2 running, Â 81 sleeping, Â 0 stopped, Â 0 zombie
Cpu(s): Â 0.7%us, Â 0.3%sy, Â 0.0%ni, 98.3%id, Â 0.0%wa, Â 0.3%hi, Â 0.3%si, Â 0.0%st
Mem: Â Â 957272k total, Â 446428k used, Â 510844k free, Â Â 4808k buffers
Swap: Â 2096472k total, Â Â 42696k used, Â 2053776k free, Â Â 67228k cached
 PID USER    PR  NI  VIRT  RES  SHR S %CPU %MEM   TIME+  COMMAND       Â
 6938 admin   17  0  122m  64m 3836 S  0.0  6.9  19:09.09 DAService      Â
 4226 admin   15  0  263m  47m  11m S  0.0  5.0  59:16.10 cpd         Â
 3088 admin   15  0 49684  45m 2320 S  0.0  4.9  0:01.55 monitord      Â
 4386 admin   15  0  284m  18m  10m S  0.0  2.0  1:23.23 fw_full       Â
 3948 admin   15  0 38032  13m 1704 S  0.0  1.5  70:42.63 searchd       Â
 3947 admin   15  0 33796  13m 7968 S  0.0  1.4  2947:10 confd        Â
 6779 admin   15  0  163m  13m 7252 S  0.0  1.4  0:03.49 rtmd        Â
 3930 admin   16  0 25300 8012 6340 S  0.0  0.8  0:00.41 pm         Â
 3988 admin   15  0 24344 7956 5780 S  0.0  0.8  22:41.56 snmpd        Â
 4339 admin   15  0  149m 7352 5748 S  0.0  0.8  0:00.51 cphamcset      Â
 4367 admin   15  0 32944 7224 6472 S  0.0  0.8  1:09.33 routed       Â
 4374 admin   15  0 33044 7168 6976 S  0.0  0.7  0:13.19 routed       Â
 3951 admin   18  0 99768 7024 6620 S  0.0  0.7  0:06.79 rconfd       Â
 3983 admin   17  0 25272 6816 6136 S  0.0  0.7  0:00.34 cloningd      Â
 2228 admin   15  0 21000 5972 3324 S  0.0  0.6  0:00.52 clish        Â
 4240 admin   15  0  150m 5732 5592 S  0.0  0.6  0:00.75 mpdaemon      Â
 4787 admin   18  0 20936 5512 5508 S  0.0  0.6  0:00.28 cpviewd       Â
 4347 nobody   17  0 18748 5108 5104 S  0.0  0.5  0:00.21 ci_http_server    Â
And the DB size reduced from more than 350M to less than 40M
[Expert@CP-DMZ-1:0]# ls -l db
-rw-r--r-- 1 admin root 37168128 Aug 26 11:32 db
No comments:
Post a Comment