vmsyslog filling up vCenter database and how to stop it

Posted December 11th, 2014 in Vmware by Dieter

In our datacenter we had a few customers where their vCenter database filled up at an alarming rate. After checking in MSSQL what table was the culprit, we found out the VPX_EVENT and VPX_EVENT_ARG tables kept growing. Sure enough, in vCenter we saw a constant flow of these alerts:

vmsyslog1.jpg

Also in the vmkernel.log file we saw similar messages:

2014-11-18T22:52:03.354Z cpu40:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages
 
2014-11-18T22:52:03.355Z cpu40:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages
 
2014-11-18T22:55:03.383Z cpu42:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 756 log messages
 
2014-11-18T22:55:03.384Z cpu42:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1 log messages
 
2014-11-18T22:55:03.386Z cpu42:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1 log messages
 
2014-11-18T22:55:03.395Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 852 log messages
 
2014-11-18T22:58:03.419Z cpu44:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 991 log messages
 
2014-11-18T22:58:03.422Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 992 log messages
 
2014-11-18T22:58:03.424Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 2 log messages
 
2014-11-18T23:01:03.454Z cpu44:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1090 log messages
 
2014-11-18T23:01:03.467Z cpu36:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1102 log messages
 
2014-11-18T23:01:03.468Z cpu36:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 3 log messages
 
2014-11-18T23:04:03.482Z cpu44:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1073 log messages
 
2014-11-18T23:04:03.494Z cpu38:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1153 log messages
 
2014-11-18T23:07:03.521Z cpu38:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 525 log messages
 
2014-11-18T23:07:03.532Z cpu38:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 893 log messages
 
2014-11-18T23:10:03.554Z cpu37:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 718 log messages
 
2014-11-18T23:10:03.566Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 796 log messages
 
2014-11-18T23:10:03.568Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages
 
2014-11-18T23:10:03.568Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages
 
2014-11-18T23:13:03.589Z cpu44:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 939 log messages
 
2014-11-18T23:13:03.595Z cpu40:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 987 log messages
 
2014-11-18T23:16:03.628Z cpu46:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 811 log messages
 
2014-11-18T23:16:03.629Z cpu44:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 926 log messages
 
2014-11-18T23:19:03.659Z cpu38:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 844 log messages
 
2014-11-18T23:19:03.670Z cpu46:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 846 log messages
 
2014-11-18T23:19:03.671Z cpu46:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1 log messages
 
2014-11-18T23:22:03.690Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 868 log messages
 
2014-11-18T23:22:03.691Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages
 
2014-11-18T23:22:03.704Z cpu36:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 948 log messages
 
2014-11-18T23:25:03.723Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 609 log messages
 
2014-11-18T23:25:03.741Z cpu46:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 585 log messages
 
2014-11-18T23:28:03.760Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 756 log messages
 
2014-11-18T23:28:03.774Z cpu40:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 790 log messages
 
2014-11-18T23:31:03.794Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 882 log messages
 
2014-11-18T23:31:03.806Z cpu40:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 963 log messages
 
2014-11-18T23:31:03.808Z cpu40:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1 log messages
 
2014-11-18T23:34:03.828Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 719 log messages
 
2014-11-18T23:37:03.851Z cpu39:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1616 log messages
 
2014-11-18T23:37:03.862Z cpu36:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 761 log messages
 
<em>...</em>

An issue with vmsyslog so it seems. Our syslog configuration however was perfectly fine:

(/etc/vmsyslog.conf)

[DEFAULT]
 
loghost = <none>
 
default_timeout = 180
 
logdir_unique = false
 
rotate = 8
 
logdir = <none>
 
size = 1024
 
[vmsyslog]
 
loghost = tcp://10.122.126.71:514, tcp://10.122.126.5:514
 
rotate = 8
 
size = 1024

Digging a bit deeper into the syslog error log revealed this:

(/var/log/.vmsyslogd.err)

2014-11-18T23:37:03.861Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 761 log messages
 
2014-11-18T23:37:03.866Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe
 
2014-11-18T23:40:03.895Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 1045 log messages
 
2014-11-18T23:40:03.899Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe
 
2014-11-18T23:43:03.933Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 661 log messages
 
2014-11-18T23:43:03.937Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe
 
2014-11-18T23:46:03.969Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 761 log messages
 
2014-11-18T23:46:03.973Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe
 
2014-11-18T23:49:04.002Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 573 log messages
 
2014-11-18T23:49:04.006Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe
 
2014-11-18T23:52:04.036Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 665 log messages
 
2014-11-18T23:52:04.040Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe

At this stage, I was starting to wonder if the other side was actually working fine. We use VMware Syslog Collector to capture these logs and although the service was running fine, we noticed this in the debug log:


(C:\ProgramData\Vmware\Vmware Syslog Collector\Logs\debug.log)

2014-11-19 06:26:27,345 - sc.twisted - Log observer &lt;bound method DefaultObserver._emit of &lt;twisted.python.log.DefaultObserver instance at 0x01095F80&gt;&gt; failed.
 
Traceback (most recent call last):
 
File "twisted\python\context.pyo", line 59, in callWithContext
 
File "twisted\python\context.pyo", line 37, in callWithContext
 
File "twisted\internet\selectreactor.pyo", line 154, in _doReadOrWrite
 
File "twisted\python\log.pyo", line 221, in err
 
--- <exception caught here> ---
 
File "twisted\python\log.pyo", line 292, in msg
 
File "twisted\python\log.pyo", line 623, in _emit
 
exceptions.IOError: [Errno 9] Bad file descriptor
 
2014-11-19 06:26:27,348 - sc.core - TCP connection lost from 192.168.243.200:49420
 
2014-11-19 06:26:27,348 - sc.log - Error cleaning up state: '192.168.243.200'
 
2014-11-19 06:26:47,546 - sc.core - New TCP connection from IPv4Address(TCP, '192.168.243.1', 58438)
 
2014-11-19 06:26:47,549 - sc.twisted - Unhandled Error
 
Traceback (most recent call last):
 
File "twisted\python\log.pyo", line 84, in callWithLogger
 
File "twisted\python\log.pyo", line 69, in callWithContext
 
File "twisted\python\context.pyo", line 59, in callWithContext
 
File "twisted\python\context.pyo", line 37, in callWithContext
 
--- <exception caught here> ---
 
File "twisted\internet\selectreactor.pyo", line 146, in _doReadOrWrite
 
File "twisted\internet\tcp.pyo", line 455, in doRead

After some digging around so see how this service actually works (it basically is a Python script), I’ve noticed the Twisted framework being used is actually quite old (v8.2). I’ve then replaced SyslogCollectorLibrary.zip (the whole Python 2.6 environment compressed) with a version in which I’ve included the latest version on Twisted for Python 2.6 (v12.0). I stopped the service, replaced the zip file and restarted the service and almost immediately the flow of events stopped.

The modified SyslogCollectorLibrary.zip is posted on the VMWare Communities forum (https://communities.vmware.com/docs/DOC-28466) , let me know if this patch works for you as well!