Segfault after migration from 3.9 to 4.0 or 4.1

hoeth · May 10, 2022, 03:07:29 PM

Hi,

we tried upgrading from NetXMS 3.9.176 to NetXMS 4 on Debian (buster and bullseye, with postgresql 11 and 13, respectively). The nxdbmgr did it's job of upgrading the database without complaining. But netxmsd segfaults after about 1 minute. We tried NetXMS-4.0 and 4.1.283, same effect.

Setting up a new netxms installation with the same scripts, templates, events, nodes, etc. works fine. But we would like to keep the data history. Has anybody else observed such a behavior? Is there anything special we need to do?

Victor Kirhenshtein · May 10, 2022, 05:41:31 PM

Hi,

do you have coredump from the crash? If not, could you try to run netxmsd under gdb? For running onder debugger follow this instruction:

1. Stop netxmsd service
2. Install package netxms-server-dbg
3. Run

Code Select


gdb netxmsd

4. In (gdb) prompt, enter

Code Select


run -D2

5. When server stops, you'll get (gdb) prompt again. Enter command

Code Select

bt

and provide the output.

Best regards,
Victor

fldiet · May 11, 2022, 11:45:33 AM

Hi,

I'm a coworker of hoeth.

Here is the backtrace you requested:

Code Select

(gdb) bt
#0  CalculateIPChecksum (data=data@entry=0x7fff6b574430, len=len@entry=18446744073709551597) at tools.cpp:470
#1  0x00007ffff795ec0a in PingRequestProcessor::sendRequestV4 (this=0x7ffff79cd560 <s_processorV4>, request=0x7fff6b5764d0) at icmp.cpp:494
#2  0x00007ffff795f345 in PingRequestProcessor::sendRequest (request=0x7fff6b5764d0, this=0x7ffff79cd560 <s_processorV4>) at icmp.cpp:222
#3  PingRequestProcessor::ping (this=this@entry=0x7ffff79cd560 <s_processorV4>, addr=..., timeout=timeout@entry=1500, rtt=rtt@entry=0x7fff6b576604, packetSize=packetSize@entry=1, dontFragment=dontFragment@entry=false) at icmp.cpp:686
#4  0x00007ffff795f45c in PingLoop (dontFragment=false, packetSize=1, rtt=0x7fff6b576604, timeout=1500, numRetries=0, addr=..., p=0x7ffff79cd560 <s_processorV4>) at icmp.cpp:776
#5  IcmpPing (addr=..., numRetries=numRetries@entry=1, timeout=1500, rtt=rtt@entry=0x7fff6b576604, packetSize=1, dontFragment=dontFragment@entry=false) at icmp.cpp:796
#6  0x00007ffff7cc31bf in Node::icmpPollAddress (this=this@entry=0x7fffc2cbf810, conn=conn@entry=0x0, target=0x7fff6e4bd000 L"PRI", addr=...) at node.cpp:11322
#7  0x00007ffff7cccf62 in Node::icmpPoll (this=0x7fffc2cbf810, poller=<optimized out>) at node.cpp:11280
#8  0x00007ffff7d41358 in Pollable::doIcmpPoll (this=0x7fffc2cc0300, poller=0x7fff8dfb9700) at pollable.cpp:237
#9  0x00007ffff7c0acd1 in __ThreadPoolExecute_Wrapper_1<Pollable, PollerInfo*> (arg=0x7fff8df9b620) at ../../../include/nms_threads.h:1101
#10 0x00007ffff79959ce in ProcessSerializedRequests (data=0x7fff8dfa1880) at tp.cpp:472
#11 0x00007ffff7995736 in WorkerThread (threadInfo=0x7fff85fa0820) at tp.cpp:211
#12 0x00007ffff79970fa in ThreadCreate_Wrapper_1<WorkerThreadInfo*> (context=0x7fff85fa0830) at ../../include/nms_threads.h:542
#13 0x00007ffff77eefa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#14 0x00007ffff6f11eff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Kind regards,

Florian

Victor Kirhenshtein · May 11, 2022, 03:16:28 PM

Hi,

looks like you have server configuration parameter ICMP.PingSize set to 1 (or other small value). It has to be set to at least 46. There is a bug in a server that it does not check this value for validity, and incorrect value causes crash later on.

Best regards,
Victor

fldiet · May 11, 2022, 03:40:22 PM

Hi,

setting ICMP.PingSize to 46 from its lowered value fixed the segmentation faults after upgrading.
Interestingly it did not seem to cause any issues at runtime prior to upgrading from NetXMS 3.9.176 though.

Thank you very much!

Best regards,

Florian

hoeth · May 11, 2022, 03:45:17 PM

Yes, we have set the ping size to 1 in order to reduce traffic, because most of our nodes are connected through mobile internet, so we pay for the traffic. What's the reason for having a 46 byte minimum? And how did NetXMS-3.9 handle this? Did it simply fall back to larger packets?

Victor Kirhenshtein · May 11, 2022, 04:00:06 PM

Actually as I start thinking about it, minimum size is 28, not 46. Because this value includes both IP header (20 bytes) and ICMP header (8 bytes), it cannot be less than that. 46 is minimum payload size for Ethernet frame. If you are using only Ethernet for communications, setting ping size to any value less than 46 will not reduce traffic, as payload will be padded to minimum length anyway. However, if you are using communication channels capable of sending shorter frames than reducing ping size further can make sense.

hoeth · May 11, 2022, 04:59:18 PM

Ah, I thought ICMP.PingSize referred to the payload, not to the whole packet.

I'm now looking at https://wiki.netxms.org/wiki/Server_Configuration_Variables which says "Size of ICMP packets (in bytes, excluding IP header size) used for status polls.", so I guess minimum would be 8, correct? Or is that documentation wrong and the IP header is included in ICMP.PingSize?

Victor Kirhenshtein · May 11, 2022, 05:03:13 PM

Documentation is wrong, I checked source code

Will fix that.

hoeth · May 11, 2022, 05:04:29 PM

Great, thanks a lot!

fldiet · May 11, 2022, 07:01:19 PM

I have done some more testing of our servers' behaviour with changing ICMP.PingSize.
This time, the value was changed in a working NetXMS 4.1.283 instance ( not measuring through an upgrade from 3.9 ).
Each time the delay of deciding whether the error occurred after restarting NetXMS to apply changes was 5mins. Error -if occurring- always was well under 2mins.

Here is the outcome:
28 - no problems, as expected
20 - no problems, unexpected
19 - segmentation fault as discussed

Could it be possible the documentation is partly correct, but it's not excluding IP(20B), but ICMP(8B) header?

Edit: Suppressing an emoji

Victor Kirhenshtein · May 13, 2022, 10:07:18 AM

Actual code that caused crash looked like this:

Code Select


   int bytes = request->packetSize - sizeof(IPHDR);
   packet.m_icmpHdr.m_wChecksum = 0;
   packet.m_icmpHdr.m_wChecksum = CalculateIPChecksum(&packet, bytes);

If total packet size < 20, bytes will be negative, which will cause crash within CalculateIPChecksum. With packet size between 20 and 27, result will be positive and CalculateIPChecksum will actually calculate checksum for requested number of bytes, but invalid ICMP packet will be sent (with only part of the header).

Best regards,
Victor

fldiet · May 13, 2022, 10:31:04 AM

Ah, that clarifies the situation.
Thank you very much for your insight!

Best regards,

Florian

NetXMS Support Forum

News:

Segfault after migration from 3.9 to 4.0 or 4.1

hoeth

Victor Kirhenshtein

fldiet

Victor Kirhenshtein

fldiet

hoeth

Victor Kirhenshtein

hoeth

Victor Kirhenshtein

hoeth

fldiet

Victor Kirhenshtein

fldiet