Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - fldiet

#1
Ah, that clarifies the situation.
Thank you very much for your insight!

Best regards,

  Florian
#2
I have done some more testing of our servers' behaviour with changing ICMP.PingSize.
This time, the value was changed in a working NetXMS 4.1.283 instance ( not measuring through an upgrade from 3.9 ).
Each time the delay of deciding whether the error occurred after restarting NetXMS to apply changes was 5mins. Error -if occurring- always was well under 2mins.

Here is the outcome:
28 - no problems, as expected
20 - no problems, unexpected
19 - segmentation fault as discussed

Could it be possible the documentation is partly correct, but it's not excluding IP(20B), but ICMP(8B) header?

Edit: Suppressing an emoji  :)
#3
Hi,

setting ICMP.PingSize to 46 from its lowered value fixed the segmentation faults after upgrading.
Interestingly it did not seem to cause any issues at runtime prior to upgrading from NetXMS 3.9.176 though.

Thank you very much!


Best regards,

  Florian
#4
Hi,

I'm a coworker of hoeth.

Here is the backtrace you requested:
(gdb) bt
#0  CalculateIPChecksum (data=data@entry=0x7fff6b574430, len=len@entry=18446744073709551597) at tools.cpp:470
#1  0x00007ffff795ec0a in PingRequestProcessor::sendRequestV4 (this=0x7ffff79cd560 <s_processorV4>, request=0x7fff6b5764d0) at icmp.cpp:494
#2  0x00007ffff795f345 in PingRequestProcessor::sendRequest (request=0x7fff6b5764d0, this=0x7ffff79cd560 <s_processorV4>) at icmp.cpp:222
#3  PingRequestProcessor::ping (this=this@entry=0x7ffff79cd560 <s_processorV4>, addr=..., timeout=timeout@entry=1500, rtt=rtt@entry=0x7fff6b576604, packetSize=packetSize@entry=1, dontFragment=dontFragment@entry=false) at icmp.cpp:686
#4  0x00007ffff795f45c in PingLoop (dontFragment=false, packetSize=1, rtt=0x7fff6b576604, timeout=1500, numRetries=0, addr=..., p=0x7ffff79cd560 <s_processorV4>) at icmp.cpp:776
#5  IcmpPing (addr=..., numRetries=numRetries@entry=1, timeout=1500, rtt=rtt@entry=0x7fff6b576604, packetSize=1, dontFragment=dontFragment@entry=false) at icmp.cpp:796
#6  0x00007ffff7cc31bf in Node::icmpPollAddress (this=this@entry=0x7fffc2cbf810, conn=conn@entry=0x0, target=0x7fff6e4bd000 L"PRI", addr=...) at node.cpp:11322
#7  0x00007ffff7cccf62 in Node::icmpPoll (this=0x7fffc2cbf810, poller=<optimized out>) at node.cpp:11280
#8  0x00007ffff7d41358 in Pollable::doIcmpPoll (this=0x7fffc2cc0300, poller=0x7fff8dfb9700) at pollable.cpp:237
#9  0x00007ffff7c0acd1 in __ThreadPoolExecute_Wrapper_1<Pollable, PollerInfo*> (arg=0x7fff8df9b620) at ../../../include/nms_threads.h:1101
#10 0x00007ffff79959ce in ProcessSerializedRequests (data=0x7fff8dfa1880) at tp.cpp:472
#11 0x00007ffff7995736 in WorkerThread (threadInfo=0x7fff85fa0820) at tp.cpp:211
#12 0x00007ffff79970fa in ThreadCreate_Wrapper_1<WorkerThreadInfo*> (context=0x7fff85fa0830) at ../../include/nms_threads.h:542
#13 0x00007ffff77eefa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#14 0x00007ffff6f11eff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



Kind regards,

  Florian
#5
Hi,

Yes, there are some transformation scripts.  But those are limited in
complexity as well as number ( ~700 DCIs ) and should therefore be manageable.
  • 88% of DCIs do not have a transformation script
  • 9% of DCIs have a lower bound or static normalization ( if;else / division )
  • 3% of DCIs show dynamic normalization using up to two
    'GetDCIValue($node, FindDCIByName($node, "..."));' invokes
    ( all referenced quantities are push-DCIs )

There are no more than 30 *DCI*() invokes for said client in total.
Thresholds boil down to very basic diff(), last() and average() calls.
These DCIs and their threshold events do not trigger any further scripts.

---

Multiple other nodes under very similar circumstances ( they inherit from the
same DCI templates or more, and are being pushed to at the same frequency ) on
the very same server instance are handled perfectly fine.  We are talking ~5
afflicted and 50 unaffected nodes.

Here is a quick plot I did for navigating/cross-referencing netxmsd's logs for
hangs in my last post.
I added the data of an unaffected node ( taken briefly after ) comparing the
time nxpush took for a set of attempts to an affected nodes behaviour.
https://i.imgur.com/CUrEvin.png

Am I off track for assuming there should be an external variable connecting the affected nodes?


Kind regards,

  Florian
#6
I configured the debug level as requested, restarted netxmsd, set up a series of identical nxpush requests sequentially without artificial delay and observed the result of nxpush next to netxmsd's log.

Here are some basic observances:

  • Typically the nxpush requests take 0.172-0.300s.
  • Isolated hangs take around 5.05s ( low variance ) and do not influence following pushes time.
  • Clusters of hangs back to back ( around 5-10 ) raise the following successful attempts time:

         
    • Hang cluster

             
      • 46.2s ( first following successful )
      •   2.2s ( second following successful )
      •   0.2s ( third  following successful )
    • Hang cluster

             
      • 25.3s
      •   2.9s
      •   0.2s
    • Hang cluster

             
      • 12.9s
      •   0.2s
    • Hang cluster

             
      • 17.4s
      •   0.2s
Here is a snippet of the logs in the neighbourhood of an isolated hang ( answering with CMD_KEEPALIVE to CMD_PUSH_DCI_DATA ), including the surrounding successful ones:

2021.06.16 15:30:01.735 *D* [client.session     ] Client session with ID 84 registered
2021.06.16 15:30:01.735 *D* [client.session.84  ] Received message CMD_GET_SERVER_INFO
2021.06.16 15:30:01.735 *D* [client.session.84  ] Server time zone: UTC+00UTC
2021.06.16 15:30:01.735 *D* [client.session.84  ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.06.16 15:30:01.736 *D* [client.session.84  ] Received message CMD_LOGIN
2021.06.16 15:30:01.736 *D* [client.session.84  ] authentication type 0
2021.06.16 15:30:01.736 *D* [client.session.84  ] Protocol level compression is supported by client
2021.06.16 15:30:01.736 *D* [client.session.84  ] User <ourUserForNxpush>@127.0.0.1 authenticated (language=en clientInfo="nxpush/3.8.382 (Linux 4.19.0-16-amd64; libnxcl 3.8.382)")
2021.06.16 15:30:01.736 *D* [client.session.84  ] Sending compressed message CMD_LOGIN_RESP (312 bytes)
2021.06.16 15:30:01.777 *D* [client.session.84  ] Received message CMD_PUSH_DCI_DATA
2021.06.16 15:30:02.545 *D* [client.session.84  ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.06.16 15:30:02.545 *D* [client.session.84  ] readSocket: connection closed
2021.06.16 15:30:02.545 *D* [client.session.84  ] Session closed
2021.06.16 15:30:02.545 *D* [client.session     ] Client session with ID 84 unregistered
2021.06.16 15:30:02.545 *D* [client.session.84  ] Socket closed
2021.06.16 15:30:02.545 *D* [client.session.84  ] Session object destroyed


2021.06.16 15:30:02.560 *D* [client.session     ] Client session with ID 84 registered
2021.06.16 15:30:02.560 *D* [client.session.84  ] Received message CMD_GET_SERVER_INFO
2021.06.16 15:30:02.560 *D* [client.session.84  ] Server time zone: UTC+00UTC
2021.06.16 15:30:02.560 *D* [client.session.84  ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.06.16 15:30:02.561 *D* [client.session.84  ] Received message CMD_LOGIN
2021.06.16 15:30:02.561 *D* [client.session.84  ] authentication type 0
2021.06.16 15:30:02.561 *D* [client.session.84  ] Protocol level compression is supported by client
2021.06.16 15:30:02.561 *D* [client.session.84  ] User <ourUserForNxpush>@127.0.0.1 authenticated (language=en clientInfo="nxpush/3.8.382 (Linux 4.19.0-16-amd64; libnxcl 3.8.382)")
2021.06.16 15:30:02.561 *D* [client.session.84  ] Sending compressed message CMD_LOGIN_RESP (312 bytes)
2021.06.16 15:30:02.602 *D* [client.session.84  ] Received message CMD_PUSH_DCI_DATA
2021.06.16 15:30:05.416 *D* [client.session.84  ] Sending message CMD_KEEPALIVE (32 bytes)
2021.06.16 15:30:05.423 *D* [client.session.84  ] Sending message CMD_NOTIFY (48 bytes)
2021.06.16 15:30:05.495 *D* [client.session.84  ] Sending message CMD_NOTIFY (48 bytes)
2021.06.16 15:30:07.601 *D* [client.session.84  ] readSocket: connection closed
2021.06.16 15:30:07.602 *D* [client.session.84  ] Waiting for pending requests...


2021.06.16 15:30:07.627 *D* [client.session     ] Client session with ID 71 registered
2021.06.16 15:30:07.627 *D* [client.session.71  ] Received message CMD_GET_SERVER_INFO
2021.06.16 15:30:07.627 *D* [client.session.71  ] Server time zone: UTC+00UTC
2021.06.16 15:30:07.627 *D* [client.session.71  ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.06.16 15:30:07.628 *D* [client.session.71  ] Received message CMD_LOGIN
2021.06.16 15:30:07.628 *D* [client.session.71  ] authentication type 0
2021.06.16 15:30:07.628 *D* [client.session.71  ] Protocol level compression is supported by client
2021.06.16 15:30:07.628 *D* [client.session.71  ] User <ourUserForNxpush>@127.0.0.1 authenticated (language=en clientInfo="nxpush/3.8.382 (Linux 4.19.0-16-amd64; libnxcl 3.8.382)")
2021.06.16 15:30:07.628 *D* [client.session.71  ] Sending compressed message CMD_LOGIN_RESP (312 bytes)
2021.06.16 15:30:07.668 *D* [client.session.71  ] Received message CMD_PUSH_DCI_DATA
2021.06.16 15:30:07.989 *D* [client.session.71  ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.06.16 15:30:07.990 *D* [client.session.71  ] readSocket: connection closed
2021.06.16 15:30:07.990 *D* [client.session.71  ] Session closed
2021.06.16 15:30:07.990 *D* [client.session     ] Client session with ID 71 unregistered
2021.06.16 15:30:07.990 *D* [client.session.71  ] Socket closed
2021.06.16 15:30:07.990 *D* [client.session.71  ] Session object destroyed

#7
Hello,

Prerequisite information:
We are pushing data to NetXMS from the machine hosting NetXMS itself with a
series of nxpush calls - one for each target node.

Observation:
We are experiencing a node-specific issue, concerning a set of local nxpush
sessions timing out statistically for specific nodes only.  Those nodes are all
experiencing connectivity issues. There seems to be a correlation between
occurrences of timeouts and network issues.

Question:
At the event of a new nxpush session, are there any internal checks taking
place requiring connectivity of the node? Could a recurring process like a poll
get stuck at the connectivity issue and interfere?


Kind regards,

  Florian
#8
General Support / Re: nxapush not working?
April 22, 2021, 06:28:08 PM
Ah! Makes a lot of sense put in context of capability memory.

Thanks for clarifying!
#9
General Support / Re: nxapush not working?
April 22, 2021, 04:57:47 PM
Thanks! nxapush works perfectly now.

Out of curiosity - would you expect your solution to work without the full configuration poll, given a netxmsd and nxagent restart? Why?
We feel like having tried something very similar without success.

---

We had some misconceptions regarding the respective feature sets of nxapush vs nxpush, resulting in us not considering the latter.
After some more fiddling with it, triggered by your suggestion, we could clear those up and find some significant advantages of using nxpush over nxapush in our use case.
Namely the more flexible reference methods for nodes.
Thank you again for that.


Kind regards,

  Florian
#10
General Support / Re: nxapush not working?
April 08, 2021, 01:44:57 PM
These are the versions we run at the moment:

$ netxmsd --version
NetXMS Server Version 3.8.250 Build 3.8-250-g21bbde29bb (UNICODE)
NXCP: 5.53.1.30 (AES-256, Blowfish-256, 3DES, AES-128, Blowfish-128)
Built with: g++ (Debian 8.3.0-6) 8.3.0


$ nxagentd -v
NetXMS Core Agent Version 3.8.250 Build 3.8-250-g21bbde29bb (UNICODE)


We have attempted to set up a tunnel connection following:
https://www.netxms.org/documentation/adminguide/server-management.html#self-signed-certificate-sample
placing certificates and keys in "/etc/netxmsd.conf.d/self/"

Server Config:

# Log file name
LogFile=/var/log/netxmsd

# Increase logging verbosity, 0 (only errors) to 9 (verbose debug)
DebugLevel=6

## PostgreSQL
DBDriver=pgsql.ddr
DBServer=127.0.0.1
DBName=netxms
DBLogin=netxms
DBPassword=our_db_password


TrustedCertificate = /etc/netxmsd.conf.d/self/rootCA.crt
ServerCertificate = /etc/netxmsd.conf.d/self/server.crt
ServerCertificateKey = /etc/netxmsd.conf.d/self/server.key


Agent Config:

# Log File
LogFile=/var/log/nxagentd

DebugLevel=8

# IP white list, can contain multiple records separated by comma.
# CIDR notation supported for subnets.
MasterServers=SERVER_IP

ServerConnection=SERVER_FQHN


######
The following logs are `tail -F /var/log/netxmsd /var/log/nxagentd | grep -i -e "crypto" -e "tunnel" -e "^==>"`'s output.
I have taken the liberty to:
  1) prefix the filenames to each line ( awk )
  2) substitute the first occurrence of "[" and "]" to "(" and ")" respectively in lines starting with "/var/log/netxmsd"
  3) abstract some detailed information
######

The first connection then succeeds,

/var/log/nxagentd 2021.04.08 08:20:43.023 *D* [tunnel           ] Tunnel manager started
/var/log/netxmsd  2021.04.08 08:20:43.026 *D* (                 ) SocketListener/AgentTunnels: Incoming connection from CLIENT_IP
/var/log/netxmsd  2021.04.08 08:20:43.026 *D* (                 ) SocketListener/AgentTunnels: Connection from CLIENT_IP accepted
/var/log/nxagentd 2021.04.08 08:20:43.026 *D* [tunnel           ] SERVER_FQHN: Loading certificate from file
/var/log/nxagentd 2021.04.08 08:20:43.026 *D* [tunnel           ] SERVER_FQHN: Cannot open file "/var/lib/netxms/certificates/53CBA2C8E6D236B46D68AB5D430E9E1BB742786B.crt" (No such file or directory)
/var/log/nxagentd 2021.04.08 08:20:43.026 *D* [tunnel           ] SERVER_FQHN: Cannot open file "/var/lib/netxms/certificates/120AFD110002000400020000000000000001.crt" (No such file or directory)
/var/log/netxmsd  2021.04.08 08:20:43.028 *D* (agent.tunnel     ) SetupTunnel(CLIENT_IP): minimal TLS version set to 1.2
/var/log/netxmsd  2021.04.08 08:20:43.028 *D* (crypto.cert      ) CreateTrustedCertificatesStore: trusted certificate "/etc/netxmsd.conf.d/self/rootCA.crt" added
/var/log/netxmsd  2021.04.08 08:20:43.028 *D* (crypto.cert      ) CreateTrustedCertificatesStore: added system certificate store at "/etc/ssl/certs"
/var/log/netxmsd  2021.04.08 08:20:43.029 *D* (crypto.cert      ) Certificate "/C=CC/ST=ST/L=LLLLLL/O=OOOOOO/CN=SERVER_FQHN" verification error 18 (self signed certificate) at depth 0
/var/log/nxagentd 2021.04.08 08:20:43.034 *D* [tunnel           ] SERVER_FQHN: Server certificate subject is /C=CC/ST=ST/L=LLLLLL/O=OOOOOO/CN=SERVER_FQHN
/var/log/nxagentd 2021.04.08 08:20:43.034 *D* [tunnel           ] SERVER_FQHN: Server certificate issuer is /C=CC/ST=ST/L=LLLLLL/O=OOOOOO/CN=SERVER_FQHN
/var/log/nxagentd 2021.04.08 08:20:43.034 *D* [tunnel           ] SERVER_FQHN: Server certificate verification is disabled
/var/log/nxagentd 2021.04.08 08:20:43.034 *D* [tunnel           ] SERVER_FQHN: Sending message CMD_SETUP_AGENT_TUNNEL
/var/log/netxmsd  2021.04.08 08:20:43.034 *D* (agent.tunnel     ) SetupTunnel(CLIENT_IP): Agent certificate not provided
/var/log/netxmsd  2021.04.08 08:20:43.035 *D* (agent.tunnel     ) SetupTunnel(CLIENT_IP): assigned to poller #0
/var/log/netxmsd  2021.04.08 08:20:43.035 *D* (agent.tunnel.1   ) Tunnel started
/var/log/netxmsd  2021.04.08 08:20:43.035 *D* (agent.tunnel.1   ) Received message CMD_SETUP_AGENT_TUNNEL
/var/log/netxmsd  2021.04.08 08:20:43.035 *D* (agent.tunnel.1   ) Unbound tunnel initialized
/var/log/netxmsd  2021.04.08 08:20:43.035 *D* (agent.tunnel.1   ) System name..............: SERVER_FQHN
/var/log/netxmsd  2021.04.08 08:20:43.035 *D* (agent.tunnel.1   ) Hostname.................: SERVER_FQHN
/var/log/netxmsd  2021.04.08 08:20:43.035 *D* (agent.tunnel.1   ) System information.......: Linux SERVER_FQHN 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Platform name............: Linux-x86_64
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Hardware ID..............: 384518613C4CA702A5CFB61248FA459DC0E33B96
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Agent ID.................: 2529de7e-dc33-4595-b2b7-b3420ac838fe
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Agent version............: 3.8.250
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Zone UIN.................: 0
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Agent proxy..............: NO
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) SNMP proxy...............: NO
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) SNMP trap proxy..........: NO
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Syslog proxy.............: NO
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) User agent...............: NO
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Hook script not found
/var/log/netxmsd  2021.04.08 08:20:43.036 *D* (agent.tunnel.1   ) Sending message CMD_REQUEST_COMPLETED
/var/log/nxagentd 2021.04.08 08:20:43.077 *D* [tunnel           ] SERVER_FQHN: Received message CMD_REQUEST_COMPLETED
/var/log/nxagentd 2021.04.08 08:20:43.077 *I* [                 ] Tunnel with SERVER_FQHN established
/var/log/nxagentd 2021.04.08 08:20:43.077 *D* [tunnel           ] SERVER_FQHN: Tunnel is active


resulting in the tunnel showing up as unbound in:

# nxadm -c "show tunnels"

BOUND TUNNELS
ID  | Node ID | EP  | Chan. | Peer IP Address          | System Name              | Hostname                 | Platform Name    | Agent Version | Agent Build Tag
-----+---------+-----+-------+--------------------------+--------------------------+--------------------------+------------------+---------------+--------------------------

UNBOUND TUNNELS
ID  | EP  | Peer IP Address          | System Name              | Hostname                 | Platform Name    | Agent Version | Agent Build Tag
-----+-----+--------------------------+--------------------------+--------------------------+------------------+---------------+------------------------------------
1   | NO  | CLIENT_IP                | SERVER_FQHN              | SERVER_FQHN              | Linux-x86_64     | 3.8.250       | 3.8-250-g21bbde29bb


Issuing an agent tunnel bind command to the server's node has netxmsd generate a key/cert pair in /var/lib/netxms/certificates/ and send it to the agent, which acknowledges the files and saves them:

/var/log/netxmsd  2021.04.08 08:24:25.841 *D* (client.session.0 ) Received message CMD_BIND_AGENT_TUNNEL
/var/log/netxmsd  2021.04.08 08:24:25.841 *D* (agent.tunnel     ) BindAgentTunnel: processing bind request 1 -> 1727 by user admin
/var/log/netxmsd  2021.04.08 08:24:25.841 *D* (agent.tunnel.1   ) Node agent ID (00000000-0000-0000-0000-000000000000) do not match tunnel agent ID (2529de7e-dc33-4595-b2b7-b3420ac838fe) on bind
/var/log/netxmsd  2021.04.08 08:24:25.841 *D* (agent.tunnel.1   ) Sending message CMD_BIND_AGENT_TUNNEL
/var/log/netxmsd  2021.04.08 08:24:25.841 *D* (event.corr       ) CorrelateEvent: event SYS_TUNNEL_AGENT_ID_MISMATCH id 332805 source SERVER_FQHN [1727]
/var/log/netxmsd  2021.04.08 08:24:25.841 *D* (event.proc       ) EVENT SYS_TUNNEL_AGENT_ID_MISMATCH [96] at {0} (ID:332805 F:0x0001 S:1 TAGS:"") FROM SERVER_FQHN: Agent ID 2529de7e-dc33-4595-b2b7-b3420ac838fe on node do not match agent ID 00000000-0000-0000-0000-000000000000 on tunnel from SERVER_FQHN (CLIENT_IP) at bind
/var/log/nxagentd 2021.04.08 08:24:25.842 *D* [tunnel           ] SERVER_FQHN: Received message CMD_BIND_AGENT_TUNNEL
/var/log/nxagentd 2021.04.08 08:24:25.893 *D* [tunnel           ] SERVER_FQHN: Sending message CMD_REQUEST_CERTIFICATE
/var/log/netxmsd  2021.04.08 08:24:25.893 *D* (agent.tunnel.1   ) Received message CMD_REQUEST_CERTIFICATE
/var/log/netxmsd  2021.04.08 08:24:25.893 *D* (crypto.cert      ) IssueCertificate: new certificate request (CN override: 63c86ba9-08e4-4b4a-8e4a-6a09e831ddac, OU override: 84bff5a5-d832-45fe-a18b-e2ddee7519a3)
/var/log/netxmsd  2021.04.08 08:24:25.895 *D* (crypto.cert      ) IssueCertificate: new certificate with subject "/C=CC/O=OOOOOO/OU=84bff5a5-d832-45fe-a18b-e2ddee7519a3/CN=63c86ba9-08e4-4b4a-8e4a-6a09e831ddac" issued successfully
/var/log/netxmsd  2021.04.08 08:24:25.896 *D* (agent.tunnel.1   ) New certificate issued
/var/log/netxmsd  2021.04.08 08:24:25.896 *D* (node.agent       ) Tunnel ID for node SERVER_FQHN [1727] set to 63c86ba9-08e4-4b4a-8e4a-6a09e831ddac
/var/log/netxmsd  2021.04.08 08:24:25.896 *D* (agent.tunnel.1   ) Sending message CMD_NEW_CERTIFICATE
/var/log/nxagentd 2021.04.08 08:24:25.897 *D* [tunnel           ] SERVER_FQHN: Received message CMD_NEW_CERTIFICATE
/var/log/nxagentd 2021.04.08 08:24:25.898 *D* [tunnel           ] SERVER_FQHN: Certificate and private key saved
/var/log/nxagentd 2021.04.08 08:24:25.898 *D* [tunnel           ] SERVER_FQHN: Sending message CMD_REQUEST_COMPLETED
/var/log/netxmsd  2021.04.08 08:24:25.898 *D* (agent.tunnel.1   ) Received message CMD_REQUEST_COMPLETED
/var/log/netxmsd  2021.04.08 08:24:25.898 *D* (agent.tunnel.1   ) Certificate successfully issued and transferred to agent
/var/log/netxmsd  2021.04.08 08:24:25.898 *D* (agent.tunnel.1   ) Bind successful, resetting tunnel
/var/log/netxmsd  2021.04.08 08:24:25.898 *D* (agent.tunnel.1   ) Sending message CMD_RESET_TUNNEL
/var/log/nxagentd 2021.04.08 08:24:25.898 *D* [tunnel           ] SERVER_FQHN: Received message CMD_RESET_TUNNEL
/var/log/nxagentd 2021.04.08 08:24:25.898 *D* [tunnel           ] SERVER_FQHN: Receiver thread stopped (tunnel reset)
/var/log/nxagentd 2021.04.08 08:24:25.898 *W* [                 ] Tunnel with SERVER_FQHN closed
/var/log/netxmsd  2021.04.08 08:24:25.964 *D* (client.session.0 ) Received message CMD_GET_AGENT_TUNNELS
/var/log/nxagentd 2021.04.08 08:24:43.084 *D* [tunnel           ] SERVER_FQHN: Resetting tunnel
/var/log/netxmsd  2021.04.08 08:24:43.084 *D* (agent.tunnel.1   ) Tunnel closed by peer
/var/log/netxmsd  2021.04.08 08:24:43.085 *D* (agent.tunnel.1   ) Tunnel unregistered
/var/log/netxmsd  2021.04.08 08:24:43.085 *D* (agent.tunnel.1   ) Tunnel closure completed
/var/log/netxmsd  2021.04.08 08:24:43.085 *D* (agent.tunnel.1   ) Tunnel shutdown
/var/log/netxmsd  2021.04.08 08:24:43.085 *D* (agent.tunnel.1   ) Tunnel destroyed


The tunnel is reset automatically and all following connection attempts show up in the logs as follows:

/var/log/netxmsd  2021.04.08 08:24:43.085 *D* (                 ) SocketListener/AgentTunnels: Incoming connection from CLIENT_IP
/var/log/netxmsd  2021.04.08 08:24:43.085 *D* (                 ) SocketListener/AgentTunnels: Connection from CLIENT_IP accepted
/var/log/netxmsd  2021.04.08 08:24:43.085 *D* (agent.tunnel     ) SetupTunnel(CLIENT_IP): minimal TLS version set to 1.2
/var/log/netxmsd  2021.04.08 08:24:43.086 *D* (crypto.cert      ) CreateTrustedCertificatesStore: trusted certificate "/etc/netxmsd.conf.d/self/rootCA.crt" added
/var/log/netxmsd  2021.04.08 08:24:43.086 *D* (crypto.cert      ) CreateTrustedCertificatesStore: added system certificate store at "/etc/ssl/certs"
/var/log/nxagentd 2021.04.08 08:24:43.085 *D* [tunnel           ] SERVER_FQHN: Loading certificate from file
/var/log/nxagentd 2021.04.08 08:24:43.089 *D* [tunnel           ] SERVER_FQHN: Certificate and private key loaded
/var/log/netxmsd  2021.04.08 08:24:43.089 *D* (crypto.cert      ) Certificate "/C=CC/ST=ST/L=LLLLLL/O=OOOOOO/CN=SERVER_FQHN" verification error 18 (self signed certificate) at depth 0
/var/log/nxagentd 2021.04.08 08:24:43.092 *D* [tunnel           ] SERVER_FQHN: Server certificate subject is /C=CC/ST=ST/L=LLLLLL/O=OOOOOO/CN=SERVER_FQHN
/var/log/nxagentd 2021.04.08 08:24:43.092 *D* [tunnel           ] SERVER_FQHN: Server certificate issuer is /C=CC/ST=ST/L=LLLLLL/O=OOOOOO/CN=SERVER_FQHN
/var/log/nxagentd 2021.04.08 08:24:43.092 *D* [tunnel           ] SERVER_FQHN: Server certificate verification is disabled
/var/log/netxmsd  2021.04.08 08:24:43.093 *D* (crypto.cert      ) Certificate "/C=CC/O=OOOOOO/OU=84bff5a5-d832-45fe-a18b-e2ddee7519a3/CN=63c86ba9-08e4-4b4a-8e4a-6a09e831ddac" verification error 7 (certificate signature failure) at depth 0
/var/log/netxmsd  2021.04.08 08:24:43.093 *D* (agent.tunnel     ) SetupTunnel(CLIENT_IP): TLS handshake failed (error:00000001:lib(0):func(0):reason(1))
/var/log/nxagentd 2021.04.08 08:24:43.093 *D* [tunnel           ] SERVER_FQHN: Sending message CMD_SETUP_AGENT_TUNNEL
/var/log/nxagentd 2021.04.08 08:24:43.093 *D* [crypto           ] OpenSSL error stack:
/var/log/nxagentd 2021.04.08 08:24:43.093 *D* [crypto           ] error:1409441B:SSL routines:ssl3_read_bytes:tlsv1 alert decrypt error
/var/log/nxagentd 2021.04.08 08:24:43.093 *D* [tunnel           ] SERVER_FQHN: SSL_write error (bytes=-1 ssl_err=6 socket_err=0)
/var/log/nxagentd 2021.04.08 08:24:43.093 *D* [tunnel           ] SERVER_FQHN: Receiver thread stopped (MSGRECV_CLOSED)
/var/log/nxagentd 2021.04.08 08:24:43.093 *W* [                 ] Tunnel with SERVER_FQHN closed
/var/log/nxagentd 2021.04.08 08:24:53.093 *D* [tunnel           ] SERVER_FQHN: Cannot configure tunnel (request timeout)
/var/log/nxagentd 2021.04.08 08:25:23.094 *D* [tunnel           ] SERVER_FQHN: Resetting tunnel


How come the server is building a new key/cert pair for the client, based on data that later cannot be found for verification?


Best regards,

  Florian
#11
I ran the test with a sample-size of 300 and spaced the measurements 0.1 seconds apart from each other.
The result was a mean / std-dev of 63.59 / 10.88 in ms.
Only one outlier at 101ms.

I started running a longer measurement ( same frequency ) while typing these lines - this time 60.27 / 6.36, no outliers.

So JVM looks fine on that end.


Best regards,

  Florian
#12
Hi!

I'm a colleague of hoeth, who can't reply personally at the moment.

The setup you described led to the following upon execution:

$ java -Dnetxms.server=127.0.0.1 -Dnetxms.login=admin -Dnetxms.password=our_password -cp logback-classic-1.2.3.jar:logback-core-1.2.3.jar:.:nxshell-3.8.250.jar org.netxms.Shell
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.python.core.io.StreamIO (file:/home/ats/debug/nxshell-3.8.250.jar) to field java.io.FilterInputStream.in
WARNING: Please consider reporting this to the maintainers of org.python.core.io.StreamIO
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ats/debug/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ats/debug/nxshell-3.8.250.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
08:48:44,747 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
08:48:44,747 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
08:48:44,747 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/home/ats/debug/logback.xml]
08:48:44,791 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
08:48:44,792 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
08:48:44,795 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [Console]
08:48:44,811 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - This appender no longer admits a layout as a sub-component, set an encoder instead.
08:48:44,811 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - To ensure compatibility, wrapping your layout in LayoutWrappingEncoder.
08:48:44,811 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - See also http://logback.qos.ch/codes.html#layoutInsteadOfEncoder for details
08:48:44,812 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to DEBUG
08:48:44,812 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [Console] to Logger[ROOT]
08:48:44,812 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
08:48:44,812 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@78f9ed3e - Registering current configuration as safe fallback point

SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
08:48:44.818 [main] INFO  org.netxms.client.NXCSession - Connecting to 127.0.0.1:4701
08:48:44.829 [main] DEBUG org.netxms.client.NXCSession - Connection established, retrieving server info
08:49:30.835 [Network Message Receiver] DEBUG org.netxms.client.NXCSession - Cipher selected: AES-256
08:49:30.837 [Network Message Receiver] DEBUG org.netxms.client.NXCSession - Server key fingerprint: 31:5f:5f:55:0e:71:03:58:17:1f:67:36:23:03:40:0f


Meanwhile, listening to /var/log/netxmsd:

2021.04.01 08:48:44.824 *D* [client.session     ] Client session with ID 3 registered
2021.04.01 08:48:44.831 *D* [client.session.3   ] Received message CMD_GET_SERVER_INFO
2021.04.01 08:48:44.831 *D* [client.session.3   ] Server time zone: UTC+00UTC
2021.04.01 08:48:44.831 *D* [client.session.3   ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.04.01 08:48:44.833 *D* [client.session.3   ] Received message CMD_REQUEST_ENCRYPTION
2021.04.01 08:48:44.833 *D* [client.session.3   ] Sending message CMD_REQUEST_SESSION_KEY (344 bytes)
2021.04.01 08:49:14.833 *D* [client.session.3   ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.04.01 08:49:14.834 *D* [client.session.3   ] readSocket: connection closed
2021.04.01 08:49:14.834 *D* [                   ] *Locks* All locks for session 3 removed
2021.04.01 08:49:14.834 *D* [                   ] Closing all logs for session 3
2021.04.01 08:49:14.834 *D* [client.session.3   ] Session closed
2021.04.01 08:49:14.834 *D* [client.session     ] Client session with ID 3 unregistered
2021.04.01 08:49:14.834 *D* [client.session.3   ] Socket closed
2021.04.01 08:49:14.834 *D* [client.session.3   ] Session object destroyed


Note:
Nxshell's last output before being abandoned by server is at 08:48:44.829, yet significantly after the session object being destroyed ( 08:49:14.834 ), it comes back with a selected cipher and key-fingerprint ( 08:49:30.837 ).


Thank you very much,

  Florian