nxshell hangs at "Connecting to <server>"

Started by hoeth, March 30, 2021, 01:17:10 PM

Previous topic - Next topic

hoeth

Hi,

I'm trying to use nxshell to setup nodes, get some DCI values, etc. Mostly it's working fine, but sometimes nxshell hangs at the "INFO org.netxms.client.NXCSession - Connecting to localhost:4701" stage. If it does, I have to abort that command and wait a bit before trying again, as it seems that waiting only a few seconds is not enough and the next connection attempt is hanging again.

There is only minimal load on the server, because this is a test setup with only two nodes. I'm running nxshell directly on the server.

Can I define a connection timeout so that I don't have to abort the command manually? Do you have any idea why subsequent connection attempts also fail and how I can avoid this?

Thanks a lot,

   Hendrik

Filipp Sudanov

Not sure how timeout could be set from nxshell side.

From server side:
- try checking number of connected sessions - show sessions in server console or using nxadm. Are there stuck sessions from previous nxshell connections?

- try checking with netstat if there are hanging sockets to port 4701


What is the version of netxms?

hoeth

#2
We are running netxmsd-3.8.250 with nxshell-3.8.250 on Debian Buster with java from Debian (openjdk 11.0.9.1 2020-11-04).

There are no stuck sessions from previous connection attempts.

With DebugLevel=6 in netxmsd we can see that there seems to be a problem with the session key exchange. I've got three logfile excerpts for you:

1) Successful login, everything looking good:

2021.03.31 13:08:03.134 *D* [client.session     ] Client session with ID 1 registered
2021.03.31 13:08:03.147 *D* [client.session.1   ] Received message CMD_GET_SERVER_INFO
2021.03.31 13:08:03.147 *D* [client.session.1   ] Server time zone: UTC+00UTC
2021.03.31 13:08:03.147 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.03.31 13:08:03.152 *D* [client.session.1   ] Received message CMD_REQUEST_ENCRYPTION
2021.03.31 13:08:03.152 *D* [client.session.1   ] Sending message CMD_REQUEST_SESSION_KEY (344 bytes)
2021.03.31 13:08:03.224 *D* [client.session.1   ] Received message CMD_SESSION_KEY
2021.03.31 13:08:03.226 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.03.31 13:08:03.230 *D* [client.session.1   ] Received message CMD_LOGIN
2021.03.31 13:08:03.230 *D* [client.session.1   ] authentication type 0
2021.03.31 13:08:03.230 *D* [client.session.1   ] Protocol level compression is supported by client
2021.03.31 13:08:03.230 *D* [client.session.1   ] User [email protected] authenticated (language=en clientInfo="nxjclient/3.8.250 (Linux 4.19.0-14-amd64; libnxcl 3.8.250)")
2021.03.31 13:08:03.230 *D* [client.session.1   ] Sending compressed message CMD_LOGIN_RESP (304 bytes)
2021.03.31 13:08:03.235 *D* [client.session.1   ] Received message CMD_GET_OBJECT_CATEGORIES
2021.03.31 13:08:03.235 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (48 bytes)
2021.03.31 13:08:03.235 *D* [client.session.1   ] Received message CMD_GET_OBJECTS
2021.03.31 13:08:03.235 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.03.31 13:08:03.237 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (720 bytes)
2021.03.31 13:08:03.238 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (352 bytes)
2021.03.31 13:08:03.238 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (336 bytes)
2021.03.31 13:08:03.238 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (296 bytes)
[...]


2) Successful login, but with quite some delay. Note there is a 7 second delay between "Sending message CMD_REQUEST_SESSION_KEY" and "Received message CMD_SESSION_KEY":

2021.03.31 13:09:28.943 *D* [                   ] SocketListener/Clients: Incoming connection from 127.0.0.1
2021.03.31 13:09:28.943 *D* [                   ] SocketListener/Clients: Connection from 127.0.0.1 accepted
2021.03.31 13:09:28.943 *D* [client.session     ] Client session with ID 1 registered
2021.03.31 13:09:28.951 *D* [client.session.1   ] Received message CMD_GET_SERVER_INFO
2021.03.31 13:09:28.951 *D* [client.session.1   ] Server time zone: UTC+00UTC
2021.03.31 13:09:28.951 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.03.31 13:09:28.954 *D* [client.session.1   ] Received message CMD_REQUEST_ENCRYPTION
2021.03.31 13:09:28.954 *D* [client.session.1   ] Sending message CMD_REQUEST_SESSION_KEY (344 bytes)
2021.03.31 13:09:36.078 *D* [client.session.1   ] Received message CMD_SESSION_KEY
2021.03.31 13:09:36.080 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.03.31 13:09:36.083 *D* [client.session.1   ] Received message CMD_LOGIN
2021.03.31 13:09:36.083 *D* [client.session.1   ] authentication type 0
2021.03.31 13:09:36.083 *D* [client.session.1   ] Protocol level compression is supported by client
2021.03.31 13:09:36.083 *D* [client.session.1   ] User [email protected] authenticated (language=en clientInfo="nxjclient/3.8.250 (Linux 4.19.0-14-amd64; libnxcl 3.8.250)")
2021.03.31 13:09:36.083 *D* [client.session.1   ] Sending compressed message CMD_LOGIN_RESP (304 bytes)
2021.03.31 13:09:36.091 *D* [client.session.1   ] Received message CMD_GET_OBJECT_CATEGORIES
2021.03.31 13:09:36.091 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (48 bytes)
2021.03.31 13:09:36.091 *D* [client.session.1   ] Received message CMD_GET_OBJECTS
2021.03.31 13:09:36.091 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.03.31 13:09:36.095 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (720 bytes)
2021.03.31 13:09:36.095 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (352 bytes)
2021.03.31 13:09:36.095 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (336 bytes)
2021.03.31 13:09:36.095 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (296 bytes)
2021.03.31 13:09:36.096 *D* [client.session.1   ] Sending compressed message CMD_OBJECT (288 bytes)
[...]


3) Unsuccessful connection attempt. 30 seconds after "Sending message CMD_REQUEST_SESSION_KEY" the server is giving up, but nxshell hangs forever:

2021.03.31 13:10:48.138 *D* [                   ] SocketListener/Clients: Incoming connection from 127.0.0.1
2021.03.31 13:10:48.138 *D* [                   ] SocketListener/Clients: Connection from 127.0.0.1 accepted
2021.03.31 13:10:48.138 *D* [client.session     ] Client session with ID 1 registered
2021.03.31 13:10:48.146 *D* [client.session.1   ] Received message CMD_GET_SERVER_INFO
2021.03.31 13:10:48.146 *D* [client.session.1   ] Server time zone: UTC+00UTC
2021.03.31 13:10:48.146 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.03.31 13:10:48.148 *D* [client.session.1   ] Received message CMD_REQUEST_ENCRYPTION
2021.03.31 13:10:48.148 *D* [client.session.1   ] Sending message CMD_REQUEST_SESSION_KEY (344 bytes)
2021.03.31 13:11:18.148 *D* [client.session.1   ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.03.31 13:11:18.148 *D* [client.session.1   ] readSocket: connection closed
2021.03.31 13:11:18.148 *D* [                   ] *Locks* All locks for session 1 removed
2021.03.31 13:11:18.148 *D* [                   ] Closing all logs for session 1
2021.03.31 13:11:18.148 *D* [client.session.1   ] Session closed
2021.03.31 13:11:18.149 *D* [client.session     ] Client session with ID 1 unregistered
2021.03.31 13:11:18.149 *D* [client.session.1   ] Socket closed
2021.03.31 13:11:18.149 *D* [client.session.1   ] Session object destroyed


We have a small number of sockets in TIME_WAIT in netstat (typically one socket, sometimes two), seemingly caused by unsuccessful connection attempts.

Victor Kirhenshtein

Hi!

Looks like Java side issue. Could you please try to do the following:

1. Download nxshell-3.8.250.jar and the following files:
https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.2.3/logback-core-1.2.3.jar
https://repo1.maven.org/maven2/ch/qos/logback/logback-classic/1.2.3/logback-classic-1.2.3.jar
into same directory.
2. Create file logback.xml in the same directory with the following content:

<configuration>
   <appender name="Console" class="ch.qos.logback.core.ConsoleAppender">
      <layout class="ch.qos.logback.classic.PatternLayout">
         <Pattern>%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n</Pattern>
      </layout>
   </appender>
   <root level="debug">
      <appender-ref ref="Console"/>
   </root>
</configuration>

3. Run nxshell as following:

java -Dnetxms.server=127.0.0.1 -Dnetxms.login=admin -Dnetxms.password=your_password -cp logback-classic-1.2.3.jar:logback-core-1.2.3.jar:.:nxshell-3.8.250.jar org.netxms.Shell

until it hangs and send me console output.

Best regards,
Victor

fldiet

Hi!

I'm a colleague of hoeth, who can't reply personally at the moment.

The setup you described led to the following upon execution:

$ java -Dnetxms.server=127.0.0.1 -Dnetxms.login=admin -Dnetxms.password=our_password -cp logback-classic-1.2.3.jar:logback-core-1.2.3.jar:.:nxshell-3.8.250.jar org.netxms.Shell
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.python.core.io.StreamIO (file:/home/ats/debug/nxshell-3.8.250.jar) to field java.io.FilterInputStream.in
WARNING: Please consider reporting this to the maintainers of org.python.core.io.StreamIO
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ats/debug/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ats/debug/nxshell-3.8.250.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
08:48:44,747 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
08:48:44,747 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
08:48:44,747 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/home/ats/debug/logback.xml]
08:48:44,791 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
08:48:44,792 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
08:48:44,795 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [Console]
08:48:44,811 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - This appender no longer admits a layout as a sub-component, set an encoder instead.
08:48:44,811 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - To ensure compatibility, wrapping your layout in LayoutWrappingEncoder.
08:48:44,811 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - See also http://logback.qos.ch/codes.html#layoutInsteadOfEncoder for details
08:48:44,812 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to DEBUG
08:48:44,812 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [Console] to Logger[ROOT]
08:48:44,812 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
08:48:44,812 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@78f9ed3e - Registering current configuration as safe fallback point

SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
08:48:44.818 [main] INFO  org.netxms.client.NXCSession - Connecting to 127.0.0.1:4701
08:48:44.829 [main] DEBUG org.netxms.client.NXCSession - Connection established, retrieving server info
08:49:30.835 [Network Message Receiver] DEBUG org.netxms.client.NXCSession - Cipher selected: AES-256
08:49:30.837 [Network Message Receiver] DEBUG org.netxms.client.NXCSession - Server key fingerprint: 31:5f:5f:55:0e:71:03:58:17:1f:67:36:23:03:40:0f


Meanwhile, listening to /var/log/netxmsd:

2021.04.01 08:48:44.824 *D* [client.session     ] Client session with ID 3 registered
2021.04.01 08:48:44.831 *D* [client.session.3   ] Received message CMD_GET_SERVER_INFO
2021.04.01 08:48:44.831 *D* [client.session.3   ] Server time zone: UTC+00UTC
2021.04.01 08:48:44.831 *D* [client.session.3   ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.04.01 08:48:44.833 *D* [client.session.3   ] Received message CMD_REQUEST_ENCRYPTION
2021.04.01 08:48:44.833 *D* [client.session.3   ] Sending message CMD_REQUEST_SESSION_KEY (344 bytes)
2021.04.01 08:49:14.833 *D* [client.session.3   ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.04.01 08:49:14.834 *D* [client.session.3   ] readSocket: connection closed
2021.04.01 08:49:14.834 *D* [                   ] *Locks* All locks for session 3 removed
2021.04.01 08:49:14.834 *D* [                   ] Closing all logs for session 3
2021.04.01 08:49:14.834 *D* [client.session.3   ] Session closed
2021.04.01 08:49:14.834 *D* [client.session     ] Client session with ID 3 unregistered
2021.04.01 08:49:14.834 *D* [client.session.3   ] Socket closed
2021.04.01 08:49:14.834 *D* [client.session.3   ] Session object destroyed


Note:
Nxshell's last output before being abandoned by server is at 08:48:44.829, yet significantly after the session object being destroyed ( 08:49:14.834 ), it comes back with a selected cipher and key-fingerprint ( 08:49:30.837 ).


Thank you very much,

  Florian

Victor Kirhenshtein

Hi,

looks like call to Cipher.getInstance("AES/CBC/PKCS5Padding") sometimes is very slow. To test it, please create the following Java file:


import javax.crypto.Cipher;

class CipherTest
{
   public static final void main(String[] args)
   {
      long start = System.currentTimeMillis();
      try
      {
         Cipher.getInstance("AES/CBC/PKCS5Padding");
      }
      catch(Exception e)
      {
         e.printStackTrace();
      }
      long elapsed = System.currentTimeMillis() - start;
      System.out.println("Elapsed time " + elapsed + " milliseconds");
   }
}


call it CipherTest.java, then compile as

javac CipherTest.java


and run as

java CipherTest


It should print elapsed time for cipher initialization. On my machine it is around 60 milliseconds. If it will take many seconds during some run then this is problem with local JVM (likely with entropy collection, maybe with /dev/random or /dev/urandom).

Best regards,
Victor

fldiet

I ran the test with a sample-size of 300 and spaced the measurements 0.1 seconds apart from each other.
The result was a mean / std-dev of 63.59 / 10.88 in ms.
Only one outlier at 101ms.

I started running a longer measurement ( same frequency ) while typing these lines - this time 60.27 / 6.36, no outliers.

So JVM looks fine on that end.


Best regards,

  Florian

Mishal

Hi,

I'm trying to use nxshell for a similiar scenario as @Hoeth.
Am facing the same issue of nxshell hanging at "INFO org.netxms.client.NXCSession - Connecting to localhost:4701".

I've tried this on 2 versions of nxshell and netxms server and it occurs on both.

NetXMS Server versions - 3.6.3 & 3.8.5
nxshell versions - 3.6.3 & 3.8.5

Regards,
Mishal

Mishal

Hi,

After further troubleshooting.
This issue occurs only when running nxshell on a linux OS.
I've tried running it on Windows 10 and Windows Server 2019 and it doesn't hang.
Java version "8" is used on Windows.

hoeth

#9
With java-8 on Linux (openjdk-8-jre package in Ubuntu 18.04 LTS) I see the same problem as with java-11: The session key sometimes arrives delayed.

I haven't tried the CipherTest with java-8.