News:

We really need your input in this questionnaire

Main Menu

Metrics from logwatch events

Started by Dawid Kellerman, April 12, 2022, 09:46:33 PM

Previous topic - Next topic

Dawid Kellerman

Hi
I am hoping someone could point me into a direction. Still new and I would like to get some insights on the best angle to do what I want..  graphs/dials on the performance tab but from events in log files with logwatch. eg ssh login failures for the last day. Or from cfengine logs how many promises kept, how may repaired and how many failed.

What would be the best?
Use the agent with some external commands / custom parsing scripts for the ssh login failures or whatever log.
Or is there a way I can get the metric/dci values from events that came from event processing? with the parameter values captured.

It feels like there should be a way to get the events from logs as values to use like with dci's and last values for a time line.

Thank You
Dawid

Filipp Sudanov

Log parser policy is probably a nicer approach as there's no need to maintain any scripts on agents. With this there are two approaches:

1) You can use repeat interval functionality of log parser. E.g. you can set Repeat interval to 24 hours. Repeat count should be set to 1 (if it's 0, this functionality is disabled). Reset repeat count should be unchecked.
So now we have a 24-hour long rolling window and on each new match generated event will have the number of matches happened within this window.

Now the question comes - how can we save the value from event processing policy rule. We can create a DCI with origin "push". To this DCI we can send values from NXSL script (and also from command line utilities nxpush or nxapush (this one comes with the agent).

So imagine that on the node where log monitoring is set up we have push DCI with parameter "log_stats". In the EPP rule that reacts to our log monitoring event we can have the following filter script:
dci_id = FindDCIByName($node, "log_stats");
if (dci_id > 0) PushDCIData($node, dci_id, $event->$1);


It's called "filter scrip" as original idea is that EPP rule is processed or not depending on what that script returns. But we can just use this script to do some operations that we need. What we do - we just send the value from first parameter of the event into that push DCI. So we now have some historic data in this DCI.

There is a flaw in this approach - the values are only updated when log file has new matches. If there are no new matches, the DCI will get stuck with the last value. We can fix this by scheduling some action that would send 0 into our push DCI after 24 hours - but this won't be exactly correct - e.g. if we had one match 12 hours ago and another just now, then after 12 hours the value should drop from 2 to 1.

2) The other approach is to do the same job that agent does, when counting matches within specified time window, on the server. For this we need to store a list of unix timestamps when new events come (again from script in EPP rule). Then periodically (can just use a script DCI) we need to count timestamps that fit within 24 hour time window and delete older timestamps from the list. The list could be stored in a mapping table (it's a global place to store things) or just in custom attribute on the node as comma-separated string with unix timestamps.

I can give more detailed instructions if you decide to use this approach.
Feel free to ask if you need more detailed explanations on the above.


Dawid Kellerman

Hi Pilipp,

Thank you for the detailed explanation my initial thoughts was doing it without knowing the exact way with your #2 way saving data in some table and working from there.
But I have less experienced colleagues and for them and ease of implementation / maintenance #1 is the more elegant one. We will try it first and if that does not work for us I will revert back for #2.

Regards Dawid

Dawid Kellerman

Hi Pilipp,

When you have a moment would you please add the #2 part example?

Could not find much about the mapping tables / persistent storage or how to use them in the docs

Regards Dawid

Filipp Sudanov

Hi!

Actually, there is no write access to mapping tables from NXSL. We can use either persistent storage or custom attributes. Persistent storage record length is limited to 2000 chars and unlimited for custom attributes, so let's use them.

There are two scripts, first one should be set as EPP filter script so it will be executed when event comes:

s = GetCustomAttribute($node, "EventTimestamps");
if (s != NULL and s != "")
{
  a = s->split(",");
}
else
{
  a = %();
}
if (a->size < 2000) a->append(time());
SetCustomAttribute($node, "EventTimestamps", ArrayToString(a,","));
return true;

Every time when event comes, this script will add unixtime to the custom attribute on the node. There's a limit of 2000 records so that the string won't grow endlessly.

The second script should be executed periodically - we can use script DCI with needed interval, e.g. 1 hour or 24 hours. You can just use interval or you can set cron schedule so that this DCI get's collected at exact moment:

period = 3600 * 24; // in seconds

s = GetCustomAttribute($node, "EventTimestamps");
a2 = %();
now = time();

if (s != NULL and s != "")
{
  a = s->split(",");
  for (i = a->minIndex; i <= a->maxIndex; i++)
  {
    if (a[i] > now - period) a2->append(a[i]);
  }
}

SetCustomAttribute($node, "EventTimestamps", ArrayToString(a2,","));
return a2->size;


The script goes through all recorded timestamps, throws away those that are older then specified period, saves updated custom attribute and counts the number of records.