PDA

View Full Version : How Are Test Events De-Duplicated On Event Manager



rajib
July 21 2009, 06:49 PM
The latest/current test status is always shown as a threshold violation event (http://community.zyrion.com/showthread.php?t=66) (assuming the filter is configured appropriately) on the Event Manager. As new events are recorded for the same device/test, they are subject to the advanced de-duplication logic built into Traverse. For threshold violation events the de-duplication window is 15 x the polling interval. Any subsequent similar event observed within that perion will be collapsed into the same row on the Event Manager with the event count increased by one. Additionally, the de-duplications timer will reset from the time of the most recent event.

The Event Manager is also designed to provide a "rolling" history of events that have transpired. When a new threshold violation event is accepted for the same device/test, there is a new current event and previous event(s) are set to auto expire after 30 minutes.



As an example, assume the following sequence of events for a Packet Loss test currently in OK state since 10:00am and configured to run every 3 minutes:

10:35am - OK -> WARNING
10:40am - WARNING -> CRITICAL
11:30am - CRITCAL -> OK
Since the de-duplication window is 3 x 15 = 45 minutes, the event at 10:35am is a candidate for de-duplication and the Event Manager will show the test with WARNING state and event count = 2. The de-duplication window resets and will now expire at 11:20am. At the same time, the earlier OK event will automatically expire at 11:05am. The event at 10:40am will also be de-duplicated with test status now being CRITICAL, event count = 3 and de-duplication window extended until 11:25am. At 11:05am, the initial OK event will expire (cleared from view) and event count decrease to 2. At 11:10am the WARNING event will expire leaving test status still at CRITICAL but event count = 1. Since the event at 11:30am is outside of the de-duplication window (expired at 11:25am), it will appear on the Event Console on a new row with test status OK and event count = 1. The earlier CRITICAL event from 11:30am is no longer the current event and will automatically expire at 12:00pm.

If you would prefer to see only the current threshold violation events without the history (as described above), you will need to edit etc/dge.xml and locate the following section:



<message-handler>
<duplicateEventCycle>15</duplicateEventCycle>
<eventExpiration>1800</eventExpiration>
</message-handler>

Change the eventExpiration value to 0, indicating expire immediately. The DGE component will need to be restarted (etc/monitor.init restart) in order to apply the setting. Also note that the change will need to be re-applied to this file if you install a newer version/build of Traverse in future.

samrog
July 27 2009, 02:07 PM
This post indicates that duplicate messages will not be displayed (or at least condensed into existing row) if it occurs within the 15 x the polling interval. Does this not apply to syslog messages, as I am seeing many duplicates. Since there is no 'polling interval' for syslog messages, I am hoping there will be a way to apply this to these messages. This will greatly reduce the amount of data we have to view.

rajib
July 27 2009, 02:24 PM
Sam, the de-duplication capability is applicable to Syslog and other Message Events in similar manner as Threshold Violation Events as outlined above. As outlined in http://community.zyrion.com/showthread.php?p=133, it is possible to define targeted Event Processing Rules to de-duplicate specific types of events. If you could post (in above thread) an example of the events you would like see de-duplicated, we'd be happy to help you create a suitable rule.