Slave playback
In the event of a slave failure, the check_opsview_slaves plugin should alert that the slave is not contactable - this highlights the problem.
After 30 minutes, the services that belong on the slave will start to be set into an unknown state - this is most representative of the state of the service.
Although the slave may continue to check its hosts, the results to be sent up to the master are dropped if the connection is not available.
However, playback of old results may not be possible because:
- Nagios has limits based on the time stamp of the passive check result and will reject old data
- How do you merge results from a playback versus a stale result?
- There's probably lots of places in the Nagios code where it uses current time, rather than the time of the result. For instance, would notifications use the time the result was received or the time of the result?
Also, send_nsca doesn't include the time of a result in the data sent back to the master - the nsca daemon uses the current time as the passive result. So this says that Nagios doesn't currently handle very much with regards to a 5 second difference, never mind potentially 1 hour difference between time of result and the time of processing it.
One possibility is to go through a subset of Nagios' result processing, for example, save the check result in the database, update the performance RRDs, but do not do the notification piece or the state history piece
There are two limits that need to be considered:
- A lower limit (say 1 minute) where the data will be uploaded and processed normally (caters for temporary network blips)
- An upper limit (say 1 day) where data on slave is discarded
What happens with data in the middle? Can this be stored in Runtime/ODW without Nagios processing? How does this affect ODW importing? Will the hourly import mean that data already in ODW needs to be updated?
Should freshness checking really update the state? Maybe this should just be a flag for the host/service to denote visual differences in UI.
Need some performance information about how old data is before reaching Nagios.
