Opsview Wish List
These are wish lists for future development of Opsview. This page will be updated constantly.
The immediate roadmap is available at http://trac.opsview.org/roadmap. Items listed under these milestones will be scheduled as either bugs, support work, or sponsored development. This page will detail items that have not been scheduled but have been requested.
As Opsera is an open source company, we look for sponsors for development work. If you have a feature you are interested in adding to Opsview, please contact us at opsview@opsera.com.
Some other items for wishlist consideration are listed in the Wishlist category of tickets.
Wish list
- Custom notification methods via the web UI - see Nagios3Notifications
- Include ability to change method and output
- Allow host checks to be disabled either via the host config page or via host template page
API enhancements
- Extend to other data types
- Extend to edit in addition to creation/cloning/deletion
- Extend inputs to include perl hashrefs
- Convert backend to DBIx::Class
Enhancements to performance graphing
The aim is to get a more flexible graphing tool, with ability to drill in and select datapoints that you care about. URLs are then bookmarkable which you can save with the servicecheck and then exposed in menu option for that service
- Daemonise the insert into rrd routines, use a temporary file store ala ndologs
- RRDs stored in subdirectories, by host then service name
- Create a separate rrd for each metric returned. Include datapoints for warning and critical levels in a separate RRD {metric}_thresholds (where appropriate - ie single values, not ranges)
- Update rrds based on datapoint name, not on position
- Create a mapping file where, based on a servicecheck name, you can set GAUGE or COUNTER for metrics coming in. Example contents:
- DNS failures:*:COUNTER
- DNS failures:metric:GAUGE
- Update servicechecks page to expose these options. Create this mapping file at reload time.
- Initial graph shows the same daily, monthly, weekly, yearly data for all datapoints available. But also have options to choose:
- datapoints
- timerange (start and end - see http://stephencelis.com/projects/timeframe) or duration (now-duration till now)
- Drag and drop to select timerange (ala cacti)
- All graphing through catalyst, rather than an external CGI script: Use RRD::Simple? Use Catalyst::View::RRDGraph?
- Use same nagiosgraph style map file, but remove as much as possible
- Migration tools to move existing RRD data into new style formats
- Allow saving graphing URLs on servicechecks page. These are exposed via menu button in status pages
- Allow multiple host selections, so can see graphs compared across hosts
- ddraw integration? To create custom graphs. Or cairo, to get data and render separately
- Tighten permissions on who can see graphs
Or possibly integrate NagiosGrapher? / pnp.
Acknowledgements
- update the Opsview API to allow acknowledgements
- ensure ODW is updated with acknowledgement information
Mass recheck
Like mass acknowledgements, have a mass recheck.
- Send request to appropriate slaves
- Next page reorders based on results coming in
Nagios 3
Extra plugin output, custom macros, performance enhancements, better scalability, no loss of results from slaves. Store custom macros in Opsview, have multiple servicechecks based on macros.
Status Map replacement
The current status map has limitations in the way it arranges items. Would like another mechanism to create the status map that is topologically correct, but still zoomable and resizable.
Events view
A view of all the most recent events, with the most recent at the top. Can filter either by hostgroup, host, service or keyword. Ordered chronologically.
Incident queue
For a single service in Nagios, it only holds one state. This is not very good for passive results. Imaging a security log analyser: it sends passive results to Nagios. However, the last result overwrites all the others, so if the order was CRIT, CRIT, WARN, OK, the service will be displayed in an OK state.
You can use the "alert every failure", but this is not very intelligent.
Proposal is an "incident queue", so each Nagios service has a list of incidents. Keyed on the first word, this service will have a list of incidents, each with their own state and needs to be acknowledged.
Nagios' state for the service is the max of the list of incidents.
This can be applied to security logging, snmptrap results and interface states.
Enterprise dashboard
Collecting various status information together under a single view
Maybe use Graphics::Primitive to draw this dashboard page. Output in PNG, though can easily change to PDF. Would require hand-coding of the page.
http://www.catalystframework.org/calendar/2008/9
SNMP trap redesign
Make SNMP trap processing easier to understand:
- Have a single queue, rather than multiple queues based on servicechecks
- Instead of macros in place, change to variable values, thus allow perl's scalar to "just work" in rules (ala logdaemon code)
- Animate process of a trap moving through the queue and what actions are taken
- SNMP::Trapinfo updated to ignore double quotes at beginning and end of a value
- Research embedded perl for snmptrapd - would it be better/quicker?
- Daemonise snmptrap rule engine
- Use files for each snmptrap
- Save every trap received. Must be on local opsview server (no mysql for slaves). Housekeep regularly (rotate sqlite files?). Forget exceptions - if there are traps wanted, can always go over history (same design principle as SEC)
- In the unique filter view, list number of similar traps, with a link to show them all
SNMP traps with looser restrictions re: fully translated
Some hardware manufacturer's MIBs do not appear to send traps with the trapname "fully translated". The Opsview SNMP trap engine should have an exception table so that if the fully_translated test fails, a lookup can be performed. Matching entries are then considered okay.
Have two types of lookup:
- By MIB name
- By full trap name
Columns of exceptions table:
- Type (1 = MIB name, 2=full trap name)
- Value (as appropriate)
Probably should be tab separated file.
Timezone support
Better timezone support, to allow contacts in different timezones to get Nagios status displays in own timezone.
- Opsview Master and slaves would be in UTC
- Contacts would have a timezone set for them
- Status pages and notifications would have time adjusted based on timezone for the contact
- Timeperiods would need adjusting if checks occurred in differing timezones
Template Toolkit for Nagios config generation
Create nagios configuration files using TT instead of lots of print statements.
Adding custom menu sections
API functionality is required to add an entire menu section to the sidenav (compared to the existing functionality whereby an item can be added to an existing section).
Reinserting a trap after promotion of MIB
After promotion of MIB, want to be able to reinsert the trap again, but with translations. This is tricky because the initial mib was received in textual form, not OID form.
Rotating view of pages
Functionality to configure a list of URLs via web UI and a time interval. By clicking on a left-nav link UI will display first URL, then after pre-defined interval refreshing to next in list. After first iteration code should go back to first URL in list and so on...
Purpose is to display monitoring pages in a rotating view.
Revamp paging in list pages
On list pages, page numbers do not scale well. Group by alphabet, host/service groups, instead, things that have changed.
Retain paging where it makes sense such as snmptrapexceptions.
Geospatial information
Allow saving of geospatial data - is becoming more important (eg: GeoRSS). Possible integration with Google Maps.
Correlation of trap information
If a network device has a problem, lots of ports could send errors. Have some correlation so that only a single alert is raised.
Could be done at SNMP trap level, or at Nagios level. SEC?
Enhancements to performance info page
What do these values mean? Can we get them graphed? Or alerted on? Should active host checks be on there, given host checks are on demand?
Need more thinking if this needs to be presented, and if so, how?
Additional options to host menu popup
Such as going to HTTP, or running SSH
Only show hard states in HH views
So these are notified errors. Maybe require a DB change to NDO to store current hard state.
List all users currently logged in
Admin function to see who is currently using Opsview Web.
Provide ability to manage host icons via Opsview UI
Ability to add, modify and delete host icons via Admin UI
Ability of applying a host template to large numbers of hosts
Ability to assign a template to multiple hosts in one action.
Multiple contact profiles
Want to be able to do things like:
- Ability to receive warnings via email, but criticals over sms
- Different notification methods for different hostgroups
- Different profiles covering time of day / weekend / etc
- Have emails for one set of hostgroup/servicegroups with critical, but emails for a different set with warnings
- Have an oncall primary and secondary contactgroup, which resolves to specific people
Maybe handled by assigning via contactgroups. Contactgroups could be associated with hosts (all services) or servicechecks (all hosts) or keywords (specific services) or hostgroup/servicegroup intesection.
So contactgroups defines the "what", whereas the contacts define the "who".
Also allow contacts to be authorised by selecting a higher level HH group (which would then include all subgroups) rather than just a hostgroup.
NMIS front end
Bypass NMIS cgis and provide via Opsview Web.
Show all contacts for a host/service
List all the contacts for a host/service. Maybe create a spreadsheet with all host/services and the list of contacts.
Recurring downtime
Schedule a recurring downtime, so, eg a weekly reboot will not alert. Or maybe a power failure test on the 6th of every month.
UI can handle the repetitions (need some calendar functions, hopefully with existing perl modules), but not sure how to pass this information into Nagios - do you only schedule the next event a day before the actual event is due to happen? Or schedule it when the last one has occurred?
Investigate how other Nagios add-ons make the decision on when to schedule.
Call out rotas
Have a rota available for metausers. Export data or be able to hit app with information about who is currently on rota. Seems like a separate webapp.
Email notifications
Test email
There's a test SMS button - have a test email one too. Would make sense to have test for all notification types.
Email configuration
Opsview currently assumes mail command is /bin/Mail
Ideally a config option could be added to opsview.conf similar to:
# Sendmail compatible mail command mailcmd="/usr/bin/mail"
...so that this can be configured at installation. This avoids symlinking /bin/mail to the real mail location, which isn't a particularly elegant solution and may cause problems with other Debian packages.
Email templating
Use some TT toolkit to create an email template. Saved in file locations so Nagios doesn't need catalyst. But use catalyst to send a test email.
http://www.catalystframework.org/calendar/2008/6
ODW data cleanup
Run cleanup_import back in time to a specific point in time and delete data from there onwards. To support this, need to retain service_saved_state over longer than just the last hour. Check that there are still servicecheck results at this point in runtime db (to avoid deleting data and not being able to regenerate). Also need to remove service_saved_state up to this point. Note: need to consider cleaning up downtimes - can those be deleted and recalculated?
ODW Reports
Include a set of scheduled reports that include performance data from service checks for given hosts
More status links
The status summary totals could be linked to other pages. For instance, the total criticals (handled + unhandled), could be linked to display those criticals.
Grouping service checks on Monitors tab
Instead of listing all servicechecks, group them together by category, with an open triangle (or plus sign) which does an ajax call to list the servicechecks within that category.
Anything with a ticked servicecheck should already be open.
This should reduce the noise on those pages.
Menu options on configuration list pages should be displayed on the edit pages
The handy popup menu should be duplicated on the edit pages in the top right, or exposed in some way, rather than forcing user to go to the list page for some information that is normally in the menu.
Resolve servicechecks
The resolve button should be menu. There should be a "check now" next to each active check to run immediately to see what effect is. The arguments should be alterable.
Mass scheduled downtime
Similar to mass ack, but for downtimes
Recurring downtimes
Are existing projects that do this. Integrate into Opsview?
Morning healthcheck report
Run a report for first thing in the morning to list outstanding problems at a point in time.
Embedded HTTP server on slave
Have a perl HTTP server on slave, so no need for apache configuration. Not sure if this is a great idea. Proposed: #258
Reviewing alterations before committing
Have a review process before committing. Mark hosts/servicechecks in a "to be reviewed" state and stop a reload until all approved. Proposed: #290.
Rename RRD files when host/servicename changes
When a hostname or servicename changes, rename the appropriate RRD files to retain history. Proposed: #49
SNPP notifications
Notification via SNPP. Proposed: #140
FastCGI options for RH
Update docs instructions for fastcgi on RH.
Plan is that will switch to using a preforked Catalyst engine, possibly around 3.2, but fastcgi can be an option if required earlier.
Agents streamlined
Agent pages. Cleanup various packages:
- opsview-agent should be independant
- Have locally override-able directories
- nrpe.d for check_commands in each file
- Only install basic agents
- Web configuraiton to only have agent plugins listed
Keyword configuration
Extend to hosttemplates, hostgroups and servicegroups
Slave sending results to master
See SlavePlayback
Viewports overall page
Status of each viewport on a summary page. Maybe in a list, maybe in a html tag cloud.
Integration with network weather map
Investigate Network Weather Map (http://netmon.grnet.gr/weathermap/)
Tokens for sessions
See: http://search.cpan.org/~hide/Catalyst-Controller-RequestToken-0.03/
Used for checking submissions of data.
iPhone web support
See http://www.joehewitt.com/blog/introducing_iui.php for framework for webapp.
Reorganise filesystems
Currently, most things going into /usr/local/nagios. Move to standards based file locations:
/opt/opsview - for code /etc/opt/opsview - for host specifics /var/opt/opsview - for data specific (rrds, running configurations)
Need to consider HA setups - where a directory location is shared between two nodes. So /var/opt/opsview should be the shared disk area. Running nagios configuration should be in /var/opt/opsview, as opposed to /etc/opt/opsview, because then it can be shared. (Compare with current system, where /var/log/opsview is not shared, but /usr/local/nagios is - where should these particular log files go?)
Sparklines
Use sparklines for small visual information: http://www.catalystframework.org/calendar/2008/3
Maybe added to certain viewport pages. Use ODW performance data? Or RRD data?
