dtquery Demo


This page shows samples of all of the graphs that dtquery produces, tells you what each graph depicts, and gives you some ideas to see how each graph can be interpreted.

The graphs were produced by querying a mon server which watches, among other things, the ping and NNTP performance of @Home's news server news1.frmt1.sfba.home.com.

The graphs are the following, in no particular order of importance:

  1. Downtime by Hour of Day - Also known as "The Bar Code Graph", this graph shows a binary representation of the state of the service. Red means failure, white means OK. Unless your timeframe is very small (1-2 days, or less), or you have very few failures, it may be hard to get much out of this graph. But it is very good at showing you date ranges to dig deeper within (or "clumps" of failures).

    Notice that this graph doesn't tell you a lot with such a long timeframe, although you can see distinctive clumps of failures around Jan 20, and seemingly trouble-free operation from about 17-Dec-2000 to early January. Perhaps the hardcore news fiends responsible for pounding @Home's news server were taking off for Christmas?

  2. Cumulative Downtime by Time of Day - This graph answers the question "what time of day is this host/group/service spending the most time in the failure state?"

    Notice peaks in this graph at about noon and 2AM, and a trough around 7PM. The 2AM makes sense, since I can imagine that would be a time of low usage (and hence network maintenance), but the noon peak is puzzling. Perhaps they tend to get overloaded around noon?

  3. Cumulative Downtime by Day of Week - This graph answers the question "what day of the week is this host/group/service spending the most time in the failure state?"

    Notice a big peak here on Thursdays, with smaller peaks on Wed and Fri. Perhaps @Home does most of their news server maintenance on Thursdays?

  4. Failure Time Distribution - This graph shows you the exact distribution of your failure times, in minutes, on a logarithmic scale. It answers the questions "Are most of my failures short? Long? Is there a discernible pattern?"

    Notice most of the failures here are 5 minutes long, which makes sense, since we test this hostgroup every 5 minutes. There are some longer failures out in the 1 hour range and beyond, though.

  5. Cumulative Downtime by Service - This graph answers the question "For a given group or groups, how is my downtime distributed among various services?" For example, how much time has your HTTP service failed relative to the ping service?

    Notice here a lot more time (about 3x) in NNTP failure than in ping failure. Which makes sense, judging from my own experience with most ISP's news servers...

  6. Cumulative Downtime by Group - This graph answers the question "For a given service or services, how is my downtime distributed among different hostgroups?" For example, which are the groups with the most minutes spent in the failure state?

    Notice that this graph is really boring, since we only searched on one hostgroup to create these graphs. But if you search on a single service (e.g. 'ping') across multiple hostgroups, this can yield interesting results.


Andrew Ryan <andrewr@nam-shub.com>
$Id: index.html,v 1.1 2001/02/04 00:50:44 andrewr Exp $
Last modified: Fri Feb 2 14:41:30 PST 2001