AtWatch is a hosted monitoring service that (according to their web site) was acquired in 2001 by the company InternetSeer. In fact, I believe that @watch (as their logo is written), provides all of the higher-end services for InternetSeer. The service they provide seems to have at least four locations and some dedicated network lines, including one they announced in July that is situated in Germany. In signing up for a testing account, it only lets you add one URL to crawl, but this will hopefully give some sense of what they can do.
To begin, I signed up for their 14 Day Free Trial account. This asked for the (one) URL to monitor, the timezone in which I reside, which greographical area I want to be crawling from (the choices were California, Pennsylvana, New York or Germany) and the usual name, email, and password info I would need to log in.
That's cool. One caveat for later is that it seems to want you to log in using your assigned account number as the username, rather than something more traditional like your email address. If you forget it, their 'forgot it' page allows to to enter your email address in able to have the account number emailed back to you.
When it brings you to the Account Administration page, you should see the URL you chose. There won't be any reports or pretty graphs to look at just this minute, you will have to wait at least a day. The Trial Account only does a check once each 20 minutes, so basically, don't expect any interesting data until the next day, unless your test site is already crashing.
They do have a 'snapshot' feature that allows you to generate a checksum of the front page of your test URL, and in a minute we can see how to enable this 'Hacker Alert' (i.e. the page has changed. Go ahead and click on the camera icon next to your URL, and we can play with it a little.
Clicking the 'Edit' link directly next to the URL on the Administration Page should allow you to begin playing around with the advanced features of @watch. The Service Level and Availability Interval are not configurable in trial mode, but several other things are.
You can choose not to receive daily, monthly or weekly email reports with simple summaries of what went on during that period of time. But does that simply mean that you are not going to be emailed links to these statistical reports, or that they are not going to be available to you even when you are there? I am unsure, and will investigate.
You can modify basic information on the URL (including changing the URL to something else if you want), and add a name and a URL of your ISP.
Alert Options are always the most fun and variable parts of these systems, and @watch is simple without being too basic. You are allowed a Primary and a Secondary contact, whose information you enter in the 'Contact Info' section of the site. Anyway, back in the Alert Options section, I will set my primary alert contact to get email and a page if something goes wrong, and then email and a page if a problem resolves itself. If a problem is detected more than a certain number of times in a row, it will be 'Escalated'. I can then set up my secondary alert contact to receive an email and a page on an escalated item.
According to their documentation, they would normally check twice (although I am uncertain if this means twice through the Availability Interval, or if they initiate a second check immediately). My guess is the former, since there is a specific 'Immediate Alerts' checkbox that starts alerting immediately after the first error is detected.
Then there are 'Watch Options'. You can ask it to verify that a particular string of characters (or two sets) are present in the first 1k of the HTML. This might be a way to determine if a dynamic job running on your site (near the top) is correctly returning the proper text. They also have a URL Image Check, that makes sure each image referenced on the test URL is not breaking. The 'Hacker Check' compares a checksum of the URL's HTML to a previously taken 'Snapshot', and alerts you if they do not match. This feature is only useful for checking static pages, since dynamic pages resulting in slowly changing content over time should trigger an alert. The 'URL Image Check' and 'Hacker Check' are only available with their highest level of service.
Their final Watch Option is the 'Site Content Check', available with their highest and mid-level service plans. This tool will crawl through a limited section of your site, searching for and telling you about broken links to a maximum depth of 4 levels, or 3000 links, whichever comes first. They do check to make sure that links to outside pages respond, but @watch does not traverse them itself. Since you can't do it on demand, and it wouldn't finish larger sites, this is really just a quality control measure, to make sure your top level pages aren't embarassing you with formerly-working links.
The periodic reports themselves (daily or weekly were the only ones I saw during my two week trial) were simple enough. They show you a graph of response times for DNS lookups and retrieving the first 1k of the page, and the low, high, and average times for these statistics. If you had any alerts during that time period, you would see those too, showing the date and time, the alert condition and how many times it occurred, a little text detail about what it meant, and how many alerts (and escalated alerts) were sent.
Alerts also generally send emails. This email would include the time of the alert and its severity, specific descriptions of what is not working, how long the system has been in this condition, and a network traceroute, which could give your network folks an idea of whether it is a routing problem, rather than a software problem.
@watch has some features I did not test, namely having alerts Faxed to me, or having them sent to me through an email to SMS gateway. They also seem to have the ability to log in to password protected sites, track cookies your site is sending, etc. My test site did none of that, but I assume it works as well as the rest, which is to say, pretty well.
All in all, this system is focused specifically on website uptime, and seems to have the features a webmaster with a lot of pages to look after. The user interface was pretty easy to use, although I wish that the page where you enter your Primary and Secondary Alert Contacts was integrated somewhere on the 'Alert Options' page, since I can see some confusion arising there. The prices seem a little steep to me, but then, I am cheap.
Features of an Alert System
Alerts, or notifications, are the attempts of a monitoring program to notify someone that something anomalous is happening to the monitored system. There is no point in having an automated test of any page or system if there is no way to remedy problems that occur. For the most part, our systems' best response is to try to bring the potential problem to the attention of a human being.
Two features are crucial to making a notification system effective. The first, obviously, is to be able to get through to someone who can address and fix the problem, as soon after it is detected as possible. The second is to make some judgment about how serious the problem is, so that the humans who are being notified are not flooded with alerts that do NOT represent problems, a situation in which real notices will be ignored.
Email is the most common way of attempting to fulfill the first requirement. Most if not all modern web monitoring systems allow the webmaster to select particular email addresses who should be sent an automated message when anomalies of a particular type or severity are detected by the monitoring system. Obviously, this relies on the fact that the email service is working for that user, which can be a bit tricky if email service is being handled by the same machine(s) that are currently exhibiting problems. It is frustrating to wade through the smoking wreckage of a web server or database, finally restore it to some working function, only THEN to finally receive a bunch of messages that would have kept the wreckage from smoking so furiously in the first place.
At a minimum, the system that handles email for anyone who is in charge of dealing with a notification should be hosted on a different system from whatever web services are being monitored. A better solution would be to have problems escalate through several different types of messaging solutions, in order to maximize the chance of the alert getting through. The more sophisticated (also more expensive) web monitoring solutions offer several means beyond email to get in touch with responsible parties ... Instant Messenger, ICQ, Skype, SMS text messaging, pager or a call on the traditional telephone network are all options that I've seen advertised by these services.
The second requirement, that of intelligent filtering of notifications, is generally more complicated, and often requires some trial and error by the person responsible for configuring web monitoring before it works correctly. Obviously, the first step is to differentiate between alerts of different severity levels, and send each different one to someone who is qualified to handle the problem. A useful next step is to be able to differentiate messages based on the particular sub-system being monitored, so as to be able to tell a database administrator when the database has a problem, versus telling a webmaster when the web server is down, or the networking people when the machine is completely unreachable.
In the perfect world, the system monitor could actually fix the problem itself, and send you a happy 'dealt with it' message. Since this is unlikely to occur in the immediate future, we have to assume that various responsible humans will have to receive these messages. So an important feature of any web monitoring alert system is to do as much as possible to only intrude upon the attention of the user when something truly urgent is happening, to avoid 'message fatigue', in which the user gets so many messages from the monitoring system that they just automatically delete them, or assume that they are more of the same ... what a waste to have a truly important event missed because it is buried in a list of hundreds of 'system is ok' messages!
A simple way to keep from fatiguing the attention span (and flooding the email box) of the user is simply to remember some history about the set of alerts, and group them as a single event, like 'the web page changed significantly each of the last 20 times I looked at it', instead of sending 20 messages that say 'the web page changed'. Fewer messages mean that the ones the user does see are apt to receive more attention.
There is potentially even more value in having the alerted parties give feedback to the web monitoring system, allowing it to 'learn' when and how to alert the user, and in what circumstances. A mechanism for accepting feedback and learning from it can be as simple as offering the notified party a way to respond to the message, with the option of saying 'yes this is important', 'please do not bug me with this, it is normal', or 'do not bug me about this unless it occurs a lot'. As in any system that relies on explicit user feedback to configure itself, however, it requires a great deal of buy-in from all of the humans involved. After all, if nobody ever responds, the thing can't learn, and the whole exercise is an expensive and useless failure. And it requires patience on their part too, as it takes time and a certain amount of work by all parties involved before the system is fully responding to their preferences.
December 22, 2004 in Commentary | Permalink | Comments (0)