This guide walks through the creation of alerts, which are used to notify your team of downtime and performance issues.
Click the plus symbol on the page to view the alert creation form. Select the type of action to perform from Slack, PagerDuty, webhook request, or email as shown here:
Slack and PagerDuty integrations will redirect you to the corresponding site to integrate, while webhooks require a value, in this example the email(s) you’d like to notify:
Choose if the alert applies to every check, or a single check.
Select a metric to alert on, if your only concern is that the website, API, or application is up then select “Downtime”. You may wish to report on unresponsive endpoints using “Total Response Time” as well.
Enter the number of minutes of downtime within the time window (15 Minutes here) which is acceptable before triggering an alert. Choosing this value depends greatly on the nature of your service.
For less obtrusive actions such as Slack or email it may be fine to trigger an alert when downtime is above a minute or two, however triggering a PagerDuty for an intermittent problem is less desirable.
Click save and you’re good to go!
Apex Ping supports a variety of metrics to alert on. You’ll likely want a “Downtime” and “Total Response Time” alert to ensure availability and responsiveness. Some alerts such as “Total Response Time” allow you to alert against stats such as min, max, average or percentiles for additional control.
If you’re receiving too many notifications, you may want to consider increasing the threshold and alert window — let’s look at a few examples.
The most sensitive alert possible would be: Downtime is above 0 minutes in the past 5 minutes
, notifying you as soon as a single error is detected, remaining triggered until 5 minutes of uptime is seen by the alerting system, after which it may trigger again.
To decrease the sensitivity you could try Downtime is above 3 in the past 5 minutes
, triggering after 3 uptime confirmations have failed.
Alternatively, the alert window may be increased to lower the frequency of resolving, for example, the configuration Downtime is above 0 minutes in the past 60 minutes
will trigger immediately, resolving after observing 60 minutes of uptime, resulting in fewer notifications.
Apex Ping currently confirms downtime in three additional locations to ensure that the failure is not intermittent. This helps reduce false-positives, however it’s important to note that a single “minute” of downtime is effectively equivalent to four HTTP requests failing.
Requests which take longer than 10 seconds time out, and are treated as errors which contribute to downtime.
Alert emails are delivered from the alerts@apex.sh address, subjects are formatted as follows, which may be useful for integration with services such as OpsGenie: