How to manage the Database Availability Group?

The Database Availability Group is the central point of High Availability in Exchange. Because it provides a global service of Mailbox availability it has to be considered as an entity itself and not only as a collection of servers.
Managing the Database Availability Group is critical as it ensures High Availability by receiving proactive alerts. Administrator will identify bottlenecks checking the service delivered by the Mailbox role and the Mailbox Databases and finally control its usage to prevent any capacity issue.

GSX Monitor & Analyzer main features

Prevent Availability Issue

Alert on Database Availability Group Down

Alert on Database Availability Group Failover

Replication services checks

Database Availability Group Real Time statistics

Identify bottlenecks within the Database Availability Group

Measure the High Availability of your system

Prevent Usage and Capacity Issue

Measure the performance of your mailbox databases at the database level

 

Prevent Availability Issue

To provide a real Microsoft Exchange High Availability you need to be alerted as soon as your system is at risk.
GSX Monitor and Analyzer monitor and report on the Database Availability Group availability and usage thanks to multiple automatic PowerShell tests that check the Database Availability Group at different levels:

  • At the Mailbox Database level
  • At the Mailbox role level
  • At the Database Availability Group entity level

GSX Monitor and Analyzer check the health of the any Mailbox Database inside the Database Availability Group. Then GSX Monitor & Analyzer check the Mailbox role servers within the Database Availability Group to determine the availability and correlate it with the Mailbox database checks. It also ensures that all the replication services, needed for the replications of the Database across the Database Availability Group are up and running.
GSX Monitor and Analyzer final test is at the Database Availability Group level to check if each mounted copy of Mailbox databases has a healthy one inside the Database Availability Group.
Hence you don’t have to wait anymore for user complaints, GSX Monitor and Analyzer warns you and help you to identify the root cause of the problem.

SS1 DAG

Alert on Database Availability Group Down

GSX Monitor & Analyzer provide different types of alerts depending on the level of the objects tested in the Database Availability Group.

The Database Availability Group is down and an alert is sent if the mailbox role server that host one of the two copies of the mailbox database is down or if the replication fails on the Mailbox server where a critical copy of a mailbox database is hosted (i.e. without this server there isn’t high availability anymore for all the Databases).

The Database Availability Group is also down if the Quorum tests (FileShare Quorum and Quorum Group) are down, whatever the server where the test has been made.

Finally, the Database Availability Group is down if there isn’t at least one healthy (passive) copy of each datababase in the Database Availability Group.

GSX Monitor and Analyzer check constantly and scan repeatedly to ensure that these tests are healthy. If not, it will alert that the Database Availability Group is down with the description of the database in error or indicates the replication test that failed if critical.

16 dag down

Warning on Database Availability Group

Alerts can be sent if a non-critical error is detected on the Database Availability Group.

For example:

  • One server on the Database Availability Group is down but there is still at least one healthy copy of each active database in the Database Availability Group.
  • One of the replication tests returns a warning (i.e. Content Index fails). In this case, the replication is still active but in a degraded mode.


Alert on Database Availability Group Failover

Ensuring that all database copies are healthy is vital for daily messaging operations to guarantee high availability. As such, proactively monitoring them is crucial and receiving a notification when the DAG failover is at risk is a key advantage to pinpoint any issue and troubleshoot accordingly.

IT Administrators will get notified as soon as a failover occurs in the DAG.

16 dag down 


Replication services checks

GSX constantly checks, scans repeatedly to see if all of the services needed for replication are up and running at the server level for every Database Availability Group member. These include:

  • Active Manager
  • Cluster Network
  • Cluster Services
  • File Share Quorum
  • Quorum Group
  • Replay Service
  • Task RPC Listener
  • TCP Listener
describe the image

For all of these tests, GSX remotely accesses each server, makes the tests in real time and displays all of the results in the real-time statistics view–  alerting the administrator on any down and warning status.

Database Availability Group Real Time statistics

GSX displays the following information in its real time statistics view:

  • At the server level: servers and their status part of the Database Availability Group.
  • At the mailbox database Level:
    • Mailbox database in the Database Availability Group
    • Status
    • Number of copies across the Database Availability Group
    • Server where the mounted database is located
    • List of servers that host healthy and error copies
    • Size of each Mailbox Databse in MB
    • Number of Mailbox per Mailbox database
    • Average size of Mailbox per Mailbox Database
    • Last log inspected date per database
    • Last full backup date per database
  • At the Replication level: all of the results of the replication tests – Up, Down or Warning with error details and the list of servers  that failed or returned a warning.

These real-time statistics are here in case of problem reported in the overview to isolate it and ease the troubleshooting.

describe the image

 

Identify bottlenecks within the Database Availability Group

Because GSX Monitor and Analyzer test the DAG at multiple levels, from the Mailbox Database to the Database Availability Group itself, it collects a lot of information from your system that will help you in case of any problem.

GSX Monitor and Analyzer test each Mailbox role server to measure their availability (see the Mailbox role server page for more information).

GSX Monitor and Analyzer report on the service that should be delivered by each server and allow you to quickly compare the performance, the availability and the usage of all your Database Availability Group members.

Hence, you can see in few seconds, which server doesn’t meet your SLAs or the level of performance and service that it should provide. You can see which server has the more database, the more Mailboxes, which one is overload, or which one constantly have latency or availability problems.
You can forecast in one click, the availability and the usage of your Database Availability Group.

Add a new Mailbox Role server or extend storage becomes easy.

SS4 High availability

 

Measure the High Availability of your system 

GSX Analyzer provides an easy to use interface with built-in reports to automatically show the evolution of the service provided by the Database Availability Group on a dedicated period (days, weeks, months, years..)
Here is the list of out of the box Database Availability Group availability statistics:

  • DAG Up / Down in count, seconds, time, with or without maintenance, during Business and or Off Hours
  • Longest Downtime
  • Outages, with or without maintenance, during Business and or Off Hours.


Preventing Usage and Capacity Issue

Database statistics are more critical than ever, while server statistics are not relevant anymore. What is important is the availability of the information rather than where it sits.

To check SLAs and general availability of the services, what really matters is the DAG (Database Availability Group) that clusters all the DB into a general pool of DB Stores. Here are the statistics provided by GSX Analyzer through which an administrator can easily report, compare, make trend and forecast to prevent any capacity issue.

  • Number of healthy Mailbox Databases per DAG
  • Percent of healthy Mailbox Databases per DAG
  • Number of Mounted Mailbox Databases per DAG
  • Percent of Mounted Mailbox Databases per DAG
  • Number of mailboxes per DAG
  • Average mailbox size per DAG
  • Number of Mailbox Databases (per DAG)
  • Total Disk Space usage for all Mailbox Databases (per DAG)

Statistics on the number and average size of mailboxes allow for the administrator to control the sizing and make capacity planning at the DAG level regarding the total population of users. It has to be correlated with the number of DB and the storage used in order to decide if the rules in place are still relevant at the DAG level.

Measure the performance of your mailbox databases at the database level

All of the following statistics have to be correlated with database statistics at the database level:

  • Database: Up 24h
  • Database: Up 24h without maintenance
  • Database: Up business hours without maintenance
  • Database: Down 24h
  • Database: Down 24h without maintenance
  • Database: Down business hours without maintenance
  • Database: % up 24h (Day / Week / Month)
  • Database: % up 24h without maintenance (Day/Week/Month)
  • Database: % up business hours without maintenance (Day/Week/Month)

 

All of the mailboxes are stored in the DB and most of the time they inherit from their characteristics (quota for example). A decrease over time can be due recurrent issues, network, SAN problems, etc. This figure should reflect the SLA’s availability of mailboxes.

SS5 DAGtrendreport