In this RoboTech we will cover a very important topic for organizations using Cloud services, especially Office 365.
We are speaking about contracts, what guarantees they include, and how to optimize your relationship with Microsoft.
With Software as a service (SaaS) like Office 365, the role of the admin team has shifted dramatically from infrastructure support and maintenance to service management.
The purpose of IT Service Management is to manage end-user expectations, reduce the number of issues, reduce our mean time to repair, and verify the best possible service delivery to your users.
So let’s take a look at Microsoft contracts.
Microsoft Office 365 Contract Management
Contracting with Microsoft Office 365 offers you the best-in-class Collaboration SaaS application.
That comes with several guarantees in term of availability of the services from a Microsoft perspective.
A SLA is a kind of insurance against service disruption so the first thing to do is to understand the limitations of this insurance.
Basically, they concern:
► Anything happening outside of reasonable control (force majeure)
► Anything happening outside of their datacenter
► Anything that has been caused by your company (irrespective of Microsoft recommendations, bad configuration bandwidth,
unauthorized actions, etc.)
► Any downtime happening during scheduled downtime
To be clear, the service delivered to your end-user is NOT guaranteed by Microsoft SLA. And that is completely normal as Microsoft is not running your Network, your ISV or anything inside your infrastructure.
Only the service delivered to the edge of their datacenter is guaranteed provided that you didn’t contribute to make it fail.
So now that we understand what is excluded, let’s understand what’s the famous 99.9% means for you.
Microsoft calculates a downtime ratio based on your total number of user minutes of use of the service.
The calculation is: [(User Minutes – Downtime Minutes) / User Minutes] *100
As downtime only counts for users that are impacted, you might surmise that you need a big incident to go under 99.9% of availability.
Let’s do a short calculation for a company with 10,000 users:
In order to breach the 99.9% SLA, you would need for example to have an outage of 44 640 user minutes of downtime per month.
And that means an incident of almost 45 min for a 1000 of your users per month.
We now understand the general limitations of the contract and what is generally insured. Now let’s look at services.
Microsoft Services Guaranteed
Microsoft Exchange Online
For your users and so for your ITSM, Exchange Online encompasses a wide range of actions including accessing mailboxes from Outlook, sending email, creating meetings, checking free/busy statuses, searching for mail in the mailbox, etc.
But in the Microsoft SLA, the only service guaranteed for Exchange Online is the ability to send or receive email with Outlook Web Access.
Here we are speaking about availability only; not about Performance.
If the service is slow, it is still considered as up from an SLA prospective, even if your users might consider it down.
Any other Microsoft Exchange feature is excluded from the SLA.
Let’s continue with Microsoft Teams.
The calculation is the same but only on the ability for a user to read or post to chat conversation for which they have appropriate permissions.
Nothing regarding calls, video sharing, etc.
If we look at OneDrive, the only service guaranteed is the ability for a user to view or edit files that are stored on their personal OneDrive for Business Storage.
If we look at Microsoft SharePoint Online, the SLA is a bit the same and consist in the ability for a user to read or write any portion of a SharePoint Online site collection for which they have appropriate permissions.
Now things are a bit different with Skype for Business Online.
It is one of the only service to also have a kind of performance SLA. There are 3 SLAs for Skype.
► The first is based on the ability for a user to see presence status, conduct instant messaging conversations, or initiate online meetings.
► The second is based PSTN Calling and Conferencing, guaranteeing the ability of a user to initiate a PSTN call or conference.
► And finally, the last SLA on Skype is about Voice quality.
For voice quality, Microsoft basically calculates a Network MOS that predicts what would be the end-user call quality ranking.
They then check how long these poor-quality calls last and provide a ratio with the total number of user minutes in a month.
The network MOS is based on a constant measurement of the roundtrip time, packet loss, Jitter and concealment factors. The calls need to be placed on Skype for Business Certified IP Desk phones on wired Ethernet.
Any network latency that would be found on your network would prevent you from claiming any credit in case of major issue.
If you want more details on Skype for Business Voice quality monitoring, you can:
At this point, to maximize your relationship with Microsoft we would recommend:
► Read your contract and make sure you differentiate what Microsoft promises you and what you are promising to your business lines.
► Microsoft SLAs are a good starting point but cannot be a basis for your Service Delivery
► 90% of the time, a user’s performance issue root cause will be found outside Microsoft range of responsibility. So, you need to implement Cloud Service Delivery
best practices to deal with the end-to-end service delivery to your end-users.
But let’s say that you have identified issues and you want to talk with Microsoft. What is required for that?
How to let Microsoft help you
To help you, Microsoft needs to have a certain number of statistics and facts:
► A detailed description of the incident
► Information regarding time and duration of the downtime
► Number of locations and affected users
► Description of your attempts to resolve the incident
The question is then, how do you collect this information?
How do you know that the service was possibly down on a Sunday at 3 am if you are not constantly monitoring it?
These questions point to the necessity of monitoring, from a Microsoft service perspective and from an end-user perspective.
We would recommend here to not forget to report your outages and performance issues.
But, you will you need statistics.
There is an easy way and a hard way to get those statistics.
► The hard way - Deploy and maintain complex scripts running from every locations, alerting you when an issue arises and feeding databases that can be easily used to share the data with Microsoft.
► The easy way -Or you can use third-party solution tools, like GSX Gizmo Robot User for Office 365.
The GSX Robot Users are small Windows services that you can install anywhere you want. The Robot Users act exactly as a user would do on Office 365, performing complex end-user scenarios.
They alert you in case of any availability and performance issues and provide every data you need through PowerBI or any other BI Solution.
To know more about the GSX Robot User, please read this article >>
Finally, before contacting Microsoft, go through incident analysis to make sure that you are not responsible for what is happening.
Here is an example of Exchange Online Service Level dashboard that you can easily get on PowerBi with the GSX Gizmo Robot User data.
What we can see here is that you can have the service delivered per location, but also per actions that your users are performing.
With the convenience of the GSX Service level dashboard, you not only have the service availability information, but also vital information about the performance you deliver and reach on a daily or monthly basis.
It is also a perfect way to share your data with Microsoft to help them helping you.
We built these dashboards using Microsoft and Gartner recommendations that we are now about to detail.
Now that we’ve seen the benefits and limitations of the contract with Microsoft, we understand that you need to go a step further if you want to ensure Service delivery to your end-users.
The next RoboTech will focus on what Microsoft recommends helping you managing the service delivered to your end-users.