GSX Solutions is the premier specialist in Office 365 service performance monitoring and optimization. Our Robot Users have worked in many enterprises to measure and analyze their end-user experience.
Large oil and gas company
50 locations, most remotely located in Africa and the Middle East.
Migrated from Exchange on-premises to Exchange Online.
- Uses SharePoint on-premises.
- Plans to deploy Microsoft Teams for collaboration and Voice across all locations.
- Sensitive to challenges that Microsoft Teams might introduce to network management and stress to current infrastructure.
- Route to Office 365 via connection of the remote locations to headquarters via VPN.
- Tenant in EMEA.
Deploying Microsoft Teams can be risky, especially if you don’t prepare well. The primary challenge is to understand the network, its current usage and capabilities in order to predict the impact of the change. With dozens of remote locations scattered around Africa and the Middle East, the Company’s IT management team knew that an extensive preventive assessment was critical to:
- Identify locations that might experience problems.
- Pinpoint what can be done to improve the level of service.
- Calculate the ROI of identified infrastructure updates/ improvements.
- Select which sites will benefit from Microsoft Teams.
- Determine which sites will have to stick to satellite communications or landlines.
The network usage of each site varies over the day, week, and month depending of the level of activity. IT management needed to continuously monitor performance statistics as recommended by Microsoft. However, Microsoft’s free assessment tools are not designed to conduct an evaluation over long periods of time or in so many locations at the same time. In urgent need of a solution, the company contacted GSX Solutions for assistance in early 2019.
The challenge was to continuously test, collect and analyze performance data from 27 remote locations across Africa and the Middle East over a period of two months. We sat down with Microsoft’s Teams engineering team and decided on a step-by-step plan.
Since GSX Robot Users can be seamlessly rolled out remotely from a central interface, the setup went very quickly. Within a week, all 27 locations were equipped with Robot Users, installed either on laptops or Windows sticks. All Robot Users were configured to monitor key metrics recommended by Microsoft.
The complete set of tests is performed every five minutes, stored in a SQL database, and displayed on a Power BI dashboard specially designed for Microsoft Teams readiness assessment. Data includes:
- Latency time, one way from the Robot User to the endpoint and round trip. This is key to understand if communication will be smooth, without a ‘satellite phone-like feeling’.
- Packet loss and packet reorder to see if interruptions or distorted voice quality occur or not.
- Jitter (inter-arrival jitter): to make sure that there is no delay, distortion, speed up or slowdown during calls.
- Network MOS: to forecast the overall score of the audio quality.
- DNS resolution time, to make sure that the DNS configuration won’t impact the overall performance.
- Traceroute: to measure the consistency of the number of hops between locations and the Microsoft datacenter.
- Microsoft Teams login time from every location.
- A real Microsoft Teams Voice call that collects quality information during the duration of the call
- Bandwidth estimation during the time of the call
As Microsoft recommends, the tests were performed both directly from the client Edge to the Microsoft datacenter and from the enterprise Edge in Paris to the Microsoft datacenter.
Once everything was set up, we created a Power BI dashboard to display the results. This allowed us to clearly identify the usage pattern of the network across the 27 locations.
We built the dashboard to provide a clear understanding of statistics and to identify usage patterns during the next two months.
We advised analyzing the network load to identify the peak times for each day, week and month.
To determine whether a site was suitable for Microsoft Teams, it was decided that the network tests should succeed during the daily and weekly peak times.
A daily test failure during a monthly peak was investigated to clarify whether or not the required infrastructure improvement would provide a sufficient ROI.
Our Robot Users tests showed that the daily peak was experienced between 7 AM and 8 AM local time. The weekly high was on Thursdays (Friday is part of the weekend in most Middle Eastern countries) and the monthly workload peaked in the first week of the month.
Out of 27 locations tested, 5 failed to meet Microsoft performance recommendations during the peak time of the month; 11 failed during the peak time of the week, and 5 failed the daily tests. Only 6 locations performed fully within the recommended guidelines during the entire test period.
What does ‘failed’ mean, exactly?
Following Microsoft’s recommendations, here are the success margins:
|Name of the Metric||Client to Microsoft Edge||Customer Edge to Microsoft Edge|
|Latency (on way)||< 50ms||< 30ms|
|Latency (RTT / Round trip time)||< 100ms||< 60ms|
|Burst packet Loss||< 10% during any 200ms interval||< 1% during any 15s interval|
|Packet loss||<1% during any 15s interval||< 0.1 % during any 15s interval|
|Jitter||< 30ms during any 15s interval||< 15ms during any 15 interval|
We won’t provide the details of the entire dataset or metrics of each site here, but we will explain how we analyzed our results with the company’s IT management.
The five sites failing daily tests had MOS quality below 3.
With these results in mind, the company’s IT management was ready to look into a set of potential network tweaks to see if they could improve these sites without spending too much money.
First, we recommended that they work on their route to the cloud.
Management had to be convinced that the extra layer of network infrastructure that they had put in place was actually unnecessary. So, we asked a Microsoft expert to show how Microsoft Teams communication is encrypted out-of-the-box, and how using a proxy and a VPN only affects performance without adding any extra security.
In order to providence evidence to support the Microsoft claim, we compared the performance between one Robot configured with the VPN connection to headquarters and another Robot breaking out directly to the Internet to access the closest Microsoft access point.
Results were clear. The Robot directly connected to Microsoft provided calls with 30% better Voice quality than the one connected to the headquarters.
All of the 16 sites that failed all the time were then set up to break out to the Internet directly and connect to a Tenant in the Middle East. (Before they had been configured to go through a VPN and a proxy to access a Tenant in Europe.) On top of that, network QoS was implemented wherever possible within the network. Our Robot Users monitored the effects of these changes, providing a new set of performance metrics. It turned out that just adjusting the path to connect to the Microsoft datacenter considerably improved the situation for 11 of the 16 sites. New global results showed 15 sites now passing monthly, weekly and daily tests; six failing the monthlies, three failing weeklies and only two failing the dailies.
A second round of network performance and configuration assessment was initiated to resolve the remaining site issues. With GSX Robot Users we were able to determine that two of the locations had firewall issues. These were spotted due to the abnormal values for the round-trip time. Three other sites, including the two that were failing daily tests, had bandwidth issues. Their high percentage of packet loss was clearly linked to network constriction problems. The remaining site checked out from a network standpoint.
So, the final results after the network improvements showed two sites failing daily tests, one failing weeklies, six failing monthlies and 15 sites succeeding. IT management went through an ROI analysis for the sites failing the weekly and daily tests to consider bandwidth improvements.
It was concluded that the two sites failing dailies would not benefit from Microsoft Teams Voice because the costs of network improvements were too high for the number of users present at these sites. However, Teams Collaboration was ok to be deployed. The site failing weeklies will get additional bandwidth from a new ISP to get full Microsoft Teams services. All other sites were ready for deployment.
As you can see, the Microsoft Teams readiness assessment was necessary.
Simple changes clearly identified by GSX Robot Users improved the audio experience and overall user satisfaction. By systematically monitoring and analyzing crucial metrics, the company avoided a lot of trouble before, during and after the deployment, while controlling their expenses. Smart decisions were taken based on facts. This assessment led to broader adoption and ultimately an overall reduction of the TCO of Office 365 services.