While working with enterprise scale companies, we faced a lot of different network configurations. Some obviously had an impact on end-user experience.
In order to share with you the results of our finding, we reproduced some cases and measure how they impact the performance of the Office 365 services.
As usual, we are using our Robot Users deployed around the world to measure end-user experience metrics. For this test, we ran again 10 Robot Users from the different locations (Azure, Bangalore, Boston, Nice, Philadelphia) with different configurations (with or without proxy, connecting to headquarter in Europe or not, etc.).
For more information about how the Robot Users are providing you with end-user experience metrics, please read this article >>
Configuring the best entry & egress point
The first important point to deal with your Office 365 deployment is your entry point. The way you access to your tenant and where your access it from a configuration standpoint is extremely important.
You probably know that if you just ping Office 365 from whatever region you are in, you’ll see the latency results of your specific local region. It is one test but clearly doesn’t help much when it comes to understand what goes on from an end user perspective.
Let’s see how you can really detect issues within the configuration with real end-user experience metrics.
The first test results that we analyze here are the “Create Meeting” end-user experience.
The findings here are that if we just analyze an “open mailbox,” everything looks fine. However, we might be able to see different results if we look into more actions.
So let’s dig into these other actions to see whether or not “open mailbox” is a sufficient measure to test tenant level connections.
On the top we see a linear chart that compares the performance over time of every Robot User.
On the bottom left, we have the list of all the Robot Users (here the active ones are one running in Microsoft Azure, one in Bangalore, two in Boston and one in Nice). In the bottom middle are shown a few statistics that are calculated to measure the severity of connection issues.
The square chart on the bottom right compares the average of time taken by each Robot User to perform the selected action (in this case create meeting).
To know more about to interpret the statistics we display in PowerBI dashboards please read this article >>
The performance of the Bangalore Robot User (black) trying to access the European tenant is not as good as the USA ones (red & yellow) trying to access the European
tenant. The best one being the Robot User located in Europe accessing a European tenant (the green one) and of course the last one sitting in Microsoft Azure connecting to Office 365.
Clearly, depending on when your users are and where the entry point is the performance varies.
Let’s focus the test on one location confirmed with 2 different routes in order to confirm our results.
From Boston, we performed 2 types of tests using dedicated mailboxes.
One Robot User is trying to access a European tenant and the second one is accessing the US-based tenant.
We are receiving 450 ms latency between Boston and the Europe tenant compared to 390 ms when Boston is reaching a US tenant.
So we are picking up 20% of latency just because the Robot User came on to a US entry point, and then from the US entry point traversed across Microsoft networks in Europe back over to where that Mailbox is located.
You should definitely consider these parameters when organizing your tenant and entry point worldwide. Limiting the distance between users and their mailbox is always a good idea to increase end-user performance.
Let’s now observe a typical instance that happens all the time with enterprise grade companies.
Connection through a Headquarter
Like most of the companies worldwide, we organized our tests to mirror those offices with different locations. In our example, we performed 2 different tests from the same location; both located in Philadelphia and both connecting to a tenant in Europe.
So you can think of a company with global headquarters in Europe and each of the branch offices connecting to that location and then breaking out to the internet from different parts of the world.
This situation happened with a lot of customers we assessed whom received performance issues: hold back the internet to corporate and then send it out from corporate.
To show you the impact, we reproduced the situation with one Robot User connecting directly to the Internet in Philadelphia in order to connect to Microsoft network in the USA. It then traveled across the ocean directly onto their network.
We had the second Robot User connect to the headquarters and then break out to the Internet to access the European tenant.
You can see on the dashboard that we also consider the hops; with the popular notion that the less hops there are, the better the connection is.
Here, the robot user in Philadelphia connects to the Microsoft endpoint in the USA and travels on the Microsoft network actually has 26 hops end to end.
The one that connects to headquarters in Europe and then breaks out to the Internet has 15 hops. This could lead us to think that the latter was faster than the first Robot Users (RU2).
But that was not necessarily the case.
Let’s take a look at the data.
In green we see Robot User 1, configured to hold back the connection to the internet to headquarters in Europe.
In black is the Robot User 2, configured to access the Internet as soon as it can in Philadelphia.
We can see the results in linear graphs in the middle, action per action. Also in the square graph we can see how each square represents the average performance of an action of a Robot. This type of graph easily allows a visual comparison of Robot User performance.
At the bottom, we can see two bar graphs that demonstrate the amount of times a certain action has reached the limit of acceptable performance.
We defined that an acceptable limit would be at three seconds for the free busy lookup and the “create meeting” features. This three seconds came from our observations of customers by linking the end-user performance data with opened tickets.
Working with customers’ and their environments, it appears that if the users repeatedly have to wait 3 seconds for a simple free / busy lookup to create a meeting, they will start to open tickets due to losing patience.
First, we looked at the open mailbox. But it doesn’t really matter as much because the complaints and support tickets come mostly from the actual actions that users perform such as looking at the free / busy statuses or creating meeting.
The free/busy lookup, again, really shows the end-user when there is a performance issue because they can actually see the waiting bars filling in each time they try to create a meeting; this is especially true with multiple attendees.
Going from Philadelphia to headquarters with the 15 hops provides an average of 1 second per free / busy lookup. During two days, the free / busy lookup was out of range of the end-user acceptable performance limit 16 times.
Going from Philadelphia directly to the internet, using a Microsoft entry point in the USA, provide an average latency of about 0.6 second, meaning about 40% faster! The number of times the performance was out of our acceptable range in the instance was around 8 times.
This impacted the "create meeting" function as well where we were able to see a difference of 50 to 60% in the end-user experience.
This data is really important when looking at what you can do to improve performance. This scenario shows how easy it is to improve end-user performance with simple correction of network configuration.
Microsoft will always have the best network and choosing one network configuration over the other can make a tremendous difference in the end-user experience.
Verify our findings
In order to make sure that these results were not skewed by a bad local network at the headquarters, we added another Robot User that operates from headquarters in Europe and connects to the tenant.
You can see the new results in red:
It is clearly the fastest and confirms that there is no issue with the headquarters network. But we also see that it is not that far off from the one in Philadelphia connecting to the Internet in the USA (the headquarter Robot is just about 15% faster).
So the conclusion in this case is that going onto the Microsoft network as quickly as you can is the best thing you can possibly do.
Our main advice is that you want to break out to the internet to a tenant located near you geographically and get your packet handle up to Microsoft as quick as you can possibly can.
We’ve seen that the route configuration between your locations and the Office 365 datacenter definitely has impact on the end-user performance.
Working on that is a first good step to improve the overall performance of the Microsoft Cloud services.
If you want to know more about improving your Exchange Online performance, please take a look at our RoboTech articles here >>