If you ever googled SharePoint online performance, you will see that you are not alone to wonder how to ensure end-user satisfaction on both SharePoint and OneDrive workloads.
Microsoft support, TechNet and other community sites are full of questions on these topics.
As you may have experienced, it is sometimes complicated when using SharePoint to understand where the latency is coming from and who is responsible for it.
This GSX Robotech article provides some tips and tricks to understand and improve the SharePoint Online and OneDrive service delivery for your users.
Detecting and acknowledging local issues
The first requirement to manage the end-user experience is to measure it. This establishes a baseline, location by location, based on what is good for each of them.
Not every location nor every user has the same definition of performance. What is acceptable in one location may not be acceptable for another location.
A baseline provides historical statistics and predicts performance.
For that you need to constantly measure, on very short intervals, over days, weeks and sometime months the end-user experience to identify every performance issue and recurring peak of change.
This baseline, by location, is be extremely helpful when it comes to understanding end-user performance issues.
At the same time, you need to collect network statistics that correlate with the end-user experience.
GSX offers the best possible way to measure your end-user experience and to correlate network metrics thanks to our GSX Robot Users.
Read here to have a better understanding of how our GSX Robots work (link to Blog GSX Robot Users).
Testing the best route to the cloud for SharePoint online and OneDrive End-User experience
To understand the impact of the route to the cloud and your network configuration we have set up multiple Robot Users, performing the same scenarios but with various routes and network configurations.
Our Robot Users are windows services that can be installed in any location you want. They use Office 365 exactly as a user does. They will also collect critical network statistics to help you troubleshoot performance issues.
For the scenario and use cases outlined below:
Our Robot Users were setup to login, upload and download documents on SharePoint online and OneDrive.
To understand how the route affects the performance, each of these Robots use a different network configuration:
All of our Robots have run for more than a week, feeding our SQL database with statistics. We can now analyze the data thanks to PowerBI.
To learn more about how to read our PowerBI dashboard, please read this blog
Let’s first review the components of the GSX Gizmo Dashboard before going through the analysis.
This dashboard is one of many we created to provide a comprehensive view of the service delivered to multiple locations.
We can follow the quality of service delivered and the number of issues experienced at each location.
The GSX Gizmo Analytics Dashboard provides 3 key statistics at the top of the page.
The first result is % of Availability for End Users with % of time the Robot was able to do the actions.
The second result displays the % of Performance Delivered to End Users. This counts the % of time the actions where made within reasonable service performance in order to measure end-user satisfaction.
Notice the service performance and availability are significantly different.
It is critical to measure service performance from a user’s perspective. To a user, something running slow and something not available are considered the same.
As Gartner says: with SaaS applications, slow is the new down.
The third result shows % of Network Performance over time displaying how often the network delivered good condition during the timeframe considered.
This performance scenario shows on average for all the locations, the pure availability was within our SLA but the performance was under what we wanted to achieve.
While the service was available at the locations almost 99% of the time, the users were only able to perform actions with decent performance only 84.57% of the time. One of the obvious reasons was the network performed well only 74,4% of the time.
The results are entirely different when we will look at a particular location.
The next dashboard result on the left displays Locations: by warning/downtime. This allows you to instantly know where to focus to improve your end-user experience.
Notice the Robot User with low bandwidth is the most problematic.
On the right, you can see Action Latency which details the time it took for every action.
This is important when we will analyze what is going on location by location.
The next dashboard component displays the Network performance per day.
It displays a graph of the last ten days of network performance.
You can instantly spot the days where you experienced network issues and start to look deeper.
And finally, the bottom of the dashboard provides pure statistics in millisecond of the most important network KPI to help you troubleshooting the situation.
So now that we understand the dashboard components, let’s look at the use cases.
Case 1: Bandwidth
Within Location: by warning/down, we can click the blue square to only display results for the Robot user with bandwidth issue.
This result shows how significant the bandwidth issue impacts SharePoint performance.
Bandwidth here was done 30% of the normal ISP offering running at 100kb/s.
You see that the availability of SharePoint is still at about 95%, but the user experience was not satisfactory. SharePoint actions could only be done with normal latency only 33% of the time!
There are two important things to consider.
When we saw that, we checked internally, and the answer is the asymmetric bandwidth that was provided.
The bandwidth is actually good in download but poor in upload. So, if you have this kind of issue, you should check the symmetry of your bandwidth in your regional locations.
Case 2: Proxy
Like the bandwidth, the proxy does really affect the end-user performance on SharePoint.
It gives a warning/downtime 10 times higher than normal conditions.
Our Performance SLA, is clearly impacted. The service is only good 93% of the time.
We also see it mostly affects the login and the upload on SharePoint.
It is important to note, this is not something you would easily detect with traditional network tools. You must to be able to track the performance at the action level to understand the impact of your to the cloud.
Therefore, you should stay away from proxies, especially if some locations have some connectivity or network issues.
Case 3: DNS
Now DNS is a bit different. In this case we configured our DNS resolution to occur at the other side of the world. We see that the DNS resolution time is pretty bad on the graph at the bottom of the screen.
Nevertheless, the service provided is still pretty good.
SharePoint is less affected by DNS resolution time issue than bandwidth or the use of a proxy.
The performance SLA is met, and you can see that every action provides decent performance.
Case 4: Distance to the cloud
Now for the last use case, we will compare the performance of our “Ideal” Robot that sits in Nice accessing a US tenant with our EU-DC Robot that sits in Nice and accesses a European tenant.
Results were predictable. We know that the shorter the distance between your user and the cloud, the better your performance will be.
Ideal Nice to USA
Ideal USA to USA
The path “Nice to USA” provides 99.3% service performance when “USA to USA” provides 99.59% of achievement.
“Nice to US” is in warning 6% of the time when Nice to Europe only 5%.
So, you can see the difference, even if it is not that enormous in this example. The difference could be bigger if you compare other areas of the world, especially if the network is not good in these locations.
Distance definitely affects performance, and it depends on the network between your users and the tenant users are trying to access.
We have seen that SharePoint performance relies on multiple factors as we showed here with real scenarios that effect bandwidth, proxy, DNS resolution and route to the cloud on your end-user experience.
Make sure you test your end-user experience to understand the service you really deliver in order to improve your route to the cloud and improve your overall Office 365 user satisfaction.