GSX Blog

Troubleshooting End User Experience in a Distributed Environment

Posted by Sushen Birari on Tue, Feb 14, 2017

social-network.pngApplications provided by IT are made to be used by people, and those people are generally not sitting in the datacenter of every company. So how do you make sure that your infrastructure is providing the best end-user experience when you’re blind to the main causes of end-user latency?

At GSX, we’ve worked on this issue more and more as our customers migrate from on-premise to hybrid service delivery. With hundreds of customers using their messaging systems in distributed environments, we’ve faced latency problems many times.

When users complain about latency issues, 90 percent of the time it’s due to network performance or configuration issues. Here, we’ll explain how to quickly troubleshoot how your network impacts the end-user experience.

   1. Measuring the network latency as a user would do

Users may not have the skills to conduct multiple tests to determine what’s going on between their office computer and your datacenter, but they do have one advantage. Users are the ones actually experiencing the issue. That means that only by checking the performance and the route from their location to your datacenter or the cloud will you be able to remove the blindfold that keeps you from understanding what “slow” connections means to them.

   2. Baselining

Users are human and can adapt to existing situations, but they don’t always adapt well to change. This means that your users will complain when their connection is slow -- but not necessarily because it’s always been slow. What’s important to determine (ideally before they realize it) is any change in the network performance before it reaches the end user and the complaints start coming in. For this, it’s important to baseline your “normal” network performance by constantly measuring it from your critical location to the end-point that matters, whether it be applications on premise or in the cloud. Once you have that in place, you can detect (usually before the user) that a latency issue is happening.

   3. Proactive troubleshooting

Once you’ve baselined your environment, checking the network configuration and latency to your main applications in real-time makes it easy to troubleshoot. To do so, you need to be alerted as soon as your users experience network performance that is below normal. When you receive the alert, you’re one step ahead of any ticket since your users need time to recognize performance issues. To understand these issues, you need all the information you can get on the latency issues to determine if anything unusual is happening. 

   4. Locating the latency

Traceroute

In order to know exactly where the latency happened, the Traceroute ping is one the best tools.

  • It provides you with the detailed route from the user to the application with the latency between each hops
  • It can also alert you – with the right tool – in case of any changes in the number of hops between the user and the applications.

The route between a user and an application is always very unstable. They are many components which you are not responsible for that can change that route. What is important to know is if there is a notable increase of the number of hops, because that would certainly lead to network configuration errors or problem with the internet provider. Here more than ever, it’s important to constantly monitor these parameters -- because if you just use the command when you have a ticket, you will almost certainly miss very important information to troubleshoot the situation.

Witness ping

Being able to compare the latency to an application with a few end points on the Internet can always help you understand where the problem is. You can also use the ingress point of your users (last local network point before Internet) to have a perfect view of the impact of the local network on an application latency.

   5. Is the network latency affecting your end users?

We’ve seen how to spot a network latency before your users, now you have to know how the network impacts your end users.

For that, there are a few performance counters and aggregates that you have to constantly measure:

  • Packet Loss: which measures how many packets are lost during transmission. The packet loss rate is measured as a percent
  • MOS: which measures the network's impact on the listening quality of the VoIP (Voice over Internet Protocol) conversation. The network MOS rating ranges from 1 to 5, with 1 being the poorest quality and 5 being the highest quality.
  • Jitter: which measures the variation in arrival times of packets being received. Inter arrival jitter is measured in milliseconds (ms).

All these performance counters are necessary to help you understand if the network really impacts the end-user experience. The MOS is particularly useful for any voice application as it is a statistic calculation designed for that.

Now that we now where the latency is and if it impacts your users, let’s eliminate the main route cause of network configuration issues. 

   6. Troubleshooting Network configuration

A few constant notifications can help you quickly assess the situation and the main route cause of end user latency.

DNS Resolution & Resolution time

Among our customers, we’ve seen that DNS issues were often the main route cause of end user latency on application. Any change in the DNS can impact its performance or its availability for your user. That is why it is really important to constantly monitor the DNS resolution from where the users are. Always having an eye on it will definitely save you a lot of time.

Port connectivity

DNS resolution time is both a performance and a configuration check. Port connectivity is the other main route cause of configuration error that lead to end user complain. An example that we’ve seen multiple times is a new firewall rule that block the port and then prevent the user to access them. 

Conclusion

We’ve seen why it is important to constantly measure the network latency from where the user are in order to detect, troubleshoot and fix issues before you get overwhelmed by users complaints.

With many things, you can be reactive and do everything yourself. But in this case, you cannot measure any baseline and you can only be reactive. So, you will have your users complaining and sometime for no reason.

At GSX, we have developed our Robot User that can do the job for you.

A Robot User is a Windows service that can run on any Windows machine, and that automatically does everything we’ve outlined in this blog:

  • It constantly measures the network latency between where your users are and the applications you want to monitor on premise or in the cloud
  • It provides you with a baseline of the network experience allowing    you to be alerted as soon as the performance is out of its normal range
  • It constantly checks the route to the applications, alerting in case of anomalous number of hops
  • It constantly performs Traceroute to provide you with a clear analysis of where the latency was at the moment it started.
  • It constantly gathers the key performance counter you need to determine if the network is impacting your application’s performance
  • It tests the main route cause of failure such as the DNS availability and resolution time and the port connectivity to the application

You can deploy, manage and see all your Robot Users on a single dashboard that provides you with a clear picture of the network performance that is provided to your end users.

And finally, because we are the specialist of end user experience on collaboration application, the Robot Users can also test with end users scenarios the response time of your entire collaboration stacks on premise or in the cloud.

The GSX Robot User is the tool that allow you to understand and troubleshoot your entire hybrid infrastructure even before your users notice anything.

robot user-1.png
GSX provides out-of-the box monitoring to ensure your applications are performing the way they should, whether they're on-premise, hybrid, or in the cloud. Find out more >>

If you haven't tried GSX' solutions yet, why not sign up for a free 14-day trial or request a personal demo.

 

Tags: GSX Robot User, QoS, Service Delivery, Office 365, Network Performance, Network latency, Network Diagnostic, office 365 health, office 365 performance