Tags:
The next generation of networks hold a vision to expand communications from the scale of billions in the world’s population to a virtually limitless scale of inter-connectivity between humans, machines, and things. With a vision to connect everyone and everything, with real-time control, we are facing a paradigm of explosive growth in enhanced services and applications, network traffic, and consumers.
The requirements of real-time applications such as voice and multimedia communication differ from the traditional web-based applications that the Cloud supports. For example, a momentary increase in latency and jitter in a voice/video application has an immediate influence on the end-user Quality of Experience (QoE), while not quite so in conventional web applications. A momentary lag in a real-time gameplay has a significant impact in competitive gaming. Bursts of latency in a decision-making system in an automated industrial environment can drastically impact the entire production pipeline. Intermittent delay in reporting the overload of one component in a smart power-grid can result in a wider blackout. Similar outcomes can be expected for the likes of autonomous driving, smart health, and virtual reality.
With the use of advanced machine learning, we can control how the Cloud reacts to such a latency-critical demand, mitigate these disruptions before they impact the end-user, and ensure that systems and services are precisely proactive over time. We propose a deep learning based solution towards adequately balancing the efficiency and reliability, towards supporting latency-critical applications with high availability values.
In this work, we propose a strategy to model frequent Service Level Agreements (SLA) violations on the application level as a multi-output target to enable more complex decision-making in the management of virtualised communication networks. We utilize data from a real-world deployment, and develop a deep neural network based multi-label classification methodology to identify and predict multiple categories of SLA breaches associated with a latency-sensitive Network Function Virtualization (NFV) application state. We achieve a subset accuracy of 99% and multi-label accuracy of 99.1% in the best model approach, while mitigating the effect of extreme class imbalance in the data.
If you’ve ever experienced a call drop, your streaming service buffering, application slowing or shutdown – you know the general idea of a degradation in the Quality of Service (QoS), and a negative impact on end-user Quality of Experience (QoE). With the use of advanced machine learning capabilities, we aim to further bridge the trade-off between efficiency and reliability, and improving the end-user QoE and QoS. This will also help Cloud and Telecom service providers towards proactive monitoring, reducing capital and operational expenditure.
5G’s usage scenario of ultra-reliable low-latency communications (URLLC) is further expected to extend in scope to a high-throughput ubiquitous global connectivity at scale, driving all major verticals towards a change. As a result of such a shift, the Cloud infrastructure is no longer host to just web based application services, but is also being extended for the next-generation of requirements that fuel these emerging application verticals. While Cloud service providers recognize the evolving mission-critical requirements in latency sensitive verticals such as autonomous driving, multimedia, gaming, telecommunications, and virtual reality, there is a wide gap to bridge the QoS constraints for the end-user experience. Most latency-critical services are over-provisioned on all fronts to offer reliability, which is inefficient towards scalability in the long run. Over time, we need to be mindful of the significant carbon footprint of networks and algorithms too, so there is more to it than just the end user experience here.
This is the first approach in the area that applies a multi-label classification methodology towards a more granular SLA violation prediction for a latency-sensitive NFV applications in a virtualised network environment, and works with extensive real world data to compare the performance of both machine learning and deep learning methodologies towards such an objective. Formulating this as a multi-output model, and associating structured data with multiple semantic information at once holds tremendous potential in the future as we advance towards solving more complex decision making problems.
In future work, we plan to integrate our approach with a traffic and workload forecasting methodology for a higher degree of detail in proactive violations’ prediction, and to combine this with dynamic policy enforcement for an end-to-end management control loop.
Our work is transferable towards supporting latency-critical applications in varied verticals with high availability values. We reason and demonstrate that our proposed methodology can be useful to identify the gaps in SLA policy enforcement, to further fine-tune the scaling policies, as well as to identify and address the frequent vulnerabilities and bottlenecks that a latency-sensitive real-time application such as this may face.
The results suggest the suitability of such a deep learning methodology in achieving the target objective, with thorough benchmarking on example-based, label-based and ranking-based measures against multi-label compatible machine learning methods. We further propose a heuristic-based algorithm that address the challenges that such a deployment may face in the real world, which enhances the potential of impact and and brings the research proposition closer to market.
Publication Title: A Deep Neural Network based Multi-Label Classifier for SLA Violation Prediction in a Latency Sensitive NFV Application
Authors: Nikita Jalodia, Dr Mohit Taneja, Dr Alan Davy
Publication Date: 28 October 2021
Journal: IEEE Open Journal of the Communications Society
Link to publication: