Research
Vision
The meaning of the term network services have changed a lot with the evolution of communication networks and the Internet. It started from a basic connectivity service with the main goal of connecting people, and now data centers run millions of software components which realize any kinds of cloud services. Moreover, users do not care about whether those softwares are running in a data center in Alaska or in the closest mobile base station. From users' perspective, the focus is on the application (and its data) instead of the network. When we edit a document in Google Drive, we have no information on where the document is stored or where the program is executed. It is more important to get the same service with the same documents when we access them from anywhere on Earth. On the one hand, emerging or future services and applications pose serious challenges on the underlying network. For example, Tactile Internet will require extremely low latency (as small as 1ms round-trip time!) in combination with ultra-high availability and reliability in order to enable remote driving, remote surgery or wireless controlled exoskeletons, just to mention a few applications which will reshape our society. As the speed of light is a given constraint, we need novel approaches, for example to dynamically move network "capabilities" close to the users. In order to fulfill the new requirements, we need to add more intelligence, automation, adaptability and novel capabilities to networks at different levels and operation planes (data, control and management planes as well). On the other hand, emerging cloud and networking technologies integrated in novel architectures enable services we have never seen or expected before. For example, 5G networks are envisioned to provide system-wide platform services for future applications. This natural synergy catalyzes the evolution of both networks and applications. From network side, the key enabler of this process is the softwarization of the whole networking architecture. As a result, Internet together with clouds will become a critical infrastructure for society in the near future. Therefore, the research on related topics will have extreme importance.
Main objectives
Our main goal is to create the components of the envisioned architecture integrating networks and clouds in order to enable future services, to meet ultra-high capacity demands, and to fulfill extreme performance requirements. For this purpose, we leverage the key softwarization technologies (SDN, NFV) at different operation planes.
Objective 1: Elaborate novel use-cases and identify corresponding requirements
Objective 2: Enable the development of a new generation of network services / applications
Objective 3: Create the orchestration & control plane for next generation services / applications
Objective 4: Enable high data plane performance in/under/across the clouds
Elaborate novel use-cases and identify corresponding requirements
In spite of several envisioned future services, e.g., on top of 5G networks, well-defined use-cases are still missing. Therefore, we aim at fully elaborating novel use-cases from different fields, such as cloud edge computing, cloud robotics, remote co-ordination of cars or drones. This includes the detailed definition of the services and the analysis of the corresponding requirements on the underlying system. Based on the gained understanding, our goal is to identify common platform services required by the evaluated applications and to provide detailed specifications on novel platform APIs (Application Programming Interfaces) which expose high level, system-wide capabilities. Due to special multi-operator aspects, beyond the technological challenges, the state-of-the-art business models have to be revisited and refined definitely.
Results in 2021
Targeting new use cases and applications (O1), one of the main focuses was on a special type of eXtended Reality (XR), namely, Mixed Reality (MR) applications, where the interaction between real and virtual objects is allowed. One such concrete application that we have implemented for demonstration purposes is a multi-user game where a physical toy car can be controlled arbitrarily with a physical remote controller, which can carry a virtual box on top and various virtual objects can be thrown into it while moving. In this case, virtual objects connected to dynamically changing physical elements and environments must be displayed to users while the virtual and physical components are constantly synchronized and different users can see their corresponding own views on their own devices. These services, implemented as distributed software, with some components running on the MR device and other modules running in the cloud (or in the edge), impose very special demands on the underlying network and cloud infrastructure if we want to provide an acceptable user experience. Using the specific application (and its different variants), we analyzed in detail the effects of latency, jitter, uplink / downlink bandwidth, packet loss, and available computing capacity on the user experience and we identified minimum requirements. The prototype was demonstrated at internal events of the industry partner.
In addition, we continued our work with our multi-player Augmented Reality (AR) application, where the game engine and the SLAM (Simultaneous Localization And Mapping) module, which is responsible for the continuous positioning of the AR devices, can be deployed to cloud or edge cloud environment and the synchronization and coordination among the players also take place from there. The results are summarized in a submitted conference paper. Besides, a number of BSc theses have been prepared studying different aspects of human-robot collaboration and AR/VR applications.
Another important application type addressed during the reporting period is “telco,” the realm of classic telecommunications services. These traditional telco services (e.g. voice) provided in the next generation network environment (e.g. 5G and beyond systems) are still of great importance today and their quality requirements are very specific. During the pandemic, we have seen how essential video conferencing has become, and its quality fundamentally affects their usability and the workflows based on them. In our work, we examined in detail the constraints and requirements on cloud and network platforms and on the applied computing paradigms which stem from the special features and quality requirements of these applications.
Summary of the first three years (2018-2020)
During the first three years, multiple use cases were investigated from the fields of cloud robotics (controlling e.g. robots, drones or other unmanned vehicles from the cloud), Virtual Reality (VR), Augmented Reality (AR). All these applications pose serious challenges to the underlying network and cloud infrastructures including strict latency requirements, ultra high reliability and availability.
First, we addressed the distributed control of indoor and outdoor drones from an integrated system of networks and clouds and we investigated how to run applications with strict delay requirements in a distributed cloud environment. We have developed several solutions, where the control software is divided into several software components with different resource requirements and QoS (Quality of Service) constraints, which can be run in different environments (cloud, edge) if we can ensure the proper communication between them. On the one hand, we have implemented an SFC-based (Service Function Chain) solution where the application is constructed form connected Virtual Network Functions (VNF). This solution builds on ETSI’s NFV architecture and follows a compatible approach. The application has been integrated with our own orchestration framework and the results were demonstrated at the most important forum on the field, at ACM SIGCOMM 2018 [2018-1]. On the other hand, we have examined how similar applications, such as the controller of an unmanned rover [2020-1], can be implemented based on the relatively new Function as a Service (FaaS) concept, and whether the platform is capable of provisioning services with strict delay requirements. Furthermore, we elaborated a microservice-based object detection software component suitable for outdoor drones (or other unmanned vehicles) and we explored different deployment options on public and private cloud platforms [2019-2].
Second, we designed and implemented the components needed to support human-robot collaboration (HRC) in future production environments, and explored how these software could be adapted to the latest cloud platforms and cloud services (e.g., serverless computing). The main goals were to investigate the impact of the extra latency and jitter introduced by the cloud platforms on the performance of the applications, and to identify the benefits provided by the clouds in software development and operation.
Third, we focused on the different types of extended reality (AR / VR / MR, Augmented / Virtual / Mixed Reality) applications. On the one hand, we analyzed the effects of increased latency and the benefits of automatic resource scaling (vertical and horizontal ones). We established a dedicated virtual reality environment for future production systems and the previous HRC application was implemented and evaluated in this space. An integrated demo including the HRC and VR components (human-robot collaboration in virtual space) were presented at IEEE ICNP 2019 conference [2019-1]. On the other hand, we elaborated a multi-player AR gaming use case where the distributed software components, including the game server and the SLAM (Simultaneous Localization And Mapping) module, can be deployed to different platforms or runtime environments (e.g., edge cloud, central cloud, AR device). The main features and benefits of our approach were demonstrated at several internal events of our industry partner and a conference paper is planned for the next year.
We also addressed the business aspects of the whole ecosystem. In this regard, we modeled and analyzed a federation of 5G infrastructure providers who aims at selecting the compute and network resource set that fulfills the technical requirements of the service deployment and that is preferable also from an economic point of view. We modeled this resource market with the tool set of graph and game theory and studied its characteristics. We also derived the best pricing strategies that the providers should follow given their expectation about customers’ demand. Our main results are summarized in a journal paper [2020-2].
In another work [2020-3], we analyzed federations of telcos, cloud administrators, and online application suppliers who unite for conveying future services to clients around the world. In order to help their portability, or the simple geographic range of the offered application, the business arrangements among the actors must scale over numerous domains and a guaranteed nature of joint effort among different stakeholders is important. In this environment, business perspectives will significantly impact the technical capacity of the system: the business arrangements of the providers will innately decide the accessibility and the end-client costs of certain services. In this work, we modeled the business relations of providers as a variation of network formation games. We inferred conditions under which the current transit-peering structure of network providers stays unblemished, and we also drew the specifics of an envisioned setup in which providers create business links among each other starting with a clean slate.
[2018-1] J. Czentye, J. Dóka, Á. Nagy, L. Toka, B. Sonkoly, R. Szabó, “Controlling Drones from 5G Networks,” In Proc. of ACM SIGCOMM 2018 (Demo)
[2019-1] B. Gy. Nagy, J. Dóka, S. Rácz, G. Szabó, I. Pelle, J. Czentye, L. Toka, B. Sonkoly, “Towards Human-Robot Collaboration: An Industry 4.0 VR Platform with Clouds Under the Hood,” IEEE ICNP 2019 (Demo)
[2019-2] J. Dóka, “Elaboration of a latency critical future 5G application and adaptation to a selected FaaS platform”, MSc Thesis, BME VIK 2019 (in Hungarian)
[2020-1] I. Pelle, F. Paolucci, B. Sonkoly, F. Cugini, “Telemetry-Driven Optical 5G Serverless Architecture for Latency-Sensitive Edge Computing”, Optical Fiber Communication Conference (OFC) 2020
[2020-2] L. Toka, M. Zubor, A. Kőrösi, G. Darzanos, O. Rottenstreich, B. Sonkoly, “Pricing games of NFV infrastructure providers”, Telecommunication Systems, doi: 10.1007/s11235-020-00706-5, 2020 (IF: 1.73)
[2020-3] L. Toka, A. Recse, M. Cserép, R. Szabó, “On the Mediation Price War of 5G Providers”, MDPI Electronics, 9:11, 1901 pp. 1-19, 2020 (IF: 2.41)
Results in 2019
As novel use-cases, two new areas were investigated. On the one hand, we designed and implemented the components needed to support human-robot collaboration in future production environments, and explored how these software could be adapted to the latest cloud platforms and cloud services (e.g., serverless computing). The main goals were to investigate the impact of the extra latency and jitter introduced by the cloud platforms on the performance of the applications, and to identify the benefits provided by the clouds in software development and operation. On the other hand, we focused on the different types of augmented reality (AR / VR / MR, Augmented / Virtual / Mixed Reality) applications, and more specifically, we analyzed the effects of increased latency and the benefits of automatic resource scaling (vertical and horizontal ones). The two application examples were integrated in a common demo (human-robot collaboration in virtual space) and presented at IEEE ICNP 2019 conference [2019-1]. In addition, we elaborated a microservice-based object detection software component suitable for outdoor drones (or other unmanned vehicles) and we explored different deployment options on public and private cloud platforms [2019-2]. We also addressed the business aspects of the whole ecosystem. In this regard, we modeled and analyzed a federation of 5G infrastructure providers who aims at selecting the compute and network resource set that fulfills the technical requirements of the service deployment and that is preferable also from an economic point of view. We modeled this resource market with the tool set of graph and game theory and studied its characteristics. We also derived the best pricing strategies that the providers should follow given their expectation about customers' demand.
[2019-1] B. Gy. Nagy, J. Dóka, S. Rácz, G. Szabó, I. Pelle, J. Czentye, L. Toka, B. Sonkoly, “Towards Human-Robot Collaboration: An Industry 4.0 VR Platform with Clouds Under the Hood,” IEEE ICNP 2019 (Demo)
[2019-2] J. Dóka, “Elaboration of a latency critical future 5G application and adaptation to a selected FaaS platform”, MSc Thesis, BME VIK 2019 (in Hungarian)
Results in 2018
As a new use-case (O1), we addressed the distributed control of indoor and outdoor drones from an integrated system of networks and clouds and we investigated how to run applications with strict delay requirements in a distributed cloud environment. We have developed several solutions, where the control software is divided into several software components with different resource requirements and QoS (Quality of Service) constraints, which can be run in different environments (cloud, edge) if we can ensure the proper communication between them. On the one hand, we have implemented an SFC-based (Service Function Chain) solution where the application is constructed form connected Virtual Network Functions (VNF). This solution builds on ETSI's NFV architecture and follows a compatible approach. The application has been integrated with our own orchestration framework and the results were demonstrated at the most important forum on the field, at ACM SIGCOMM 2018 [2018-1]. On the other hand, we have examined how similar applications can be implemented based on the relatively new Function as a Service (FaaS) concept, and whether services with strict delay requirements in this environment can be implemented.
[2018-1] J. Czentye, J. Dóka, Á. Nagy, L. Toka, B. Sonkoly, R. Szabó, “Controlling Drones from 5G Networks,” In Proc. of ACM SIGCOMM 2018 (Demo)
Enable the development of a new generation of network services
We can distinguish two different types of network services. One of them is provisioned directly to the customer, while the other one is provided as a platform service for building higher level services. Both types of network services consist of a series of service functions, traditionally implemented by middleboxes, that have to be traversed in a given order by traffic flows. Service chain/graph is an abstraction to describe high level services in a generic way and to assemble processing flows for given traffic. Due to hardware-based functions and topology dependent mechanisms, configuring/deploying/operating service chains today is a complex task and usually requires human interaction. In order to mitigate these issues and to enable automated, dynamic service creation, the service plane and its main elements have to be reconsidered. We need novel tools, mechanisms, and platform libraries with unified APIs to enable the development and instantiation of a new generation of network services and applications.
Results in 2021
During the reporting period, we continued the work on new software development and operating methods that support the operation of latency-sensitive, distributed software in a heterogeneous cloud / edge environment. On the one hand, we have proposed an edge cloud-based, multi-user AR platform that supports coordination between players, has good scalability properties, and can also provide the expected quality (latency) requirements. A prototype of the entire system was implemented, tested with various mobile-based AR devices (Android, iOS phones) and open source SLAM solutions. We have developed a new methodology for measuring the accuracy of the coordination system and performed extensive studies in different edge / cloud environments, based on which the performance of the platform can be analyzed from several aspects. Our results were summarized in a conference paper submitted to the top conference in the field of computer vision, the IEEE / CVF Conference on Computer Vision and Pattern Recognition (CVPR).
On the other hand, we have developed a machine learning based solution that allows us to control the latency of the SLAM algorithm running in the cloud and improve the user experience. Based on the incoming image and sensor information, the proposed method tries to learn the response times shown by the given SLAM module using an LSTM (Long Short-Term Memory) based neural network and if it predicts an increase in the response time, it takes appropriate intervention steps to reduce it (e.g. dropping selected frames in case of congestion). Our solution was summarized in a paper published at the IEEE AIVR conference [2021-1].
We also developed a serverless-based dynamic software deployment system capable of running latency-sensitive applications in a cloud / edge environment based on telemetry data collected from packet-optical networks. This platform implements a tight integration of network and serverless technologies. Our results were summarized in an IEEE JSAC journal article [2021-2], which was the result of a collaboration with researchers from Sant’Anna School of Advanced Studies, SSSA, Pisa.
[2021-1] J. Czentye, B.P. Gerő, B. Sonkoly, “Managing Localization Delay for Cloud-assisted AR Applications via LSTM-driven Overload Control”, in Proc. of IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), 2021
[2021-2] I. Pelle, F. Paolucci, B. Sonkoly, F. Cugini, “Latency-sensitive Edge/Cloud Serverless Dynamic Deployment over Telemetry-based Packet-Optical Network”, IEEE Journal on Selected Areas in Communications (JSAC), 39:(9), pp. 2849-2863, 2021 (IF: 9.144)
Summary of the first three years (2018-2020)
We put significant efforts on the design and development of new tools, methods, and automated solutions for enabling the development and operation of distributed applications running over heterogeneous cloud platforms. The main task here is the adoption of the cloud native paradigm for delay-sensitive applications. We carried out a comprehensive analysis for one of the most important and versatile cloud platforms, namely for Amazon’s AWS. The most important results have been summarized in a paper at IEEE CLOUD 2019 conference [2019-3] which presents the performance characteristics of various AWS services and examines whether the current toolkit is suitable and capable of running delay-sensitive applications. Based on these results, we have proposed an optimization module on top of AWS that can calculate cost optimized software layouts with predefined delay requirements. On the one hand, we provided a formal model for the problem and on the other hand, we proposed efficient algorithms for solving the optimization problem under different relaxation conditions. This was the subject of a conference paper presented at the IEEE GLOBECOM 2019 conference [2019-4].
As an extension of that works, we adapted the cloud native approach and related operating techniques for latency sensitive IoT applications operated on public serverless platforms. We argue that solely adding cloud resources to the edge is not enough and other mechanisms and operation layers are required to achieve the desired level of quality. First, we proposed a novel system on top of a public serverless edge cloud platform, which can dynamically optimize and deploy the microservice-based software layout based on live performance measurements. We added two control loops and the corresponding mechanisms which are responsible for the online reoptimization at different timescales. The first one addresses the steady-state operation, while the second one provides fast latency control by directly reconfiguring the serverless runtime environments. Second, we applied our general concepts to one of today’s most widely used and versatile public cloud platforms, namely, Amazon’s AWS, and its edge extension for IoT applications, called Greengrass. Third, we characterized the main operation phases and evaluated the overall performance of the system. We also analyzed the performance characteristics of the two control loops and investigated different implementation options. The main results were summarized in an IEEE IoT journal paper [2020-4] and the live operation of the whole platform was demonstrated at ACM SIGCOMM 2020 [2020-5]. Furthermore, the platform was extended and integrated with a packet-optical network infrastructure. A dedicated framework was proposed and validated for automated deployment and dynamic reconfiguration of serverless functions at either the edge or cloud relying on extensive telemetry data retrieved from both the computing and packet-optical network infrastructure. A proof-of-concept prototype was implemented and operated on top of diverse Amazon Web Services technologies, including Greengrass on the edge. Experimental demonstration with latency-sensitive serverless application was provided, showing fast dynamic reconfiguration capabilities, e.g., enabling even zero outage time under certain conditions. The first results was presented in an OFC conference paper [2020-1], while the full system was described in a journal paper submitted to a JSAC special issue (“Latest advances in optical networks for 5G communications and beyond”).
In addition, we have investigated different aspects of human-robot collaboration applications focusing on cloud based deployments and realizations and we have identified several challenges which we face when adjusting the software components to cloud platforms. The first results were presented at IEEE CloudNet 2019 conference [2019-5]. Regarding the object detection application proposed for unmanned vehicles, we investigated two different serverless deployment options. The Lambda platform of Amazon AWS was evaluated and compared to OpenWhisk which is an open-source FaaS platform for private cloud environments. The main findings are summarized in an MSc Thesis [2019-2].
We also initiated the research work addressing an edge/cloud based AR gaming platform enabling the coordination of an extremely large number of players and devices while meeting the strict latency bounds. The first results were presented and demonstrated at internal events of the industry partner, and a conference paper is scheduled for the next year.
[2019-2] J. Dóka, “Elaboration of a latency critical future 5G application and adaptation to a selected FaaS platform”, MSc Thesis, BME VIK 2019 (in Hungarian)
[2019-3] I. Pelle, J. Czentye, J. Dóka, B. Sonkoly, “Towards Latency Sensitive Cloud Native Applications: A Performance Study on AWS”, IEEE CLOUD 2019
[2019-4] J. Czentye, I. Pelle, A. Kern, B. P. Gerő, L. Toka, B. Sonkoly, “Optimizing Latency Sensitive Applications for Amazon’s Public Cloud Platform,” IEEE GLOBECOM 2019
[2019-5] B. Sonkoly, B. Gy. Nagy, J. Dóka, I. Pelle, G. Szabó, S. Rácz, J. Czentye, L. Toka, “Cloud-Powered Digital Twins: Is It Reality?”, IEEE CloudNet 2019
[2020-1] I. Pelle, F. Paolucci, B. Sonkoly, F. Cugini, “Telemetry-Driven Optical 5G Serverless Architecture for Latency-Sensitive Edge Computing”, Optical Fiber Communication Conference (OFC) 2020
[2020-4] I. Pelle, J. Czentye, J. Dóka, A. Kern, B. P. Gerő, B. Sonkoly, “Operating Latency Sensitive Applications on Public Serverless Edge Cloud Platforms”, IEEE Internet of Things Journal, doi: 10.1109/JIOT.2020.3042428, 2020 (IF: 9.94)
[2020-5] I. Pelle, J. Czentye, J. Dóka, B. Sonkoly, “Dynamic Latency Control of Serverless Applications Operated on AWS Lambda and Greengrass”, ACM SIGCOMM 2020 (Demo)
Results in 2019
An important part of the second year’s work was the design and development of new tools, methods, and automated solutions for enabling the development and operation of distributed applications running over heterogeneous cloud platforms. The main task here is the adoption of the cloud native paradigm for delay-sensitive applications. We carried out a comprehensive analysis for one of the most important and versatile cloud platforms, namely for Amazon’s AWS. The most important results have been summarized in a paper at IEEE CLOUD 2019 conference [2019-3] which presents the performance characteristics of various AWS services and examines whether the current toolkit is suitable and capable of running delay-sensitive applications. Based on these results, we have proposed an optimization module on top of AWS that can calculate cost optimized software layouts with predefined delay requirements. On the one hand, we provided a formal model for the problem and on the other hand, we proposed efficient algorithms for solving the optimization problem under different relaxation conditions. This was the subject of a conference paper presented at the IEEE GLOBECOM 2019 conference at the end of the year [2019-4]. As an extension of that works, we adapt the cloud native approach and serverless operation for latency sensitive IoT applications building on Amazon’s AWS and its edge extension for IoT applications, called Greengrass. We proposed a novel system on top of AWS which can dynamically optimize and deploy the microservice based IoT software layout based on live performance measurements. The first results and revealed challenges are summarized in a paper submitted to IEEE/ACM CCGRID 2020. In addition, we have investigated different aspects of human-robot collaboration applications focusing on cloud based deployments and realizations and we have identified several challenges which we face when adjusting the software components to cloud platforms. The first results were presented at IEEE CloudNet 2019 conference [2019-5]. Regarding the object detection application proposed for unmanned vehicles, we investigated two different serverless deployment options. The Lambda platform of Amazon AWS was evaluated and compared to OpenWhisk which is an open-source FaaS platform for private cloud environments. The main findings are summarized in an MSc Thesis [2019-2].
[2019-2] J. Dóka, “Elaboration of a latency critical future 5G application and adaptation to a selected FaaS platform”, MSc Thesis, BME VIK 2019 (in Hungarian)
[2019-3] I. Pelle, J. Czentye, J. Dóka, B. Sonkoly, “Towards Latency Sensitive Cloud Native Applications: A Performance Study on AWS”, IEEE CLOUD 2019
[2019-4] J. Czentye, I. Pelle, A. Kern, B. P. Gerő, L. Toka, B. Sonkoly, “Optimizing Latency Sensitive Applications for Amazon’s Public Cloud Platform,” IEEE GLOBECOM 2019
[2019-5] B. Sonkoly, B. Gy. Nagy, J. Dóka, I. Pelle, G. Szabó, S. Rácz, J. Czentye, L. Toka, “Cloud-Powered Digital Twins: Is It Reality?”, IEEE CloudNet 2019
Results in 2018
We initiated the work on developing tools, methods, and automated mechanisms fostering the development and operation of future network applications and services which can be deployed and operated in distributed cloud environments (O2). It is crucial to connect these methods to FaaS platforms and to adapt applications automatically to these environments. As a first step towards this goal, we have provided a comprehensive performance analysis of Amazon’s AWS platform, which is one of the most important and versatile FaaS platforms. This work was carried out in close collaboration with Ericsson Research and we submitted a research paper on the first results to IEEE CLOUD conference.
Create the orchestration & control plane for next generation services
Network services are realized on top of underlying resources. The main task of the orchestration & control plane is to operate the shared resources of the infrastructure optimally while provisioning the required high level services. We aim at proposing an orchestration & control plane architecture which supports i) flexible, dynamically configurable virtualization of different resources (such as compute, storage, and network resources), ii) optimized resource control based on configurable objectives (e.g., green operation, uniform utilization, policy enforcement), iii) inter-, and co-operation between operators, iv) co-operation between resource orchestration and platform services. We also address theoretical research tasks including the elaboration of advanced network embedding algorithms and traffic steering mechanisms.
Results in 2021
During the reporting period, several work started earlier was completed and published in the relevant journals after the required revision cycles. One of our most important results is our survey article published in IEEE Communications Surveys & Tutorials (COMST) [2021-3], in which we reviewed and analyzed the literature on placement methods proposed for the edge cloud environment based on a newly developed taxonomy. In addition, we have revised and successfully published an article summarizing our proposed and implemented VNF (Virtual Network Function) deployment algorithms for mobile and volatile 5G infrastructures [2021-4].
In the current period, our main goal was to develop new orchestration and control methods to support latency-sensitive applications in a cloud native environment where software can take advantage of modern cloud technologies such as the serverless paradigm and the Function as a Service (FaaS) computing model. We proposed several theoretical solutions and working prototypes [2021-5], [2021-6], [2021-7]. Our long-term goal is to develop a platform that also supports real-time applications with strict operational guarantees. The first important result of this work is our paper published at the IEEE CLOUD conference (honored with Best Paper Award), which provides real-time task scheduling in a FaaS environment [2021-8].
In addition, we designed and implemented new automatic scaling solutions for telco applications that can be used in a cloud native environment. On the one hand, we proposed a horizontal scaling mechanism combining several machine learning algorithms for a Kubernetes-managed edge cloud environment and evaluated its performance in different operating ranges. The results were summarized in an IEEE Transactions on Network and Service Management (TNSM) journal article [2021-9]. Furthermore, we proposed a method for optimizing cloud resources that is able to operate application components (typically telco) cost-effectively based on a new, proactive horizontal scaling approach based on learning user behavior. The proposed method was described in an article submitted to the IEEE ICC Conference. We have also designed a “data lake”-based platform for reliable operation of cloud native telco applications, which can monitor all relevant parameters of the cloud infrastructure and extract and derive various types of information from them, as well as predict failures, which is essential in production environment. The main results were presented in an article published at the IFIP / IEEE International Symposium on Integrated Network Management (IM) [2021-10].
[2021-3] B. Sonkoly, J. Czentye, M. Szalay, B. Németh, L. Toka, “Survey on Placement Methods in the Edge and Beyond”, IEEE Communications Surveys and Tutorials (COMST), 23:(4), pp. 2590-2629., 2021 (IF: 25.249)
[2021-4] B. Németh, N. Molner, J. Martin-Perez, C. J. Bernardos, A. de la Oliva, B. Sonkoly, “Delay and reliability-constrained VNF placement on mobile and volatile 5G infrastructure”, IEEE Transactions on Mobile Computing (TMC), 2021, DOI: 10.1109/TMC.2021.3055426 (IF: 5.577)
[2021-5] L. Toka, “Ultra-Reliable and Low-Latency Computing in the Edge with Kubernetes”, Springer Journal of Grid Computing, 19:(3), 2021 (IF: 3.986)
[2021-6] M. Szalay, P. Mátray, L. Toka, “State Management for Cloud-Native Applications”, MDPI Electronics, 10:(4), 2021 (IF: 2.397)
[2021-7] D. Haja, Z.R. Turanyi, L. Toka, “Location, Proximity, Affinity – The key factors in FaaS”, Infocommunications Journal, 12:(4), pp. 14-21, 2021
[2021-8] M. Szalay, P. Mátray, L. Toka, “Real-time task scheduling in a FaaS cloud”, in Proc. of IEEE International Conference on Cloud Computing (CLOUD), 2021
[2021-9] L. Toka, G. Dobreff, B. Fodor, B. Sonkoly, “Machine Learning-based Scaling Management for Kubernetes Edge Clusters”, IEEE Transactions on Network and Service Management (TNSM), 18:(1), pp. 958-972., 2021 (IF: 4.195)
[2021-10] L. Toka, G. Dobreff, D. Haja, M. Szalay, “Predicting cloud-native application failures based on monitoring data of cloud infrastructure”, in Proc. of IFIP/IEEE International Symposium on Integrated Network Management (IM), Piscataway (NJ), USA, 2021
Summary of the first three years (2018-2020)
One of the most significant part of the work carried out during the first three years was the elaboration and implementation of an orchestration system responsible for the automatic deployment of new types of network applications/services and its major algorithms. The central component of this system is the engine which maps (embeds) the applications (service chains) defined as inter-connected software components to available resources while fulfilling the specified (e.g., delay, bandwidth) requirements. This problem is a generalized, more complex version of VNE (Virtual Network Embedding) which is an NP-hard problem. In order to have practical solutions, we have developed various heuristic algorithms and evaluated them from different aspects, and compared them with MILP-based solutions. A special version of the algorithm was adjusted to softwarized data plane where the low-level data plane resources are orchestrated efficiently according to given performance requirements. The results were published at the IEEE INFOCOM 2018 conference with a detailed complexity analysis [2018-2]. That year at the INFOCOM conference, which is a top tier forum in our field, we appeared with an accepted demonstration [2018-6] where we showcased the self-healing feature of our orchestration system, and we also published three workshop papers. On the one hand, we have provided a hybrid algorithm that combines an online heuristic algorithm with an offline ILP-based solution [2018-3]. Basically, the system works on the basis of the online algorithm, but it runs the offline optimization periodically in the background, and then reconfigures the system at certain intervals, moving the operation state toward the global optimum. In the next paper, we have examined the business aspects of cooperating 5G operators [2018-5]. In the third paper [2018-4], we have developed an extension to OpenStack, which is one of the most common VIM (Virtual Infrastructure Manager) platforms, in order to enable the consideration of network properties and capabilities during the orchestration. As a result, OpenStack can be used for Cloud Edge resource management, where not only the central data center has computing resources but also the edge of the networks (e.g., in base stations). The proof-of-concept prototype of our OpenStack orchestrator was presented at IEEE INFOCOM 2019 conference [2019-6]. Our complete orchestration framework, which supports multi-operator operation and automatic consideration of several constraints and requirements, such as end-to-end delay, bandwidth requirements, affinity, anti-affinity, and link anti-affinity, is summarized in a journal paper published in a JSAC special issue [2020-6]. The developed solutions and algorithms are expected to be incorporated into the products of our industry partner.
In addition, in a dedicated journal paper [2020-7], we compared two architecture options together with proof-of-concept prototypes and corresponding embedding algorithms, which enable the provisioning of delay-sensitive IoT applications. On the one hand, we extended the VIM itself with network-awareness and on the other hand, we proposed a multi-layer orchestration system where an orchestrator is added on top of VIMs and network controllers to integrate different resource domains. We implemented fully-fledged solutions and conducted large-scale experiments to reveal the scalability characteristics of both approaches. We found that our VIM extension can be a valid option for single-provider setups encompassing even 100 edge domains and serving a few hundreds of customers. Whereas, our multi-layer orchestration system showed better scaling characteristics in a wider range of scenarios at the cost of a more complex control plane including additional entities and novel APIs.
As a specific complement to our orchestration systems, we defined novel substrate models capturing the characteristics and dynamicity of the rapidly evolving communication infrastructure. These models were incorporated into our placement algorithms, by these means providing novel capabilities and management options to operators. The main achievements are summarized in an MDPI Sensors journal paper [2020-8].
Besides OpenStack, which operates virtual machines, we also considered container management systems. We extended today’s most widely used container orchestration system, named Kubernetes, with the capability of taking the network characteristics into consideration in its scheduler, which makes it suitable for Edge resource management, as well. The results were demonstrated at ACM SIGCOMM 2019 [2019-7] which is a flagship conference in our research field, and an extended version of the system was described in [2019-8]. We proposed another important extension to Kubernetes in order to enhance its scaling engine. More specifically, we designed and implemented a novel, proactive horizontal auto-scaling engine which is capable of handling the actual variability of incoming requests making use of various AI-based forecast methods. In our implementation, these methods compete with each other via a short-term evaluation loop in order to always give the lead to the method that suits best the actual request dynamics. We found that our auto-scaling engine results in significantly less lost requests with slightly more provisioned resources compared to the default baseline. The first results were published at IEEE/ACM CCGRID 2020 [2020-9] which is a top-tier conference, while an extended version of the work was submitted to IEEE Transactions on Network and Service Management (TNSM).
A number of theoretical publications have also been published or submitted focusing on different aspects of the orchestration system. First, we investigated how the new orchestration methods can be applied to Big Data systems operating in geographically distributed environments [2019-9], [2019-10]. Second, we have been working in a close cooperation with our industry partner on how to efficiently manage application-related data in a serverless environment and dynamically optimize their placement based on current needs and traffic situations. The results were published at IEEE CLOUD 2019 [2019-11] and IEEE CloudNet 2019 [2019-12] conferences. Moreover, a working prototype was also implemented [2020-10] and demonstrated at ACM SIGCOMM 2020 [2020-11]. Third, we addressed a special type of VNF (Virtual Network Function) placement. More exactly, we focused on the temporal aspects and conflicting traits of reliable, low latency service deployment over a volatile network, where mobile compute nodes act as an extension of the cloud and edge computing infrastructure. This means that we can run VNFs on continuously moving mobile devices, such as robots or drones. The problem was formulated as a cost-minimizing VNF placement optimization and an efficient heuristic was also proposed. The algorithms were extensively evaluated from various aspects by simulation on detailed real-world scenarios. The achievements were summarized in a journal paper submitted to IEEE Transactions on Mobile Computing (TMC). And finally, instead of central control, we proposed a wireless access sharing framework [2020-12], in which users have a say in optimizing their quality of service on the long term, and we tackled its analysis with the tool set of stochastic game theory. Our findings showed that greedy users become polite against their counterparts when the load is relatively low with the goal of preparing for situations with high load. The results were presented at IEEE ICC 2020 conference.
Last but not least, as a comprehensive summary, we compiled and submitted a survey paper to IEEE Communications Surveys & Tutorials (COMST) providing a structured taxonomy of the vast research on placement of tasks, containers, virtual machines or any other unit of computational entity that can be moved across different types of computing infrastructure. Following the given taxonomy, the research papers were analyzed and categorized according to several dimensions, such as the structure of the supported services, the problem formulation, the applied mathematical methods, and the objectives and constraints incorporated in the optimization problems. We also revealed some important research gaps in the current literature. (The paper has gone through the first round of reviews and it is currently under revision.) Furthermore, another survey paper focusing on Industrial IoT Applications enabled by 5G was also prepared during the reporting period [2020-13].
[2018-2] B. Sonkoly, M. Szabó, B. Németh, A. Majdán, G. Pongrácz, L. Toka, “FERO: Fast and Efficient Resource Orchestrator for a Data Plane Built on Docker and DPDK,” In Proc. of IEEE INFOCOM 2018
[2018-3] B. Németh, M. Szalay, J. Doka, M. Rost, S. Schmid, L. Toka, B. Sonkoly, “Fast and efficient network service embedding method with adaptive offloading to the edge,” In Proc. of IEEE INFOCOM (Workshops) 2018
[2018-4] D. Haja, M. Szabo, M. Szalay, A. Nagy, A. Kern, L. Toka, B. Sonkoly, “How to orchestrate a distributed OpenStack,” In Proc. of IEEE INFOCOM (Workshops) 2018
[2018-5] M. Cserep, A. Recse, R. Szabo, L. Toka, “Business network formation among 5G providers,” In Proc. of IEEE INFOCOM (Workshops) 2018
[2018-6] J. Harmatos, D. Jocha, R. Szabó, B. P. Gero, B. Németh, J. Czentye, B. Sonkoly, “Self-healing in multi-constrained multi-actor virtualization orchestration,” In Proc. of IEEE INFOCOM 2018 (Demo)
[2019-6] M. Szalay, D. Haja, J. Dóka, B. Sonkoly, L. Toka, “Turning OpenStack into a Fog Orchestrator”, IEEE INFOCOM 2019 (Demo)
[2019-7] D. Haja, M. Szalay, B. Sonkoly, G. Pongracz, L. Toka, “Sharpening Kubernetes for the Edge”, ACM SIGCOMM 2019 (Demo)
[2019-8] L. Toka, D. Haja, A. Korosi, B. Sonkoly, “Resource provisioning for highly reliable and ultra-responsive edge applications”, IEEE CloudNet 2019
[2019-9] D. Haja, B. Vass, L. Toka, “Improving Big Data Application Performance in Edge-Cloud Systems”, IEEE CLOUD 2019
[2019-10] D. Haja, B. Vass, L. Toka, “Towards making big data applications network-aware in edge-cloud systems”, IEEE CloudNet 2019
[2019-11] M. Szalay, M. Nagy, D. Gehberger, Z. Kiss, P. Matray, F. Nemeth, G. Pongracz, G. Retvari, L. Toka, “Industrial-Scale Stateless Network Functions”, IEEE CLOUD 2019
[2019-12] M. Szalay, P. Matray, L. Toka, “Minimizing state access delay for cloud-native network functions”, IEEE CloudNet 2019
[2020-6] B. Sonkoly, R. Szabó, B. Németh, J. Czentye, D. Haja, M. Szalay, J. Dóka, B. Gerő, D. Jocha, L. Toka, “5G Applications from Vision to Reality: Multi-Operator Orchestration”, IEEE Journal on Selected Areas in Communications (JSAC), 38:7 pp. 1401-1416, 2020 (IF: 11.42)
[2020-7] B. Sonkoly, D. Haja, B. Németh, M. Szalay, J. Czentye, R. Szabó, R. Ullah, K. Byung-Seo, L. Toka, “Scalable edge cloud platforms for IoT services”, Journal of Network and Computer Applications (JNCA), 170: Paper:102785, 2020 (IF: 5.57)
[2020-8] B. Németh, B. Sonkoly, “Advanced Computation Capacity Modeling for Delay-Constrained Placement of IoT Services”, MDPI Sensors, 20:14, 3830, 2020 (IF: 3.27)
[2020-9] L. Toka, G. Dobreff, B. Fodor, B. Sonkoly, “Adaptive AI-based auto-scaling for Kubernetes”, IEEE/ACM CCGRID 2020
[2020-10] M. Szalay, P. Mátray, L. Toka, “AnnaBellaDB: Key-Value Store Made Cloud Native”, 16th International Conference on Network and Service Management (CNSM), 2020
[2020-11] M. Szalay, P. Mátray, L. Toka, “AnnaBellaDB: a key value store for stateless network functions”, ACSM SIGCOMM 2020 (Demo)
[2020-12] L. Toka, M. Szalay, D. Haja, G. Szabó, S. Rácz, M. Telek, “To boost or not to boost: a stochastic game in wireless access networks”, IEEE International Conference on Communications (ICC) 2020
[2020-13] P. Varga, J. Pető, A. Franko, D. Balla, D. Haja, F. Janky, G. Sóos, D. Ficzere, M. Maliosz, L. Toka, “5G support for Industrial IoT Applications – Challenges, Solutions, and Research gaps”, MDPI Sensors, 20:3, 828, 2020 (IF: 3.27)
Results in 2019
An important part of this work phase was the development and implementation of the orchestration system responsible for the automatic deployment of the novel distributed applications. One of the central components of this system is the engine which maps the applications defined as inter-connected software components to available resources while meeting the specified (e.g., delay) requirements. To solve this problem efficiently, we developed different algorithms and evaluated their performance. The proposed solutions have been implemented as part of real platforms and two important demo papers have been published. First, we added an own orchestration component to the OpenStack system which was presented at IEEE INFOCOM 2019 conference [2019-6]. Second, we extended today’s most widely used container orchestration system, named Kubernetes, with the capability of taking the network characteristics into consideration in its scheduler. As a result, it is suitable for Edge resource management. The results were demonstrated at ACM SIGCOMM 2019 [2019-7] which is a flagship conference in our research field, and an extended version of the system was described in [2019-8]. We proposed another important extension to Kubernetes in order to enhance its scaling engine. More specifically, we designed and implemented a novel, proactive horizontal auto-scaling engine which is capable of handling the actual variability of incoming requests making use of various AI-based forecast methods. In our implementation, these methods compete with each other via a short-term evaluation loop in order to always give the lead to the method that suits best the actual request dynamics. We found that our auto-scaling engine results in significantly less lost requests with slightly more provisioned resources compared to the default baseline. A paper on the results was submitted to IEEE/ACM CCGRID 2020. A number of theoretical publications have also been published or submitted focusing on different aspects of the orchestration system. First, we investigated how the new orchestration methods can be applied to Big Data systems operating in geographically distributed environments [2019-9], [2019-10]. Second, we have been working in a close cooperation with our industry partner on how to efficiently manage application-related data in a serverless environment and dynamically optimize their placement based on current needs and traffic situations. The results were published at IEEE CLOUD 2019 [2019-11] and IEEE CloudNet 2019 [2019-12] conferences. Third, we addressed a special type of VNF (Virtual Network Function) placement. More exactly, we focused on the temporal aspects and conflicting traits of reliable, low latency service deployment over a volatile network, where mobile compute nodes act as an extension of the cloud and edge computing infrastructure. This means that we can run VNFs on continuously moving mobile devices, such as robots or drones. The problem was formulated as a cost-minimizing VNF placement optimization and an efficient heuristic was also proposed. The algorithms were extensively evaluated from various aspects by simulation on detailed real-world scenarios. A paper summarizing this work was submitted to ACM Mobihoc 2020.
[2019-6] M. Szalay, D. Haja, J. Dóka, B. Sonkoly, L. Toka, “Turning OpenStack into a Fog Orchestrator”, IEEE INFOCOM 2019 (Demo)
[2019-7] D. Haja, M. Szalay, B. Sonkoly, G. Pongracz, L. Toka, “Sharpening Kubernetes for the Edge”, ACM SIGCOMM 2019 (Demo)
[2019-8] L. Toka, D. Haja, A. Korosi, B. Sonkoly, “Resource provisioning for highly reliable and ultra-responsive edge applications”, IEEE CloudNet 2019
[2019-9] D. Haja, B. Vass, L. Toka, “Improving Big Data Application Performance in Edge-Cloud Systems”, IEEE CLOUD 2019
[2019-10] D. Haja, B. Vass, L. Toka, “Towards making big data applications network-aware in edge-cloud systems”, IEEE CloudNet 2019
[2019-11] M. Szalay, M. Nagy, D. Gehberger, Z. Kiss, P. Matray, F. Nemeth, G. Pongracz, G. Retvari, L. Toka, “Industrial-Scale Stateless Network Functions”, IEEE CLOUD 2019
[2019-12] M. Szalay, P. Matray, L. Toka, “Minimizing state access delay for cloud-native network functions”, IEEE CloudNet 2019
Results in 2018
One of the most significant part of the first year's work was the elaboration and implementation of an orchestration system responsible for the automatic deployment of new types of network applications/services and its major algorithms (O3). The central element of this system is the mapping (or embedding) of the service chains specified as inputs to the available resources while fulfilling the specified requirements. This problem is a generalized, more complex version of VNE (Virtual Network Embedding) which is an NP-hard problem. In order to have practical solutions, we have developed various heuristic algorithms and evaluated them from different aspects, and compared them with MILP-based solutions. A special version of the algorithm was adjusted to softwarized data plane where the low-level data plane resources are orchestrated efficiently according to given performance requirements. The results were published at the IEEE INFOCOM 2018 conference with a detailed complexity analysis [2018-2]. At the INFOCOM conference, which is a top tier forum in our field, we also published three workshop papers. On the one hand, we have provided a hybrid algorithm that combines an online heuristic algorithm with an offline ILP-based solution [2018-3]. Basically, the system works on the basis of the online algorithm, but it runs the offline optimization periodically in the background, and then reconfigures the system at certain intervals, moving the operation state toward the global optimum. In the next paper [2018-4], we have developed an extension for OpenStack, which is one of the most common VIM (Virtual Infrastructure Manager) platforms, in order to enable the consideration of network properties and capabilities during the orchestration. As a result, OpenStack can be used for Cloud Edge resource management, where not only the central data center has computing resources but also the edge of the networks (e.g., in base stations). In the third paper we have examined the business aspects of cooperating 5G operators [2018-5]. Moreover, we appeared at this forum with an accepted demonstration as well [2018-6], where we showcased the self-healing feature of our orchestration system. The complete orchestration framework, which supports multi-operator operation and automatic consideration of several constraints and requirements, such as end-to-end delay, bandwidth requirements, affinity, anti-affinity, and link anti-affinity, is summarized in a submitted journal paper. The developed solutions and algorithms are expected to be incorporated into the products of our industrial partner.
[2018-2] B. Sonkoly, M. Szabó, B. Németh, A. Majdán, G. Pongrácz, L. Toka, “FERO: Fast and Efficient Resource Orchestrator for a Data Plane Built on Docker and DPDK,” In Proc. of IEEE INFOCOM 2018
[2018-3] B. Németh, M. Szalay, J. Doka, M. Rost, S. Schmid, L. Toka, B. Sonkoly, “Fast and efficient network service embedding method with adaptive offloading to the edge,” In Proc. of IEEE INFOCOM (Workshops) 2018
[2018-4] D. Haja, M. Szabo, M. Szalay, A. Nagy, A. Kern, L. Toka, B. Sonkoly, “How to orchestrate a distributed OpenStack,” In Proc. of IEEE INFOCOM (Workshops) 2018
[2018-5] M. Cserep, A. Recse, R. Szabo, L. Toka, “Business network formation among 5G providers,” In Proc. of IEEE INFOCOM (Workshops) 2018
[2018-6] J. Harmatos, D. Jocha, R. Szabó, B. P. Gero, B. Németh, J. Czentye, B. Sonkoly, “Self-healing in multi-constrained multi-actor virtualization orchestration,” In Proc. of IEEE INFOCOM 2018 (Demo)
Enable high data plane performance in/under/across the clouds
By the end of the day, the atomic components of the services run at the data plane as softwares on top of general purpose commodity servers. It is crucial to understand the performance of this virtualized data plane. Therefore, we analyze the impact of different hardware and virtualization platforms, network function logic and traffic traces on the performance. Based on the revealed shortcomings, we propose solutions that can mitigate the problem and are able to improve the performance. In addition, we define abstract models for the data plane and implement local orchestrator modules controlling low level resources in order to further optimize the data plane operation. Creating novel network functions is also addressed.
Results in 2021
An important result of the research related to the high-performance data plane and the competence built in recent years is the survey paper published in the journal ACM Computing Surveys [2021-11]. In addition, we continued the adaption of the service mesh concept to the telco environment. The methods traditionally developed for HTTP traffic are not in themselves suitable for implementing telco services. Therefore, we are working on the implementation of an open source framework [2021-12] that can support any transport and application layer protocol, such as UDP-type traffic essential for telco applications, in addition to all the functions provided by the Kubernetes container management system.
In addition, we have examined a new approach to the efficient handling of handovers in 5G and subsequent (6G) systems. Instead of complicated control plane mechanisms, similar functionality can be achieved making use of multipath transport mechanisms integrated into the system. This was investigated with (multipath) TCP and (multipath) QUIC protocols and the advantages and disadvantages of the approach were revealed based on laboratory measurements and emulations. The results will be summarized in a journal article that we plan to submit in the first half of next year.
Last but not least, Tamás Lévai’s long collaboration with Barath Raghavan and his group (University of Southern California, USC) resulted in an outstanding publication at the ACM SIGCOMM conference [2021-13]. In this work, natural language processing methods are used to analyze and validate protocol standards that are not always accurately formulated in RFC documents, to debug them, and finally to (somewhat) automatically generate the implementation. The concept is supported by concrete, simple RFC examples.
[2021-11] O. Michel, R. Bifulco, G. Rétvári, S. Schmid, “The Programmable Data Plane: Abstractions, Architectures, Algorithms, and Applications”, ACM Computing Surveys, 54:(4), pp. 1–36, 2021 (IF: 10.282)
[2021-12] G. Rétvári, F. Németh, R. Váradi, K. Dávid, “l7mp: A L7 Multiprotocol Proxy and Service Mesh”, https://l7mp.io, 2021, source code: https://github.com/l7mp/l7mp
[2021-13] J. Yen, T. Lévai, Q. Ye, X. Ren, R. Govindan, B. Raghavan, “Semi-automated protocol disambiguation and code generation.” in Proc. of ACM SIGCOMM, 2021
Summary of the first three years (2018-2020)
As a result of activities addressing the implementation of the high-performance data plane, several significant results were achieved and important papers were published. The previously referred INFOCOM 2018 paper [2018-2] combined a solution based on Docker, Open vSwitch and Intel’s DPDK library with the resource management component and as a result, it provides mechanisms to control the latency characteristics at the data plane and to manage the low level resources efficiently. Besides, we have defined various software data plane pipelines for telco functions and implemented them in various software switches. We have developed a general framework and elaborated a novel methodology to evaluate different software switches based on given aspects (e.g., scaling characteristics). We have carried out extensive measurements in high-speed environments and evaluated several implementations. The results have been published in a JSAC journal paper [2018-7]. In addition, a survey paper was also prepared on the programmable data plane [2018-8].
Furthermore, a new virtual switch concept together with an implementation designed for cloud environments was presented at one of the most prestigious technical forums, namely at USENIX ATC 2019 conference [2019-13]. In addition, two papers were presented at ACM CoNEXT 2019 conference, which is an outstanding achievement. The first paper [2019-14] examines the vulnerability of the widely used OVS virtual switch and defines the attack methods in details together with potential mitigation techniques. In the second one [2019-15], a new formal description based on normal forms is proposed for match-action based packet processing devices. Moreover, an IEEE survey journal paper was also published on performance acceleration techniques for virtual network functions [2019-16].
We put significant efforts on the design and implementation of a dedicated scheduler for data flow graphs at the data plane layer taking service-level objectives into consideration. The main results were summarized in a conference paper [2020-14] presented at USENIX NSDI 2020 which is a top-tier venue in our research field. Our proposal, called Batchy, is a scheduler for run-to-completion packet processing engines, which uses controlled queuing to efficiently reconstruct fragmented batches in accordance with strict service-level objectives (SLOs). Batchy comprises a runtime profiler to quantify batch-processing gain on different processing functions, an analytical model to fine-tune queue backlogs, a new queuing abstraction to realize this model in run-to-completion execution, and a one-step receding horizon controller that adjusts backlogs across the pipeline. Extensive experiments with five networking use cases taken from an official 5G benchmark suite validated our concept.
During the reporting period, we continued the research work on SDN (and beyond SDN) related topics. As widespread SDN adoption has not occurred yet due to the lack of a viable incremental deployment path and the relatively immature present state of SDN-capable devices on the market, we proposed HARMLESS, a new SDN switch design that seamlessly adds SDN capability to legacy network gear, by emulating the OpenFlow switch OS in a separate software switch component. Our proposal was presented in an IEEE/ACM Transactions on Networking (ToN) journal paper [2020-15]. Another problem what we addressed in our research was the compilation of P4 source codes (P4 can be considered as a next generation SDN concept). More specifically, we focused on a critical step of P4 compilation which targets a feasible and efficient mapping of the high-level P4 source code constructs to the physical resources exposed by the underlying hardware, while meeting data and control flow dependencies in the program. In an ACM EuroP4’20 workshop paper [2020-16], we took a new look at the algorithmic aspects of this problem, with the motivation to understand the fundamental theoretical limits and obtain better P4 pipeline embeddings, and to speed up practical P4 compilation times. In another research paper presented at ACM SOSR 2020 [2020-17], we challenged the common assumption that SDN networks shall be run only at lowest layers of the stack, ie, L2 and L3. Using as use case data center networks providing virtualized services, we showed how state-of-the-art solutions already employ some application-level processing via a central controller. With this in mind, we questioned if the lessons learned from a decade of SDN networking can be also extended to the upper layers. We made the case for a full-stack SDN framework that encompasses all protocol layers in the network stack, and called for further research in the area.
Finally, we focused on the adaptation of the service mesh concept to non-HTTP applications. We presented our results at ServiceMeshCon NA 2020 [2020-18] which was collocated with KubeCon + CloudNativeCon North America 2020 which is Cloud Native Computing Foundation’s flagship conference and one of the most important industry event. We argued that despite the increasing use of HTTP as a common application transport protocol, there are tons of legacy non-HTTP applications that would greatly benefit from the traffic management and monitoring capabilities provided by a service mesh. Taking a real telco media-plane use case as demonstrator, we made the case for l7mp, a joint industry-academy effort to build a service mesh prototype with first-class support for legacy applications. L7mp aspires to serve as an incubator project to experiment with radically new service mesh designs and features.
[2018-2] B. Sonkoly, M. Szabó, B. Németh, A. Majdán, G. Pongrácz, L. Toka, “FERO: Fast and Efficient Resource Orchestrator for a Data Plane Built on Docker and DPDK,” In Proc. of IEEE INFOCOM 2018
[2018-7] T. Lévai, G. Pongrácz, P. Megyesi, P. Vörös, S. Laki, F. Németh, G. Rétvári, “The Price for Programmability in the Software Data Plane: The Vendor Perspective”, IEEE Journal on Selected Areas in Communications (JSAC), 36:12 pp. 2621-2630, 2018 (IF: 11.42)
[2018-8] R. Bifulco, G. Rétvári, “A Survey on the Programmable Data Plane: Abstractions, Architectures, and Open Problems”, in Proc. of IEEE HPSR, 2018
[2019-13] K. Thimmaraju, S. Hermak, G. Rétvári, S. Schmid, “MTS: bringing multi-tenancy to virtual networking”, USENIX ATC 2019
[2019-14] L. Csikor, D. M. Divakaran, M. S. Kang, A. Korosi, B. Sonkoly, D. Haja, D. Pezaros, S. Schmid, G. Rétvári, “Tuple Space Explosion: A Denial-of-Service Attack Against a Software Packet Classifier”, ACM CoNEXT 2019
[2019-15] F. Németh, M. Chiesa, G. Rétvári, “Normal Forms for Match-Action Programs”, ACM CoNEXT 2019
[2019-16] L. Linguaglossa, P. Stanislav ; S. Pontarelli, G. Retvari, D. Rossi, T. Zinner, R. Bifulco, M. Jarschel, G. Bianchi, “Survey of Performance Acceleration Techniques for Network Function Virtualization”, Proceedings of the IEEE, Volume 107, Issue 4, April 2019
[2020-14] T. Lévai, F. Németh, B. Raghavan, G. Rétvári, “Batchy: Batch-scheduling Data Flow Graphs with Service-level Objectives”, 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Santa Clara, CA, USA, 2020
[2020-15] L. Csikor, M. Szalay, G. Rétvári, G. Pongrácz, D. P. Pezaros, L. Toka, “Transition to SDN is HARMLESS: Hybrid Architecture for Migrating Legacy Ethernet Switches to SDN”, IEEE/ACM Transactions on Networking (ToN), 28:1, pp. 275-288, 2020 (IF: 3.32)
[2020-16] B. Vass, E. Bérczi-Kovács, C. Raiciu, G. Rétvári, “Compiling Packet Programs to Reconfigurable Switches: Theory and Algorithms”, ACM 3rd P4 Workshop in Europe (EuroP4’20), Barcelona, Spain, 2020
[2020-17] G. Antichi, G. Rétvári, “Full-stack SDN: The next big challenge?”, ACM Symposium on SDN Research (SOSR), San Jose, CA, USA, 2020
[2020-18] G. Rétvári, “L7mp: A Multiprotocol Service Mesh for Legacy Applications”, ServiceMeshCon NA 2020, collocated with KubeCon + CloudNativeCon North America, 2020
Results in 2019
As a result of activities addressing the implementation of the high-performance data plane, several significant results were achieved and important papers were published. On the one hand, a new virtual switch concept together with an implementation designed for cloud environments was presented at one of the most prestigious technical forums, namely at USENIX ATC 2019 conference [2019-13]. On the other hand, two papers were accepted for publication at ACM CoNEXT 2019 conference, which is an outstanding achievement. The first paper [2019-14] examines the vulnerability of the widely used OVS virtual switch and defines the attack methods in details together with potential mitigation techniques. In the second one [2019-15], a new formal description based on normal forms is proposed for match-action based packet processing devices. In addition, an IEEE survey journal paper was also published on performance acceleration techniques for virtual network functions [2019-16]. And finally, our paper on scheduling of data flow graphs at the data plane layer taking service-level objectives into consideration has recently been accepted for publication at USENIX NSDI 2020 [2019-17].
[2019-13] K. Thimmaraju, S. Hermak, G. Rétvári, S. Schmid, “MTS: bringing multi-tenancy to virtual networking”, USENIX ATC 2019
[2019-14] L. Csikor, D. M. Divakaran, M. S. Kang, A. Korosi, B. Sonkoly, D. Haja, D. Pezaros, S. Schmid, G. Rétvári, “Tuple Space Explosion: A Denial-of-Service Attack Against a Software Packet Classifier”, ACM CoNEXT 2019
[2019-15] F. Németh, M. Chiesa, G. Rétvári, “Normal Forms for Match-Action Programs”, ACM CoNEXT 2019
[2019-16] L. Linguaglossa, P. Stanislav ; S. Pontarelli, G. Retvari, D. Rossi, T. Zinner, R. Bifulco, M. Jarschel, G. Bianchi, “Survey of Performance Acceleration Techniques for Network Function Virtualization”, Proceedings of the IEEE, Volume 107, Issue 4, April 2019
[2019-17] T. Lévai, F. Németh, B. Raghavan, G. Rétvári, “Batchy: Batch-scheduling Data Flow Graphs with Service-level Objectives”, USENIX NSDI 2020 (accepted)
Results in 2018
Our high-performance data plane activities consisted of several parts (O4). On the one hand, the previously mentioned INFOCOM paper [2018-2] combined a solution based on Docker, Open vSwitch and Intel’s DPDK library with the resource management component. On the other hand, in close collaboration with Ericsson, we have defined various software data plane pipelines for telco functions and implemented them in various software switches. We have developed a general framework and elaborated a novel methodology to evaluate different software switches based on given aspects (e.g., scaling characteristics). We have carried out extensive measurements in high-speed environments and evaluated several implementations. The results have been published in a JSAC journal paper [2018-7]. In addition, a survey paper was also prepared on the programmable data plane [2018-8].
[2018-2] B. Sonkoly, M. Szabó, B. Németh, A. Majdán, G. Pongrácz, L. Toka, “FERO: Fast and Efficient Resource Orchestrator for a Data Plane Built on Docker and DPDK,” In Proc. of IEEE INFOCOM 2018
[2018-7] T. Lévai, G. Pongrácz, P. Megyesi, P. Vörös, S. Laki, F. Németh, G. Rétvári, “The Price for Programmability in the Software Data Plane: The Vendor Perspective”, IEEE Journal on Selected Areas in Communications (JSAC), 36:12 pp. 2621-2630, 2018 (IF: 11.42)
[2018-8] R. Bifulco, G. Rétvári, “A Survey on the Programmable Data Plane: Abstractions, Architectures, and Open Problems”, in Proc. of IEEE HPSR, 2018