Efficient End-to-End Failure Probing Matrix Construction in Data Center Networks

Zequn Jia, Qiang Liu, Ying He, Qianqian Wu, Ren Ping Liu, and Yantao Sun

10.23919/JCN.2023.000029

Abstract :  Data centers play an essential role in the functioning of modern society. However, failures are unavoidable in data center networks (DCN) and will lead to negative impact on all applications. Therefore, researchers are interested in the rapid detection and localization of failures in DCNs.In this paper, we present a theoretical model to analyze the end-to-end failure detection methods in data center networks. Our numerical results verify that the proposed theoretical model is accurate. In addition, we propose an algorithm to construct probing matrices based on an enhanced probing path selection indicator. We also introduce deep reinforcement learning (DRL) method to solve the problem and propose a DRL-based probing matrix construction algorithm. Our experimental results show that both of the proposed algorithms for constructing probing matrices achieve better performance in detection accuracy than existing methods. We discussed different scenarios that the algorithms are applicable to that can improve detection accuracy or construction speed performance.​

Index terms : Data center network, deep reinforcement learning, failure detection, probing matrix construction.