Placement of Parameter Server in Wide Area Network Topology for Geo-Distributed Machine Learning

Yongyao Li, Chenyu Fan, Xiaoning Zhang, and Yufeng Chen

10.23919/JCN.2023.000021

Abstract :  Machine learning (ML) is extensively used in a wide range of real-world applications that require data all around world to pursue high accuracy of a global model. Unfortunately, it is impossible to transmit all gathered raw data to a central data center for training due to data privacy, data sovereignty and high communication cost. This brings the idea of geodistributed machine learning (Geo-DML), which completes the training of the global ML model across multiple data centers with the bottleneck of high communication cost over the limited wide area networks (WAN) bandwidth. In this paper, we study on the problem of parameter server (PS) placement in PS architecture for communication efficiency of Geo-DML. Our optimization aims to select an appropriate data center as the PS for global training algorithm based on the communication cost. We prove the PS placement problem is NP-hard. Further, we develop an approximation algorithm to solve the problem using the randomized rounding method. In order to validate the performance of our proposed algorithm, we conduct large-scale simulations, and the simulation results on two typical carrier network topologies show that our proposed algorithm can reduce the communication cost up to 61.78% over B4 topology and 21.78% over Internet2 network topology.

Index terms : Geo-distributed machine learning, routing, wide area networks.