Our paper “The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism” is released on arXiv.
This paper presents scalable hybrid-parallel algorithms for training two large-scale 3D convolutional neural networks, the CosmoFlow network, and the 3D U-Net. For the ComsoFlow network, we successfully scale the training to 2k V100 GPUs with 64x larger spatial input size, by partitioning each data sample across multiple GPUs.