Âé¶¹´«Ã½AV

Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication

Submitted by admin on Mon, 10/28/2024 - 01:24

We propose two coding schemes for distributed matrix multiplication in the presence of stragglers. These coding schemes are adaptations of Luby Transform (LT) codes and Raptor codes to distributed matrix multiplication and are termed Factored LT (FLT) codes and Factored Raptor (FRT) codes. We show that all nodes in the Tanner graph of a randomly sampled code have a tree-like neighborhood with high probability. This ensures that the density evolution analysis gives a reasonable estimate of the average recovery threshold of FLT codes.

Compressing Gradients by Exploiting Temporal Correlation in Momentum-SGD

Submitted by admin on Mon, 10/28/2024 - 01:24

An increasing bottleneck in decentralized optimization is communication. Bigger models and growing datasets mean that decentralization of computation is important and that the amount of information exchanged is quickly growing. While compression techniques have been introduced to cope with the latter, none has considered leveraging the temporal correlations that exist in consecutive vector updates. An important example is distributed momentum-SGD where temporal correlation is enhanced by the low-pass-filtering effect of applying momentum.

Coded Computing via Binary Linear Codes: Designs and Performance Limits

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed nodes. The goal is to reduce the average execution time of the computational job. We provide a connection between the problem of characterizing the average execution time of a coded distributed computing system and the problem of analyzing the error probability of codes of length $n$ used over erasure channels.