PODC ’19- Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing

Full Citation in the ACM Digital Library

SESSION: Awards

2019 Edsger W. Dijkstra Prize in Distributed Computing

Lorenzo Alvisi
Shlomi Dolev
Faith Ellen (chair)
Idit Keidar
Fabian Kuhn
Jukka Suomela

The committee decided to award the 2019 Edsger W. Dijkstra Prize in Distributed Computing to Alessandro Panconesi and Aravind Srinivasan for their paper Randomized Distributed Edge Coloring via an Extension of the Chernoff-Hoeffding Bounds, SIAM Journal on Computing, volume 26, number 2, 1997, pages 350-368. A preliminary version of this paper appeared as Fast Randomized Algorithms for Distributed Edge Coloring, Proceedings of the Eleventh Annual ACM Symposium Principles of Distributed Computing (PODC), 1992, pages 251-262.

The paper presents a simple synchronous algorithm in which processes at the nodes of an undirected network color its edges so that the edges adjacent to each node have different colors. It is randomized, using 1.6Δ + O(log^1+ζn) colors and O(log n) rounds with high probability for any constant ζ>0, where n is the number of nodes and is the maximum degree of the nodes. This was the first nontrivial distributed algorithm for the edge coloring problem and has influenced a great deal of follow-up work. Edge coloring has applications to many other problems in distributed computing such as routing, scheduling, contention resolution, and resource allocation.

In spite of its simplicity, the analysis of their edge coloring algorithm is highly nontrivial. Chernoff-Hoeffding bounds, which assume random variables to be independent, cannot be used. Instead, they develop upper bounds for sums of negatively correlated random variables, for example, which arise when sampling without replacement. More generally, they extend Chernoff-Hoeffding bounds to certain random variables they call λ-correlated. This has directly inspired more specialized concentration inequalities. The new techniques they introduced have also been applied to the analyses of important randomized algorithms in a variety of areas including optimization, machine learning, cryptography, streaming, quantum computing, and mechanism design.

2019 Principles of Distributed Computing Doctoral Dissertation Award

Prasad Jayanti
Nancy A. Lynch
Boaz Patt-Shamir
Ulrich Schmid (chair)

The winner of the 2019 Principles of Distributed Computing Doctoral Dissertation Award is Dr. Sepehr Assadi for his dissertation Combinatorial Optimization on Massive Datasets: Streaming, Distributed, and Massively Parallel Computation, written under the supervision of Prof. Sanjeev Khanna at the University of Pennsylvania.

The thesis resolves a number of long-standing problems in the exciting and still relatively new area of sublinear computation. The area of sublinear computation focuses on design of algorithms that use sublinear space, time, or communication to solve global optimization problems on very large datasets. In addition to addressing a wide range of different problems, comprising graph optimization problems (matching, vertex cover, and connectivity), submodular optimization (set cover and maximum coverage), and resource-constrained optimization (combinatorial auctions and learning), these problems are studied in three different models of computation, namely, streaming algorithms, multiparty communication, and massively parallel computation (MPC). The thesis also reveals interesting relations between these different models, including generic algorithmic and analysis techniques that can be applied in all of them.

For many fundamental optimization problems, the thesis gives asymptotically matching algorithmic and intractability results, completely resolving several long-standing problems. This is accomplished by using a broad spectrum of mathematical methods in very detailed and intricate proofs. In addition to a wide variety of classic techniques, ranging from graph theory, combinatorics, probability, linear algebra and calculus, it also makes heavy use of communication complexity and information theory, for example.

Sepehr’s dissertation work has been published in a remarkably large number of top-conference papers. It received multiple best paper awards and multiple special issue invitations, as well as two invitations to the Highlights of Algorithms (HALG) conference. Due to its contributions to the field of distributed computing and all the merits mentioned above, the award committee unanimously selected this thesis as the winner of the 2019 Principles of Distributed Computing Doctoral Dissertation Award.

SESSION: Keynote Lecture 1

Local Computation Algorithms

Ronitt Rubinfeld

Consider a setting in which inputs to and outputs from a computational problem are so large, that there is not time to read them in their entirety. However, if one is only interested in small parts of the output at any given time, is it really necessary to solve the entire computational problem? Is it even necessary to view the whole input? We survey recent work in the model of “local computation algorithms” which for a given input, supports queries by a user to values of specified bits of a legal output. The goal is to design local computation algorithms in such a way that very little of the input needs to be seen in order to determine the value of any single bit of the output. Though this model describes sequential computations, techniques from local distributed algorithms have been extremely important in designing efficient local computation algorithms.

In this talk, we describe results on a variety of problems for which sublinear time and space local computation algorithms have been developed — we will give special focus to finding maximal independent sets and sparse spanning graphs.

SESSION: Session 1

Symmetry Breaking in the Plane: Rendezvous by Robots with Unknown Attributes

Jurek Czyzowicz
Leszek Gasieniec
Ryan Killick
Evangelos Kranakis

We study a fundamental question related to the feasibility of deterministic symmetry breaking in the infinite Euclidean plane for two robots that have minimal or no knowledge of the respective capabilities and “measuring instruments” of themselves and each other. Assume that two anonymous mobile robots are placed at different locations at unknown distance d from each other on the infinite Euclidean plane. Each robot knows neither the location of itself nor of the other robot. The robots cannot communicate wirelessly, but have a certain nonzero visibility radius r (with range r unknown to the robots). By rendezvous we mean that they are brought at distance at most r of each other by executing symmetric (identical) mobility algorithms. The robots are moving with unknown and constant but not necessarily identical speeds, their clocks and pedometers may be asymmetric, and their chirality inconsistent.

We demonstrate that rendezvous for two robots is feasible under the studied model iff the robots have either: different speeds; or different clocks; or different orientations but equal chiralities. When the rendezvous is feasible, we provide a universal algorithm which always solves rendezvous despite the fact that the robots have no knowledge of which among their respective parameters may be different.

Composable Computation in Discrete Chemical Reaction Networks

Eric E. Severson
David Haley
David Doty

We study the composability of discrete chemical reaction networks (CRNs) that stably compute (i.e., with probability 0 of error) integer-valued functions ƒ:N^d→ N. We consider output-oblivious CRNs in which the output species is never a reactant (input) to any reaction. The class of output-oblivious CRNs is fundamental, appearing in earlier studies of CRN computation, because it is precisely the class of CRNs that can be composed by simply renaming the output of the upstream CRN to match the input of the downstream CRN.

Our main theorem precisely characterizes the functions f stably computable by output-oblivious CRNs with an initial leader. The key necessary condition is that for sufficiently large inputs, f is the minimum of a finite number of nondecreasing quilt-affine functions. (An affine function is linear with a constant offset; a quilt-affine function is linear with a periodic offset).

How to Spread a Rumor: Call Your Neighbors or Take a Walk?

George Giakkoupis
Frederik Mallmann-Trenn
Hayk Saribekyan

We study the problem of randomized information dissemination in networks. We compare the now standard PUSH-PULL protocol, with agent-based alternatives where information is disseminated by a collection of agents performing independent random walks. In the VISIT-EXCHANGE protocol, both nodes and agents store information, and each time an agent visits a node, the two exchange all the information they have. In the MEET-EXCHANGE protocol, only the agents store information, and exchange their information with each agent they meet.

We consider the broadcast time of a single piece of information in an n-node graph for the above three protocols, assuming a linear number of agents that start from the stationary distribution. We observe that there are graphs on which the agent-based protocols are significantly faster than PUSH-PULL, and graphs where the converse is true. We attribute the good performance of agent-based algorithms to their inherently fair bandwidth utilization, and conclude that, in certain settings, agent-based information dissemination, separately or in combination with PUSH-PULL, can significantly improve the broadcast time.

The graphs considered above are highly non-regular. Our main technical result is that on any regular graph of at least logarithmic degree, PUSH-PULL and VISIT-EXCHANGE have the same asymptotic broadcast time. The proof uses a novel coupling argument which relates the random choices of vertices in PUSH-PULL with the random walks in VISIT-EXCHANGE. Further, we show that the broadcast time of MEET-EXCHANGE is asymptotically at least as large as the other two’s on all regular graphs, and strictly larger on some regular graphs.

As far as we know, this is the first systematic and thorough comparison of the running times of these very natural information dissemination protocols.

Efficient Size Estimation and Impossibility of Termination in Uniform Dense Population Protocols

David Doty
Mahsa Eftekhari

We study uniform population protocols: networks of anonymous agents whose pairwise interactions are chosen at random, where each agent uses an identical transition algorithm that does not depend on the population size n. Many existing polylog(n) time protocols for leader election and majority computation are nonuniform: to operate correctly, they require all agents to be initialized with an approximate estimate of n (specifically, the value łfloorłog n\rfloor). Our first main result is a uniform protocol for calculating łog(n) \pm O(1) with high probability in O(łog^2 n) time and O(łog^4 n) states (O(łog łog n) bits of memory). The protocol is not terminating : it does not signal when the estimate is close to the true value of łog n. If it could be made terminating with high probability, this would allow composition with protocols requiring a size estimate initially. We do show how our main protocol can be indirectly composed with others in a simple and elegant way, based on leaderless phase clocks, demonstrating that those protocols can in fact be made uniform. However, our second main result implies that the protocol cannot be made terminating, a consequence of a much stronger result: a uniform protocol for any task requiring more than constant time cannot be terminating even with probability bounded above 0, if infinitely many initial configurations are dense : any state present initially occupies Ømega(n) agents. (In particular no leader is allowed.) Crucially, the result holds no matter the memory or time permitted.

On Counting the Population Size

Petra Berenbrink
Dominik Kaaser
Tomasz Radzik

We consider the problem of counting the population size in the population model. In this model, we are given a distributed system of n identical agents which interact in pairs with the goal to solve a common task. In each time step, the two interacting agents are selected uniformly at random. In this paper, we consider so-called uniform protocols, where the actions of two agents upon an interaction may not depend on the population size n. We present two population protocols to count the size of the population: protocol Approximate, which computes with high probability either [log n] or [log n], and protocol CountExact, which computes the exact population size in optimal O(log n) interactions, using Õ (n) states. Both protocols can also be converted to stable protocols that give a correct result with probability 1 by using an additional multiplicative factor of O(log n) states.

Self-Stabilizing Leader Election

Hsueh-Ping Chen
Ho-Lin Chen

In this paper, we study the self-stabilizing leader election (SSLE) problem in population protocols. We construct a non-deterministic population protocol that can solve SSLE on directed rings of all sizes. Our algorithm uses a constant number of states and can be converted to a deterministic population protocol on undirected rings using previous techniques [8]. Furthermore, we extend our algorithm to perform SSLE on directed and undirected tori of arbitrary sizes.

Logarithmic Expected-Time Leader Election in Population Protocol Model

Yuichi Sudo
Fukuhito Ooshita
Taisuke Izumi
Hirotsugu Kakugawa
Toshimitsu Masuzawa

In this paper, we present a leader election protocol in the population protocol model that stabilizes within O(log n) parallel time in expectation with O(log n) states per agent, where n is the number of agents. Given a rough knowledge m of the population size n such that m ≥ = log₂ n and m=O(log n), this protocol guarantees that exactly one leader is elected and the unique leader is kept forever thereafter.

On Site Fidelity and the Price of Ignorance in Swarm Robotic Central Place Foraging Algorithms

Abhinav Aggarwal
G. Matthew Fricke
Diksha Gupta
Melanie E. Moses

A key factor limiting the performance of central place foraging algorithms is the awareness of the agent(s) about the location of food items around the nest. We study the ratio of how much time an ignorant agent takes relative to an omniscient forager for complete collection of food items in the arena. This effectively quantifies the penalty each algorithm pays for not knowing (or choosing to ignore information gained about) where the resources are located. We model the effect of depletion of food items from the arena on the foraging efficiency over time and analytically verify that returning to the location of the last food item found strongly helps in counteracting this effect. To the best of our knowledge, these results have only been empirically argued so far.

SESSION: Session 2

Improved Distributed Expander Decomposition and Nearly Optimal Triangle Enumeration

Yi-Jun Chang
Thatchaphol Saranurak

An(ε,φ)-expander decomposition of a graph G=(V,E) is a clustering of the vertices V=V₁∪…∪ V_x such that (1) each cluster V_i induces subgraph with conductance at least φ, and (2) the number of inter-cluster edges is at most ε|E|. In this paper, we give an improved distributed expander decomposition, and obtain a nearly optimal distributed triangle enumeration algorithm in the CONGEST model.

Specifically, we construct an (ε,φ)-expander decomposition with φ=(ε/log n)^{2 O(k)} in O(n^2/k ⋅ poly (1/φ, log n))rounds for any ε ∈(0,1) and positive integer k. For example, a (1/n^o(1), 1/n^o(1))-expander decomposition only requires O(n^o(1)) rounds to compute, which is optimal up to subpolynomial factors, and a (0.01,1/poly log n)-expander decomposition can be computed in O(n^γ) rounds, for any arbitrarily small constant γ > 0. Previously, the algorithm by Chang, Pettie, and Zhang can construct a (1/6,1/poly log n)-expander decomposition using Õ (n^1-δ) rounds for any δ > 0, with a caveat that the algorithm is allowed to throw away a set of edges into an extra part which form a subgraph with arboricity at most n^δ. Our algorithm does not have this caveat.

By slightly modifying the distributed algorithm for routing on expanders by Ghaffari, Kuhn and Su [PODC’17], we obtain a triangle enumeration algorithm using Õ(n^1/3) rounds. This matches the lower bound by Izumi and LeGall [PODC’17] and Pandurangan, Robinson and Scquizzato [SPAA’18] of Ø(n^1/3) which holds even in the CONGESTED-CLIQUE model. To the best of our knowledge, this provides the first non-trivial example for a distributed problem that has essentially the same complexity (up to a polylogarithmic factor) in both CONGEST and CONGESTED-CLIQUE.

The key technique in our proof is the first distributed approximation algorithm for finding a low conductance cut that is as balanced as possible. Previous distributed sparse cut algorithms do not have this nearly most balanced guarantee.

Fast Approximate Shortest Paths in the Congested Clique

Keren Censor-Hillel
Michal Dory
Janne H. Korhonen
Dean Leitersdorf

We design fast deterministic algorithms for distance computation in the CONGESTED CLIQUE model. Our key contributions include:

A (2+ε)-approximation for all-pairs shortest paths problem in O(log²n / ε) rounds on unweighted undirected graphs. With a small additional additive factor, this also applies for weighted graphs. This is the first sub-polynomial constant-factor approximation for APSP in this model.
A (1+ε)-approximation for multi-source shortest paths problem from O(√n) sources in O(log² n / ε) rounds on weighted undirected graphs. This is the first sub-polynomial algorithm obtaining this approximation for a set of sources of polynomial size.

Our main techniques are new distance tools that are obtained via improved algorithms for sparse matrix multiplication, which we leverage to construct efficient hopsets and shortest paths. Furthermore, our techniques extend to additional distance problems for which we improve upon the state-of-the-art, including diameter approximation, and an exact single-source shortest paths algorithm for weighted undirected graphs in Õ (n_1/6) rounds.

Quantum Distributed Algorithm for the All-Pairs Shortest Path Problem in the CONGEST-CLIQUE Model

Taisuke Izumi
Fran{c c}ois Le Gall

The All-Pairs Shortest Path problem (APSP) is one of the most central problems in distributed computation. In the CONGEST-CLIQUE model, in which n nodes communicate with each other over a fully connected network by exchanging messages of O(łog n) bits in synchronous rounds, the best known general algorithm for APSP uses Õ(n_1/3) rounds. Breaking this barrier is a fundamental challenge in distributed graph algorithms. In this paper we investigate for the first time quantum distributed algorithms in the CONGEST-CLIQUE model, where nodes can exchange messages of O(log n) quantum bits, and show that this barrier can be broken: we construct a Õ(n_1/4)-round quantum distributed algorithm for the APSP over directed graphs with polynomial weights in the CONGEST-CLIQUE model. This speedup in the quantum setting contrasts with the case of the standard CONGEST model, for which Elkin et al. (PODC 2014) showed that quantum communication does not offer significant advantages over classical communication.

Our quantum algorithm is based on a relationship discovered by Vassilevska Williams and Williams (JACM 2018) between the APSP and the detection of negative triangles in a graph. The quantum part of our algorithm exploits the framework for quantum distributed search recently developed by Le Gall and Magniez (PODC 2018). Our main technical contribution is a method showing how to implement multiple quantum searches (one for each edge in the graph) in parallel without introducing congestions.

Deterministic Distributed Dominating Set Approximation in the CONGEST Model

Janosch Deurer
Fabian Kuhn
Yannic Maus

We develop deterministic approximation algorithms for the minimum dominating set problem in the CONGEST model with an almost optimal approximation guarantee. For ε 1/ poly log Δ we obtain two algorithms with approximation factor (1 + ε)(1 + ł n (Δ + 1)) and with runtimes 2^O(√ log n log log n) and O(Δ poly log Δ + poly log Δ log* n), respectively. Further we show how dominating set approximations can be deterministically transformed into a connected dominating set in the CONGEST model while only increasing the approximation guarantee by a constant factor. This results in a deterministic O(log Δ)-approximation algorithm for the minimum connected dominating set with time complexity 2O(√ log n log log n).

Optimal Distributed Covering Algorithms

Ran Ben Basat
Guy Even
Ken-ichi Kawarabayashi
Gregory Schwartzman

We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank ƒ. This problem is equivalent to the Minimum Weight Set Cover problem in which the frequency of every element is bounded by ƒ. The approximation factor of our algorithm is (ƒ + ε). Let Δ denote the maximum degree in the hypergraph. Our algorithm runs in the CONGEST model and requires O(log Δ/log log Δ) rounds, for constants ε ∈ (0,1] and ƒ ∈ N⁺. This is the first distributed algorithm for this problem whose running time does not depend on the vertex weights nor the number of vertices. Thus adding another member to the exclusive family of emphprovably optimal distributed algorithms.

For constant values of ƒ and ε, our algorithm improves over the (&3402; + ε)-approximation algorithm of [16] whose running time is O(log Δ + log W), where W is the ratio between the largest and smallest vertex weights in the graph. Our algorithm also achieves an ƒ-approximation for the problem in O(ƒ log n) rounds, improving over the classical result of [13] that achieves a running time of O(ƒ log ² n). Finally, for weighted vertex cover (ƒ=2) our algorithm achieves a deterministic running time of O(log n), matching the randomized previously best result of [14].

We also show that integer covering-programs can be reduced to the Minimum Weight Set Cover problem in the distributed setting. This allows us to achieve an (ƒ + ε)-approximate integral solution in O(1 + ƒ /log n)⋅ log Δ over log log Δ+(ƒ ⋅ log M)^1.01⋅ log ^ε-1 ⋅(log Δ)^0.01)) rounds, where ƒ bounds the number of variables in a constraint, Δ bounds the number of constraints a variable appears in, and M=max{1,1/a min},, a_min, where a_min is the smallest normalized constraint coefficient. This significantly improves over the results of [16] for the integral case, which achieves the same guarantees in O(ε^-4 ⋅ ƒ⁴ ⋅ log ƒ ⋅ log(M ⋅ Δ)) rounds.

SESSION: Session 3

Secure Distributed Computing Made (Nearly) Optimal

Merav Parter
Eylon Yogev

In this paper, we study secure distributed algorithms that are nearly optimal, with respect to running time, for the given input graph G. Roughly speaking, an algorithm is secure if the nodes learn only their final output while gaining no information on the input (or output) of other nodes.

A graph theoretic framework for secure distributed computation was recently introduced by the authors (SODA 2019). This framework is quite general and it is based on a new combinatorial structure called private neighborhood trees : a collection of n trees T(u₁), …, T(u_n) such that each tree T(u_i) spans the neighbors of u_i without going through u_i. Intuitively, each tree T(u_i) allows all neighbors of u_i to exchange a secret that is hidden from u_i. The efficiency of the framework depends on two key parameters of these trees: their depth and the amount of overlap. In a (d,c)-private neighborhood trees each tree T(u_i) has depth O(d) and each edge e ∈ G appears in at most O(c) different trees. An existentially optimal construction of private neighborhood trees with d=O(Δ … D) and c=Õ (D) was presented therein. We make two key contributions:

Universally Optimal Private Trees: We show a combinatorial construction of nearly (universally) optimal (d,c)-private neighborhood trees with d + c=Õ (OPT(G)) for any input graph G. Perhaps surprisingly, we show that OPT(G) is equal to the best depth possible for these trees even without the congestion constraint. We also present efficient distributed constructions of these private trees.

Optimal Secure Computation: Using the optimal constructions above, we get a secure compiler for distributed algorithms where the overhead for each round is Õ (poly(Δ)… OPT(G)). As our second key contribution, we design an optimal compiler with an overhead of merely Õ (OPT(G)) per round for a class of “simple” algorithms. This class includes many standard distributed algorithms such as Luby-MIS, the standard logarithmic-round algorithms for matching and Δ + 1-coloring, as well as the computation of aggregate functions.

With Great Speed Come Small Buffers: Space-Bandwidth Tradeoffs for Routing

Avery Miller
Boaz Patt-Shamir
Will Rosenbaum

We consider the Adversarial Queuing Theory (AQT) model, where packet arrivals are subject to a maximum average rate 0 ≤ ρ ≤ 1 and burstiness σ ≤ 0. In this model, we analyze the size of buffers required to avoid overflows in the basic case of a path. Our main results characterize the space required by the average rate and the number of distinct destinations: we show that O(ℓ d^1/ℓ + σ) space suffice, where d is the number of distinct destinations and ℓ=⌋1/ρ⌊ and we show that Ω(1 over ℓ ^d1/ℓ + σ) space is necessary. For directed trees, we describe an algorithm whose buffer space requirement is at most 1 + d’ + σ where d’ is the maximum number of destinations on any root-leaf path.

Plain SINR is Enough!

Magnus M. Halldorsson
Tigran Tonoyan

We develop randomized distributed algorithms for many of the most fundamental communication problems in the wireless SINR model, including (multi-message) broadcast, local broadcast, coloring, MIS, and aggregation. The complexity of the algorithms is optimal up to polylogarithmic preprocessing time. It shows — contrary to expectation — that the plain vanilla SINR model is just as powerful and fast (modulo the preprocessing) as various extensions studied, including power control, carrier sense, collision detection, free acknowledgements, and GPS location. A key component of the algorithms is an efficient simulation of CONGEST algorithms on a constant-density SINR backbone.

Efficient Multiparty Interactive Coding for Insertions, Deletions, and Substitutions

Ran Gelles
Yael Tauman Kalai
Govind Ramnarayan

In the field of interactive coding, two or more parties wish to carry out a distributed computation over a communication network that may be noisy. The ultimate goal is to develop efficient coding schemes that can tolerate a high level of noise while increasing the communication by only a constant factor (i.e., constant rate).

In this work we consider synchronous communication networks over an arbitrary topology, in the powerful adversarial insertion-deletion noise model. Namely, the noisy channel may adversarially alter the content of any transmitted symbol, as well as completely remove a transmitted symbol or inject a new symbol into the channel.

We provide efficient, constant rate schemes that successfully conduct any computation with high probability as long as the adversary corrupts at most ε over m fraction of the total communication, where m is the number of links in the network and ε is a small constant. This scheme assumes an oblivious adversary which is independent of the parties’ inputs and randomness. We can remove this assumption and resist a worst-case adversary at the price of being resilient to ε over m log m errors.

While previous work considered the insertion-deletion noise model in the two-party setting, to the best of our knowledge, our scheme is the first multiparty scheme that is resilient to insertions and deletions. Furthermore, our scheme is the first computationally efficient scheme in the multiparty setting that is resilient to adversarial noise.

Multiparty Interactive Communication with Private Channels

Abhinav Aggarwal
Varsha Dani
Thomas P. Hayes
Jared Saia

A group of n players wants to run a distributed protocol ℘ over a network where communication occurs via private point-to-point channels. Can we efficiently simulate ℘ in the presence of an adversary who knows ℘ and is able to maliciously flip bits on the channels? We show that this is possible, even when L, the number of bits sent in ℘, the average message size α in ℘, and T, the number of bits flipped by the adversary are not known in advance. In particular, we show how to create a robust version of ℘, ℘ such that 1) ℘’ fails with probability at most δ, for any δ>0; and 2) ℘’ sends O( L (1 + (1/α) łog (n L/δ)) + T) bits. We note that if α is Ω (log (n L/δ), then ℘ sends only O(L+T) bits, and is therefore within a constant factor of optimal. Critically, our result requires that ℘ runs correctly in an asynchronous network and our protocol ℘ must run in a synchronous network.

Coded State Machine — Scaling State Machine Execution under Byzantine Faults

Songze Li
Saeid Sahraei
Mingchao Yu
Salman Avestimehr
Sreeram Kannan
Pramod Viswanath

We introduce Coded State Machine (CSM), an information-theoretic framework to securely and efficiently execute multiple state machines on Byzantine nodes. The standard method of solving this problem is using State Machine Replication, which achieves high security at the cost of low efficiency. CSM simultaneously achieves the optimal linear scaling in storage, throughput, and security with increasing network size. The storage is scaled via the design of Lagrange coded states and coded input commands that require the same storage size as their origins. The computational efficiency is scaled using a novel delegation algorithm, called INTERMIX, which is an information-theoretically verifiable matrix-vector multiplication algorithm of independent interest.

On Termination of a Flooding Process

Walter Hussak
Amitabh Trehan

Flooding is a fundamental distributed algorithms technique. Consider the following flooding process, for simplicity, in a synchronous message passing network: A distinguished node begins the flooding process by sending the (same) message to all its neighbours in the first round. In subsequent rounds, every node receiving the message relays a copy of the message further to all those, and only those, nodes it did not receive the message from in the previous round. However, the nodes do not remember if they’ve taken part in the flooding before and therefore will repeat the process every time they get a message. In other words, they execute an amnesiac flooding process with memory only of the present round. The flooding process terminates in a particular round when no edge in the network carries the message in that, and, hence, subsequent, rounds. We call this process Amnesiac Flooding (AF).

In this work, the main question we address is whether AF will terminate on an arbitrary network (graph) and in what time? We show that, indeed, AF terminates on any arbitrary graph. Further, AF terminates in at most D rounds in bipartite graphs and at most 2D + 1 rounds in non-bipartite graphs – in this brief announcement, we show this for the bipartite case only.

We also show that in a natural asynchronous variant of AF, an adversary can always ensure non-termination.

SESSION: Keynote Lecture 2

Towards a Theory of Randomized Shared Memory Algorithms

Philipp Woelfel

Randomization has become an invaluable tool to overcome some of the problems associated with asynchrony and faultiness. Allowing processors to use random bits helps to break symmetry, and to reduce the likelihood of undesirable schedules. As a consequence, randomized techniques can lead to simpler and more efficient algorithms, and sometimes to solutions of otherwise unsolvable computational problems. However, the design and the analysis of randomized shared memory algorithms remains challenging. This talk will give an overview of recent progress towards developing a theory of randomized shared memory algorithms.

For many years, linearizability [6] has been the gold standard of distributed correctness conditions, and the corner stone of modular programming. In deterministic algorithms, implemented linearizable methods can be assumed to be atomic. But when processes can make random choices, the situation is not the same: Probability distributions of outcomes of algorithms using linearizable methods may be very different from those using equivalent atomic operations [4]. In general, modular algorithm design is much more difficult for randomized algorithms than for deterministic ones. The first part of the talk will present a correctness condition [2, 5] that is suitable for randomized algorithms in certain settings, and will explain why in other settings no such correctness condition exists [3] and what we can do about that.

To this date, almost all randomized shared memory algorithms are Las Vegas, meaning they permit no error. Monte Carlo algorithms, which allow errors to occur with small probability, have been studied thoroughly for sequential systems. But in the shared memory world such algorithms have been neglected. The second part of this talk will discuss recent attempts to devise Monte Carlo algorithms for fundamental shared memory problems (e.g., [1]). It will also present some general techniques, that have proved useful in the design of concurrent randomized algorithms.

SESSION: Session 4

Optimal Memory-Anonymous Symmetric Deadlock-Free Mutual Exclusion

Zahra Aghazadeh
Damien Imbs
Michel Raynal
Gadi Taubenfeld
Philipp Woelfel

The notion of an anonymous shared memory, introduced by Taubenfeld in PODC 2017, considers that processes use different names for the same memory location. As an example, a location name A used by a process p and a location name B ≠ A used by another process q can correspond to the very same memory location X, and similarly for the names B used by p and A used by q which may (or may not) correspond to the same memory location Y ≠ X. In this context, the PODC paper presented a 2-process symmetric deadlock-free mutual exclusion (mutex) algorithm and a necessary condition on the size m of the anonymous memory for the existence of such an n-process algorithm. This condition states that m must be belongs to M(n) {1} where M(n)= {m: ∀ ℓ: (1) < ℓ ≤ n: gcd(ℓ,m)=1). Symmetric means here that,process identities define a specific data type which allows a process to check only if two identities are equal or not.

The present paper presents two optimal deadlock-free symmetric mutual exclusion algorithms for n-process systems where communication is through m registers. The first algorithm, which considers anonymous read/write registers, works for any m which is ≥ n and belongs to the set M(n). It follows that this condition on m is both necessary and sufficient for symmetric deadlock-free mutual exclusion in this anonymity context, and this algorithm is optimal with respect to m The second algorithm, which considers anonymous read/modify/write atomic registers, works for any m∈ M(n), which is shown to be necessary and sufficient for anonymous read/modify/write registers. It follows that, when m > 1, m ∈ M(n) is a tight characterization of the size of the anonymous shared memory needed to solve deadlock-free mutex, be the registers read/write or read/modify/write.

Constant Amortized RMR Abortable Mutex for CC and DSM

Prasad Jayanti
Siddhartha Jayanti

The Abortable mutual exclusion problem, proposed by Scott and Scherer in response to the needs in real time systems and databases, is a variant of mutual exclusion that allows processes to abort from their attempt to acquire the lock. Worst-case constant remote memory reference (RMR) algorithms for mutual exclusion using hardware instructions such as Fetch&Add or Fetch&Store have long existed for both Cache Coherent (CC) and Distributed Shared Memory (DSM) multiprocessors, but no such algorithms are known for abortable mutual exclusion. Even relaxing the worst-case requirement to amortized, algorithms are only known for the CC model.

In this paper, we improve this state-of-the-art by designing a deterministic algorithm that uses Fetch&Store (FAS) to achieve amortized O(1) RMR in both the CC and DSM models. Our algorithm supports Fast Abort (a process aborts within six steps of receiving the abort signal), and has the following additional desirable properties: it supports an arbitrary number of processes of arbitrary names, requires only O(1) space per process, and satisfies a novel fairness condition that we call “Airline FCFS”. Our algorithm is short and practical with fewer than a dozen lines of code.

A Recoverable Mutex Algorithm with Sub-logarithmic RMR on Both CC and DSM

Prasad Jayanti
Siddhartha Jayanti
Anup Joshi

In light of recent advances in non-volatile main memory technology, Golab and Ramaraju reformulated the traditional mutex problem into the novel Recoverable Mutual Exclusion (RME) problem. In the best known solution for RME, due to Golab and Hendler from PODC 2017, a process incurs at most O(√ log n log log n) remote memory references (RMRs) per passage on a system with n processes, where a passage is an interval from when a process enters the Try section to when it subsequently returns to Remainder. Their algorithm, however, guarantees this bound only for cache-coherent (CC) multiprocessors, leaving open the question of whether a similar bound is possible for distributed shared memory (DSM) multiprocessors.

We answer this question affirmatively by designing an algorithm for a system with n processes, such that, it satisfies the same complexity bound as Golab and Hendler’s for both CC and DSM multiprocessors. Our algorithm has some additional advantages over Golab and Hendler’s: (i) its Exit section is wait-free, (ii) it uses only the Fetch-and-Store instruction, and (iii) on a CC machine our algorithm needs each process to have a cache of only O(1) words, while their algorithm needs a cache of size that is a function of n.

Randomized Concurrent Set Union and Generalized Wake-Up

Siddhartha Jayanti
Robert E. Tarjan
Enric Boix-Adserà

We consider the disjoint set union problem in the asynchronous shared memory multiprocessor computation model. We design a randomized algorithm that performs at most O(log n) work per operation (with high probability), and performs at most O(m #8226; (α(n, m/(np)) + log(np/m + 1)) total work in expectation for a problem instance with m operations on n elements solved by p processes. Our algorithm is the first to have work bounds that grow sublinearly with p against an adversarial scheduler.

We use Jayanti’s Wake Up problem and our newly defined Generalized Wake Up problem to prove several lower bounds on concurrent set union. We show an Ω(log min {n,p}) expected work lower bound on the cost of any single operation on a set union algorithm. This shows that our single-operation upper bound is optimal across all algorithms when p = n^Ω(1). Furthermore, we identify a class of “symmetric algorithms” that captures the complexities of all the known algorithms for the disjoint set union problem, and prove an Ω(m•(α(n, m(np)) + log(np/m + 1))) expected total work lower bound on algorithms of this class, thereby showing that our algorithm has optimal total work complexity for this class. Finally, we prove that any randomized algorithm, symmetric or not, cannot breach an Ω(m •(α(n, m/n) + log log(np/m + 1))) expected total work lower bound.

Strongly Linearizable Implementations of Snapshots and Other Types

Sean Ovens
Philipp Woelfel

Linearizability is the gold standard of correctness conditions for shared memory algorithms, and historically has been considered the practical equivalent of atomicity. However, it has been shown that replacing atomic objects with linearizable implementations can affect the probability distribution of execution outcomes in randomized algorithms. Thus, linearizable objects are not always suitable replacements for atomic objects. A stricter correctness condition called strong linearizability has been developed and shown to be appropriate for randomized algorithms in a strong adaptive adversary model[16].

We devise several new lock-free strongly linearizable implementations from atomic registers. In particular, we give the first strongly linearizable lock-free snapshot implementation that uses bounded space. This improves on the unbounded space solution of Denysyuk and Woelfel[14]. As a building block, our algorithm uses a lock-free strongly linearizable ABA-detecting register. We obtain this object by modifying the wait-free linearizable ABA-detecting register of Aghazadeh and Woelfel [5], which, as we show, is not strongly linearizable.

Aspnes and Herlihy[8] identified a wide class of types that have wait-free linearizable implementations. These types require that any pair of operations either commute, or one overwrites the other. Aspnes and Herlihy gave a general wait-free linearizable implementation of such types, employing an atomic snapshot object. We show that this implementation is strongly linearizable, proving that all types in this class have a lock-free strongly linearizable implementation from atomic registers.

Fast Concurrent Data Sketches

Arik Rinberg
Alexander Spiegelman
Edward Bortnikov
Eshcar Hillel
Idit Keidar
Hadar Serviansky

Data sketches are approximate succinct summaries of long data streams. They are widely used for processing massive amounts of data and answering statistical queries about it. Existing libraries producing sketches are very fast, but do not allow parallelism for creating sketches using multiple threads or querying them while they are being built. We present a generic approach to parallelising data sketches efficiently and allowing them to be queried in real time, while bounding the error that such parallelism introduces. Utilising relaxed semantics and the notion of strong linearisability we prove our algorithm’s correctness and analyse the error it induces in two specific sketches. Our implementation achieves high scalability while keeping the error small. We have contributed one of our concurrent sketches to the open-source data sketches library.

Self-Stabilizing Snapshot Objects for Asynchronous Failure-Prone Networked Systems

Chryssis Georgiou
Oskar Lundström
Elad M. Schiller

A snapshot object simulates the behavior of an array of single-writer/multi-reader shared registers that can be read atomically. Delporte-Gallet et al. proposed two fault-tolerant algorithms for snapshot objects in asynchronous crash-prone message-passing systems. Their first algorithm is non-blocking; it allows snapshot operations to terminate once all write operations had ceased. It uses O(n) messages of O(n v) bits, where n is the number of nodes and v is the number of bits it takes to represent the object. Their second algorithm allows snapshot operations to always terminate independently of write operations. It incurs O(n^2) messages. The fault model of Delporte-Gallet et al. considers node failures (crashes). We aim at the design of even more robust snapshot objects. We do so through the lenses of self-stabilization—a very strong notion of fault-tolerance. In addition to Delporte-Gallet et al.’s fault model, a self-stabilizing algorithm can recover after the occurrence of transient faults; these faults represent arbitrary violations of the assumptions according to which the system was designed to operate (as long as the code stays intact). In particular, in this work, we propose self-stabilizing variations of Delporte-Gallet et al.’s non-blocking algorithm and always-terminating algorithm. Our algorithms have similar communication costs to the ones by Delporte-Gallet et al. and O(1) recovery time (in terms of asynchronous cycles) from transient faults. The main differences are that our proposal considers repeated gossiping of O(v) bits messages and deals with bounded space, which is a prerequisite for self-stabilization.

The Recoverable Consensus Hierarchy

Wojciech Golab

Herlihy’s consensus hierarchy ranks the power of various synchronization primitives for solving consensus in a model where asynchronous processes communicate through shared memory, and may fail by halting. This paper revisits the consensus hierarchy in a model with crash-recovery failures, where the specification of consensus, called recoverable consensus in this paper, is weakened by allowing non-terminating executions when a process fails infinitely often. Two variations of this model are considered: with independent process failures, and with simultaneous (i.e., system-wide) process failures. We prove two fundamental results: (i) Test-And-Set is at level 2 of the recoverable consensus hierarchy if failures are simultaneous, and similarly for any primitive at level 2 of the traditional consensus hierarchy; and (ii) Test-And-Set drops to level 1 of the hierarchy if failures are independent, unless the number of such failures is bounded. To our knowledge, this is the first separation between the simultaneous and independent crash-recovery failure models with respect to the computability of consensus.

How Fast Reads Affect Multi-Valued Register Simulations

Soma Chaudhuri
Reginald Frank
Jennifer L. Welch

We consider the problem of simulating a k-valued register in a wait-free manner using binary registers as building blocks, where k 2. We show that for any simulation using atomic binary base registers to simulate a safe k-valued register in which the read algorithm takes the optimal number of steps (log₂ k), the write algorithm must take at least log₂ k steps in the worst case. A fortiori, the same lower bound applies when the simulated register should be regular. Previously known algorithms show that both these lower bounds are tight. We also show that in order to simulate an atomic k-valued register for two readers, the optimal number of steps for the read algorithm must be strictly larger than log₂ k.

SESSION: Session 5

Topological Characterization of Consensus under General Message Adversaries

Thomas Nowak
Ulrich Schmid
Kyrill Winkler

In this paper, we provide a rigorous characterization of consensus solvability in synchronous directed dynamic networks controlled by an arbitrary message adversary using point-set topology: We extend the approach introduced by Alpern and Schneider in 1985 by introducing two novel topologies on the space of infinite executions: the process-view topology, induced by a distance function that relies on the local view of a given process in an execution, and the minimum topology, which is induced by a distance function that focuses on the local view of the process that is the last to distinguish two executions. We establish some simple but powerful topological results, which not only lead to a topological explanation of bivalence arguments, but also provide necessary and sufficient topological conditions on the admissible graph sequences of a message adversary for solving consensus. In particular, we characterize consensus solvability in terms of connectivity of the set of admissible graph sequences. For non-compact message adversaries, which are not limit-closed in the sense that there is a convergent sequence of graph sequences whose limit is not permitted, this requires the exclusion of all “fair” and “unfair” limit sequences that coincide with the forever bivalent runs constructed in bivalence proofs. For both compact and non-compact message adversaries, we also provide tailored characterizations of consensus solvability, i.e., tight conditions for impossibility and existence of algorithms, based on the broadcastability of the connected components of the set of admissible graph sequences.

Can Distributed Uniformity Testing Be Local?

Uri Meir
Dor Minzer
Rotem Oshman

In the distributed uniformity testing problem, k servers draw samples from some unknown distribution, and the goal is to determine whether the unknown distribution is uniform or whether it is ε-far from uniform, where ε is a proximity parameter. Each server decides whether to accept or reject, and these decisions are sent to a referee, who makes a final decision based on the servers’ local decisions. Uniformity testing is a particularly useful building-block, because it is complete for the problem of testing identity to any fixed distribution.

It was recently shown that distributing the task of uniformity testing allows each server to draw fewer samples than are needed in the centralized case, but so far the number of samples required for distributed uniformity testing has not been well understood. In this paper we settle this question, and also investigate the cost of using local decision rules, such as rejecting iff at least one server wants to reject (the usual decision rule used in local distributed decision). To answer these questions, we develop a new Fourier-based technique for proving lower bounds on the sample complexity of distribution testing, which lends itself particularly well to the distributed case.

Using our technique, we tightly characterize the number of samples required for uniformity testing when the referee can apply any decision function to the servers’ local decisions. We also show that if the network rejects whenever one server wants to reject, then the cost of uniformity testing is much higher, and in fact we do not gain compared to the centralized case unless the number of servers is exponential in Ω (1/ε). Finally, we apply our lower bound technique to the case where the referee applies a threshold decision rule, and also generalize a lower bound from[1] for learning an unknown input distribution.

Hardness of Distributed Optimization

Nir Bacrach
Keren Censor-Hillel
Michal Dory
Yuval Efron
Dean Leitersdorf
Ami Paz

This paper studies lower bounds for fundamental optimization problems in the CONGEST model. We show that solving problems exactly in this model can be a hard task, by providing tildeΩmega (n²) lower bounds for cornerstone problems, such as minimum dominating set (MDS), Hamiltonian path, Steiner tree and max-cut. These are almost tight, since all of these problems can be solved optimally in O(n²) rounds. Moreover, we show that even in bounded-degree graphs and even in simple graphs with maximum degree 5 and logarithmic diameter, it holds that various tasks, such as finding a maximum independent set (MaxIS) or a minimum vertex cover, are still difficult, requiring a near-tight number of tildeΩ (n) rounds.

Furthermore, we show that in some cases even approximations are difficult, by providing an tildeΩ (n²) lower bound for a (7/8+ε)-approximation for MaxIS, and a nearly-linear lower bound for an O(log n )-approximation for the k-MDS problem for any constant k geq≥ 2, as well as for several variants of the Steiner tree problem.

Our lower bounds are based on a rich variety of constructions that leverage novel observations, and reductions among problems that are specialized for the CONGEST model. However, for several additional approximation problems, as well as for exact computation of some central problems in P, such as maximum matching and max flow, we show that such constructions cannot be designed, by which we exemplify some limitations of this framework.

Broadcast Congested Clique: Planted Cliques and Pseudorandom Generators

Lijie Chen
Ofer Grossman

We develop techniques to prove lower bounds for the BCAST(log n) Broadcast Congested Clique model (a distributed message passing model where in each round, each processor can broadcast an O(log n)-sized message to all other processors). Our techniques are built to prove bounds for natural input distributions. So far, all lower bounds for problems in the model relied on constructing specifically tailored graph families for the specific problem at hand, resulting in lower bounds for artificially constructed inputs, instead of natural input distributions.

One of our results is a lower bound for the directed planted clique problem. In this problem, an input graph is either a random directed graph (each directed edge is included with probability 1/2), or a random graph with a planted clique of size k. That is, k randomly chosen vertices have all of the edges between them included, and all other edges in the graph appear with probability 1/2. The goal is to determine whether a clique exists. We show that when k = n(1/4 – ε), this problem requires a number of rounds polynomial in n.

Additionally, we construct a pseudo-random generator which fools the Broadcast Congested Clique. This allows us to show that every k round randomized algorithm in which each processor uses up to n random bits can be efficiently transformed into an O(k)-round randomized algorithm in which each processor uses only up to O(k log n) random bits, while maintaining a high success probability. The pseudo-random generator is simple to describe, computationally very cheap, and its seed size is optimal up to constant factors. However, the analysis is quite involved, and is based on the new technique for proving lower bounds in the model.

The technique also allows us to prove the first average case lower bound for the Broadcast Congested Clique, as well as an average-case time hierarchy. We hope our technique will lead to more lower bounds for problems such as triangle counting, APSP, MST, diameter, and more, for natural input distributions.

Connectivity Lower Bounds in Broadcast Congested Clique

Shreyas Pai
Sriram V. Pemmaraju

We prove three new lower bounds for graph connectivity in the 1-bit broadcast congested clique model, BCC(1). First, in the KT-0 version of BCC(1), in which nodes are aware of neighbors only through port numbers, we show an Ømega(log n) round lower bound for CONNECTIVITY even for constant-error randomized Monte Carlo algorithms. The deterministic version of this result can be obtained via the well-known “edge-crossing” argument, but, the randomized version of this result requires establishing new combinatorial results regarding the indistinguishability graph induced by inputs. In our second result, we show that the Ømega(log n) lower bound result extends to the KT-1 version of the BCC(1) model, in which nodes are aware of IDs of all neighbors, though our proof works only for deterministic algorithms. Since nodes know IDs of their neighbors in the KT-1 model, it is no longer possible to play “edge-crossing” tricks; instead we present a reduction from the 2-party communication complexity problem PARTITION in which Alice and Bob are give two set partitions on [n] and are required to determine if the join of these two set partitions equals the trivial one-part set partition. While our KT-1 CONNECTIVITY lower bound holds only for deterministic algorithms, in our third result we extend this Ømega(log n) KT-1 lower bound to constant-error Monte Carlo algorithms for the closely related CONNECTED COMPONENTS problem. We use information-theoretic techniques to obtain this result. All our results hold for the seemingly easy special case of CONNECTIVITY in which an algorithm has to distinguish an instance with one cycle from an instance with multiple cycles. Our results showcase three rather different lower bound techniques and lay the groundwork for further improvements in lower bounds for CONNECTIVITY in the BCC(1) model.

Does Preprocessing Help under Congestion?

Klaus-Tycho Foerster
Janne H. Korhonen
Joel Rybicki
Stefan Schmid

This paper investigates the power of preprocessing in the CONGEST model. Schmid and Suomela (ACM HotSDN 2013) introduced the SUPPORTED CONGEST model to study the application of distributed algorithms in Software-Defined Networks (SDNs). In this paper, we show that a large class of lower bounds in the CONGEST model still hold in the SUPPORTED model, highlighting the robustness of these bounds. This also raises the question how much does preprocessing help in the CONGEST model

SESSION: Session 6

The Distributed Complexity of Locally Checkable Problems on Paths is Decidable

Alkida Balliu
Sebastian Brandt
Yi-Jun Chang
Dennis Olivetti
Mikaël Rabie
Jukka Suomela

Consider a computer network that consists of a path with n nodes. The nodes are labeled with inputs from a constant-sized set, and the task is to find output labels from a constant-sized set subject to some local constraints—more formally, we have an LCL (locally checkable labeling) problem. How many communication rounds are needed (in the standard LOCAL model of computing) to solve this problem?

It is well known that the answer is always either O(1) rounds, or Θ(log_⋅ n) rounds, or Θ(n) rounds. In this work we show that this question is decidable (albeit PSPACE-hard): we present an algorithm that, given any LCL problem defined on a path, outputs the distributed computational complexity of this problem and the corresponding asymptotically optimal algorithm.

Hardness of Exact Distance Queries in Sparse Graphs Through Hub Labeling

Adrian Kosowski
Przemyslaw Uznanski
Laurent Viennot

A distance labeling scheme is an assignment of bit-labels to the vertices of an undirected, unweighted graph such that the distance between any pair of vertices can be decoded solely from their labels. An important class of distance labeling schemes is that of hub labelings, where a node ν ∈ G stores its distance to the so-called hubs S_ν ⊆ V, chosen so that for any u,ν ∈ V there is w ∈ S_u ∩ S_v belonging to some shortest uv path. Notice that for most existing graph classes, the best distance labelling constructions existing use at some point a hub labeling scheme at least as a key building block.

Our interest lies in hub labelings of sparse graphs, i.e., those with |E(G)| = O (n), for which we show a lowerbound of n 2^O (^{√log n)} for the average size of the hubsets. Additionally, we show a hub-labeling construction for sparse graphs of average size O(√n RS (n)^c) for some 0 < c < 1, where RS (n) is the so-called Ruzsa-Szemerédi function, linked to structure of induced matchings in dense graphs. This implies that further improving the lower bound on hub labeling size to n over 2(log n)o(1 would require a breakthrough in the study of lower bounds on RS (n), which have resisted substantial improvement in the last 70 years.

For general distance labeling of sparse graphs, we show a lowerbound of 1 over 2^Θ(√log n) SumIndex (n), where SumIndex (n) is the communication complexity of the SUM-I problem over Z_n. Our results suggest that the best achievable hub-label size and distance-label size in sparse graphs may be Θ(n over 2^{(log n)c} ) for some 0 < c < 1.,

On the Complexity of Distributed Splitting Problems

Philipp Bamberger
Mohsen Ghaffari
Fabian Kuhn
Yannic Maus
Jara Uitto

One of the fundamental open problems in the area of distributed graph algorithms is whether randomization is needed for efficient symmetry breaking. While there are poly log n-time randomized algorithms for all the classic symmetry breaking problems, for many of them, the best deterministic algorithms are almost exponentially slower. The following basic local splitting problem, which is known as weak splitting, takes a central role in this context: Each node of a graph G=(V,E) has to be colored red or blue such that each node of sufficiently large degree has at least one neighbor of each color. Ghaffari, Kuhn, and Maus [STOC ’17] showed that this seemingly simple problem is complete w.r.t. the above fundamental open question in the following sense: If there is an efficient poly log n-time determinstic distributed algorithm for weak splitting, then there is such an algorithm for all locally checkable graph problems for which an efficient randomized algorithm exists. We investigate the distributed complexity of weak splitting and some closely related problems and we in particular obtain the following results:

We obtain efficient algorithms for special cases of weak splitting in nearly regular graphs. We show that if δ=Ø(log n) and Δ are the minimum and maximum degrees of G, weak splitting can be solved deterministically in time O #916;(√ over δ • poly(log n)). Further, if δ = Ø(log log n) and Δ ≤ 2^{ε δ}, the time complexity is O(Δ over δ⋅poly(log log n)).
We prove that the following two related problems are also complete in the same sense: (I) Color the nodes of a graph with C ≤ poly log n colors such that each node with a sufficiently large polylogarithmic degree has at least 2 log n different colors among its neighbors, and (II) Color the nodes with a large constant number of colors so that for each node of a sufficiently large at least logarithmic degree d(v), the number of neighbors of each color is at most (1-εd(v) for some constant ε > 0.

On the Use of Randomness in Local Distributed Graph Algorithms

Mohsen Ghaffari
Fabian Kuhn

We attempt to better understand randomization in local distributed graph algorithms by exploring how randomness is used and what we can gain from it:

We first ask the question of how much randomness is needed to obtain efficient randomized algorithms. We show that for all locally checkable problems with poly log n-time randomized algorithms, there are such algorithms even if either (I) there is a only a single (private) independent random bit in each poly log n-neighborhood of the graph, (II) the (private) bits of randomness of different nodes are only poly log n-wise independent, or (III) there are only poly log n bits of global shared randomness (and no private randomness).
Second, we study how much we can improve the error probability of randomized algorithms. For all locally checkable problems with poly log n-time randomized algorithms, we show that there are such algorithms that succeed with probability 1-n^{-2 ε(log log n) 2} and more generally T-round algorithms, for T ≥ poly log n, with success probability 1-n^{-2 εlog 2T}. We also show that poly log n-time randomized algorithms with success probability 1-2^{-2 log ε n} for some ε > 0 can be derandomized to poly log n-time deterministic algorithms.

Both of the directions mentioned above, reducing the amount of randomness and improving the success probability, can be seen as partial derandomization of existing randomized algorithms. In all the above cases, we also show that any significant improvement of our results would lead to a major breakthrough, as it would imply significantly more efficient deterministic distributed algorithms for a wide class of problems.

Message Reduction in the LOCAL Model is a Free Lunch

Shimon Bitton
Yuval Emek
Taisuke Izumi
Shay Kutten

A new spanner construction algorithm is presented, working under the LOCAL model assuming unique edge IDs. Given an n-node communication graph, a spanner with a constant stretch and Õ(n^{1 + c}) edges (for any small constant c > 0) is constructed efficiently — i.e., in a constant number of rounds and a message complexity of Õ (n^{1 + 2c}) whp.

One of the many known applications of spanners is for reducing the number of messages of various algorithms. However, usually, one still needs to pay the cost of constructing the spanner. Due to the efficiency of the spanner construction here, we show that every t-round LOCAL algorithm can be transformed into a randomized one with the same asymptotic time complexity and Õ(t²n^{1 + O(1/log t)}) message complexity. All previous message-reduction schemes for LOCAL algorithms incur either an O(log n)-multiplicative or an O(polylog (n))-additive blow-up of the round complexity.

P-SLOCAL-Completeness of Maximum Independent Set Approximation

Yannic Maus

We prove that the maximum independent set approximation problem with polylogarithmic approximation factor is P-SLOCAL-complete.

SESSION: Keynote Lecture 3

Engineering Distributed Systems that We Can Trust (and Also Run)

Ilya Sergey

The interest in formal methods and verification of correctness-critical distributed systems is on the rise in the past few years. But what are the gains from proving statements about software in full mathematical rigour? Do they justify the high cost of verification? And how far can we extend our trust in formal methods when talking about realistic distributed systems and their client programs?

This talk is in three parts. First, I will provide an overview of the state of the art in machine-assisted reasoning about distributed consensus protocols, their implementations, and applications. Next, I will discuss the trade-offs that have to be made in order to enable mechanised proofs about runnable systems code, as well as implications of the assumptions made to describe the real-world execution environments. Lastly, I will focus on the ongoing work propelled by the programming languages community towards engineering modular proofs about distributed protocols-a way to build correct-by-construction composite systems from verified reusable components.

SESSION: Session 7

The Consensus Number of a Cryptocurrency

Rachid Guerraoui
Petr Kuznetsov
Matteo Monti
Matej Pavlovič
Dragos-Adrian Seredinschi

Many blockchain-based algorithms, such as Bitcoin, implement a decentralized asset transfer system, often referred to as a cryptocurrency. As stated in the original paper by Nakamoto, at the heart of these systems lies the problem of preventing double-spending ; this is usually solved by achieving consensus on the order of transfers among the participants. By treating the asset transfer problem as a concurrent object and determining its consensus number, we show that consensus is not necessary to prevent double-spending. We first consider the problem as defined by Nakamoto, where only a single process—the account owner—can withdraw from each account. Safety and liveness need to be ensured for correct account owners, whereas misbehaving account owners might be unable to perform transfers. We show that the consensus number of an asset transfer object is 1. We then consider a more general k-shared asset transfer object where up to k processes can atomically withdraw from the same account, and show that this object has consensus number k. We first establish these these results in the context of shared memory with benign faults, in order to properly understand the level of difficulty of the asset transfer problem. Then, we translate our result in the more practically relevant message passing setting with Byzantine players. We describe an asynchronous Byzantine fault-tolerant asset transfer implementation that is both simpler and more efficient than state-of-the-art consensus-based solutions. Our results are applicable to both the permissioned (private) and permissionless (public) setting, as normally their differentiation is hidden by the abstractions on top of which our algorithms are based.

Communication Complexity of Byzantine Agreement, Revisited

Ittai Abraham
T-H. Hubert Chan
Danny Dolev
Kartik Nayak
Rafael Pass
Ling Ren
Elaine Shi

As Byzantine Agreement (BA) protocols find application in large-scale decentralized cryptocurrencies, an increasingly important problem is to design BA protocols with improved communication complexity. A few existing works have shown how to achieve subquadratic BA under an adaptive adversary. Intriguingly, they all make a common relaxation about the adaptivity of the attacker, that is, if an honest node sends a message and then gets corrupted in some round, the adversary cannot erase the message that was already sent – henceforth we say that such an adversary cannot perform “after-the-fact removal”. By contrast, many (super-)quadratic BA protocols in the literature can tolerate after-the-fact removal. In this paper, we first prove that disallowing after-the-fact removal is necessary for achieving subquadratic-communication BA.

Next, we show a new subquadratic binary BA construction (of course, assuming no after-the-fact removal) that achieves near- optimal resilience and expected constant rounds under standard cryptographic assumptions and a public-key infrastructure (PKI). In comparison, all known subquadratic protocols make additional strong assumptions such as random oracles or the ability of honest nodes to erase secrets from memory, and even with these strong assumptions, no prior work can achieve the above properties. Lastly, we show that some setup assumption is necessary for achieving subquadratic multicast-based BA.

Exact Byzantine Consensus on Undirected Graphs under Local Broadcast Model

Muhammad Samir Khan
Syed Shalan Naqvi
Nitin H. Vaidya

This paper considers the Byzantine consensus problem for nodes with binary inputs. The nodes are interconnected by a network represented as an undirected graph, and the system is assumed to be synchronous. Under the classical point-to-point communication model, it is well-known that the following two conditions are both necessary and sufficient to achieve Byzantine consensus among n nodes in the presence of up to ƒ Byzantine faulty nodes: n & 3 #8805; 3 ≥ ƒ+ 1 and vertex connectivity at least 2 ƒ + 1. In the classical point-to-point communication model, it is possible for a faulty node to equivocate, i.e., transmit conflicting information to different neighbors. Such equivocation is possible because messages sent by a node to one of its neighbors are not overheard by other neighbors.

This paper considers the local broadcast model. In contrast to the point-to-point communication model, in the local broadcast model, messages sent by a node are received identically by all of its neighbors. Thus, under the local broadcast model, attempts by a node to send conflicting information can be detected by its neighbors. Under this model, we show that the following two conditions are both necessary and sufficient for Byzantine consensus: vertex connectivity at least ⌋ 3 fƒ / 2 ⌊ + 1 and minimum node degree at least 2 ƒ. Observe that the local broadcast model results in a lower requirement for connectivity and the number of nodes n, as compared to the point-to-point communication model.

We extend the above results to a hybrid model that allows some of the Byzantine faulty nodes to equivocate. The hybrid model bridges the gap between the point-to-point and local broadcast models, and helps to precisely characterize the trade-off between equivocation and network requirements.

Asymptotically Optimal Validated Asynchronous Byzantine Agreement

Ittai Abraham
Dahlia Malkhi
Alexander Spiegelman

We provide a new protocol for Validated Asynchronous Byzantine Agreement in the authenticated setting. Validated (multi-valued) Asynchronous Byzantine Agreement is a key building block in constructing Atomic Broadcast and fault-tolerant state machine replication in the asynchronous setting. Our protocol has optimal resilience of ƒ < n/3 Byzantine failures and asymptotically optimal expected O(1) running time to reach agreement. Honest parties in our protocol send only an expected O(n²) messages where each message contains a value and a constant number of signatures. Hence our total expected communication is O(n²) words. The best previous result of Cachin et al. from 2001 solves Validated Byzantine Agreement with optimal resilience and O(1) expected time but with O(n³) expected word communication. Our work addresses an open question of Cachin et al. from 2001 and improves the expected word communication from O(n³) to asymptotically optimal O(n²).

HotStuff: BFT Consensus with Linearity and Responsiveness

Maofan Yin
Dahlia Malkhi
Michael K. Reiter
Guy Golan Gueta
Ittai Abraham

We present HotStuff, a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay–a property called responsiveness—and with communication complexity that is linear in the number of replicas. To our knowledge, HotStuff is the first partially synchronous BFT replication protocol exhibiting these combined properties. Its simplicity enables it to be further pipelined and simplified into a practical, concise protocol for building large-scale replication services.

Fault Tolerant Gradient Clock Synchronization

Johannes Bund
Christoph Lenzen
Will Rosenbaum

Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems, and the optimal achievable local skew in general fault-free networks. However, so far nothing non-trivial is known about the local skew that can be achieved in non-fully-connected topologies even under a single Byzantine fault. In this work, we show that asymptotically optimal local skew can be achieved in the presence of Byzantine faults.

Our approach combines the Lynch-Welch algorithm [19] for synchronizing a clique of n nodes with up to ƒ < n/3 Byzantine faults, and the gradient clock synchronization (GCS) algorithm by Lenzen et al. [15] in order to render the latter resilient to faults. This is not possible on general graphs, so we augment an arbitrary input graph G by replacing each node with a fully connected cluster of 3 ƒ +1 copies, and execute an instance of the Lynch-Welch algorithm within each cluster. We interpret the clusters as supernodes executing the GCS algorithm on G, where each node in the cluster maintains an estimate of the logical clock of its supernode. By also fully connecting clusters corresponding to neighbors in l G, supernodes maintain estimates of neighboring clusters’ logical clocks. We achieve asymptotically optimal local skew, assuming that no cluster contains more than ƒ faulty nodes. This construction yields factors of O(ƒ) and O(ƒ²) overheads in terms of nodes and edges, respectively. Since tolerating ƒ faulty neighbors trivially requires degrees larger than ƒ, these overheads are asymptotically optimal.

Bootstrapping Public Blockchains Without a Trusted Setup

Abhinav Aggarwal
Mahnush Movahedi
Jared Saia
Mahdi Zamani

We propose a protocol that allows the participants of a permissionless decentralized system to agree on a set of identities in the presence of a computationally-bounded Byzantine adversary. Our protocol guarantees that the fraction of identities belonging to the adversary in the set of identities is at most equal to the total computational hash power of the adversary.

We significantly improve on the existing state-of-the-art in the following four ways. First, our algorithm runs in expected O(1) rounds, in contrast to previous results which require O(log n/log log n) rounds, where n is the number of initial nodes in the system. Second, we require each node to solve just one computational puzzle, whereas previous algorithms require O(log n/log log n) puzzles per node. Third, our algorithm sends only O(n) bits per node in expectation, whereas previous algorithms send O( (log² n/log log n)) bits in expectation. Finally, in contrast to past results, our algorithm handles dynamic joining and leaving of nodes.

SESSION: Session 8

Hardness of Minimal Symmetry Breaking in Distributed Computing

Alkida Balliu
Juho Hirvonen
Dennis Olivetti
Jukka Suomela

A graph is weakly 2-colored if the nodes are labeled with colors black and white such that each black node is adjacent to at least one white node and vice versa. In this work we study the distributed computational complexity of weak 2-coloring in the standard łocal model of distributed computing, and how it is related to the distributed computational complexity of other graph problems.

First, we show that weak 2-coloring is a minimal distributed symmetry-breaking problem for regular even-degree trees and high-girth graphs: if there is any non-trivial locally checkable labeling problem that is solvable in o(log^⋅ n) rounds with a distributed graph algorithm in the middle of a regular even-degree tree, then weak 2-coloring is also solvable in o(log^⋅ n) rounds there.

Second, we prove a tight lower bound of Ω(log ^⋅ n) for the distributed computational complexity of weak 2-coloring in regular trees; previously only a lower bound of Ω n(log log^⋅ n) was known. By minimality, the same lower bound holds for any non-trivial locally checkable problem inside regular even-degree trees.

An Automatic Speedup Theorem for Distributed Problems

Sebastian Brandt

Recently, Brandt et al.\ [STOC’16] proved a lower bound for the distributed Lovász Local Lemma, which has been conjectured to be tight for sufficiently relaxed LLL criteria by Chang and Pettie [FOCS’17]. At the heart of their result lies a speedup technique that, for graphs of girth at least 2t+2, transforms any t-round algorithm for one specific LLL problem into a (t-1)-round algorithm for the same problem. We substantially improve on this technique by showing that such a speedup exists for any locally checkable problem ¶i, with the difference that the problem ¶i_1 the inferred (t-1)-round algorithm solves is not (necessarily) the same problem as ¶i. Our speedup is automatic in the sense that there is a fixed procedure that transforms a description for ¶i into a description for ¶i_1 and reversible in the sense that any (t-1)-round algorithm for ¶i_1 can be transformed into a t-round algorithm for ¶i. In particular, for any locally checkable problem ¶i with exact deterministic time complexity T(n, Δ) łeq t on graphs with n nodes, maximum node degree Δ, and girth at least 2t+2, there is a sequence of problems ¶i_1, ¶i_2, \dots with time complexities T(n, Δ)-1, T(n, Δ)-2, \dots, that can be inferred from ¶i. As a first application of our generalized speedup, we solve a long-standing open problem of Naor and Stockmeyer [STOC’93]: we show that weak 2-coloring in odd-degree graphs cannot be solved in o(łog^* Δ) rounds, thereby providing a matching lower bound to their upper bound.

A Sharp Threshold Phenomenon for the Distributed Complexity of the Lovász Local Lemma

Sebastian Brandt
Yannic Maus
Jara Uitto

The Lová sz Local Lemma (LLL) says that, given a set of bad events that depend on the values of some random variables and where each event happens with probability at most p and depends on at most d other events, there is an assignment of the variables that avoids all bad events if the LLL criterion ep(d+1)<1 is satisfied. Nowadays, in the area of distributed graph algorithms it has also become a powerful framework for developing—mostly randomized—algorithms. A classic result by Moser and Tardos yields an O(łog^2 n) algorithm for the distributed Lová sz Local Lemma [JACM’10] if ep(d + 1) < 1 is satisfied. Given a stronger criterion, i.e., demanding a smaller error probability, it is conceivable that we can find better algorithms. Indeed, for example Chung, Pettie and Su [PODC’14] gave an O(łog_epd^2 n) algorithm under the epd^2 < 1 criterion. Going further, Ghaffari, Harris and Kuhn introduced an 2^O(\sqrtłog łog n ) time algorithm given d^8 p = O(1) [FOCS’18]. On the negative side, Brandt et al.\ and Chang et al.\ showed that we cannot go below Ømega(łog łog n) (randomized) [STOC’16] and Ømega(łog n) (deterministic) [FOCS’16], respectively, under the criterion płeq 2^-d . Furthermore, there is a lower bound of Ømega(łog^* n) that holds for any criterion. In this paper, we study the dependency of the distributed complexity of the LLL problem on the chosen LLL criterion. We show that for the fundamental case of each random variable of the considered LLL instance being associated with an edge of the input graph, that is, each random variable influences at most two events, a sharp threshold phenomenon occurs at p = 2^-d : we provide a simple deterministic (!) algorithm that matches the Ømega(łog^* n) lower bound in bounded degree graphs, if p < 2^-d , whereas for p \geq 2^-d , the Ømega(łog łog n) randomized and the Ømega(łog n) deterministic lower bounds hold. In many applications variables affect more than two events; our main contribution is to extend our algorithm to the case where random variables influence at most three different bad events. We show that, surprisingly, the sharp threshold occurs at the exact same spot, providing evidence for our conjecture that this phenomenon always occurs at p = 2^-d , independent of the number r of events that are affected by a variable. Almost all steps of the proof framework we provide for the case r=3 extend directly to the case of arbitrary r; consequently, our approach serves as a step towards characterizing the complexity of the LLL under different exponential criteria.

SESSION: Session 9

Reconfigurable Atomic Transaction Commit

Manuel Bravo
Alexey Gotsman

Modern data stores achieve scalability by partitioning data into shards and fault-tolerance by replicating each shard across several servers. A key component of such systems is a Transaction Certification Service (TCS), which atomically commits a transaction spanning multiple shards. Existing TCS protocols require 2f+1 crash-stop replicas per shard to tolerate f failures. In this paper we present atomic commit protocols that require only f+1 replicas and reconfigure the system upon failures using an external reconfiguration service. We furthermore rigorously prove that these protocols correctly implement a recently proposed TCS specification. We present protocols in two different models—the standard asynchronous message-passing model and a model with Remote Direct Memory Access (RDMA), which allows a machine to access the memory of another machine over the network without involving the latter’s CPU. Our protocols are inspired by a recent FARM system for RDMA-based transaction processing. Our work codifies the core ideas of FARM as distributed TCS protocols, rigorously proves them correct and highlights the trade-offs required by the use of RDMA.

The Impact of RDMA on Agreement

Marcos K. Aguilera
Naama Ben-David
Rachid Guerraoui
Virendra Marathe
Igor Zablotchi

Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n \geq 2f_P + 1 processes (where f_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.

Hyaline: Fast and Transparent Lock-Free Memory Reclamation

Ruslan Nikolaev
Binoy Ravindran

We present a new lock-free safe memory reclamation algorithm, Hyaline, which is fast, scalable, and transparent to the underlying data structures. Hyaline easily handles virtually unbounded number of threads that can be created and deleted dynamically, while retaining O(1) reclamation cost. We also extend Hyaline to avoid situations where stalled threads prevent others from reclaiming newly allocated objects, a common problem with epoch-based reclamation. Our evaluation reveals that Hyaline’s throughput is high — it steadily outperformed other reclamation schemes by >10% in one test and yielded even higher gains in oversubscribed scenarios.

Layering Data Structures over Skip Graphs for Increased NUMA Locality

Samuel Thomas
Hammurabi Mendes

We present a lock-free, linearizable, and NUMA-aware data structure that implements sets, maps, and priority queue abstract data types (ADTs), based on using thread-local, sequential maps that are used to “jump” to suitable positions in a lock-free, linearizable variant of a skip graph. Our skip graph is suitably constrained in height and subjected to a data partition scheme that reduces contention and increases NUMA locality. We developed an additional skip graph variant, which we call sparse skip graph, that causes our thread-local maps as well as our shared structure to become more sparse. Compared to using regular skip graphs, sparse skip graphs show increased performance in workloads dominated by “insert” or “remove” operations, and comparable performance in workloads dominated by “contains” operations.

SESSION: Session 10

Partially Replicated Causally Consistent Shared Memory: Lower Bounds and An Algorithm

Zhuolun Xiang
Nitin H. Vaidya

The focus of this paper is on causal consistency in a partially replicated distributed shared memory (DSM) system that provides the abstraction of shared read/write registers. Maintaining causal consistency in distributed shared memory systems has received significant attention in the past, mostly on full replication wherein each replica stores a copy of all the registers in the shared memory. To ensure causal consistency, all causally preceding updates must be performed before an update is performed at any given replica. Therefore, some mechanism for tracking causal dependencies is required, such as vector timestamps with the number of vector elements being equal to the number of replicas in the context of full replication. In this paper, we investigate causal consistency in partially replicated systems, wherein each replica may store only a subset of the shared registers. Building on the past work, this paper makes three key contributions:

present a necessary condition on the metadata (which we refer as a timestamp) that must be maintained by each replica to be able to track causality accurately. The necessary condition identifies a set of directed edges in a share graph that a replica’s timestamp must keep track of.
We present an algorithm for achieving causal consistency using a timestamp that matches the above necessary condition, thus showing that the condition is necessary and sufficient.
We define a measurement of timestamp space size and present a lower bound (in bits) on the size of the timestamps. The lower bound matches our algorithm in several special cases.

Vorpal: Vector Clock Ordering For Large Persistent Memory Systems

Kunal Korgaonkar
Joseph Izraelevitz
Jishen Zhao
Steven Swanson

In systems with non-volatile main memories (NVMMs), programmers must carefully control the order in which writes become persistent. Otherwise, what will remain in persistence after a crash may be unusable upon recovery. Prior art has already explored semantic models for specifying this persist order, but most enforcement algorithms for the order are not scalable to large server machines because they assume that the machine contains only one or two memory controllers. In this paper, we describe a collection of provably correct algorithms for enforcing the persist-order across writes, generated at many different cores, and persisted across numerous different memory controllers. Relative to existing solutions, our algorithms improve performance by 48% by reducing both traffic and serialization overheads.

On the Parallels between Paxos and Raft, and how to Port Optimizations

Zhaoguo Wang
Changgeng Zhao
Shuai Mu
Haibo Chen
Jinyang Li

In recent years, Raft has surpassed Paxos to become the more popular consensus protocol in the industry. While many researchers have observed the similarities between the two protocols, no one has shown how Raft and Paxos are formally related to each other. In this paper, we present a formal mapping between Raft and Paxos, and use this knowledge to port a certain class of optimizations from Paxos to Raft. In particular, our porting method can automatically generate an optimized protocol specification with guaranteed correctness. As case studies, we port and evaluate two optimizations, Mencius and Paxos Quorum Lease to Raft.

Linearizable State Machine Replication of State-Based CRDTs without Logs

Jan Skrzypczak
Florian Schintke
Thorsten Schütt

General solutions of state machine replication have to ensure that all replicas apply the same commands in the same order, even in the presence of failures. Such strict ordering incurs high synchronization costs due to the use of distributed consensus or a leader. This paper presents a protocol for linearizable state machine replication of conflict-free replicated data types (CRDTs) that neither requires consensus nor a leader. By leveraging the properties of state-based CRDTs—in particular the monotonic growth of a join semilattice—synchronization overhead is greatly reduced. In addition, updates just need a single round trip and modify the state ‘in-place’ without the need for a log. Furthermore, the message size overhead for coordination consists of a single counter per message. While reads in the presence of concurrent updates are not wait-free without a coordinator, we show that more than 97\,% of reads can be handled in one or two round trips under highly concurrent accesses. Our protocol achieves high throughput without auxiliary processes such as command log management or leader election. It is well suited for all practical scenarios that need linearizable access on CRDT data on a fine-granular scale.

On Mixing Eventual and Strong Consistency: Bayou Revisited

Maciej Kokociński
Tadeusz Kobus
Paweł T. Wojciechowski

In this paper we study the properties of eventually consistent distributed systems that feature arbitrarily complex semantics and mix eventual and strong consistency. These systems execute requests in a highly-available, weakly-consistent fashion, but also enable stronger guarantees through additional inter-replica synchronization mechanisms that require the ability to solve distributed consensus. We use the seminal Bayou system as a case study, and then generalize our findings to a whole class of systems. We show dubious and unintuitive behaviour exhibited by those systems and provide a theoretical framework for reasoning about their correctness. We also state an impossibility result that formally proves the inherent limitation of such systems, namely temporary operation reordering, which admits interim disagreement between replicas on the relative order in which the client requests were executed.

SESSION: Session 11

Massively Parallel Algorithms for Finding Well-Connected Components in Sparse Graphs

Sepehr Assadi
Xiaorui Sun
Omri Weinstein

Massively parallel computation (MPC) algorithms for graph problems have witnessed a resurgence of interest in recent years. Despite major progress for numerous graph problems however, the complexity of the sparse graph connectivity problem in this model has remained elusive: While classical logarithmic-round PRAM algorithms for finding connected components in any n-vertex graph have been known for more than three decades (and imply the same bounds for MPC model), no o(log n)-round MPC algorithms are known for this task with truly sublinear in n memory per machine (which is the only interesting regime for sparse graphs with O(n) edges). It is conjectured that an o(log n)-round algorithm for connectivity on general sparse graphs with n^{1-Ω (1)} per-machine memory may not exist, a conjecture that also forms the basis for multiple conditional hardness results on the round complexity of other problems in the MPC model.

We take an opportunistic approach towards the sparse graph connectivity problem by designing an algorithm with improved performance in terms of the connectivity structure of the input graph. Formally, we design an MPC algorithm that finds all connected components with spectral gap at least λ in a graph in O(log log n + log(1/λ)) MPC rounds and n^δ memory per machine for any constant δ ∈ (0,1). While this algorithm still requires Θ(log n) rounds in the worst-case, it achieves an exponential round reduction on “well-connected” components with λ ≥ 1/polylog(n) using only n^δ memory per machine and ł(n) total memory, and still operates in o(log n)l rounds even when λ = 1/n^o(1).

En-route to our main result, we design a new distributed data structure for performing independent random walks from all vertices simultaneously, as well as a new leader-election algorithm with exponentially faster round complexity on random graphs.

The Complexity of (Δ+1) Coloring in Congested Clique, Massively Parallel Computation, and Centralized Local Computation

Yi-Jun Chang
Manuela Fischer
Mohsen Ghaffari
Jara Uitto
Yufan Zheng

In this paper, we present new randomized algorithms that improve the complexity of the classic (Δ+1)-coloring problem, and its generalization (Δ+1)-list-coloring, in three well-studied models of distributed, parallel, and centralized computation: Distributed Congested Clique: We present an O(1)-round randomized algorithm for (Δ + 1)-list-coloring in the congested clique model of distributed computing. This settles the asymptotic complexity of this problem. It moreover improves upon the O(log* Δ)-round randomized algorithms of Parter and Su [DISC’18] and O((log log Δ)⋅ log* Δ)-round randomized algorithm of Parter [ICALP’18].

Massively Parallel Computation: We present a randomized (Δ + 1)-list-coloring algorithm with round complexity O(√ log log n ) in the Massively Parallel Computation (MPC) model with strongly sublinear memory per machine. This algorithm uses a memory of O(n^α) per machine, for any desirable constant α > 0, and a total memory of Õ (m), where m is the number of edges in the graph. Notably, this is the first coloring algorithm with sublogarithmic round complexity, in the sublinear memory regime of MPC. For the quasilinear memory regime of MPC, an O(1)-round algorithm was given very recently by Assadi et al. [SODA’19].

Centralized Local Computation: We show that (Δ + 1)-list-coloring can be solved by a randomized algorithm with query complexity Δ O(1) … O(log n), in the centralized local computation model. The previous state of the art for (Δ+1)-list-coloring in the centralized local computation model are based on simulation of known LOCAL algorithms. The deterministic O(√ Δ poly log Δ + log* n)-round LOCAL algorithm of Fraigniaud et al. [FOCS’16] can be implemented in the centralized local computation model with query complexity Δ^O(√ Δ poly log Δ) … O(log* n); the randomized O(log* Δ) + 2^^O(√ log log n)-round LOCAL algorithm of Chang et al. [STOC’18] can be implemented in the centralized local computation model with query complexity Δ^O(log* Δ) … O(log n).

Massively Parallel Computation of Matching and MIS in Sparse Graphs

Soheil Behnezhad
Sebastian Brandt
Mahsa Derakhshan
Manuela Fischer
MohammadTaghi Hajiaghayi
Richard M. Karp
Jara Uitto

The Massively Parallel Computation (MPC) model serves as a common abstraction of many modern large-scale parallel computation frameworks and has recently gained a lot of importance, especially in the context of classic graph problems. In this work, we mainly consider maximal matching and maximal independent set problems in the MPC model.

These problems are known to admit efficient MPC algorithms if the space available per machine is near-linear in the number n of nodes. This is not only often significantly more than what we can afford, but also allows for easy if not trivial solutions for sparse graphs—which are common in real-world large-scale graphs. We are, therefore, interested in the low-memory MPC model, where the space per machine is restricted to be strongly sublinear, that is, n^δ for any constant 0 < δ < 1.

We parametrize our algorithms by the arboricity λ of the input graph. Our key ingredient is a degree reduction technique that reduces these problems in graphs with arboricity λ to the corresponding problems in graphs with maximum degree poly(λ, log n) in O(log² log n) rounds, giving rise to O(√ log λ ⋅ log log λ + log ² log n)-round algorithms.

Our result is particularly interesting for graphs with poly log n arboricity as for such graphs, we get O(log ² log n)-round algorithms. This covers most natural families of sparse graphs and almost exponentially improves over previous algorithms that all required log ^Ω(1) n rounds in this regime of MPC.

Finally, our maximal matching algorithm can be employed to obtain a (1+ε)-approximate maximum cardinality matching, a (2+ε)-approximate maximum weighted matching, as well as a 2-approximate minimum vertex cover in essentially the same number of rounds.

Weighted Matchings via Unweighted Augmentations

Buddhima Gamlath
Sagar Kale
Slobodan Mitrovic
Ola Svensson

We design a generic method to reduce the task of finding weighted matchings to that of finding short augmenting paths in unweighted graphs. This method enables us to provide efficient implementations for approximating weighted matchings in the massively parallel computation (MPC) model and in the streaming model.

For the MPC and the multi-pass streaming model, we show that any algorithm computing a (1-δ)-approximate unweighted matching in bipartite graphs can be translated into an algorithm that computes a (1-(ε(δ))-approximate maximum weighted matching. Furthermore, this translation incurs only a constant factor (that depends on ε > 0) overhead in the complexity. Instantiating this with the current best MPC algorithm for unweighted matching yields a (1 – ε)-approximation algorithm for maximum weighted matching that uses O_ε(log log n) rounds, O(m/n) machines per round, and O(npoly(logn)) memory per machine. This improves upon the previous best approximation guarantee of (1/2-ε) for weighted graphs. In the context of single-pass streaming with random edge arrivals, our techniques yield a (1/2+c)-approximation algorithm thus breaking the natural barrier of 1/2.

SESSION: Session 12

Implementing Mediators with Asynchronous Cheap Talk

Ittai Abraham
Danny Dolev
Ivan Geffner
Joseph Y. Halpern

A mediator can help non-cooperative agents obtain an equilibrium that may otherwise not be possible. We study the ability of players to obtain the same equilibrium without a mediator, using only cheap talk, that is, nonbinding pre-play communication. Previous work has considered this problem in a synchronous setting. Here we consider the effect of asynchrony on the problem, and provide upper bounds for implementing mediators. Considering asynchronous environments introduces new subtleties, including exactly what solution concept is most appropriate and determining what move is played if the cheap talk goes on forever. Different results are obtained depending on whether the move after such “infinite play” is under the control of the players or part of the description of the game.

Distributed Minimum Degree Spanning Trees

Michael Dinitz
Magnus M. Halldorsson
Taisuke Izumi
Calvin Newport

The minimum degree spanning tree (MDST) problem requires the construction of a spanning tree T for graph G, such that the maximum degree of T is the smallest among all spanning trees of G. Let d be this MDST degree for a given graph. In this paper, we present a randomized distributed approximation algorithm for the MDST problem that constructs a spanning tree with maximum degree in O(d+log n ). With high probability in n, the algorithm runs in O((D + √n) log₄ n) rounds, in the broadcast-CONGEST model, where D is the graph diameter and n is the graph size. We then show how to derandomize this algorithm, obtaining the same asymptotic guarantees on degree and time complexity, but now requiring the standard CONGEST model. Although efficient approximation algorithms for the MDST problem have been known in the sequential setting since the 1990’s (finding an exact solution is NP-hard), our algorithms are the first efficient distributed solutions. We conclude by proving a lower bound that establishes that any randomized MDST algorithm that guarantees a maximum degree in ∼Ω (d) requires &Ø#8764;Ω (n^1/3) rounds, and any deterministic solution requires ∼Ω (n^1/2) rounds. These bounds proves our deterministic algorithm to be asymptotically optimal, and eliminates the possibility of significantly more efficient randomized solutions.

Improved Distributed Approximations for Minimum-Weight Two-Edge-Connected Spanning Subgraph

Michal Dory
Mohsen Ghaffari

The minimum-weight 2-edge-connected spanning subgraph (2-ECSS) problem is a natural generalization of the well-studied minimum-weight spanning tree (MST) problem, and it has received considerable attention in the area of network design. The latter problem asks for a minimum-weight subgraph with an edge connectivity of 1 between each pair of vertices while the former strengthens this edge-connectivity requirement to 2. Despite this resemblance, the 2-ECSS problem is considerably more complex than MST. While MST admits a linear-time centralized exact algorithm, 2-ECSS is NP-hard and the best known centralized approximation algorithm for it (that runs in polynomial time) gives a 2-approximation.

In this paper, we give a deterministic distributed algorithm with round complexity of Õ (D + √n) that computes a (9 + ε)-approximation of 2-ECSS, for any constant ε > 0. Up to logarithmic factors, this complexity matches the Ø (D + √ n) lower bound that can be derived from the technique of Das Sarma et al. [STOC’11], as shown by Censor-Hillel and Dory [OPODIS’17]. Our result is the first distributed constant approximation for 2-ECSS in the nearly optimal time and it improves on a recent randomized algorithm of Dory [PODC’18], which achieved an O(łog n)-approximation in Õ (D+√ ) rounds.

We also present an alternative algorithm for O(log n)-approximation, whose round complexity is linear in the low-congestion shortcut parameter of the network—following a framework introduced by Ghaffari and Haeupler [SODA’16]. This algorithm has round complexity Ö (D+√n) in worst-case networks but it provably runs much faster in many well-behaved graph families of interest. For instance, it runs in Õ (D) time in planar networks and those with bounded genus, bounded path-width or bounded tree-width.

Near-Additive Spanners In Low Polynomial Deterministic CONGEST Time

Michael Elkin
Shaked Matar

Given a pair of parameters α ≥ 1,β ≥ 0, a subgraph G’=(V,H) of an n-vertex unweighted undirected graph G=(V,E) is called an (α,β)-spanner if for every pair u,ν ∈ V of vertices, we have d_G’ (u,ν)≤ α d_G (u,α)+β. If β=0 the spanner is called a multiplicative α-spanner, and if α = 1+ε, for an arbitrarily small ε>0, the spanner is said to be near-additive.

Graph spanners [5,36], are a fundamental and extremely well-studied combinatorial construct, with a multitude of applications in distributed computing and in other areas. Near-additive spanners, introduced in [27], preserve large distances much more faithfully than the more traditional multiplicative spanners. Also, recent lower bounds [1] ruled out the existence of arbitrarily sparse purely additive spanners (i.e., spanners with α=1), and therefore showed that essentially near-additive spanners provide the best approximation of distances that one can hope for.

Numerous distributed algorithms, for constructing sparse near-additive spanners, were devised in [17,20,25,28,40]. In particular, there are now known efficient randomized algorithms in the CONGEST model that construct such spanners [25]., and also there are efficient deterministic algorithms in the LOCAL model [17]. However, the only known deterministic CONGEST-model algorithm for the problem [20] requires super-linear time in n. In this paper, we remedy the situation and devise an efficient deterministic CONGEST-model algorithm for constructing arbitrarily sparse near-additive spanners.

The running time of our algorithm is low polynomial, i.e., roughly O(β ⋅ n^ρ), where ρ > 0 is an arbitrarily small positive constant that affects the additive term β. In general, the parameters of our new algorithm and of the resulting spanner are at the same ballpark as the respective parameters of the state-of-the-art randomized algorithm due to [25].

A Trivial Yet Optimal Solution to Vertex Fault Tolerant Spanners

Greg Bodwin
Shyamal Patel

We give a short and easy upper bound on the worst-case size of fault tolerant spanners, which improves on all prior work and is fully optimal at least in the setting of vertex faults.

SESSION: Tutorials

From Classical to Blockchain Consensus: What Are the Exact Algorithms?

Yanhong A. Liu
Scott D. Stoller

This tutorial describes well-known algorithms for distributed consensus problems, from classical consensus to blockchain consensus, and discusses exact algorithms that are high-level as in pseudocode and directly executable at the same time. The tutorial consists of five parts:

Byzantine Fault Tolerant State Machine Replication in Any Programming Language

Ethan Buchman

State machine replication is a fundamental primitive in fault tolerant distributed computing, but few production tools exist to support the replication of arbitrary state machines. The tools that do exist, like Apache Zookeeper, CoreOS’s etcd, and Hashicrop’s Consul, include an implementation of a consensus algorithm (eg. ZAB or Raft) for replication, and a service-discovery oriented key-value store as the state machine. While these tools can tolerate crash failures, they cannot tolerate malicious or adversarial (“Byzantine”) faults.

We present Tendermint, a production-grade Byzantine Fault Tolerant State Machine Replication engine written in Go. Tendermint supports replication for state machines written in any language by using a socket protocol to communicate between the state machine and the replication engine. Tendermint is being used on the public internet today to secure upwards of 1 Billion USD in value, with deployments supporting hundreds of consensus nodes. In this workshop, we provide an overview of the Tendermint system and demonstrate how to build Byzantine Fault Tolerant applications in Go and Javascript. We will also introduce the Cosmos Hub, an advanced Proof-of-Stake cryptocurrency system built on Tendermint.

Central Control over Distributed Asynchronous Systems: A Tutorial on Software-Defined Networks and Consistent Network Updates

Klaus-Tycho Foerster

This tutorial will give an introduction to a topic that lies at the intersection of distributed computing and networking, and combines asynchronous distributed systems with central control, namely consistent updates in Software-Defined Networks (SDNs). We will give an overview on current models and algorithms, but also selected related topics, in particular those of potential interest to the PODC community, showcasing avenues for further research.

In more detail, SDNs have been an intensively studied topic in networking over the last years, but much of its focus has been on (logical) central control, abstracting away most of its underlying foundation, namely that a network is still a distributed asynchronous system at its core. Summarized in a simplified way, SDNs come with the promise that the network state can be optimized and updated from a global point of view. However, such a simplification becomes especially problematic when consistency guarantees have to maintained. In asynchronous distributed systems, it is not possible to simultaneously change the state of all nodes, such a naive approach will lead to an inconsistent mix of old and new states, introducing e.g. forwarding loops. Notwithstanding, most approaches tackle these issues from the viewpoints of the networking/systems communities, and we believe could henceforth greatly benefit from connections to and ideas from the PODC community.

Tutorial: Specifying, Implementing, and Verifying Algorithms for Persistent Memory

Diego Cepeda
Sakib Chowdhury
Wojciech Golab

High-density byte-addressable non-volatile memory became a reality earlier this year when Intel launched the long-awaited Optane persistent memory module. This tutorial is intended for researchers interested in using persistent memory to construct fault-tolerant data structures that can maintain state consistently across power outages and system crashes without relying on conventional secondary storage. A number of practical and theoretical topics will be covered including hardware purchasing considerations, operating system and programming language support for persistent memory, definitions of correctness properties for fault-tolerant data structures, techniques for implementing fault-tolerant concurrency control and memory management, as well as verification of correctness.