2023 North American School of Information Theory

University of Pennsylvania, Philadelphia, PA.


NASIT 2023 Schedule


The talks and poster sessions will happen at Wu and Chen Auditorium, Levine Hall, 3330 Walnut Street, Philadelphia, PA 19104. 


Erdal Arıkan

Polar Coding: Theory and Practice [Slides][Handout Slides]

Abstract. Polar coding is a source/channel coding technique that is capable of achieving information-theoretic limits in a variety of scenarios. The main aim of this tutorial is to establish a solid understanding of polar coding, starting from first principles. A second aim is to inspire new research directions in channel coding by taking a look at the practice of channel coding from a historical perspective. This will include a look at channel coding in current wireless standards (4G/5G), and a discussion of 6G channel coding requirements.

Collaborative Machine Learning: From Theory to Practice [Slides]

Abstract. Collaborative and federated machine learning has drawn much attention as it enables the training of ML models on decentralized data that can’t be pooled together due to various reasons, such as privacy, regulations, bandwidth, or cloud costs. In the first part of this tutorial, I will discuss four research directions that have been the focus of the research community in recent years: (1) challenges of FL with non i.i.d. data distribution and convergence guarantees; (2) privacy and security guarantees for FL; (3) FL over resource-constrained edge nodes; and (4) trustworthiness of FL systems. In the second part, I will provide an overview of FedML (https://www.fedml.ai), which provides an open-source library and an MLOps platform for real-world deployment of collaborative/federated machine learning. Using this platform, we will launch a real-world collaborative training scenario with all participants. 

Privacy in Learning and Analytics [Slides: Part 1] [Slides: Part 2]

Abstract. Differential privacy has emerged as a powerful method to provide privacy guarantees on individuals’ personal data and has been recently deployed by major technology organizations for privacy-preserving federated learning and analytics as well data collection from peripheral devices. In the first part of this lecture, we will discuss how to optimally privatize and learn from data by investigating several canonical models for federated learning and analytics such as mean and histogram estimation. We will use tools from information theory to establish the optimal trade-offs between privacy-utility and communication bandwidth. In the second part, we will discuss the notion of information constrained optimal transport and show that it enables training generative models from privatized data.  

Approximate Message Passing Algorithms for High-dimensional Statistical Inference

Abstract. Approximate Message Passing (AMP) refers to a class of iterative algorithms that have been successfully applied to a number of high-dimensional statistical estimation problems like linear regression, generalized linear models, and low-rank matrix estimation, and are practical and useful in a variety of engineering and computer science applications such as imaging, communications, and deep learning. AMP algorithms have two features that make them particularly attractive: they can easily be tailored to take advantage of prior information on the structure of the signal, such as sparsity, and under suitable assumptions on a design matrix, AMP theory provides precise asymptotic guarantees for statistical procedures in the high-dimensional regime. In this talk, I will introduce and motivate the main ideas of AMP from a mathematical perspective, by viewing AMP as an abstract random matrix iteration, and then I will demonstrate how to use AMP theory to establish precise asymptotic guarantees for a variety of statistical procedures in the high-dimensional regime, under suitable assumptions on the data.

Distortion, realism, and learned compression [Slides]


Abstract. Future generations of video codecs will be able to produce much more realistic reconstructions at low bitrates thanks to advances in learned compression and our theoretical understanding of realism-distortion trade-offs. In this lecture we will take a look at approaches to achieving lossy compression with realism constraints. In particular, we will review Blau & Michaeli's rate-distortion-perception trade-off as well as some of the available results and tools for traversing the trade-off. This includes a high-level summary of VAEs, GANs, and diffusion generative models. Finally, we will see how channel simulation and shared randomness can help in our quest to achieving realistic reconstructions at low bitrates.

Information measures and their applications in statistical learning [Notes]

Abstract. Mutual information and Kullback-Leibler divergence classically play a central role in information theory. However, there are other (non-Shannon) information measures that arise in many modern problems of machine learning. In this tutorial we define and discuss basic properties of f-divergences, e.g., Hellinger and chi-square. After introducing ideas of comparing f-divergences, we show a new result in density estimation of Gaussian mixtures. Next, we demonstrate that noisy channels contract information measures (strong data processing) and show how these lead to sharp results in combinatorial statistics (stochastic block model, broadcasting on trees, random colorings). Overall, we demonstrate that selecting a suitable problem-dependent information measure leads to shorter proofs and often sharper results.

On the performance of ranking recovery algorithms with privacy consideration [Slides]

Abstract. Today, ranking algorithms are of fundamental importance and are used in a wide variety of applications, such as recommender systems, search engines, biomedical systems, and communication systems. Broadly speaking, the goal of a ranking algorithm is to sort a dataset – which, in the current Big Data era, is usually massive in size and complexity – so that users are provided with accurate and relevant results. For instance, a recommender system may suggest a new item to buy to a user based on their interests and previous purchases. Although modern ranking algorithms promise efficient means of performing large-scale data processing, there are numerous privacy considerations that must not be overlooked. For instance, the dataset might contain confidential data, such as clinical/genomic health and banking records, or perhaps a user would not like to disclose their previous purchases to a recommender system.

In this talk, we consider the private ranking recovery problem, which consists of recovering the ranking/permutation of an input data vector from a noisy version of it. We aim to establish fundamental trade-offs between the performance of the estimation task, measured in terms of probability of error, and the level of privacy that can be guaranteed when the noise mechanism consists of adding artificial noise. Towards this end, in the first part of the talk we show the optimality of a low-complexity decision rule (referred to as linear decoder) for the estimation task, under several noise distributions widely used in the privacy literature (e.g., Gaussian, Laplace, and generalized normal model). We derive the corresponding expression for the probability of error, and we analyze its rate of convergence in the low-noise and high-noise regimes as well as in the high-dimensional data regime. Then, in the second part of the talk we quantify the guaranteed level of privacy using the (α,ε)-Rényi differential privacy metric. Finally, we put together the results to characterize trade-offs between privacy and probability of error.

Gromov-Wasserstein Distances: Statistical and Computational Advancements via a Duality Theory [Slides]

Abstract. The Gromov-Wasserstein (GW) distance quantifies dissimilarity between metric measure spaces and provides a natural correspondence between them. As such, it serves as a figure of merit for applications involving alignment of heterogeneous datasets, including object matching, multi-modal analysis of biological data, and matching language models. While various computational methods for the GW distance have been developed, formal guarantees for such approaches as well as foundational questions concerning a duality theory and statistical estimation rates remained open. This work closes these gaps for the GW distance with quadratic cost over Euclidean spaces of different dimensions. We focus on the entropically regularized GW (EGW) distance, although similar results apply for the standard (unregularized) version. We derive a novel dual formulation of the EGW problem by representing it as an infimum of certain entropic optimal transportation problems. The dual form enables deriving, for the first time, empirical convergence rates (which are shown to be minimax optimal), sufficient conditions for convexity, and efficient computational methods with global convergence guarantees.

Subspace coding for scalable and resilient wireless networks 

Abstract. In today's information world, communication ecosystems have become increasingly distributed, heterogeneous, and complex, while the demands for extreme scaling of next-generation wireless systems, in terms of data rate, latency, and power efficiency, continue to grow at an unprecedented scale. This calls for a wide range of innovations in connectivity paradigms and, in particular, in channel coding, being one of the fundamental building blocks of any communication system. In this talk, we deviate from the conventional block coding framework and discuss an alternative paradigm, namely, subspace coding. In the first part of this talk, we consider subspace codes over finite fields that can be utilized to guarantee reliability for packet transmission in randomized network coding settings. We review Koetter-Kschischang code construction and show methods for list-decoding subspace codes and extended Koetter-Kschischang codes that lead to significant improvement in their error-correction performance. In the second part of this talk, we establish a framework for studying analog subspace coding (subspace coding over real/complex numbers) for non-coherent wireless networks, leading to solutions that are scalable with network size while guaranteeing resilience against node failures, noise, and rank deficiency of wireless networks.

Causal Inference in the Presence of Network Interference

Abstract: Randomized experiments are widely used to estimate causal effects of proposed “treatments” in domains spanning the physical and biological sciences, social sciences, engineering, medicine and health, as well as in public service domains and the technology industry. However, classical approaches to experimental design rely on critical independence assumptions that are violated when the outcome of an individual A may be affected by the treatment of another individual B, referred to as network interference. In this tutorial, we will survey the challenges of causal inference under network interference and the different approaches proposed in the literature to account for network interference. We will also present a new hierarchy of models and estimators that enable statistically efficient and computationally simple solutions with theoretical guarantees in settings with and without knowledge of the network.

Mohammad Vahid Jamali

Learning Error-Correction Codes: Product Autoencoder and Rate-Matched TurboAE

Abstract: While decades of theoretical research have led to the invention of several classes of error-correction codes, the design of such codes is an extremely challenging task, mostly driven by human ingenuity. In this talk, we demonstrate that such designs can be effectively automated and accelerated via tools from machine learning (ML), thus enabling ML-driven classes of error-correction codes with promising performance gains compared to classical designs. To this end, we first introduce the Product Autoencoder (ProductAE) -- a computationally-efficient family of deep learning driven (encoder, decoder) pairs -- aimed at enabling the training of relatively large codes. Next, we show how to design low-complexity channel-adaptive autoencoders accommodating multiple code rates and lengths with single encoding and decoding hardware.