麻豆传媒AV

Bandit-Based Monte Carlo Optimization for Nearest Neighbors

Submitted by admin on Mon, 10/28/2024 - 01:24

The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits.

On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the notoriously difficult no-sensing adversarial multi-player multi-armed bandits (MP-MAB) problem from a new perspective. Instead of focusing on the hardness of multiple players, we introduce a new dimension of hardness, called attackability. All adversaries can be categorized based on the attackability and we introduce Adversary-Adaptive Collision-Communication (A2C2), a family of algorithms with forced-collision communication among players.

Quickest Detection of Moving Anomalies in Sensor Networks

Submitted by admin on Mon, 10/28/2024 - 01:24

The problem of sequentially detecting a moving anomaly is studied, in which the anomaly affects different parts of a sensor network over time. Each network sensor is characterized by a pre- and post-change distribution. Initially, the observations of each sensor are generated according to the corresponding pre-change distribution. After some unknown but deterministic time instant, a moving anomaly emerges, affecting different sets of sensors as time progresses.

Universal Active Learning via Conditional Mutual Information Minimization

Submitted by admin on Mon, 10/28/2024 - 01:24

Modern machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates which is very expensive in terms of time and cost. Active learning is an approach which uses feedback to only label the most informative data points and significantly reduce the labeling effort. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking.

Intelligence and Unambitiousness Using Algorithmic Information Theory

Submitted by admin on Mon, 10/28/2024 - 01:24

Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an agent might learn to solve arbitrary solvable problems, gives an agent a dangerous incentive: to gain arbitrary 鈥減ower鈥 in order to intervene in the provision of their own reward.

Evasive Active Hypothesis Testing

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider a situation in which a decision maker takes sequential and adaptive sensing actions to collect measurements and estimate an unknown parameter taking finitely many values, in the presence of an adversary who also collects measurements whenever a sensing action is taken. This situation can be viewed as an abstraction in which to analyze the mitigation of information leakage inherent to control actions in systems with feedback, such as cyber-physical systems.

Belief Propagation Decoding of Short Graph-Based Channel Codes via Reinforcement Learning

Submitted by admin on Mon, 10/28/2024 - 01:24

In this work, we consider the decoding of short sparse graph-based channel codes via reinforcement learning (RL). Specifically, we focus on low-density parity-check (LDPC) codes, which for example have been standardized in the context of 5G cellular communication systems due to their excellent error correcting performance. LDPC codes are typically decoded via belief propagation on the corresponding bipartite (Tanner) graph of the code via flooding, i.e., all check and variable nodes in the Tanner graph are updated at once.

Empirical Policy Evaluation With Supergraphs

Submitted by admin on Mon, 10/28/2024 - 01:24

We devise algorithms for the policy evaluation problem in reinforcement learning, assuming access to a simulator and certain side information called the supergraph. Our algorithms explore backward from high-cost states to find high-value ones, in contrast to approaches that work forward from all states. While several papers have demonstrated the utility of backward exploration empirically, we conduct rigorous analyses which show that our algorithms can reduce average-case sample complexity from $O(S \log S)$ to as low as $O(\log S)$ .

One for All and All for One: Distributed Learning of Fair Allocations With Multi-Player Bandits

Submitted by admin on Mon, 10/28/2024 - 01:24

Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, represented as an N脳M matrix. These utilities are unknown to the players. In each turn, players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm in that turn, all colliding players will receive zero utility due to the conflict. No communication between the players is possible.

Sequential (Quickest) Change Detection: Classical Results and New Directions

Submitted by admin on Mon, 10/28/2024 - 01:24

Online detection of changes in stochastic systems, referred to as sequential change detection or quickest change detection, is an important research topic in statistics, signal processing, and information theory, and has a wide range of applications. This survey starts with the basics of sequential change detection, and then moves on to generalizations and extensions of sequential change detection theory and methods.