N1H111SM's Miniverse

Visual Exploration

字数统计: 1.1k阅读时长: 5 min
2020/02/28 Share

Materials

Motivation

本文提出了Embodied Visual Exploration这一概念:how might a robot equipped with a camera scope out a new environment? 关于Visual Exploration,有三个问题需要解决:

  • What does it mean for an agent to explore its environment well?
  • Which methods work well, and under which assumptions and environmental settings?
  • Where do current approaches fall short, and where might future work seek to improve?

首先,CV中的问题分为三个层次:第一层是模型被动地学习人为收集和标注好的数据;第二层是embodied active perception,学习task-specific controls;第三层是embodied visual exploration, the goal is inherently more open-ended and task-agnostic: how does an agent learn to move around in an environment to gather information that will be useful for a variety of tasks that it may have to perform in the future?

Embodied Visual Exploration的算法比较难以横向进行比较,因为不同的工作之间的重心不同,因此测试的evaluation metric也选择不同:overcoming sparse rewards, pixelwise reconstruction of environments, area covered in the environment, object interactions,information gathering for solving downstream tasks such as navigation/recognition/pose estimation. 因此这篇工作提出了一个unified view of exploration algorithms for visually rich 3D environments, and a common evaluation framework to understand their strengths andweaknesses.

Problem Setting

首先Embodied Visual Exploration定义为:agent在环境中进行一定轮数的observe-action-update循环,其中action的选择目标是最大化information gain。本质上就是一个Partially Observable Markov Decision Process (POMDP)。

POMDP

A discrete-time POMDP models the relationship between an agent and its environment. Formally, a POMDP is a 7-tuple $(S,A,T,R,\Omega ,O,\gamma )$:

  • $S$ is a set of states,
  • $A$ is a set of actions,
  • $T$ is a set of conditional transition probabilities between states,
  • $R:S\times A\to \mathbb {R} $ is the reward function,
  • $\Omega$ is a set of observations,
  • $O$ is a set of conditional observation probabilities, and
  • $\gamma \in [0,1]$ is the discount factor.

POMDP和MDP唯一不同的地方在于,PDMDP多了agent接受到的observation. 对于当前世界的state $s\in S$,agent采取了action $a\in A$,接着世界的state进入到了$s^\prime$,agent接受到的observation 遵从分布$O(o|s^\prime ,a)$.

Exploration Paradigms

Curiosity

In the curiosity paradigm, the agent is encouraged to visit states where its predictive model of the environment is uncertain. Dynamics-based formulation of curiosity指的是通过学习一个forward-dynamics model $\mathcal F$,模型每一次选择和当前state最大差异的下一个state。也就是该模型预测下一个state会是怎样的表示: $\hat{\boldsymbol{s}}_{t+1}=\mathcal{F}\left(\boldsymbol{s}_{t}, a_{t}\right)$ ,通过定义reward function $R$即可:

这个forward-dynamics model是通过online training的方式进行学习的,只需要在两个state之间最小化$\left|F\left(s_{t}, a_{t}\right)-s_{t+1}\right|_{2}^{2}$.

Novelty

Novelty reward直接硬记录到达每一个state的次数,然后让reward function和该次数成负相关即可。其中需要对3D环境的平面进行网格化建模,含义是是同一个地点不要来很多次。

Coverage

Coverage则认为Novelty的判断过于武断,因为在3D环境中不同位置的信息量是不一样的,这和该位置的周围structure相关。Whereas novelty encourages explicitly visiting all locations, coverage encourages observing all of the environment. 换句话说,到的地方越多不等价于observe到的信息越大。

The coverage reward consists of the increment in some observed quantity of interest:

其中$I_t$指的是在时间步$t$是观测到interesting object的数目。可供选择的”things”可以为aera/object/landmark/random view. 其中random view指的是我们可以随机在这个环境中指定viewpoint作为奖励,如果agent看到了这个viewpoint,那么就可以获得相应的reward. This method is similar to the “goal agnostic” baseline.

Reconstruction

Reconstruction-based methods use the objective of active observation completion to learn exploration policies. The reconstruction reward scores the quality of the predicted outputs:

其中$V(\mathcal P)$是camera view在pose $\mathcal P$处的true query view,$\hat V_t(\mathcal P)$是agent在时间步$t$时的view reconstructions,$d$ 是定义在view上的distance function. Whereas curiosity rewards views that are individually surprising, reconstruction rewards views that bolster the agent’s correct hallucination of all other views.

Evaluation Framework

第一种Evaluation Metric比较简单,可以计算Model在Exploration期间访问了多少”interesting things”,其中如上文所述,interesting things可以是area, objects, and landmarks。

第二种Evaluation Metric测试Exploration在Downstream task transfer上的性能。最近被拿来广泛测试的主要downstream task包括了以下三种:

  • PointNav: how to quickly navigate from point A to point B?
  • View localization: where was this photo taken?
  • Reconstruction: what can I expect to see at point B?
CATALOG
  1. 1. Motivation
  2. 2. Problem Setting
    1. 2.1. POMDP
  3. 3. Exploration Paradigms
    1. 3.1. Curiosity
    2. 3.2. Novelty
    3. 3.3. Coverage
    4. 3.4. Reconstruction
  4. 4. Evaluation Framework