N1H111SM's Miniverse

2019/06/12 Share

# Introduction

• provide the Football Engine, a highly-optimized game engine that simulates the game of football,
• propose the Football Benchmarks, a versatile set of benchmark tasks of varying difficulties that can be used to compare different algorithms,
• propose the Football Academy, a set of progressively
harder and diverse reinforcement learning scenarios,
• evaluate state-of-the-art algorithms on both the Football
Benchmarks and the Football Academy, providing an extensive set of reference results for future comparison, and
• sprovide a simple API to completely customize and define
new football reinforcement learning scenarios.

# Football Engine

## 内置AI对手

Moreover, by default, our non-active players are also con- trolled by another rule-based bot. In this case, the behav- ior is simple and corresponds to reasonable football actions and strategies, such as running towards the ball when we are not in possession, or move forward together with our active player. In particular, this type of behavior can be turned off for future research on cooperative multi-agents if desired.

## State & Observation

• Pixels： 1280$\times$720 RGB 图像
• Super Mini Map：SMM由四个96$\times$72的矩阵组成，编码了包括了主队、客队、足球以及active player的信息。矩阵是binary的形式，简单来说就是bitmap，表征该位置上是否有上述的物体。
• Floats：一个更加紧凑的representation，115维向量用于表征所有的比赛信息，包括players coordinates, ball possession and direction, active player, or game mode.

# Football Benchmarks

Similar to the Atari games in the Arcade Learning Environment, in these tasks, the agent has to interact with a fixed environment and maximize its episodic reward by sequentially choosing suitable actions based on observations of the environment.

## Algorithms

Football Benchmarks的游戏目标是对抗Engine提供的opponent bot取得全场比赛的胜利。同样的，这些benchmarks被分为easy medium 以及 hard三个level。文章采取了三个现阶段比较常用的算法来cover不同的研究场景。PPO用来模拟单机多进程的训练；IMPALA则采用了集群，500个actor的setting；以及Ape-X DQN。这几个算法未来几天有时间可以研究一下。

### Ape-X DQN

Ape-X DQN是一个高度scalable的DQN版本，和IMPALA相同的是，该算法也将learning和acting解耦，但是它采用了distributed replay buffer和Q-learning variant consisting of dueling network architectures & double Q-learning。很多超参和IMPALA设置得相同（为了更好比较）。

## Reward

First time our player steps into one region with the ball, the reward coming from that region and all previously unvisited further ones will be collected. In to- tal, the extra reward can be up to +1, the same as scoring a goal. To avoid penalizing an agent that would not go through all the checkpoints before scoring, any non-collected checkpoint reward is added to the scoring reward. Checkpoint rewards are only given once per episode.

• Empty Goal Close. 玩家在box中起始，面对空门将球打进。
• Empty Goal. 玩家在场地中间起始，面对空门将球打进。
• Run to Score. 玩家在场地中间起始带球，身后有5名对手追，面对空门将球打进.
• Run to Score with Keeper. 在Run to Score基础上加上Keeper.
• Pass and Shoot with Keeper. 带球者在远端，有人防守；另一玩家在center，无人防守面对门将，需将球打进。
• Run, Pass and Shoot with Keeper. 在Pass and Shoot基础上交换防守者的位置（不带球的玩家有人防守）。
• 3 versus 1 with Keeper. 前场三打一，有门将。
• Corner. 标准角球设置，除了允许开角球者进行带球。
• Easy Counter-Attack. 四打一防守，其余无关球员都向球跑动。
• Hard Counter-Attack. 四打二防守。
• 11 versus 11 with Lazy Opponents. 全场游戏，只是对手站着不动，只会拦截离自己距离较近的球。