N1H111SM's Miniverse

2020/06/14 Share

Materials

# Definition

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. Though supervised variants exist, the typical manifold learning problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications.

Definition (Geodesic Distance). A geodesic line is the shortest path between two points on a curved surface, like Earth (referring to the following figure).

# Methods

## Isomap

One of the earliest approaches to manifold learning is the Isomap algorithm, short for Isometric Mapping. Isomap seeks a lower-dimensional embedding which maintains geodesic distances between every two points.

### Estimating Geodestic Distance

We could estimate the geodestic distance by constructing an adjacency graph on which the shortest distance between two nodes is the estimation of their geodestic distance. We can set the adjacency matrix following the rule:

Isomap使用 MDS 计算映射后的坐标$y$，使得映射坐标下的欧氏距离与原来的测地线距离尽量相等.

## Locally Linear Embedding

Locally linear embedding (LLE) seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods. It can be thought of as a series of local Principal Component Analyses which are globally compared to find the best non-linear embedding.

“流形在局部可以近似等价于欧氏空间”是 LLE 分析方法的出发点。LLE认为一个中心数据点 $x_i$ 能够被处于其小邻域内的点$\{x_j\}_{j \sim i}$线性重构，重构的权重可能是我们想要在表征空间上保留的geometry attributes。