Table of Contents
1. Motivation
We assume that there is a true data distribution , which is only accessible through that are sampled from . The goal of generative models is to find an approximation of :
A generative model is composed of its architecture and its parameter . The architecture reflects people’s thought on how looks like. Parameter determines remaining things. There are many applications of generative models including:
- generation of new samples
- abnormal detection, outlier detection
- denoising, missing value completion
2. Learning
As written in Section 1, the goal is to learn that approximates . There are two issues to achieve this goal.
- Issue 1: is unknown
- Issue 2: It is unclear how to measure the “distance” between and .
The first issue can be solved by approximating with an empirical distribution . The second issue can be solved by introducing KL divergence . As a result, the learning objective is to derive the following :
From the above equations, you can understand that minimizing the KL divergence between and is equivalent to maximum likelihood estimation. Thus, in common maximum likelihood estimation, we should keep in mind that we use KL divergence as a distance metric. Due to the assymmetry property of KL divergence, there may be undesirable effects on learned results. In addition, we use instead of . Therefore, maximum likelihood estimation does not necessarilly lead to generalization. For example, if the number of training samples $N$ is small, you can fall into over-fitting.
3. Reference
- Summer seminar “Deep Generative Models” provided by Matsuo Lab