Review of “NeRF: Neural Radiance Fields”
Key takeaways
- They propose a model that generates images of a single object from different angles using rendering loss.
- They propose the use of positional encoding and hierarchical volume sampling techniques to enhance the model's performance.
Model Structure

A single neural network with fully connected layers predicts color (RGB) and volume density (σ) based on viewing points and rendering rays. The model first predicts σ and a feature vector based on 3D coordinates. It then concatenates this vector with the viewing direction of the camera ray to obtain the view-dependent RGB color.
Rendering loss
After obtaining the (RGBσ) values, we combine them into a pixel using the following equation:

The equation can be divided into three parts:
- The first part, T, represents the volume density before.
- The expression (1 - exp(-σδ)) represents the current volume density.
- The variable c represents the color of the current point.
In simpler terms, this equation summarizes the color based on volume density (σ). To measure the difference between the predicted and true image, the researchers utilize the L2 loss.
Positional encoding
Positional encoding is commonly used when models need to consider multiple input values. The technique involves mapping these values into a higher-dimensional space using sine and cosine waves, rather than treating them as individual values.
Hierarchical volume sampling
Hierarchical volume sampling is used to sample points on the render ray. It employs two networks called "coarse" and "fine." Initially, Nc locations are sampled, similar to previous methods. Then, an inverse transform sampling (Nf) is performed based on the output distribution of the coarse network. The resulting samples are combined to generate the output. The objective is to prioritize samples containing valuable information, with the second network specifically emphasizing samples considered important by the coarse network.