Head Rotation in Denoising Diffusion Models logo

Head Rotation in Denoising Diffusion Models

Denoising Diffusion Models (DDM) are emerging as the cutting-edge technology in the realm of deep generative modeling, challenging the dominance of Generative Adversarial Networks.

GitHub Link

The GitHub link is https://github.com/asperti/head-rotation

Introduce

This repository, "Head-Rotation," is linked to the article "Head Rotation in Denoising Diffusion Models." Collaboratively authored, the article addresses challenges in exploring and manipulating the latent space of Denoising Diffusion Models (DDM) for face rotation. The researchers employ an embedding technique for Denoising Diffusion Implicit Models (DDIM) to achieve significant manipulations of face rotation angles, up to ±30°. The method involves computing trajectories through linear regression in the latent space to represent rotations. The CelebA dataset is labeled based on illumination direction, enhancing the accuracy of image selection for the process. The study showcases the intricate relationship between illumination, pose, and rotation. Denoising Diffusion Models (DDM) are emerging as the cutting-edge technology in the realm of deep generative modeling, challenging the dominance of Generative Adversarial Networks.

Content

This is a companion repository to the article "Head Rotation in Denoising Diffusion Models", joint work with Gabriele Colasuonno and Antonio Guerra. In this research, our focus is specifically on face rotation, which is recognized as one of the most complex editing operations. By utilizing a recent embedding technique for Denoising Diffusion Implicit Models (DDIM), we have achieved remarkable manipulations covering a wide rotation angle of up to $pm 30^o$, while preserving the distinct characteristics of each individual. Our methodology involves computing trajectories that approximate clusters of latent representations from dataset samples with various yaw rotations through linear regression. These trajectories are obtained by analyzing subsets of data that share significant attributes with the source image. One of these critical attributes is the light provenance: as a byproduct of our research, we have labeled the CelebA dataset, categorizing images into three major groups based on the illumination direction: left, center, and right. For a fixed direction (left or right), the approach is schematically described in the following picture We prefer to compute centroids instead of directly fitting over all clusters for computational reasons. In the picture below, we summarise the outcome of our labeling and the complex interplay between illumination and orientation by showing the mean faces corresponding to different light sources and poses.

Alternatives & Similar Tools