| CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation Yifeng Xu, Zhenliang He, Shiguang Shan, Xilin Chen ⮟ Abstract ⮝ Abstract | Paper | Project | ⮟ Bib ⮝ Bib |
Recently, large-scale diffusion models have made impressive progress in text-to-image (T2I) generation. To further equip these T2I models with fine-grained spatial control, approaches like ControlNet introduce an extra network that learns to follow a condition image. However, for every single condition type, ControlNet requires independent training on millions of data pairs with hundreds of GPU hours, which is quite expensive and makes it challenging for ordinary users to explore and develop new types of conditions. To address this problem, we propose the CtrLoRA framework, which trains a Base ControlNet to learn the common knowledge of image-to-image generation from multiple base conditions, along with condition-specific LoRAs to capture distinct characteristics of each condition. Utilizing our pretrained Base ControlNet, users can easily adapt it to new conditions, requiring as few as 1,000 data pairs and less than one hour of single-GPU training to obtain satisfactory results in most scenarios. Moreover, our CtrLoRA reduces the learnable parameters by 90% compared to ControlNet, significantly lowering the threshold to distribute and deploy the model weights. Extensive experiments on various types of conditions demonstrate the efficiency and effectiveness of our method. |
@article{xu2024ctrlora, title={CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation}, author={Xu, Yifeng and He, Zhenliang and Shan, Shiguang and Chen, Xilin}, journal={arXiv preprint arXiv:2410.09400}, year={2024} } |
| EigenGAN: Layer-Wise Eigen-Learning for GANs Zhenliang He, Meina Kan, Shiguang Shan ⮟ Abstract ⮝ Abstract | Paper | Proof | Video | TensorFlow (Official) | PyTorch | ⮟ Bib ⮝ Bib |
Recent studies on Generative Adversarial Network (GAN) reveal that different layers of a generative CNN hold different semantics of the synthesized images. However, few GAN models have explicit dimensions to control the semantic attributes represented in a specific layer. This paper proposes EigenGAN which is able to unsupervisedly mine interpretable and controllable dimensions from different generator layers. Specifically, EigenGAN embeds one linear subspace with orthogonal basis into each generator layer. Via generative adversarial training to learn a target distribution, these layer-wise subspaces automatically discover a set of eigen-dimensions at each layer corresponding to a set of semantic attributes or interpretable variations. By traversing the coefficient of a specific eigen-dimension, the generator can produce samples with continuous changes corresponding to a specific semantic attribute. Taking the human face for example, EigenGAN can discover controllable dimensions for high-level concepts such as pose and gender in the subspace of deep layers, as well as low-level concepts such as hue and color in the subspace of shallow layers. Moreover, in the linear case, we theoretically prove that our algorithm derives the principal components as PCA does. |
@inproceedings{he2021eigengan, title={EigenGAN: Layer-Wise Eigen-Learning for GANs}, author={He, Zhenliang and Kan, Meina and Shan, Shiguang}, booktitle={International Conference on Computer Vision}, year={2021} } |
| AttGAN: Facial Attribute Editing by Only Changing What You Want Zhenliang He, Wangmeng Zuo, Meina Kan, Shiguang Shan*, Xilin Chen ⮟ Abstract ⮝ Abstract | Paper | TensorFlow (Official) | PyTorch | PaddlePaddle | ⮟ Bib ⮝ Bib |
Facial attribute editing aims to manipulate single or multiple attributes on a given face image, i.e., to generate a new face image with desired attributes while preserving other details. Recently, generative adversarial net (GAN) and encoder-decoder architecture are usually incorporated to handle this task with promising results. Based on the encoder-decoder architecture, facial attribute editing is achieved by decoding the latent representation of a given face conditioned on the desired attributes. Some existing methods attempt to establish an attribute-independent latent representation for further attribute editing. However, such attribute-independent constraint on the latent representation is excessive because it restricts the capacity of the latent representation and may result in information loss, leading to over-smooth or distorted generation. Instead of imposing constraints on the latent representation, in this work we propose to apply an attribute classification constraint to the generated image to just guarantee the correct change of desired attributes, i.e., to "change what you want". Meanwhile, the reconstruction learning is introduced to preserve attribute-excluding details, in other words, to "only change what you want". Besides, the adversarial learning is employed for visually realistic editing. These three components cooperate with each other forming an effective framework for high quality facial attribute editing, referred as AttGAN. Furthermore, the proposed method is extended for attribute style manipulation in an unsupervised manner. Experiments on two wild datasets, CelebA and LFW, show that the proposed method outperforms the state-of-the-arts on realistic attribute editing with other facial details well preserved. |
@article{he2019attgan, title={AttGAN: Facial Attribute Editing by Only Changing What You Want}, author={He, Zhenliang and Zuo, Wangmeng and Kan, Meina and Shan, Shiguang and Chen, Xilin}, journal={IEEE Transactions on Image Processing}, volume={28}, number={11}, pages={5464--5478}, year={2019} } |
| S2GAN: Share Aging Factors Across Ages and Share Aging Trends Among Individuals Zhenliang He, Meina Kan, Shiguang Shan, Xilin Chen ⮟ Abstract ⮝ Abstract | Paper | Video | ⮟ Bib ⮝ Bib |
Generally, we human follow the roughly common aging trends, e.g., the wrinkles only tend to be more, longer or deeper. However, the aging process of each individual is more dominated by his/her personalized factors, including the invariant factors such as identity and mole, as well as the personalized aging patterns, e.g., one may age by graying hair while another may age by receding hairline. Following this biological principle, in this work, we propose an effective and efficient method to simulate natural aging. Specifically, a personalized aging basis is established for each individual to depict his/her own aging factors. Then different ages share this basis, being derived through age-specific transforms. The age-specific transforms represent the aging trends which are shared among all individuals. The proposed method can achieve continuous face aging with favorable aging accuracy, identity preservation, and fidelity. Furthermore, befitted from the effective design, a unique model is capable of all ages and the prediction time is significantly saved. |
@inproceedings{he2019s2gan, title={S2GAN: Share Aging Factors Across Ages and Share Aging Trends Among Individuals}, author={He, Zhenliang and Kan, Meina and Shan, Shiguang and Chen, Xilin}, booktitle={International Conference on Computer Vision}, year={2019} } |
| PA-GAN: Progressive Attention Generative Adversarial Network for Facial Attribute Editing Zhenliang He, Meina Kan, Jichao Zhang, Shiguang Shan ⮟ Abstract ⮝ Abstract | Paper | TensorFlow (Official) | ⮟ Bib ⮝ Bib |
Facial attribute editing aims to manipulate attributes on the human face, e.g., adding a mustache or changing the hair color. Existing approaches suffer from a serious compromise between correct attribute generation and preservation of the other information such as identity and background, because they edit the attributes in the imprecise area. To resolve this dilemma, we propose a progressive attention GAN (PA-GAN) for facial attribute editing. In our approach, the editing is progressively conducted from high to low feature level while being constrained inside a proper attribute area by an attention mask at each level. This manner prevents undesired modifications to the irrelevant regions from the beginning, and then the network can focus more on correctly generating the attributes within a proper boundary at each level. As a result, our approach achieves correct attribute editing with irrelevant details much better preserved compared with the state-of-the-arts. |
@article{he2020pagan, title={PA-GAN: Progressive Attention Generative Adversarial Network for Facial Attribute Editing}, author={He, Zhenliang and Kan, Meina and Zhang, Jichao and Shan, Shiguang}, journal={arXiv:2007.05892}, year={2020} } |
| Image Style Disentangling for Instance-Level Facial Attribute Transfer Xuyang Guo, Meina Kan*, Zhenliang He, Xingguang Song, Shiguang Shan ⮟ Abstract ⮝ Abstract | Paper | PyTorch (Official) | ⮟ Bib ⮝ Bib | Instance-level facial attribute transfer aims at transferring an attribute including its style from a source face to a target one. Existing studies have limitations on fidelity or correctness. To address this problem, we propose a weakly supervised style disentangling method embedded in Generative Adversarial Network (GAN) for accurate instance-level attribute transfer, using only binary attribute annotations. In our method, the whole attributes transfer process is designed as two steps for easier transfer, which first removes the original attribute or transfers it to a neutral state and then adds the attributes style disentangled from a source face. Moreover, a style disentangling module is proposed to extract the attribute style of an image used in the adding step. Our method aims for accurate attribute style transfer. However, it is also capable of semantic attribute editing as a special case, which is not achievable with existing instance-level attribute transfer methods. Comprehensive experiments on CelebA Dataset show that our method can transfer the style more precisely than existing methods, with an improvement of 39% in user study, 16.5% in accuracy, and about 3.3 in FID. |
@article{guo2021image, title={Image Style Disentangling for Instance-level Facial Attribute Transfer}, author={Guo, Xuyang and Kan, Meina and He, Zhenliang and Song, Xingguang and Shan, Shiguang}, journal={Computer Vision and Image Understanding}, volume={207}, pages={103205}, year={2021} } |