tg-me.com/RIMLLab/211
Last Update:
💠 Compositional Learning Journal Club
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📌 Title:
A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization
🎙️ Presenter: Amir Kasaei
🧠 Abstract:
This work presents an in-depth analysis of the causal structure in the text encoder of text-to-image (T2I) diffusion models, highlighting its role in introducing information bias and loss. While prior research has mainly addressed these issues during the denoising stage, this study focuses on the underexplored contribution of text embeddings—particularly in multi-object generation scenarios. The authors investigate how text embeddings influence the final image output and why models often favor the first-mentioned object, leading to imbalanced representations. To mitigate this, they propose a training-free text embedding balance optimization method that improves information balance in Stable Diffusion by 125.42%. Additionally, a new automatic evaluation metric is introduced, offering a more accurate assessment of information loss with an 81% concordance rate with human evaluations. This metric better captures object presence and accuracy compared to existing measures like CLIP-based text-image similarity scores.
📄 Paper:
A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization
Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
BY RIML Lab

Share with your friend now:
tg-me.com/RIMLLab/211