💠 Compositional Learning Journal ClubJoin us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges

RIML Lab

💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

🌟 This Week's Presentation:

📌 Title:
A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization

🎙️ Presenter: Amir Kasaei

🧠 Abstract:
This work presents an in-depth analysis of the causal structure in the text encoder of text-to-image (T2I) diffusion models, highlighting its role in introducing information bias and loss. While prior research has mainly addressed these issues during the denoising stage, this study focuses on the underexplored contribution of text embeddings—particularly in multi-object generation scenarios. The authors investigate how text embeddings influence the final image output and why models often favor the first-mentioned object, leading to imbalanced representations. To mitigate this, they propose a training-free text embedding balance optimization method that improves information balance in Stable Diffusion by 125.42%. Additionally, a new automatic evaluation metric is introduced, offering a more accurate assessment of information loss with an 81% concordance rate with human evaluations. This metric better captures object presence and accuracy compared to existing measures like CLIP-based text-image similarity scores.

📄 Paper:
A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization

Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️

arXiv.org

A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in...

This paper analyzes the impact of causal manner in the text encoder of text-to-image (T2I) diffusion models, which can lead to information bias and loss. Previous works have focused on addressing...

www.tg-me.com/us/telegram/com.RIMLLab/211

1.2K viewsAmir Kasaei, edited Jun 7 at 19:09

tg-me.com/RIMLLab/211

Create: 2025-06-07
Last Update: 2025-06-13 22:08:25

BY RIML Lab

Share with your friend now:
tg-me.com/RIMLLab/211

telegram Telegram | DID YOU KNOW?

💠 Compositional Learning Journal ClubJoin us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges