Telegram Group & Telegram Channel
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

🔸 Presenter: Dr Rohban

🌀 Abstract:
This innovative framework addresses the limitations of current image generation models in handling intricate text prompts and ensuring reliability through verification and self-correction mechanisms. Coordinated by a multimodal large language model (MLLM) agent, GenArtist integrates a diverse library of tools, enabling seamless task decomposition, step-by-step execution, and systematic self-correction. With its tree-structured planning and advanced use of position-related inputs, GenArtist achieves state-of-the-art performance, outperforming models like SDXL and DALL-E 3. This session will delve into the system’s architecture and its groundbreaking potential for advancing image generation and editing tasks.


📄 Papers: GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing


Session Details:
- 📅 Date: Wednesday
- 🕒 Time: 3:30 - 4:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️



tg-me.com/RIMLLab/147
Create:
Last Update:

💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

🔸 Presenter: Dr Rohban

🌀 Abstract:
This innovative framework addresses the limitations of current image generation models in handling intricate text prompts and ensuring reliability through verification and self-correction mechanisms. Coordinated by a multimodal large language model (MLLM) agent, GenArtist integrates a diverse library of tools, enabling seamless task decomposition, step-by-step execution, and systematic self-correction. With its tree-structured planning and advanced use of position-related inputs, GenArtist achieves state-of-the-art performance, outperforming models like SDXL and DALL-E 3. This session will delve into the system’s architecture and its groundbreaking potential for advancing image generation and editing tasks.


📄 Papers: GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing


Session Details:
- 📅 Date: Wednesday
- 🕒 Time: 3:30 - 4:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️

BY RIML Lab




Share with your friend now:
tg-me.com/RIMLLab/147

View MORE
Open in Telegram


telegram Telegram | DID YOU KNOW?

Date: |

A project of our size needs at least a few hundred million dollars per year to keep going,” Mr. Durov wrote in his public channel on Telegram late last year. “While doing that, we will remain independent and stay true to our values, redefining how a tech company should operate.

NEWS: Telegram supports Facetime video calls NOW!

Secure video calling is in high demand. As an alternative to Zoom, many people are using end-to-end encrypted apps such as WhatsApp, FaceTime or Signal to speak to friends and family face-to-face since coronavirus lockdowns started to take place across the world. There’s another option—secure communications app Telegram just added video calling to its feature set, available on both iOS and Android. The new feature is also super secure—like Signal and WhatsApp and unlike Zoom (yet), video calls will be end-to-end encrypted.

telegram from br


Telegram RIML Lab
FROM USA