Banner

TUTORIAL – HALF DAY SESSION
MIND MEETS VISION: HARNESSING LLMS AND VLMS FOR CROSS-MODAL OFFENSIVE MEME ANALYSIS

Abstract : Social media platforms, including Twitter, Facebook, and Instagram have revolutionized many aspects of 21st-century society by serving an effective and efficient means of information creation and dissemination. While offering numerous benefits, these platforms also possess the capacity to shape public opinion and beliefs globally. A growing concern is the proliferation of offensive content on social media, which fosters hatred, spreads rumors and offensive content among different communities, groups, or individuals. Through the use of strong profanity, derogatory terms, and dark humor, this material often targets individuals based on factors such as caste, color, religion, sexual orientation, ethnicity, gender identity, immigration status, nationality, or disability in order to shape opinions and spread offensive messages. Hence, the detection of harmful language has received a significant focus in the NLP field in recent years.

Memes, a form of multi-modal media, have surged in popularity on social media platforms, exemplifying the concept of “Amplification by Simplification”. Memes condense complex realworld issues into powerful messages that engage a vast audience. Initially created for humour, nowadays, memes are often used to spread offensiveness through dark humour. In literature, work exist in this field. But there is a noticeable gap in research focusing on offensive content detection in memes in low-resource languages like Hindi (one of the top-most Indian languages). Detecting such content is critical for maintaining a safe and respectful online environment. Therefore, this tutorial will focus on showcasing a multi-modal framework for offensive meme identification along with the classification of fine-grained offensive memes into implicit and explicit categories using the capability of AI and ML models.

As now-a-days, Generative AI models, including LLMs like GPT-4, Gemini, and BART, are transforming NLP by analyzing complex language patterns and contextual nuances. Traditional single-modal approaches focusing on either text or images struggle to interpret the nuanced information in memes. Multi-modal vision and language models (MVLMs) such as LLAVA, Cobra, Mamba, CLIP, and BLIP address this gap by combining visual and linguistic features. Inspired by these models, this tutorial also signifies the fusion of language models (LLMs), which emulate human-like understanding of text (the "mind"), with vision models (VLMs), which interpret visual data (the "vision"), to develop the multi-modal framework for offensive meme classification.

Outline :

  • Introduction to Social Media Analysis and Offensive Meme Classification
  • Fine-grained classification of Offensive Memes
  • Motivation and Challenges
  • Introduction to Foundational Models
  • Large Language Models and Visual Language Models
  • Existing Datasets and Evaluation Metrics
  • Existing Text-based Approaches
  • Existing Visual-based Approaches
  • Multimodal Offensive Mene Identification
  • Application Areas and Future Scope
  • Hands-on Session on developing Multimodal approaches

Duration : Half Day

Topics to be Covered : Basic concepts about offensive memes identification, its need and challenges, Generative AI, LLMs + Hand-on-session

Target Audience : We are targeting the audience of bachelor students, masters’ students and researchers. The tutorial will also be beneficial to those who want to do some research in the harmful content identification. The audience with basic machine learning and deep learning knowledge will be able to understand.

Instructor :

Dr. Naveen Saini is currently working with IIIT Allahabad. He did his PhD from IIT Patna. His research interests include Text Analytics, Social Media Analysis, Multimodal Information Processing, Artificial Intelligence, Machine Learning, Multi-objective Optimization, and Evolutionary Algorithms. He has also worked at the IIIT Lucknow as an Assistant Professor and University of Paul Sabatier (France) as a postdoctoral fellow. Apart from these, he was also associated with Endicott College of International Studies, Woosong University, South Korea as an Assistant Professor where he was involved in healthcare projects. He is in touch with IBM research India, Technical University of Create, Spain and University of Surrey, UK for a collaborative work on Indian languages and one outcome has been already published in ACL which is a top-tier NLP conference. He has made notable contributions with publications in IEEE Transactions on Computational Social Systems, IEEE Transactions on Affective Computing and ACM Transactions Multimedia Computing Communications and Applications.

Email: nsaini@iiita.ac.in

More information is available at https://sites.google.com/view/nsain

Ś