Publications
Confereneces
“From Sights to Insights:Towards Summarization of Multimodal Clinical Documents”, Akash Ghosh, Mohit Tomar, Abhisek Tiwari, Sriparna Saha, Jatin Salve, Setu Sinha, 62nd Annual Meeting of the Association for Computational Linguistics(ACL 2024)
[Abstract]
The advancement of Artificial Intelligence is pivotal in reshaping healthcare, enhancing diagnostic precision, and facilitating personalized treatment strategies. One major challenge for healthcare professionals is quickly navigating through long clinical documents to provide timely and effective solutions. Doctors often struggle to draw quick conclusions from these extensive documents. To address this issue and save time for healthcare professionals, an effective summarization model is essential. Most current models assume the data is only text-based. However, patients often include images of their medical conditions in clinical documents. To effectively summarize these multimodal documents, we introduce EDI-Summ, an innovative Image-Guided Encoder-Decoder Model. This model uses modality-aware contextual attention on the encoder and an image cross-attention mechanism on the decoder, enhancing the BART base model to create detailed visual-guided summaries. We have tested our model extensively on three multimodal clinical benchmarks involving multimodal question and dialogue summarization tasks. Our analysis demonstrates that EDI-Summ outperforms state-of-the-art large language and vision-aware models in these summarization tasks. Disclaimer: The work includes vivid medical illustrations, depicting the essential aspects of the subject matter.
[Full Paper] [Code]
“Yes, this is what I was looking for! Towards Multi-modal Medical Consultation Concern Summary Generation”, Abhisek Tiwari, Shreyangshu Behra, Sriparna Saha, Pushpak Bhattacharyy, & Samrat Ghosh, 46th European Conference on Information Retrieval (ECIR 2024)
[Abstract]
Over the past few years, the use of the Internet for healthcare-related tasks has grown by leaps and bounds, posing a challenge in effectively managing and processing information to ensure its efficient utilization. During moments of emotional turmoil and psychological challenges, we frequently turn to the internet as our initial source of support, choosing this over discussing our feelings with others due to the associated social stigma. In this paper, we propose a new task of multi-modal medical concern summary (MMCS) generation, which provides a short and precise summary of patients' major concerns brought up during the consultation. Nonverbal cues, such as patients' gestures and facial expressions, aid in accurately identifying patients' concerns. Doctors also consider patients' personal information, such as age and gender, in order to describe the medical condition appropriately. Motivated by the potential efficacy of patients' personal context and visual gestures, we propose a transformer-based multi-task, multi-modal intent-recognition, and medical concern summary generation (IR-MMCSG) system. Furthermore, we propose a multitasking framework for intent recognition and medical concern summary generation for doctor-patient consultations. We construct the first multi-modal medical concern summary generation (MM-MediConSummation) corpus, which includes patient-doctor consultations annotated with medical concern summaries, intents, patient personal information, doctor's recommendations, and keywords. Our experiments and analysis demonstrate (a) the significant role of patients' expressions/gestures and their personal information in intent identification and medical concern summary generation, and (b) the strong correlation between intent recognition and patients' medical concern summary generation
[Full Paper] [Code]
“Experience and Evidence are the eyes of an excellent summarizer! Towards Knowledge Infused Multi-modal Clinical Conversation Summarization”, Abhisek Tiwari, Anisha Saha, Sriparna Saha,
Pushpak Bhattacharyy, & Minakshi Dhar, 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023) [Abstract]
With the advancement of telemedicine, both researchers and medical practitioners are working hand-in-hand to develop various techniques to automate various medical operations, such as diagnosis report generation. In this paper, we first present a multi-modal clinical conversation summary generation task that takes a clinician-patient interaction (both textual and visual information) and generates a succinct synopsis of the conversation. We propose a knowledge-infused, multi-modal, multi-tasking medical domain identification and clinical conversation summary generation (MM-CliConSummation) framework. It leverages an adapter to infuse knowledge and visual features and unify the fused feature vector using a gated mechanism. Furthermore, we developed a multi-modal, multi-intent clinical conversation summarization corpus annotated with intent, symptom, and summary. The extensive set of experiments, both quantitatively and qualitatively, led to the following findings: (a) critical significance of visuals, (b) more precise and medical entity preserving summary with additional knowledge infusion, and (c) a correlation between medical department identification and clinical synopsis generation. Furthermore, the dataset and source code are available at https://github.com/NLP-RL/MM-CliConSummation
[Full Paper] [Code]
“Your tone speaks louder than your face! Modality Order Infused Multi-modal Sarcasm Detection” Mohit Tomar*, Abhisek Tiwari*, Tulika Saha & Sriparna Saha,
, * denotes joint first author, 31st ACM International Conference on Multimedia (ACM Multimedia 2023) [Abstract]
Figurative language is an essential component of human communication, and detecting sarcasm in text has become a challenging yet highly popular task in natural language processing. As humans, we rely on a combination of visual and auditory cues, such as facial expressions and tone of voice, to comprehend a message. Our brains are implicitly trained to integrate information from multiple senses to form a complete understanding of the message being conveyed, a process known as multi-sensory integration. The combination of different modalities not only provides additional information but also amplifies the information conveyed by each modality in relation to the others. Thus, the infusion order of different modalities also plays a significant role in multimodal processing. In this paper, we investigate the impact of different modality infusion orders for identifying sarcasm in dialogues. We propose a modality order-driven module integrated into a transformer network, MO-Sarcation that fuses modalities in an ordered manner. Our model outperforms several state-of-the-art models by 1-3% across various metrics, demonstrating the crucial role of modality order in sarcasm detection. The obtained improvements and detailed analysis show that audio tone should be infused with textual content, followed by visual information to identify sarcasm efficiently. The code and dataset are available at https://github.com/mohit2b/MO-Sarcation.
[Full Paper] [Code]
“Local Context is not enough! Towards Query Semantic and Knowledge Guided Multi-Span Medical Question Answering” Abhisek Tiwari, Aman Bhanshali, Sriparna Saha,
Pushpak Bhattacharyy, Preeti Verma & Minakshi Dhar, 26th European Conference on Artificial Intelligence, ECAI 2023 [Abstract]
Medical Question Answering (MedQA) is one of the most popular and significant tasks in developing healthcare assistants. When humans extract an answer to a question from a document, they first (a) understand the question itself in detail and (b) utilize relevant knowledge/experiences to determine the answer segments. In multi-span question answering, it becomes increasingly important to comprehend the query accurately and possess relevant knowledge, as the interrelationship among different answer segments is essential for achieving completeness. Motivated by this, we first propose a transformer-based query semantic and knowledge (QueSemKnow) guided multi-span question-answering model. The proposed QueSemKnow works in a two-phased manner; in the first stage, a multi-task model is proposed to extract query semantics: (i) intent identification and (ii) question type prediction. In the second stage, QueSemKnow selects a relevant subset of the knowledge graph as the underlying context/document and extracts answers depending on the semantic information extracted from the first stage and context. We build a multi-task query semantic extraction model for query intent and query type identification to investigate the co-relation among these tasks. Furthermore, we created a semantically aware medical question-answering corpus named QueSeMSpan MedQA wherein each question is annotated with its corresponding semantic information. The proposed model outperforms several baselines and existing state-of-the-art models by a large margin on multiple datasets, which firmly demonstrates the effectiveness of the human-inspired multispan question-answering methodology.
[Full Paper] [Code]
“Dr. Can See: Towards a Multi-modal Disease Diagnosis Virtual Assistant”,Abhisek Tiwari, Simha, Sriparna Saha,
Pushpak Bhattacharyy, & Minakshi Dhar, 31st ACM International Conference on Information and Knowledge Management (CIKM 2022) [Abstract]
Artificial Intelligence-based clinical decision support is gaining ever-growing popularity and demand in both the research and industry communities. One such manifestation is automatic disease diagnosis, which aims to assist clinicians in conducting symptom investigations and disease diagnoses. When we consult with doctors, we often report and describe our health conditions with visual aids. Moreover, many people are unacquainted with several symptoms and medical terms, such as mouth ulcer and skin growth. Therefore, visual form of symptom reporting is a necessity. Motivated by the efficacy of visual form of symptom reporting, we propose and build a novel end-to-end Multi-modal Disease Diagnosis Virtual Assistant (MDD-VA) using reinforcement learning technique. In conversation, users' responses are heavily influenced by the ongoing dialogue context, and multi-modal responses appear to be of no difference. We also propose and incorporate a Context-aware Symptom Image Identification module that leverages discourse context in addition to the symptom image for identifying symptoms effectively. Furthermore, we first curate a multi-modal conversational medical dialogue corpus in English that is annotated with intent, symptoms, and visual information. The proposed MDD-VA outperforms multiple uni-modal baselines in both automatic and human evaluation, which firmly establishes the critical role of symptom information provided by visuals . The dataset and code are available at https://github.com/NLP-RL/DrCanSee
[Full Paper]
“Multi-Modal Dialogue Policy Learning for Dynamic and Co-operative Goal Setting”,Abhisek Tiwari, Sriparna Saha, Shubhasish Sengupta, Anutosh Maitra, Roshni Ramnani
& Pushpak Bhattacharyy, International Joint Conference on Neural Networks, Shenzhen, China, (IJCNN 2021) [Abstract]
Developing an adequate and human-like virtual agent has been one of the primary applications of artificial intelligence. In the last few years, task-oriented dialogue systems have gained huge popularity because of their upsurging relevance and positive outcomes. In real-world, users may not always have a predefined and rigid task goal beforehand; they upgrade/downgrade/change their goal component dynamically depending upon their utility value and agent's serving capability. However, existing virtual agents fail to incorporate this dynamic behavior, leading to either unsuccessful task completion or an ungratified user experience. The paper presents an end to end multimodal dialogue system for dynamic and co-operative goal setting, which incorporates i) a multi-modal semantic state representation in policy learning to deal with multi-modal inputs, ii) a goal manager module in a traditional dialogue manager for handling dynamic and goal unavailability scenarios effectively, iii) an accumulative reward (task/persona/sentiment) for task success, personalized persuasion and user-adaptive behavior, respectively. The obtained experimental results and the comparisons with baselines firmly establish the need and efficacy of the proposed system.
[Full Paper]
Journals
“Towards Symptom Assessment Guided Symptom Investigation and Disease Diagnosis” Abhisek Tiwari, Rishav Raj, Sriparna Saha, Pushpak Bhattacharyya, Sarbajeet Tiwari, & Minakshi Dhar, IEEE Transactions on Artificial Intelligence, (2023), Impact Factor: 7.25; [Abstract]
Automatic Disease Diagnosis (ADD) has gained immense popularity and demand over the past few years, and it is emerging as an effective diagnostic assistant to doctors. Diagnosis assistants assist clinicians in conducting a thorough symptom investigation and identifying possible diseases. Doctors correctly diagnose patients by observing only a few symptoms in most cases, even though the diagnosed disease has numerous symptoms. Also, some common symptoms, such as fever and headache, usually emerge due to other symptoms, which do not play a major role in identifying suffering diseases. In this work, we investigate the role of symptom importance in disease diagnosis through several feature engineering techniques and propose a novel symptom assessment incorporated symptom investigation and disease diagnosis (SA-SIDD) assistant using hierarchical reinforcement learning. The proposed SA-SIDD assistant first collects an adequate set of symptoms/sign information through conversing with users and then diagnoses a disease based on the extracted symptoms. We incorporated a symptom assessment module with the diagnosis framework that evaluates the relevance of current inspected symptom at each turn and reinforces the assistant to investigate distinctive and context-aligned symptoms using an assessment critic. The proposed methodology outperforms the state-of-the-art method, HRL, on two publicly available datasets, which firmly establishes the crucial role of symptom importance in disease diagnosis and the need for the proposed symptom assessment incorporated disease diagnosis framework. Furthermore, we have also conducted a human evaluation, revealing that the diagnosis method greatly enhances end-user satisfaction because of context-aligned relevant and minimal symptom investigation.
[Full Paper] [Code]
“Towards Personalized Persuasive Dialogue Generation for Adversarial Task oriented Dialogue Setting”, Abhisek Tiwari, Abhijit Khandwe, Sriparna Saha, Roshni Ramnani Shubashish Sengupta, Anutosh Maitra & Pushpak Bhattacharyya in Expert Systems with Applications, Impact Factor: 8.66, (2023) [Abstract]
In recent years, task-oriented virtual assistants have gained huge popularity and demand in both research and industry communities. The primary aim of a task-oriented dialogue agent is to assist end-users in accomplishing a task successfully and satisfactorily. Existing virtual agents have acquired proficiency in assisting users in solving simple tasks such as restaurant bookings. However, they operate under the deterministic presumption that end-users will have a servable task objective, which makes them inadequate under adversarial situations such as goal unavailability. On the other hand, human agents accomplish users’ tasks even in many goal unavailability scenarios by persuading them towards a similar goal to the user’s proposed task. Motivated by the limitation, the current work proposes and builds a novel transformer-based context-aware personalized persuasive virtual assistant (CoPersUasive VA), which also serves end-users in task unavailability situations. The proposed CoPersUasive VA recognizes goal conflicts through user sentiment and identifies an appropriate persuasion strategy using ongoing dialogue context and user personality. Depending on users’ proposed goals, it finds a similar servable goal and persuades them with the identified persuasion strategy. The obtained experimental results and detailed post-analysis firmly establish that the proposed model effectively enhances the capability of task-oriented virtual assistants to deal with the task failures caused by goal unavailability. The obtained findings also suggest the crucial role of dialogue context in identifying an appropriate and appealing persuasion strategy. The proposed CoPersUasive model could easily be adapted to any other domain by fine-tuning the model on an underlying task
[Full Paper]
“A Persona aware Persuasive Dialogue Policy for Dynamic and Co-operative Goal Setting
”, Abhisek Tiwari, Sriparna Saha, Shubashish Sengupta, Anutosh Maitra, Roshni Ramnani & Pushpak Bhattacharyya, Expert Systems with Applications, IF: 8.66, (2022) [Abstract]
Disease diagnosis is an essential and critical step in any disease treatment process. Automatic disease diagnosis has gained immense popularity in recent years, owing to its effiacy, easy accessability and reliablity. The major challenges for the diagnosis agent are inevitably large action space (symptoms) and varieties of diseases, which demand either rich domain knowledge or an intelligent learning framework. We propose a novel knowledge-infused context-driven (KI-CD) hierarchical reinforcement learning (HRL) based diagnosis dialogue system, which leverages a bayesian learning-inspired symptom investigation module called potential candidate module (PCM) for aiding context-aware, knowledge grounded symptom investigation. The PCM module serves as a context and knowledge guiding companion for lower-level policies, leveraging current context and disease-symptom knowledge to identify candidate diseases and potential symptoms, and reinforcing the agent for conducting an intelligent and context guided symptom investigation with the information enriched state and an additional critic known as learner critic. The knowledge-guided symptom investigation extracts an adequate set of symptoms for disease identification, whereas the context-aware symptom investigation aspect substantially improves topic (symptom) transition and enhances user experiences. Furthermore, we also propose and incorporate a hierarchical disease classifier (HDC) with the model for alleviating symptom state sparsity issues, which has led to a significant improvement in disease classification accuracy. The proposed framework outperforms the current state-of-the-art method on the multiple benchmarked datasets and, in all evaluation metrics other than dialogue length (diagnosis success rate, average match rate, symptom identification rate, and disease classification accuracy by 7.1 %, 0.23 %, 19.67 % and 8.04 %, respectively), which firmly establishes the effiacy of the proposed bayesian learning-inspired context-driven symptom investigation and disease diagnosis methodology.
[Full Paper] [Code]
"Symptoms are known by their companies: Towards Association guided Disease Diagnosis Assistant"Abhisek Tiwari, Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya, Shemim Begum, Minakshi Dhar & Sarbajeet Tiwari, BMC Bioinformatics (2022), IF: 3.00, H-Index: 242
[Abstract]
Humans are known by the company they keep; similarly, symptoms also exhibit the association property, i.e., one symptom may strongly suggest another symptom’s existence/non-existence, and their association provides crucial information about the suffering condition. The work investigates the role of symptom association in symptom investigation and disease diagnosis process. We propose and build a virtual assistant called Association guided Symptom Investigation and Diagnosis Assistant (A-SIDA) using hierarchical reinforcement learning. The proposed A-SIDDA converses with patients and extracts signs and symptoms as per patients’ chief complaints and ongoing dialogue context. We infused association-based recommendations and critic into the assistant, which reinforces the assistant for conducting context-aware, symptom-association guided symptom investigation. Following the symptom investigation, the assistant diagnoses a disease based on the extracted signs and symptoms. The assistant then diagnoses a disease based on the extracted signs and symptoms. In addition to diagnosis accuracy, the relevance of inspected symptoms is critical to the usefulness of a diagnosis framework. We also propose a novel evaluation metric called Investigation Relevance Score (IReS), which measures the relevance of symptoms inspected during symptom investigation. The obtained improvements (Diagnosis success rate-5.36%, Dialogue length-1.16, Match rate-2.19%, Disease classifier-6.36%, IReS-0.3501, and Human score-0.66) over state-of-the-art methods firmly establish the crucial role of symptom association that gets uncovered by the virtual agent. Furthermore, we found that the association guided symptom investigation greatly increases human satisfaction, owing to its seamless topic (symptom) transition.
[Full Paper] [Code]
"A Knowledge Infused Context driven Dialogue Agent for Disease Diagnosis using Hierarchical Reinforcement Learning" Abhisek Tiwari, Sriparna Saha, & Pushpak Bhattacharyya, Knowledge Based Systems, IF: 8.13, (2022)
Patents
“System and Method for a Knowledge-Infused Multi-Modal Symptom Investigation and Disease Diagnosis Assistant" Abhisek Tiwari, Sriparna Saha, Pushpak Bhattacharyya, Sarbajeet Tiwari, Minakshi Dhar, Indian Patent 2025 (Granted)
"Dynamic Goal-Oriented Dialogue with Virtual Agents" Shubhashis Sengupta, Anutosh, Roshni Ramnani, Sriparna Saha, Abhisek Tiwari, Pushpak Bhattacharyya, US Patent 2024 (Granted)
“System and Method for Automatic Disease Diagnosis" Abhisek Tiwari, Sriparna Saha,
Pushpak Bhattacharyya, Minakshi Dhar, Indian Patent; (Jan 2023) with IIT Patna, Status: Filed and Published (Under Examination)