Y

Yanyan Zhao

Harbin Institute of Technology

ORCID: 0000-0002-1166-1459

Publishes on Natural Language Processing Techniques, Topic Modeling, Sentiment Analysis and Opinion Mining. 9 papers and 38 citations.

9Publications
38Total Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors
Yang Wu, Yanyan Zhao, Hao Yang et al.|Findings of the Association for Computational Linguistics: ACL 2022|2022
Cited by 27Open Access

Multimodal sentiment analysis has attracted increasing attention and lots of models have been proposed. However, the performance of the state-of-the-art models decreases sharply when they are deployed in the real world. We find that the main reason is that real-world applications can only access the text outputs by the automatic speech recognition (ASR) models, which may be with errors because of the limitation of model capacity. Through further analysis of the ASR outputs, we find that in some cases the sentiment words, the key sentiment elements in the textual modality, are recognized as other words, which makes the sentiment of the text change and hurts the performance of multimodal sentiment analysis models directly. To address this problem, we propose the sentiment word aware multimodal refinement model (SWRM), which can dynamically refine the erroneous sentiment words by leveraging multimodal sentiment clues. Specifically, we first use the sentiment word position detection module to obtain the most possible position of the sentiment word in the text and then utilize the multimodal sentiment word refinement module to dynamically refine the sentiment word embeddings. The refined embeddings are taken as the textual inputs of the multimodal feature fusion module to predict the sentiment labels. We conduct extensive experiments on the real-world datasets including MOSI-Speechbrain, MOSI-IBM, and MOSI-iFlytek and the results demonstrate the effectiveness of our model, which surpasses the current state-of-the-art models on three datasets. Furthermore, our approach can be adapted for other multimodal feature fusion models easily 1 .

Negation scope detection with recurrent neural networks models in review texts
Lydia Lazib, Yanyan Zhao, Bing Qin et al.|International Journal of High Performance Computing and Networking|2019
Cited by 11

Identifying negation scopes in a text is an important subtask of information extraction that can benefit other natural language processing tasks, like relation extraction, question answering and sentiment analysis, and serves the task of social media text understanding. The task of negation scope detection can be regarded as a token-level sequence labelling problem. In this paper, we propose different models based on recurrent neural networks (RNNs) and word embedding that can be successfully applied to such tasks without any task-specific feature engineering effort. Our experimental results show that RNNs, without using any hand-crafted features, outperform feature-rich CRF-based model.

Beyond Snapshots: A Multimodal User-Level Dataset for Depression Detection in Dynamic Social Media Streams
Bichen Wang, Yixin Sun, Yanyan Zhao et al.|Unknown|2025
Cited by 0

As an increasing number of users share their lives and mental states on social media, many studies attempt to detect depression risk through social media videos using non-verbal cues like facial expressions, posture, gaze, and intonation from individual social media platforms, a proven effective field. However, these studies have focused on single-video level analysis to detect depression. These researches fail to capture the dynamic nature of social media streams and the complex, often gradual manifestation of depression. This limitation overlooks the comprehensive mental state of users, which can only be understood through their extended video histories. To address this, we introduce the Multimodal User-level Depression Detection Dataset (MUD3). MUD3 includes the long-term video histories of depressed users on social media platforms, containing user mental states across multiple videos and treating the video histories as a continuous social media stream. This allows us to model multiple videos at the user-level and analyze users' long-term mental states. MUD3 and supplementary materials are available at https://github.com/Syx1030/MUD3.

Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models
Di Wu, Xinzheng Lu, Yanyan Zhao et al.|Unknown|2025
Cited by 0Open Access

Although large language models (LLMs) achieve effective safety alignment at the time of release, they still face various safety challenges.A key issue is that fine-tuning often compromises the safety alignment of LLMs.To address this issue, we propose a method named IRR (Identify, Remove, and Recalibrate for Safety Realignment) that performs safety realignment for LLMs.The core of IRR is to identify and remove unsafe delta parameters from the fine-tuned models, while recalibrating the retained parameters.We evaluate the effectiveness of IRR across various datasets, including both full fine-tuning and LoRA methods.Our results demonstrate that IRR significantly enhances the safety performance of finetuned models on safety benchmarks, such as harmful queries and jailbreak attacks, while maintaining their performance on downstream tasks.The source code is available at:

Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation
Cited by 0Open Access

Synthesizing high-quality instruction data from unsupervised text is a promising paradigm for training large language models (LLMs), yet automated methods for this task still exhibit significant limitations in the diversity and difficulty of synthesized instructions. To address these challenges, we propose Self-Foveate, an LLM-driven method for instruction synthesis. Inspired by hierarchical human visual perception, Self-Foveate introduces a "Micro-Scatter-Macro" multi-level foveation methodology that guides the extraction of textual information at three complementary granularities, from fine-grained details through cross-region connections to holistic patterns, thereby enhancing both the diversity and difficulty of synthesized instructions. Furthermore, a re-synthesis module is incorporated to improve the fidelity of instructions to source text and their overall quality. Comprehensive experiments across multiple unsupervised corpora and diverse model architectures demonstrate that Self-Foveate consistently outperforms existing methods. We publicly release our code at https://github.com/Mubuky/Self-Foveate