文章分类：论文阅读-Quiet

Quiet

文章分类：论文阅读

2026

论文总结-On the Role of Attention Heads in Large Language Model Safety 04-07 论文总结-Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models 04-07 论文总结FACT-AUDIT:An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models 04-05 论文总结-Safevla:Towards safety alignment of vision-language-action model via constrained learning 04-03 论文总结-Think-Reflect-Revise:A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models 04-02 论文总结- VLMGuard-R1:Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization 04-02 论文总结-HoliSafe:Holistic Safety Benchmarking and Modeling for Vision-Language Model 03-31 论文总结-SafeGRPO:Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization 03-30 论文总结-Understanding and Rectifying Safety Perception Distortion in VLMs 03-29 论文总结-Safe RLHF-V:Safe Reinforcement Learning from Multi-modal Human Feedback 03-29 论文总结-Adversary-Aware DPO:Enhancing Safety Alignment in Vision Language Models via Adversarial Training 03-24 论文总结-JailBound:Jailbreaking Internal Safety Boundaries of Vision-Language Models 03-24 论文总结-SteerVLM:Robust Model Control through Lightweight Activation Steering for Vision Language Models 03-24 论文总结-Improving Instruction-Following in Language Models through Activation Steering 03-24 论文总结-Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph 03-19 论文总结-Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization 03-16 论文总结-VLM-Guard:Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap 03-16 论文总结-Benchmark Self-Evolving:A Multi-Agent Framework for Dynamic LLM Evaluation 03-14 论文总结-MMJ-Bench:A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models 03-13 论文总结-Visual Adversarial Examples Jailbreak Aligned Large Language Models 03-13 论文总结-Instruction-Following Evaluation for Large Language Models 03-12 论文总结-AGENTIF:Benchmarking Instruction Following of Large Language Models in Agentic Scenarios 03-12 VLSBench:Unveiling Visual Leakage in Multimodal Safety 03-11 论文总结-Automating Steering for Safe Multimodal Large Language Models 03-10 论文总结-DAVSP:Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt 03-10 论文总结-LLMs Encode Harmfulness and Refusal Separately 03-10 论文总结-AdaSteer:Your Aligned LLM is Inherently an Adaptive Jailbreak Defender 03-10 论文总结-Evolving Deception:When Agents Evolve, Deception Wins 03-10 论文总结-From Self-Evolving Synthetic Data to Verifiable-Reward RL:Post-Training Multi-turn Interactive Tool-Using Agents 03-07 论文总结-Agent World Model:Infinity Synthetic Environments for Agentic Reinforcement Learning 03-07 论文总结-EnvScaler:Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis 03-06 论文总结-Simulating Environments with Reasoning Models for Agent Training 03-06 论文总结-Close the Loop:Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing 03-06 论文总结-Scaling Agent Learning via Experience Synthesis 03-06 论文总结-GenEnv:Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators 03-06 论文总结-ETA:Evaluating Then Aligning Safety of Vision Language Models at Inference Time 03-04 论文总结:Deep semi-supervised learning for medical image segmentation:A review 02-09 论文总结-ImpossibleBench:Measuring LLMs’ Propensity of Exploiting Test Cases 02-03 论文总结:Are Your Agents Upward Deceivers? 02-01

2025

论文总结:SAMora:Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images 12-07 论文总结:SemiSAM+:Rethinking semi-supervised medical image segmentation in the era of foundation models 12-04 论文总结-Q-MLLM:Vector Quantization for Robust Multimodal Large Language Model Security 11-30