2026
论文总结-SteerVLM:Robust Model Control through Lightweight Activation Steering for Vision Language Models 论文总结-Improving Instruction-Following in Language Models through Activation Steering 论文总结-Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization VLSBench:Unveiling Visual Leakage in Multimodal Safety 论文总结-Automating Steering for Safe Multimodal Large Language Models 论文总结-DAVSP:Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt 论文总结-LLMs Encode Harmfulness and Refusal Separately 论文总结-AdaSteer:Your Aligned LLM is Inherently an Adaptive Jailbreak Defender