2026
论文总结-On the Role of Attention Heads in Large Language Model Safety 论文总结-Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models 论文总结FACT-AUDIT:An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models 论文总结-Safevla:Towards safety alignment of vision-language-action model via constrained learning 论文总结-Think-Reflect-Revise:A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models 论文总结- VLMGuard-R1:Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization 论文总结-HoliSafe:Holistic Safety Benchmarking and Modeling for Vision-Language Model 论文总结-SafeGRPO:Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization 论文总结-Understanding and Rectifying Safety Perception Distortion in VLMs 论文总结-Safe RLHF-V:Safe Reinforcement Learning from Multi-modal Human Feedback 论文总结-Adversary-Aware DPO:Enhancing Safety Alignment in Vision Language Models via Adversarial Training 论文总结-JailBound:Jailbreaking Internal Safety Boundaries of Vision-Language Models 论文总结-SteerVLM:Robust Model Control through Lightweight Activation Steering for Vision Language Models 论文总结-Improving Instruction-Following in Language Models through Activation Steering 论文总结-Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph 论文总结-Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization 论文总结-VLM-Guard:Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap 论文总结-Benchmark Self-Evolving:A Multi-Agent Framework for Dynamic LLM Evaluation 论文总结-MMJ-Bench:A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models 论文总结-Visual Adversarial Examples Jailbreak Aligned Large Language Models 论文总结-Instruction-Following Evaluation for Large Language Models 论文总结-AGENTIF:Benchmarking Instruction Following of Large Language Models in Agentic Scenarios VLSBench:Unveiling Visual Leakage in Multimodal Safety 论文总结-Automating Steering for Safe Multimodal Large Language Models 论文总结-DAVSP:Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt 论文总结-LLMs Encode Harmfulness and Refusal Separately 论文总结-AdaSteer:Your Aligned LLM is Inherently an Adaptive Jailbreak Defender 论文总结-Evolving Deception:When Agents Evolve, Deception Wins 论文总结-From Self-Evolving Synthetic Data to Verifiable-Reward RL:Post-Training Multi-turn Interactive Tool-Using Agents 论文总结-Agent World Model:Infinity Synthetic Environments for Agentic Reinforcement Learning 论文总结-EnvScaler:Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis 论文总结-Simulating Environments with Reasoning Models for Agent Training 论文总结-Close the Loop:Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing 论文总结-Scaling Agent Learning via Experience Synthesis 论文总结-GenEnv:Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators 论文总结-ETA:Evaluating Then Aligning Safety of Vision Language Models at Inference Time 论文总结:Deep semi-supervised learning for medical image segmentation:A review 论文总结-ImpossibleBench:Measuring LLMs’ Propensity of Exploiting Test Cases 论文总结:Are Your Agents Upward Deceivers?2025
论文总结:SAMora:Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images 论文总结:SemiSAM+:Rethinking semi-supervised medical image segmentation in the era of foundation models 论文总结-Q-MLLM:Vector Quantization for Robust Multimodal Large Language Model Security