ArXiv Intelligence

When AI Says It Feels

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

PerceptUI: LLM Agents as Human-Aligned Synthetic Users for UI/UX Evaluation

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

AdaMEM: Test-Time Adaptive Memory for Language Agents

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

Topic · 强化学习

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG

Topic · 记忆

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Answer Presence Drives RAG Rewriting Gains

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Evaluation of LLMs for Mathematical Formalization in Lean

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Safety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Multilingual Fine-Tuning via Localized Gradient Conflict Resolution

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Fix the Mind, Not the Move: Interpretable AI Assistance via Knowledge-Gap Localization

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Individual Gain, Collective Loss: Metacognitive Adaptation in AI-Assisted Creativity

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

2026-06-05 · 280 篇

When AI Says It Feels

DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance

Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving

Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models

PerceptUI: LLM Agents as Human-Aligned Synthetic Users for UI/UX Evaluation

AdaMEM: Test-Time Adaptive Memory for Language Agents

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio

Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?

FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG

Answer Presence Drives RAG Rewriting Gains

Evaluation of LLMs for Mathematical Formalization in Lean

Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking

Safety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack

Multilingual Fine-Tuning via Localized Gradient Conflict Resolution

Fix the Mind, Not the Move: Interpretable AI Assistance via Knowledge-Gap Localization

GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

Individual Gain, Collective Loss: Metacognitive Adaptation in AI-Assisted Creativity