ArXiv Intelligence

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop

Topic · 大模型后训练

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Provably Secure Agent Guardrail

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

DenseSteer: Steering Small Language Models towards Dense Math Reasoning

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

ReasonOps: Operator Segmentation for LLM Reasoning Traces

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Governing Technical Debt in Agentic AI Systems

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

PRO-CUA: Process-Reward Optimization for Computer Use Agents

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

2026-05-29 · 354 篇

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories

Provably Secure Agent Guardrail

DenseSteer: Steering Small Language Models towards Dense Math Reasoning

Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

ReasonOps: Operator Segmentation for LLM Reasoning Traces

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction

Governing Technical Debt in Agentic AI Systems

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

PRO-CUA: Process-Reward Optimization for Computer Use Agents