ArXiv Intelligence

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Differentiable Efficient Operator Search

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Where's the Structure? A Systematic Literature Review of Empirical Research on Human-AI Collaboration and Hybrid Intelligence for Learning

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

The Score Hamiltonian: Mapping Diffusion Models to Adiabatic Transport

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Ontology-constrained multi-LLM scoring of hypothesis support in the predictive processing literature

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Finite Element-Based Material Learning via Automatic Differentiation: Learning constitutive neural network models from full-field deformation data

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Temporal Preference Concepts and their Functions in a Large Language Model

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Assessing the Geographic Diversity of AI's Platial Representations in Image Generation

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Geographic Bias and Diversity in AI Evaluation

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

The Granularity Gap: A Multi-Dimensional Longitudinal Audit of Sycophancy in Gemini Models

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Multi-Granularity Reasoning for Natural Language Inference

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

From Scoring to Explanations: Evaluating SHAP and LLM Rationales for Rubric-based Teaching Quality Assessment

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

The Virtual Roundtable: Multi-Agent Personas Simulating the Dynamics of Human Brainstorming

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics

Topic · 其他

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

RAINO: Anchoring Agents in Reality, A Systematic Review and Conceptual Framework for Realism in Agent-Based Modelling

Topic · Agent

仅有原始 MD

Quick Read

LLM failed, fallback used

详情问答

2026-06-05 · 280 篇

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

Differentiable Efficient Operator Search

Where's the Structure? A Systematic Literature Review of Empirical Research on Human-AI Collaboration and Hybrid Intelligence for Learning

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

The Score Hamiltonian: Mapping Diffusion Models to Adiabatic Transport

Ontology-constrained multi-LLM scoring of hypothesis support in the predictive processing literature

Finite Element-Based Material Learning via Automatic Differentiation: Learning constitutive neural network models from full-field deformation data

Temporal Preference Concepts and their Functions in a Large Language Model

Assessing the Geographic Diversity of AI's Platial Representations in Image Generation

Geographic Bias and Diversity in AI Evaluation

The Granularity Gap: A Multi-Dimensional Longitudinal Audit of Sycophancy in Gemini Models

Multi-Granularity Reasoning for Natural Language Inference

From Scoring to Explanations: Evaluating SHAP and LLM Rationales for Rubric-based Teaching Quality Assessment

The Virtual Roundtable: Multi-Agent Personas Simulating the Dynamics of Human Brainstorming

MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models

PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis

Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics

RAINO: Anchoring Agents in Reality, A Systematic Review and Conceptual Framework for Realism in Agent-Based Modelling