Ornith-1.0 offers agent-driven capabilities for code tasks in sizes 9B, 31B, 35B MoE, and 397B MoE, achieving state-of-the-art performance on coding benchmarks at comparable scale.
The quality of local open-source LLMs depends less on the model itself than on code quality, error handling, and API integration surrounding the model request.
InfoKV combines attention scores with uncertainty signals for KV-cache compression, outperforming pure attention-based methods on long reasoning tasks by measurable margins.
JetSpec overcomes scaling limits of speculative decoding through parallel tree drafting with causal conditioning, achieving up to 9.64x speedup in LLM inference.
OpenBioRQ reveals that agent-based AI models fail on approximately 40% of complex biomedical research questions and paradoxically stop using their tools on difficult tasks, despite these tools being most critical.
ViQ quantizes visual inputs at arbitrary resolutions into discrete representations, achieving 20–70% training acceleration compared to continuous image encodings.
JSON schema constraints compile tool-call tokens into unreachable regions of token space, causing models to suppress function calls despite both functions working in isolation.
AI agents exceed baseline on only roughly 18 percent of genuine scientific tasks because they tend to reframe problems rather than solve them with true innovation.
Frontier LLMs solve fewer than one-third of 87 multi-GPU CUDA benchmark tasks, though some generated kernels still outperform public reference implementations.
Structured curriculum learning strategies that leverage task relationships in latent space achieve better downstream performance than pure difficulty prioritization.