key developments

anthropic confirms claude code users hitting usage limits far faster than expected. the register reports (with 129 points and 111 comments on hn) that anthropic is acknowledging claude code users are burning through their usage allocations much faster than the company anticipated. this matters because it signals real, sustained adoption of agentic coding workflows, not just tire-kicking. it also surfaces the fundamental tension in ai product economics: the most valuable use cases (long-running, multi-step agent sessions) are also the most expensive to serve. anthropic will need to figure out pricing and capacity planning for a usage pattern that looks fundamentally different from chat. https://www.theregister.com/2026/03/31/anthropic_claude_code_limits/

finetuning bypasses copyright safety alignment across gpt-4o, gemini-2.5-pro, and deepseek-v3.1, extracting up to 85-90% of held-out copyrighted books. this paper demonstrates that training models on a task as innocuous as “expand plot summaries into full text” causes frontier models to reproduce large spans of copyrighted books (460+ word verbatim spans) using only semantic descriptions as prompts. the cross-model correlation is r≥0.90, meaning all three providers memorized the same books in the same regions. finetuning on one author unlocks verbatim recall of 30+ unrelated authors, and the effect generalizes across random author pairs. this is legally explosive: it directly undermines the defense that safety alignment prevents reproduction of training data, which has been a key premise in recent fair use rulings. https://arxiv.org/abs/2603.20957

mollick on why interfaces, not model capability, are the real bottleneck to ai productivity. ethan mollick’s latest post synthesizes a new paper showing financial professionals experienced cognitive overload from chatbot interfaces that offset productivity gains from gpt-4o1. the core argument: ai capability is already far ahead of how people can actually use it, and the chatbot paradigm is actively harmful for real work. he positions claude code’s computer use addition and specialized task interfaces as the right direction. this frames a useful mental model: the next wave of value comes from interface design, not model improvements. https://www.oneusefulthing.org/p/claude-dispatch-and-the-power-of

latent space covers claude code getting computer use, codex interop, and the emerging “composable harness” pattern. anthropic added computer use inside claude code (research preview for pro/max users), enabling closed-loop verification where the agent writes code, runs it, visually inspects the ui, and iterates. separately, openai shipped a codex plugin for claude code, allowing cross-agent composition. the latent space analysis frames this correctly: coding stacks are becoming composable harnesses rather than monolithic products. the broader piece also covers yoni rechtman’s framework for post-ai tech roles. https://www.latent.space/p/ainews-the-last-4-jobs-in-tech

attn-rot (“turboquant lite”) approaching merge into llama.cpp, delivering better kv cache quantization quality with no speed penalty. ggerganov’s attn-rot branch shows measurably improved kld scores at q4_0 quantization across qwen3.5-35b-a3b, qwen3.5-27b, and qwen3.5-122b-a10b, with essentially identical inference speed. for example, on qwen3.5-35b-a3b q4_0 kv cache, mean kld drops from 0.010338 to 0.007657 and same-top-p accuracy improves from 95.3% to 96.1%. this is the kind of infrastructure improvement that compounds: better quantization quality at the same memory footprint means local inference gets meaningfully better for everyone using llama.cpp. https://www.reddit.com/r/LocalLLaMA/comments/1s92x7z/

google releases veo 3.1 lite for cost-effective video generation via gemini api. available in paid preview through the gemini api and testable in google ai studio. positioned as the most cost-effective option in google’s video generation lineup. incremental but signals google is building out a tiered video generation product line. https://blog.google/innovation-and-ai/technology/ai/veo-3-1-lite/

notable

papers

“chinchilla approach 2 has systematic biases in isoflop parabola fits” shows the widely used scaling law methodology introduces biases corresponding to 6.5% of llama 3’s training budget (~$1.4m wasted compute). proposes variable projection as a drop-in replacement. practically important for anyone fitting scaling laws. https://arxiv.org/abs/2603.22339

“context parroting: a simple but tough-to-beat baseline for foundation models in scientific ml” demonstrates that time-series foundation models often just copy from context, and a naive parroting baseline outperforms leading models on chaos, turbulence, oscillators, and ecg prediction at a fraction of compute. ties the observation to induction heads. important corrective for the field. https://arxiv.org/abs/2505.11349

“on-policy self-distillation for reasoning compression (opsdc)” achieves 57-59% token reduction on math-500 while improving accuracy by 9-16 points on qwen3-8b/14b, using nothing more than “be concise” self-distillation via reverse kl. the finding that much reasoning output is actively harmful, not just redundant, is striking. https://arxiv.org/abs/2603.05433

“hyperp: hypersphere parameterization for transferable scaling” introduces the first framework for transferring learning rates across width, depth, tokens, and moe granularity under frobenius-sphere constraints with the muon optimizer. achieves 1.58x compute efficiency over strong baselines at 6e21 flops. code released at github.com/microsoft/archscale. https://arxiv.org/abs/2603.28743

“heddle: distributed orchestration for agentic rl rollout” addresses the long-tail trajectory bottleneck in agentic rl training with trajectory-centric scheduling, achieving up to 2.5x higher end-to-end throughput. relevant infrastructure work as agentic rl scales up. https://arxiv.org/abs/2603.28101

“stop probing, start coding” shows sparse autoencoders fail at compositional generalization not due to amortization but due to dictionary learning itself pointing in wrong directions. oracle baselines prove the problem is solvable with good dictionaries, reframing the key open problem in mechanistic interpretability. https://arxiv.org/abs/2603.28744

“davinci-llm: towards the science of pretraining” releases a fully open 3b model trained on 8t tokens with 200+ controlled ablations, establishing that processing depth matters alongside volume scaling, and different domains exhibit distinct saturation dynamics. rare level of transparency for pretraining research. https://arxiv.org/abs/2603.27164