key developments
anthropic confirms claude code users hitting usage limits far faster than expected. the register reports (with 129 points and 111 comments on hn) that anthropic is acknowledging claude code users are burning through their usage allocations much faster than the company anticipated. this matters because it signals real, sustained adoption of agentic coding workflows, not just tire-kicking. it also surfaces the fundamental tension in ai product economics: the most valuable use cases (long-running, multi-step agent sessions) are also the most expensive to serve. anthropic will need to figure out pricing and capacity planning for a usage pattern that looks fundamentally different from chat. https://www.theregister.com/2026/03/31/anthropic_claude_code_limits/
finetuning bypasses copyright safety alignment across gpt-4o, gemini-2.5-pro, and deepseek-v3.1, extracting up to 85-90% of held-out copyrighted books. this paper demonstrates that training models on a task as innocuous as “expand plot summaries into full text” causes frontier models to reproduce large spans of copyrighted books (460+ word verbatim spans) using only semantic descriptions as prompts. the cross-model correlation is r≥0.90, meaning all three providers memorized the same books in the same regions. finetuning on one author unlocks verbatim recall of 30+ unrelated authors, and the effect generalizes across random author pairs. this is legally explosive: it directly undermines the defense that safety alignment prevents reproduction of training data, which has been a key premise in recent fair use rulings. https://arxiv.org/abs/2603.20957
mollick on why interfaces, not model capability, are the real bottleneck to ai productivity. ethan mollick’s latest post synthesizes a new paper showing financial professionals experienced cognitive overload from chatbot interfaces that offset productivity gains from gpt-4o1. the core argument: ai capability is already far ahead of how people can actually use it, and the chatbot paradigm is actively harmful for real work. he positions claude code’s computer use addition and specialized task interfaces as the right direction. this frames a useful mental model: the next wave of value comes from interface design, not model improvements. https://www.oneusefulthing.org/p/claude-dispatch-and-the-power-of
latent space covers claude code getting computer use, codex interop, and the emerging “composable harness” pattern. anthropic added computer use inside claude code (research preview for pro/max users), enabling closed-loop verification where the agent writes code, runs it, visually inspects the ui, and iterates. separately, openai shipped a codex plugin for claude code, allowing cross-agent composition. the latent space analysis frames this correctly: coding stacks are becoming composable harnesses rather than monolithic products. the broader piece also covers yoni rechtman’s framework for post-ai tech roles. https://www.latent.space/p/ainews-the-last-4-jobs-in-tech
attn-rot (“turboquant lite”) approaching merge into llama.cpp, delivering better kv cache quantization quality with no speed penalty. ggerganov’s attn-rot branch shows measurably improved kld scores at q4_0 quantization across qwen3.5-35b-a3b, qwen3.5-27b, and qwen3.5-122b-a10b, with essentially identical inference speed. for example, on qwen3.5-35b-a3b q4_0 kv cache, mean kld drops from 0.010338 to 0.007657 and same-top-p accuracy improves from 95.3% to 96.1%. this is the kind of infrastructure improvement that compounds: better quantization quality at the same memory footprint means local inference gets meaningfully better for everyone using llama.cpp. https://www.reddit.com/r/LocalLLaMA/comments/1s92x7z/
google releases veo 3.1 lite for cost-effective video generation via gemini api. available in paid preview through the gemini api and testable in google ai studio. positioned as the most cost-effective option in google’s video generation lineup. incremental but signals google is building out a tiered video generation product line. https://blog.google/innovation-and-ai/technology/ai/veo-3-1-lite/
notable
-
axios npm supply chain attack via leaked token. versions 1.14.1 and 0.30.4 of the 101m weekly download package included freshly published malware stealing credentials and installing a rat. same pattern as the litellm attack last week: npm publish without github release. https://simonwillison.net/2026/Mar/31/supply-chain-attack-on-axios/
-
trl v1.0 released by hugging face. the post-training library hits its 1.0 milestone, signaling stability for rl-based alignment workflows. https://huggingface.co/blog/trl-v1
-
ibm releases granite 4.0 3b vision, a compact multimodal model targeting enterprise document understanding. https://huggingface.co/blog/ibm-granite/granite-4-vision
-
simon willison releases llm 0.30 with a new register_models() hook allowing plugins to see previously registered models/aliases, enabling better plugin interop. https://simonwillison.net/2026/Mar/31/llm/
-
openai closes record $122b funding round at $852b valuation as ipo anticipation grows. https://www.reddit.com/r/mlscaling/comments/1s94a3b/
-
kat-coder-v2 from kuaishou hits 79.6% on swe-bench verified (vs claude opus 4.6 at 80.8%), using a “specialize then unify” paradigm with domain-specific rl before on-policy distillation into a single model. https://arxiv.org/abs/2603.27703
-
qwen3.5-omni results published by alibaba, per localllama discussion. https://www.reddit.com/r/LocalLLaMA/comments/1s8apue/
-
seed1.8 model card from bytedance describes a foundation model targeting “generalized real-world agency” with unified agentic interface for search, code execution, and gui interaction. https://arxiv.org/abs/2603.20633
-
190 security advisories against openclaw catalogued into a systematic taxonomy, revealing that three moderate/high-severity gateway vulnerabilities compose into complete unauthenticated rce from an llm tool call to host process. https://arxiv.org/abs/2603.27517
-
apple releases protext, a benchmark for measuring gendering and misgendering in long-form text transformations by llms. https://machinelearning.apple.com/research/protext-gender-bias-benchmark
papers
“chinchilla approach 2 has systematic biases in isoflop parabola fits” shows the widely used scaling law methodology introduces biases corresponding to 6.5% of llama 3’s training budget (~$1.4m wasted compute). proposes variable projection as a drop-in replacement. practically important for anyone fitting scaling laws. https://arxiv.org/abs/2603.22339
“context parroting: a simple but tough-to-beat baseline for foundation models in scientific ml” demonstrates that time-series foundation models often just copy from context, and a naive parroting baseline outperforms leading models on chaos, turbulence, oscillators, and ecg prediction at a fraction of compute. ties the observation to induction heads. important corrective for the field. https://arxiv.org/abs/2505.11349
“on-policy self-distillation for reasoning compression (opsdc)” achieves 57-59% token reduction on math-500 while improving accuracy by 9-16 points on qwen3-8b/14b, using nothing more than “be concise” self-distillation via reverse kl. the finding that much reasoning output is actively harmful, not just redundant, is striking. https://arxiv.org/abs/2603.05433
“hyperp: hypersphere parameterization for transferable scaling” introduces the first framework for transferring learning rates across width, depth, tokens, and moe granularity under frobenius-sphere constraints with the muon optimizer. achieves 1.58x compute efficiency over strong baselines at 6e21 flops. code released at github.com/microsoft/archscale. https://arxiv.org/abs/2603.28743
“heddle: distributed orchestration for agentic rl rollout” addresses the long-tail trajectory bottleneck in agentic rl training with trajectory-centric scheduling, achieving up to 2.5x higher end-to-end throughput. relevant infrastructure work as agentic rl scales up. https://arxiv.org/abs/2603.28101
“stop probing, start coding” shows sparse autoencoders fail at compositional generalization not due to amortization but due to dictionary learning itself pointing in wrong directions. oracle baselines prove the problem is solvable with good dictionaries, reframing the key open problem in mechanistic interpretability. https://arxiv.org/abs/2603.28744
“davinci-llm: towards the science of pretraining” releases a fully open 3b model trained on 8t tokens with 200+ controlled ablations, establishing that processing depth matters alongside volume scaling, and different domains exhibit distinct saturation dynamics. rare level of transparency for pretraining research. https://arxiv.org/abs/2603.27164