ai digest - March 21, 2026

key developments

arxiv declares independence from cornell, becoming a standalone nonprofit. the preprint server that underpins essentially all ml research is restructuring to handle exploding submission volumes and what it explicitly calls “ai slop.” this matters because arxiv’s infrastructure decisions directly affect how research is disseminated and filtered. as an independent nonprofit, it gains fundraising flexibility but also takes on full financial responsibility. the “ai slop” framing is notable; arxiv is now publicly acknowledging that llm-generated paper submissions are a real operational burden, not just a theoretical concern. worth watching whether this leads to new submission filtering policies. (reddit)

deepseek core researcher daya guo reportedly resigns. guo was a primary author on deepseek-r1 (the paper that made nature’s cover in 2025) and was deeply involved in deepseek-v3 and deepseek-math. he joined deepseek only in july 2024 after completing his phd at sun yat-sen university. rumors point to either baidu or bytedance as his destination. this matters because deepseek’s technical output has been disproportionately impactful relative to its size, and losing a core contributor to a larger competitor signals the intensity of china’s ai talent war. deepseek’s efficiency-focused approach may be less compelling to researchers who want access to massive compute clusters. (reddit)

simon willison demonstrates llm-based user profiling from public hacker news comments. willison built a simple tool that pulls a user’s last 1,000 hn comments via the algolia api (open cors, zero authentication) and feeds them to claude opus 4.6 with the prompt “profile this user.” the results are, in his words, “startlingly effective,” reconstructing professional identity, technical opinions, geographic location, and behavioral patterns. this is a useful concrete demonstration of the privacy implications of long public comment histories combined with modern llms. the fact that it requires zero technical sophistication (a single api call plus a paste into any llm) makes the threat model accessible to anyone. (simonwillison.net)

multi-token prediction for qwen 3.5 coming to mlx-lm. a community pr adds mtp support to apple’s mlx-lm framework for the qwen 3.5 model family. benchmarks on qwen3.5-27b 4-bit running on m4 pro show 15.3 to 23.3 tok/s (roughly 1.5x throughput) with an 80.6% acceptance rate. this is a meaningful local inference speedup for apple silicon users running one of the current top open model families, and it demonstrates mtp moving from a training technique to a practical inference optimization in mainstream tooling. (reddit, github pr)

mistral ceo proposes revenue-based content levy for ai companies operating in europe. arthur mensch published an ft opinion piece arguing that europe’s opt-out copyright framework is “unworkable” and proposing a levy applied to all commercial ai providers placing models on the european market, including foreign companies. this is significant because it comes from europe’s leading ai lab, not from publishers or regulators. mensch is essentially conceding that mistral cannot compete under current eu rules while us and chinese labs train freely, and is proposing a regulatory mechanism that would at least level the playing field by taxing everyone equally. if this gains traction, it could reshape the economics of deploying ai services in europe. (ft via archive, reddit)

notable

nemotron cascade 2 30b-a3b scores 97.6% on humaneval at iq4_xs quantization, beating both qwen3.5 medium variants. uses nvidia’s own architecture, not qwen-derived. worth attention as a coding-focused local model. (reddit)
langchain releases “deep agents,” an mit-licensed framework built on langgraph with planning-first architecture, sub-agent spawning, filesystem access, and cross-session memory. positioned as an open alternative to claude code-style agents. (reddit)
willison publishes guide on git patterns for coding agents, covering how to leverage agents’ deep git fluency for branching strategies, history exploration, and reversible experimentation. practical reference for agentic engineering workflows. (simonwillison.net)
fastflowlm benchmarks on ryzen ai max+ 395 show deepseek-r1 8b at 10.7 tok/s generation across all context depths up to 70k, while lfm2 models hit 63.8 tok/s at short context. useful reference for amd npu inference performance. (reddit)
persona-level safety via system prompts achieves 100% refusal in abliterated models when combining behavioral rules with governance hierarchies, up from 22% baseline. interesting finding but tested on only 18 prompts with a single model family (qwen 3.5 9b); generalization unclear. (zenodo paper 1, zenodo paper 2)

papers

“ket-rag: graph rag with structured chain-of-thought closes the gap between 8b and 70b models on multi-hop qa.” shows that retrieval is largely solved (77-91% answer presence) but reasoning is the bottleneck (73-84% of errors). structured decomposition plus 60% context compression via graph traversal lets llama 3.1 8b match vanilla llama 3.3 70b on hotpotqa, musique, and 2wikimultihopqa at ~12x lower cost. (arxiv)