key developments
interconnects on “lossy self-improvement” challenges the rsi narrative. nathan lambert’s latest piece examines recursive self-improvement (rsi), the idea that ai models will accelerate their own development in a runaway loop. his core argument is that self-improvement is inherently lossy; models improving their own training pipelines face compounding errors, distribution shifts, and diminishing returns that dampen the exponential feedback loop rsi proponents assume. this matters because the rsi thesis underpins much of the “fast takeoff” safety discourse and significant investment narratives. lambert acknowledges the real acceleration happening (superhuman coding assistants making research easier, consolidation into 2-3 leading labs) but frames it as rapid linear progress, not exponential recursion. this is the most grounded technical analysis of rsi i’ve seen from someone embedded in the industry. https://www.interconnects.ai/p/lossy-self-improvement
sebastian raschka publishes a visual gallery of 45 llm architectures with attention variant deep dive. raschka released an interactive architecture gallery covering 45 distinct llm designs with visual model cards, plus a companion article walking through every major attention variant used in modern open-weight models (mha, gqa, mqa, sliding window, etc). this is a genuine reference resource rather than a tutorial; it synthesizes years of architectural evolution into one navigable artifact. useful for anyone who needs to quickly compare design choices across the llama, mistral, deepseek, and qwen families. the gallery will be maintained as new architectures emerge. https://magazine.sebastianraschka.com/p/visual-attention-variants https://sebastianraschka.com/llm-architecture-gallery/
starlette 1.0 released; willison explores it with claude skills. starlette, the asgi framework underpinning fastapi, shipped its 1.0 after years of development. willison flags this as significant because starlette has enormous invisible usage (every fastapi app runs on it) but low brand recognition. the 1.0 brings breaking changes around startup/shutdown via a new lifespan context manager pattern. willison notes that starlette’s single-file app style makes it exceptionally llm-friendly, and he used claude’s agent skills to experiment with the new api. for anyone building python web services or llm-generated backends, this is worth knowing about. https://simonwillison.net/2026/Mar/22/starlette/#atom-everything
minimax confirms m2.7 will be open weights, release in approximately 2 weeks. minimax’s m2.7 model, which has been generating buzz for strong benchmark performance, will be released as open weights. this adds another strong contender to the open model ecosystem alongside qwen and llama. details on architecture and parameter count are still sparse, but the localllama community is treating this as significant given minimax’s recent api performance. https://www.reddit.com/r/LocalLLaMA/comments/1s0mo33/m27_open_weights_coming_in_2_weeks/
notable
-
alibaba reaffirmed commitment to continuously open-sourcing qwen and wan models, signaling the chinese open-weight pipeline remains active and strategic. https://www.reddit.com/r/LocalLLaMA/comments/1s0pfml/alibaba_confirms_they_are_committed_to/
-
featherops achieves near-theoretical-max fp8 matmul performance on amd rdna3 gpus without native fp8 support, a meaningful result for anyone running inference on consumer amd hardware. https://github.com/woct0rdho/ComfyUI-FeatherOps
-
willison used claude code to produce a comprehensive comparison of javascript sandboxing approaches (isolated-vm, vm2, quickjs-emscripten, shadowrealm, deno workers); useful reference for anyone building agent tool execution. https://simonwillison.net/2026/Mar/22/javascript-sandboxing-research/#atom-everything
-
mit released updated 2026 course materials on flow matching and diffusion models, including lecture videos, mathematically self-contained notes, and coding exercises; new topics include latent spaces, diffusion transformers, and discrete diffusion for language models. https://diffusion.csail.mit.edu
-
arc institute introduced bioreason-pro, targeting the vast majority of proteins lacking experimental annotations; potentially significant for computational biology but details are thin. https://www.reddit.com/r/MachineLearning/comments/1s0uxom/arc_institute_introduces_bioreasonpro_targeting/
-
a former google tpu / nvidia gpu engineer published a detailed document on designing ai chips (software and hardware), framed as the plan for a startup they never launched. niche but rare transparency into accelerator design thinking. https://www.reddit.com/r/MachineLearning/comments/1s0y008/r_designing_ai_chip_software_and_hardware/