The best Hacker News stories from Show from the past week
Latest posts:
Show HN: Freenet, a peer-to-peer platform for decentralized apps
For the past 5 years or so I've been working on a ground-up redesign of Freenet, my peer-to-peer project from the early 2000s (now renamed Hyphanet).<p>The new Freenet has been up and running since December along with some early applications like River[1], our decentralized group chat and Delta - a decentralized CMS. Users have already started to build their own apps on Freenet including games, and we have some interesting apps in development like Atlas, a search/recommendation engine.<p>Architecturally, this new Freenet is a global, decentralized key-value store where keys are webassembly contracts which define what values (aka "state") are valid for that key, how or when the values can be mutated, and how the state can be efficiently synchronized between peers.<p>We've developed a unique (AFAIK) solution to the consistency problem, every contract must define a "merge" operation for the contract's associated state. This operation must be commutative, meaning that you can merge multiple states in any order and you'll get the same end result.<p>This approach allows state updates to spread through the network like a virus[2], which typically achieves consistent global state in a few seconds or less.<p>Like the world wide web, Freenet applications can be downloaded from the network itself and run in a web browser - similar to single-page apps on the normal web. However, rather than connecting back to an API running in a datacenter, the webapp connects locally to the Freenet peer and interacts with Freenet contracts and delegates over a local websocket connection.<p>If you'd like to try Freenet we have convenient installers for the major desktop OSs but not yet mobile, and you can be chatting with other users on River within seconds[3]. Happy to answer any questions, you're also welcome to read our FAQ[4], or watch a talk I gave back in March[5].<p>[1] <a href="https://github.com/freenet/river" rel="nofollow">https://github.com/freenet/river</a><p>[2] <a href="https://freenet.org/about/news/summary-delta-sync/" rel="nofollow">https://freenet.org/about/news/summary-delta-sync/</a><p>[3] <a href="https://freenet.org/quickstart/" rel="nofollow">https://freenet.org/quickstart/</a><p>[4] <a href="https://freenet.org/faq/" rel="nofollow">https://freenet.org/faq/</a><p>[5] <a href="https://youtu.be/3SxNBz1VTE0" rel="nofollow">https://youtu.be/3SxNBz1VTE0</a>
Show HN: Freenet, a peer-to-peer platform for decentralized apps
For the past 5 years or so I've been working on a ground-up redesign of Freenet, my peer-to-peer project from the early 2000s (now renamed Hyphanet).<p>The new Freenet has been up and running since December along with some early applications like River[1], our decentralized group chat and Delta - a decentralized CMS. Users have already started to build their own apps on Freenet including games, and we have some interesting apps in development like Atlas, a search/recommendation engine.<p>Architecturally, this new Freenet is a global, decentralized key-value store where keys are webassembly contracts which define what values (aka "state") are valid for that key, how or when the values can be mutated, and how the state can be efficiently synchronized between peers.<p>We've developed a unique (AFAIK) solution to the consistency problem, every contract must define a "merge" operation for the contract's associated state. This operation must be commutative, meaning that you can merge multiple states in any order and you'll get the same end result.<p>This approach allows state updates to spread through the network like a virus[2], which typically achieves consistent global state in a few seconds or less.<p>Like the world wide web, Freenet applications can be downloaded from the network itself and run in a web browser - similar to single-page apps on the normal web. However, rather than connecting back to an API running in a datacenter, the webapp connects locally to the Freenet peer and interacts with Freenet contracts and delegates over a local websocket connection.<p>If you'd like to try Freenet we have convenient installers for the major desktop OSs but not yet mobile, and you can be chatting with other users on River within seconds[3]. Happy to answer any questions, you're also welcome to read our FAQ[4], or watch a talk I gave back in March[5].<p>[1] <a href="https://github.com/freenet/river" rel="nofollow">https://github.com/freenet/river</a><p>[2] <a href="https://freenet.org/about/news/summary-delta-sync/" rel="nofollow">https://freenet.org/about/news/summary-delta-sync/</a><p>[3] <a href="https://freenet.org/quickstart/" rel="nofollow">https://freenet.org/quickstart/</a><p>[4] <a href="https://freenet.org/faq/" rel="nofollow">https://freenet.org/faq/</a><p>[5] <a href="https://youtu.be/3SxNBz1VTE0" rel="nofollow">https://youtu.be/3SxNBz1VTE0</a>
Show HN: I reverse engineered Apple's video wallpapers
Ever since Apple introduced their video wallpapers I wanted to be able to put custom videos there. I decided to reverse engineer and see what I can do.<p>I built Phosphene to sell it, but the existing competitors were polished enough that the time it would have taken to catch up wasn't going to pay off. So I'm open-sourcing it.<p>WallpaperExtensionKit.framework is what powers macOS wallpapers. It controls what’s shows in the Settings app. It took a lot of trial and error to replicate the behavior, but the result is that your custom wallpapers appear alongside everything else. I wanted to have an “add” button there too, but I couldn’t find a way to do so, so there’s a companion app that will put your video where it needs to be.<p>Unlike Apple's Aerials, the video keeps playing on the desktop (not just the lock screen). The renderer drives AVSampleBufferDisplayLayer directly with PTS-offset gapless looping, and pauses or downshifts based on thermal state, battery level, brightness, and window occlusion.<p>It’s free and works well.
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments.<p>I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.<p>What it does:<p>- Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware<p>- Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it<p>- Ships with an eval harness and interactive dashboard so you can reproduce every number<p>I wanted to run a handful of always-on agentic systems for my portfolio, didn't want to pay cloud frontier costs, and immediately hit the compounding math problem on local models. 90% per-step accuracy sounds great, but with a 5-step workflow that's a 40% failure rate. No existing framework seemed to address this mechanical reliability issue - they all seemed tailor-made for cloud frontier.<p>Demo video: <a href="https://youtu.be/MzRgJoJAXGc" rel="nofollow">https://youtu.be/MzRgJoJAXGc</a> (side-by-side: same model, same task, with and without Forge guardrails)<p>The paper (accepted to ACM CAIS '26, presenting May 26-29 in San Jose) covers the peer-reviewed findings across 97 model/backend configurations, 18 scenarios, 50 runs each. Key numbers:<p>- Ministral 8B with Forge: 99.3%. Claude Sonnet with Forge: 100%. The gap between a free local 8B model on a $600 GPU and a frontier API is less than 1 point.<p>- The same 8B local model with Forge (99.3%) outperforms Claude Sonnet without guardrails (87.2%) - an 8B model with framework support beats the best result you can get through frontier API alone.<p>- Error recovery scores 0% for every model tested - local and frontier - without the retry mechanism. Not a capability gap, an architectural absence.<p>I'm currently using this for my home assistant running on Ministral 14B-Reasoning, and for my locally hosted agentic coding harness (8B managed to contribute to the codebase!).<p>The guardrail stack has five layers, each independently toggleable. The two that carry the most weight (per ablation study with McNemar's test): retry nudges (24-49 point drops when disabled) and error recovery (~10 point drops, significant for every model tested). Step enforcement is situational - only fires for models with weaker sequencing discipline. Rescue parsing and context compaction showed no significance in the eval but are retained for production workloads where they activate once in a while.<p>One thing I really didn't expect: the serving backend matters. Same Mistral-Nemo 12B weights produce 7% accuracy on llama-server with native function calling and 83% on Llamafile in prompt mode. A 75-point swing from infrastructure alone. I don't think anyone's published this because standard benchmarks don't control for serving backend.<p>Another surprise: there's no distinction in current LLM tool-calling between "the tool ran successfully and returned data" and "the tool ran successfully but found nothing." Both return a value, the orchestrator marks the step complete, and bad data cascades downstream. It's the equivalent of HTTP having 200 but no 404. Forge adds this as a new exception class (ToolResolutionError) - the model sees the error and can retry instead of silently passing garbage forward.<p>Biggest technical challenge was context compaction for memory-constrained hardware. Both Ollama and Llamafile silently fall back to CPU when the model exceeds VRAM - no warning, no error, just 10-100x slower inference. Forge queries nvidia-smi at startup and derives a token budget to prevent this.<p>How to try it:<p>- Clone the repo, run the eval harness on a model I haven't tested. If you get interesting results I'll add them to the dashboard.<p>- Try the proxy server mode - point any OpenAI-compatible client at Forge and it handles guardrails transparently. It's the newest model and I'd love more eyes on it.<p>- Dogfooding led me to optimize model parameters in v0.6.0. The harder eval suite (26 scenarios) is designed to raise the ceiling so no one sits at 100%. Several that did on the original suite can't sweep it - including Opus 4.6. Curious if anyone finds scenarios that expose gaps I haven't thought of. Paper numbers based on pre v0.6.0 code.<p>Background: prior ML publication in unsupervised learning (83 citations). This paper accepted to ACM CAIS '26 - presenting May 26-29.<p>Repo: <a href="https://github.com/antoinezambelli/forge" rel="nofollow">https://github.com/antoinezambelli/forge</a><p>Paper: <a href="https://www.caisconf.org/program/2026/demos/forge-agentic-reliability/" rel="nofollow">https://www.caisconf.org/program/2026/demos/forge-agentic-re...</a> <a href="https://github.com/antoinezambelli/forge/blob/main/docs/forge_ieee_preprint.pdf" rel="nofollow">https://github.com/antoinezambelli/forge/blob/main/docs/forg...</a><p>Dashboard: <a href="https://github.com/antoinezambelli/forge/docs/results/dashboard.html" rel="nofollow">https://github.com/antoinezambelli/forge/docs/results/dashbo...</a>
Show HN: Number Gacha, a gacha game distilled to its essence
Number Gacha is a half-parody, half-real gacha game where you roll, unwrap, and battle numbers. Play on Desktop for the best experience!
Show HN: Gaussian Splat of a Strawberry
The Setup:<p><a href="https://i.imgur.com/o0hgybh.jpeg" rel="nofollow">https://i.imgur.com/o0hgybh.jpeg</a><p><a href="https://i.imgur.com/mcNiomp.jpeg" rel="nofollow">https://i.imgur.com/mcNiomp.jpeg</a><p><a href="https://i.imgur.com/vIjw6pc.jpeg" rel="nofollow">https://i.imgur.com/vIjw6pc.jpeg</a><p><a href="https://i.imgur.com/nzOwmSC.jpeg" rel="nofollow">https://i.imgur.com/nzOwmSC.jpeg</a>
Show HN: Auto-identity-remove – Automated data broker opt-out runner for macOS
Show HN: Files.md – Open-source alternative to Obsidian
Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep
Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality.<p>Semble is our solution for this. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster.<p>Main features:<p>- Token-efficient: 98% fewer tokens than grep+read<p>- Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer)<p>- Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested<p>- MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode<p>- Zero config: no API keys, no GPU, no external services<p>Install in Claude Code with:
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble<p>Or check our README for other installation instructions, benchmarks, and methodology:<p>Semble: <a href="https://github.com/MinishLab/semble" rel="nofollow">https://github.com/MinishLab/semble</a><p>Benchmarks: <a href="https://github.com/MinishLab/semble/tree/main/benchmarks" rel="nofollow">https://github.com/MinishLab/semble/tree/main/benchmarks</a><p>Model: <a href="https://huggingface.co/minishlab/potion-code-16M" rel="nofollow">https://huggingface.co/minishlab/potion-code-16M</a><p>Let us know if you have any feedback or questions!
Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep
Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality.<p>Semble is our solution for this. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster.<p>Main features:<p>- Token-efficient: 98% fewer tokens than grep+read<p>- Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer)<p>- Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested<p>- MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode<p>- Zero config: no API keys, no GPU, no external services<p>Install in Claude Code with:
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble<p>Or check our README for other installation instructions, benchmarks, and methodology:<p>Semble: <a href="https://github.com/MinishLab/semble" rel="nofollow">https://github.com/MinishLab/semble</a><p>Benchmarks: <a href="https://github.com/MinishLab/semble/tree/main/benchmarks" rel="nofollow">https://github.com/MinishLab/semble/tree/main/benchmarks</a><p>Model: <a href="https://huggingface.co/minishlab/potion-code-16M" rel="nofollow">https://huggingface.co/minishlab/potion-code-16M</a><p>Let us know if you have any feedback or questions!
Show HN: Watch a neural net learn to play Snake
In browser PPO training demo, made possible by tinygrad: TinyJit -> WebGPU kernels.<p>Requires WebGPU.
Show HN: Find the best local LLM for your hardware, ranked by benchmarks
Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.<p>We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.<p>Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).<p>Training:
- Pretrained on 200B tokens across 16 TPU v6e (27 hours)
- Post-trained on 2B tokens of synthesized function-calling data (45 minutes)
- Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)<p>You can test it right now and finetune on your Mac/PC: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a><p>The full writeup on the architecture is here: <a href="https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md" rel="nofollow">https://github.com/cactus-compute/needle/blob/main/docs/simp...</a><p>We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.<p>While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.<p>This is part of our broader work on Cactus (<a href="https://github.com/cactus-compute/cactus" rel="nofollow">https://github.com/cactus-compute/cactus</a>), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: <a href="https://news.ycombinator.com/item?id=44524544">https://news.ycombinator.com/item?id=44524544</a><p>Everything is MIT licensed. Weights: <a href="https://huggingface.co/Cactus-Compute/needle" rel="nofollow">https://huggingface.co/Cactus-Compute/needle</a>
GitHub: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a>
Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.<p>We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.<p>Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).<p>Training:
- Pretrained on 200B tokens across 16 TPU v6e (27 hours)
- Post-trained on 2B tokens of synthesized function-calling data (45 minutes)
- Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)<p>You can test it right now and finetune on your Mac/PC: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a><p>The full writeup on the architecture is here: <a href="https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md" rel="nofollow">https://github.com/cactus-compute/needle/blob/main/docs/simp...</a><p>We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.<p>While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.<p>This is part of our broader work on Cactus (<a href="https://github.com/cactus-compute/cactus" rel="nofollow">https://github.com/cactus-compute/cactus</a>), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: <a href="https://news.ycombinator.com/item?id=44524544">https://news.ycombinator.com/item?id=44524544</a><p>Everything is MIT licensed. Weights: <a href="https://huggingface.co/Cactus-Compute/needle" rel="nofollow">https://huggingface.co/Cactus-Compute/needle</a>
GitHub: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a>
Show HN: TikTok but for scientific papers
Show HN: An index of indie web/blog indexes
I saw a comment here about how there are so many indexes of indie sites, blogs, etc but there wasn't an index of all the indexes. So I built it. It doesn't require a log in, just go browse! I've curated about 30 or so, but there is a submission form if there are ones I am missing.<p>Also happy to take UI improvements because I am not great in that area!
Show HN: Rust but Lisp
Show HN: Building a web server in assembly to give my life (a lack of) meaning
This is ymawky, a static file web server for MacOS written entirely in ARM64 assembly. It supports GET, PUT, DELETE, HEAD, and OPTIONS requests, and supports Range: bytes=X-Y headers (which allows scrubbing for video streaming). It decodes percent-encoded URLs, strictly enforces docroot, serves custom error pages for any HTTP error response, supports directory listing, and has (some) mitigations against slowloris-like attacks.<p>I’ve also written a more detailed writeup here: <a href="https://imtomt.github.io/ymawky/" rel="nofollow">https://imtomt.github.io/ymawky/</a>
Show HN: I made a Clojure-like language in Go, boots in 7ms
Let-go is a Clojure-like language (~90% compatible with JVM Clojure) written in pure Go. It ships as a ~10MB static binary and cold boots in ~7ms - that's about 50x faster than JVM and 3x faster than Babashka. It has decent throughput on algorithmic workloads - within ballpark of the GraalVM-backed sci.<p>I started this project in 2021 as an elaborate practical joke: I wanted to have an excuse for writing Clojure while pretending to write Go.<p>Jokes aside, it turned out to be pretty decent: it feels like real Clojure, it has an nREPL server (supported in Calva, CIDER, etc.), it's easily embeddable in your Go programs (funcs, structs and channels cross the boundary without fuss). It's good for writing CLIs, web servers, data processing scripts and even doing some systems programming - I used it to write a deamonless container runtime. Oh, and it runs on Plan9.<p>Under the hood there is a fairly simple compiler and a stack VM, both handcrafted specifically for running Clojure-like code. The compiler can work in AOT mode producing portable bytecode blobs and standalone binaries (runtime+bytecode).<p>This is not a drop-in replacement for Clojure in general - it does not load JARs, it does not have all Java APIs and it most probably won't run your exiting Clojure projects without modifications. At least not at the moment.<p>Take it for a spin, tell me what you think. Issues and PRs are welcome!
Show HN: TRUST – Coding Rust like it's 1989