The best Hacker News stories from Show from the past week

Go back

Latest posts:

Show HN: Watch a neural net learn to play Snake

In browser PPO training demo, made possible by tinygrad: TinyJit -> WebGPU kernels.<p>Requires WebGPU.

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.<p>We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.<p>Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).<p>Training: - Pretrained on 200B tokens across 16 TPU v6e (27 hours) - Post-trained on 2B tokens of synthesized function-calling data (45 minutes) - Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)<p>You can test it right now and finetune on your Mac/PC: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a><p>The full writeup on the architecture is here: <a href="https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md" rel="nofollow">https://github.com/cactus-compute/needle/blob/main/docs/simp...</a><p>We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.<p>While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.<p>This is part of our broader work on Cactus (<a href="https://github.com/cactus-compute/cactus" rel="nofollow">https://github.com/cactus-compute/cactus</a>), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: <a href="https://news.ycombinator.com/item?id=44524544">https://news.ycombinator.com/item?id=44524544</a><p>Everything is MIT licensed. Weights: <a href="https://huggingface.co/Cactus-Compute/needle" rel="nofollow">https://huggingface.co/Cactus-Compute/needle</a> GitHub: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a>

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.<p>We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.<p>Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).<p>Training: - Pretrained on 200B tokens across 16 TPU v6e (27 hours) - Post-trained on 2B tokens of synthesized function-calling data (45 minutes) - Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)<p>You can test it right now and finetune on your Mac/PC: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a><p>The full writeup on the architecture is here: <a href="https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md" rel="nofollow">https://github.com/cactus-compute/needle/blob/main/docs/simp...</a><p>We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.<p>While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.<p>This is part of our broader work on Cactus (<a href="https://github.com/cactus-compute/cactus" rel="nofollow">https://github.com/cactus-compute/cactus</a>), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: <a href="https://news.ycombinator.com/item?id=44524544">https://news.ycombinator.com/item?id=44524544</a><p>Everything is MIT licensed. Weights: <a href="https://huggingface.co/Cactus-Compute/needle" rel="nofollow">https://huggingface.co/Cactus-Compute/needle</a> GitHub: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a>

Show HN: TikTok but for scientific papers

Show HN: An index of indie web/blog indexes

I saw a comment here about how there are so many indexes of indie sites, blogs, etc but there wasn't an index of all the indexes. So I built it. It doesn't require a log in, just go browse! I've curated about 30 or so, but there is a submission form if there are ones I am missing.<p>Also happy to take UI improvements because I am not great in that area!

Show HN: Rust but Lisp

Show HN: Building a web server in assembly to give my life (a lack of) meaning

This is ymawky, a static file web server for MacOS written entirely in ARM64 assembly. It supports GET, PUT, DELETE, HEAD, and OPTIONS requests, and supports Range: bytes=X-Y headers (which allows scrubbing for video streaming). It decodes percent-encoded URLs, strictly enforces docroot, serves custom error pages for any HTTP error response, supports directory listing, and has (some) mitigations against slowloris-like attacks.<p>I’ve also written a more detailed writeup here: <a href="https://imtomt.github.io/ymawky/" rel="nofollow">https://imtomt.github.io/ymawky/</a>

Show HN: I made a Clojure-like language in Go, boots in 7ms

Let-go is a Clojure-like language (~90% compatible with JVM Clojure) written in pure Go. It ships as a ~10MB static binary and cold boots in ~7ms - that's about 50x faster than JVM and 3x faster than Babashka. It has decent throughput on algorithmic workloads - within ballpark of the GraalVM-backed sci.<p>I started this project in 2021 as an elaborate practical joke: I wanted to have an excuse for writing Clojure while pretending to write Go.<p>Jokes aside, it turned out to be pretty decent: it feels like real Clojure, it has an nREPL server (supported in Calva, CIDER, etc.), it's easily embeddable in your Go programs (funcs, structs and channels cross the boundary without fuss). It's good for writing CLIs, web servers, data processing scripts and even doing some systems programming - I used it to write a deamonless container runtime. Oh, and it runs on Plan9.<p>Under the hood there is a fairly simple compiler and a stack VM, both handcrafted specifically for running Clojure-like code. The compiler can work in AOT mode producing portable bytecode blobs and standalone binaries (runtime+bytecode).<p>This is not a drop-in replacement for Clojure in general - it does not load JARs, it does not have all Java APIs and it most probably won't run your exiting Clojure projects without modifications. At least not at the moment.<p>Take it for a spin, tell me what you think. Issues and PRs are welcome!

Show HN: TRUST – Coding Rust like it's 1989

Show HN: State of the Art of Coding Models, According to Hacker News Commenters

Hello HN,<p>I was away from my computer for two weeks, and after coming back and reading the latest discussions on HN about coding assistants (models, harnesses), I felt very out of the loop. My normal process would have been to keep reading and figure out the latest and greatest from people's comments, but I wanted to try and automate this process.<p>Basically the goal is to get a quick overview over which coding models are popular on HN. A next iteration could also scan for harnesses that people use, or info on self-hosting or hardware setups.<p>I wrote a short intro on the page about the pipeline that collects and analyzes the data, but feel free to ask for more details or check the Google Sheet for more info.<p><a href="https://hnup.date/hn-sota" rel="nofollow">https://hnup.date/hn-sota</a>

Show HN: GhostBox – Borrow a disposable little machine from the Global Free Tier

I built this because I was always creating machines on GH actions to test builds on different OS, and I wanted a tight CLI that could do it. I always saw Actions as this great resources and ephemeral machines you could do dev work in just were a natural way for me to work, so this grew out of that workflow.<p>I didn't expect it to blow up, so it wasn't 100% finished when I posted it. But it should stabilize pretty quickly.<p>Happy to know what you think and talk about it.

Show HN: WhatCable, a tiny menu bar app for inspecting USB-C cables

USB-C cables can be a mess. One cable charges at 5W, another does 100W and Thunderbolt 4, and they look identical in the drawer.<p>WhatCable sits in your menu bar and reads the cable data your Mac already has access to. Plug in a cable and it tells you in plain English what it can actually do: charging wattage, data speed, display support, Thunderbolt, etc.<p>Built in Swift/SwiftUI. Open source, free, no tracking.<p>GitHub: <a href="https://github.com/darrylmorley/whatcable" rel="nofollow">https://github.com/darrylmorley/whatcable</a>

Show HN: Rip.so – a graveyard for dead internet things

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

Show HN: Drive any macOS app in the background without stealing the cursor

Hi HN, Francesco from Cua here. I hacked this project together last weekend, inspired by the Codex Computer-Use release and lessons learned from deploying GUI-operating agents for our customers.<p>The main problem: when a UI automation process controls a desktop app today, it usually takes over the human’s session. Your cursor moves, keyboard focus gets stolen, windows jump to the front, and you have to stop working until the agent is done. That is why we have historically avoided encouraging users to run these processes directly on their host machine, instead relying on VMs or GUI containers for concurrency and background execution.<p>But computer-use - the tools we give agents to operate computers like humans - does not scale cleanly that way. As models get smarter, agents need to share hosts safely, run in the background, and avoid collisions with the human or other agents using the same machine.<p>We realized macOS has no first-class API for "drive this app without touching the cursor". CGEventPost routes through the hardware input stream, so it moves your cursor. CGEvent.postToPid avoids the cursor warp, but Chromium treats those events as untrusted and silently drops clicks at the renderer boundary. Activating the target app first raises the window and pulls focus, defeating the point of background execution.<p>Cua Driver is our attempt at a real fix: a background computer-use driver for macOS that lets an agent click, type, scroll, and read native apps while your cursor, frontmost app, and Space stay where they are. The default interface is a CLI, so it is easy to script or call from any coding agent shell.<p>Try it on macOS 14+:<p>/bin/bash -c "$(curl -fsSL <a href="https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh" rel="nofollow">https://raw.githubusercontent.com/trycua/cua/main/libs/cua-d...</a>)"<p>The first internal use case was delegated demo recording. We ask Claude Code to drive an app while 'cua-driver recording start' captures the trajectory, screenshots, actions, and click markers. The result is an agent-generated product demo, Screen Studio inspired.<p>Other things we have used it for:<p>- Replacing Vercel’s agent-browser and other browser-use CLIs. With Claude Code and Cua Driver, you do not need Chrome DevTools Protocol at all.<p>- A dev-loop QA agent that reproduces a visual bug, edits code, rebuilds, and verifies the UI while my editor stays frontmost.<p>- Personal-assistant flows that use iMessage from Claude Code, Hermes, or other general-purpose agent CLIs.<p>- Pulling visual context from Chrome, Figma, Preview, or YouTube windows I am not looking at, without relying on their APIs.<p>What made this harder than expected:<p>- CGEventPost warps the cursor because it goes through the HID stream.<p>- CGEvent.postToPid does not warp the cursor, but Chromium drops it at the renderer IPC boundary.<p>- Activating the target first raises the window and can drag you across Spaces.<p>- Electron apps stop keeping useful AX trees alive when windows are occluded without a private remote-aware SPI.<p>The unlock was SkyLight. SLEventPostToPid is a sibling of the public per-PID call, but it travels through a WindowServer channel Chromium accepts as trusted. Pair it with yabai’s focus-without-raise pattern, plus an off-screen primer click at (-1, -1), and the click lands without the window ever raising.<p>One thing we learned: the right addressing mode depends on the app. Native macOS apps usually have rich AX trees, Chromium-family apps often need a hybrid of AX and screenshots, and apps like Blender or CAD tools may expose almost no useful AX surface. The mistake is defaulting to pixels everywhere - or defaulting to AX everywhere.<p>Long technical writeup: <a href="https://github.com/trycua/cua/blob/main/blog/inside-macos-window-internals.md" rel="nofollow">https://github.com/trycua/cua/blob/main/blog/inside-macos-wi...</a><p>I would like feedback from people building Mac automation, agent harnesses, or accessibility tooling. If it breaks on an macOS app you care about, that is useful data for us.

Show HN: Live Sun and Moon Dashboard with NASA Footage

Show HN: Live Sun and Moon Dashboard with NASA Footage

Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%.<p>Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 lately (<a href="https://debugml.github.io/cheating-agents/" rel="nofollow">https://debugml.github.io/cheating-agents/</a>), I would like to also clarify a few things<p>1. Absolutely no {agents/skills}.md files were inserted at any point. No cheating mechanisms whatsoever<p>2. The cli agent was run in leaderboard compliant way (no modification of resources or timeouts)<p>3. The full terminal bench run was done using the fully open source version of the agent, no difference between what is on github and what was run.<p>I was originally going to wait for it to land on the leaderboard, but it has been 8 days and the maintainers do not respond unfortunately (there is a large backlog of the pull requests on their HF) so I decided to post anyways.<p>HF PR: <a href="https://huggingface.co/datasets/harborframework/terminal-bench-2-leaderboard/discussions/145" rel="nofollow">https://huggingface.co/datasets/harborframework/terminal-ben...</a><p>It is astounding how much the harness matters, based on this and other experiments I have done.

< 1 2 3 4 ... 168 169 170 >