The best Hacker News stories from Show from the past week
Latest posts:
Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview
Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%.<p>Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 lately (<a href="https://debugml.github.io/cheating-agents/" rel="nofollow">https://debugml.github.io/cheating-agents/</a>), I would like to also clarify a few things<p>1. Absolutely no {agents/skills}.md files were inserted at any point. No cheating mechanisms whatsoever<p>2. The cli agent was run in leaderboard compliant way (no modification of resources or timeouts)<p>3. The full terminal bench run was done using the fully open source version of the agent, no difference between what is on github and what was run.<p>I was originally going to wait for it to land on the leaderboard, but it has been 8 days and the maintainers do not respond unfortunately (there is a large backlog of the pull requests on their HF) so I decided to post anyways.<p>HF PR: <a href="https://huggingface.co/datasets/harborframework/terminal-bench-2-leaderboard/discussions/145" rel="nofollow">https://huggingface.co/datasets/harborframework/terminal-ben...</a><p>It is astounding how much the harness matters, based on this and other experiments I have done.
Show HN: Free textbook on engineering thermodynamics
Author here. Feel free to send questions of any kind.
Show HN: Turning a Gaussian Splat into a videogame
Show HN: Turning a Gaussian Splat into a videogame
Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)
I shipped a wiki layer for AI agents that uses markdown + git as the source of truth, with a bleve (BM25) + SQLite index on top. No vector or graph db yet.<p>It runs locally in ~/.wuphf/wiki/ and you can git clone it out if you want to take your knowledge with you.<p>The shape is the one Karpathy has been circling for a while: an LLM-native knowledge substrate that agents both read from and write into, so context compounds across sessions rather than getting re-pasted every morning. Most implementations of that idea land on Postgres, pgvector, Neo4j, Kafka, and a dashboard.<p>I wanted to go back to the basics and see how far markdown + git could go before I added anything heavier.<p>What it does:
-> Each agent gets a private notebook at agents/{slug}/notebook/.md, plus access to a shared team wiki at team/.<p>-> Draft-to-wiki promotion flow. Notebook entries are reviewed (agent or human) and promoted to the canonical wiki with a back-link. A small state machine drives expiry and auto-archive.<p>-> Per-entity fact log: append-only JSONL at team/entities/{kind}-{slug}.facts.jsonl. A synthesis worker rebuilds the entity brief every N facts. Commits land under a distinct "Pam the Archivist" git identity so provenance is visible in git log.<p>-> [[Wikilinks]] with broken-link detection rendered in red.<p>-> Daily lint cron for contradictions, stale entries, and broken wikilinks.<p>-> /lookup slash command plus an MCP tool for cited retrieval. A heuristic classifier routes short lookups to BM25 and narrative queries to a cited-answer loop.<p>Substrate choices:
Markdown for durability. The wiki outlives the runtime, and a user can walk away with every byte. Bleve for BM25. SQLite for structured metadata (facts, entities, edges, redirects, and supersedes). No vectors yet. The current benchmark (500 artifacts, 50 queries) clears 85% recall@20 on BM25 alone, which is the internal ship gate. sqlite-vec is the pre-committed fallback if a query class drops below that.<p>Canonical IDs are first-class. Fact IDs are deterministic and include sentence offset. Canonical slugs are assigned once, merged via redirect stubs, and never renamed. A rebuild is logically identical, not byte-identical.<p>Known limits:
-> Recall tuning is ongoing. 85% on the benchmark is not a universal guarantee.<p>-> Synthesis quality is bounded by agent observation quality. Garbage facts in, garbage briefs out. The lint pass helps. It is not a judgment engine.<p>-> Single-office scope today. No cross-office federation.<p>Demo. 5-minute terminal walkthrough that records five facts, fires synthesis, shells out to the user's LLM CLI, and commits the result under Pam's identity: <a href="https://asciinema.org/a/vUvjJsB5vtUQQ4Eb" rel="nofollow">https://asciinema.org/a/vUvjJsB5vtUQQ4Eb</a><p>Script lives at ./scripts/demo-entity-synthesis.sh.<p>Context. The wiki ships as part of WUPHF, an open source collaborative office for AI agents like Claude Code, Codex, OpenClaw, and local LLMs via OpenCode. MIT, self-hosted, bring-your-own keys. You do not have to use the full office to use the wiki layer. If you already have an agent setup, point WUPHF at it and the wiki attaches.<p>Source: <a href="https://github.com/nex-crm/wuphf" rel="nofollow">https://github.com/nex-crm/wuphf</a><p>Install: npx wuphf@latest<p>Happy to go deep on the substrate tradeoffs, the promotion-flow state machine, the BM25-first retrieval bet, or the canonical-ID stability rules. Also happy to take "why not an Obsidian vault with a plugin" as a fair question.
Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)
I shipped a wiki layer for AI agents that uses markdown + git as the source of truth, with a bleve (BM25) + SQLite index on top. No vector or graph db yet.<p>It runs locally in ~/.wuphf/wiki/ and you can git clone it out if you want to take your knowledge with you.<p>The shape is the one Karpathy has been circling for a while: an LLM-native knowledge substrate that agents both read from and write into, so context compounds across sessions rather than getting re-pasted every morning. Most implementations of that idea land on Postgres, pgvector, Neo4j, Kafka, and a dashboard.<p>I wanted to go back to the basics and see how far markdown + git could go before I added anything heavier.<p>What it does:
-> Each agent gets a private notebook at agents/{slug}/notebook/.md, plus access to a shared team wiki at team/.<p>-> Draft-to-wiki promotion flow. Notebook entries are reviewed (agent or human) and promoted to the canonical wiki with a back-link. A small state machine drives expiry and auto-archive.<p>-> Per-entity fact log: append-only JSONL at team/entities/{kind}-{slug}.facts.jsonl. A synthesis worker rebuilds the entity brief every N facts. Commits land under a distinct "Pam the Archivist" git identity so provenance is visible in git log.<p>-> [[Wikilinks]] with broken-link detection rendered in red.<p>-> Daily lint cron for contradictions, stale entries, and broken wikilinks.<p>-> /lookup slash command plus an MCP tool for cited retrieval. A heuristic classifier routes short lookups to BM25 and narrative queries to a cited-answer loop.<p>Substrate choices:
Markdown for durability. The wiki outlives the runtime, and a user can walk away with every byte. Bleve for BM25. SQLite for structured metadata (facts, entities, edges, redirects, and supersedes). No vectors yet. The current benchmark (500 artifacts, 50 queries) clears 85% recall@20 on BM25 alone, which is the internal ship gate. sqlite-vec is the pre-committed fallback if a query class drops below that.<p>Canonical IDs are first-class. Fact IDs are deterministic and include sentence offset. Canonical slugs are assigned once, merged via redirect stubs, and never renamed. A rebuild is logically identical, not byte-identical.<p>Known limits:
-> Recall tuning is ongoing. 85% on the benchmark is not a universal guarantee.<p>-> Synthesis quality is bounded by agent observation quality. Garbage facts in, garbage briefs out. The lint pass helps. It is not a judgment engine.<p>-> Single-office scope today. No cross-office federation.<p>Demo. 5-minute terminal walkthrough that records five facts, fires synthesis, shells out to the user's LLM CLI, and commits the result under Pam's identity: <a href="https://asciinema.org/a/vUvjJsB5vtUQQ4Eb" rel="nofollow">https://asciinema.org/a/vUvjJsB5vtUQQ4Eb</a><p>Script lives at ./scripts/demo-entity-synthesis.sh.<p>Context. The wiki ships as part of WUPHF, an open source collaborative office for AI agents like Claude Code, Codex, OpenClaw, and local LLMs via OpenCode. MIT, self-hosted, bring-your-own keys. You do not have to use the full office to use the wiki layer. If you already have an agent setup, point WUPHF at it and the wiki attaches.<p>Source: <a href="https://github.com/nex-crm/wuphf" rel="nofollow">https://github.com/nex-crm/wuphf</a><p>Install: npx wuphf@latest<p>Happy to go deep on the substrate tradeoffs, the promotion-flow state machine, the BM25-first retrieval bet, or the canonical-ID stability rules. Also happy to take "why not an Obsidian vault with a plugin" as a fair question.
Show HN: I've built a nice home server OS
ohai!<p>I've released Lightwhale 3, which is possibly the easiest way to self-host Docker containers.<p>It's a free, immutable Linux system purpose-built to live-boot straight into a working Docker Engine, thereby shortcutting the need for installation, configuration, and maintenance. Its simple design makes it easy to learn, and its low memory footprint should make it especially attractive during these times of RAMageddon.<p>If this has piqued your interest, do check it out, along with its easy-to-follow Getting Started guide.<p>In any event,
have a nice day! =)
Show HN: How LLMs Work – Interactive visual guide based on Karpathy's lecture
All content is based on Andrej Karpathy's "Intro to Large Language Models" lecture (youtube.com/watch?v=7xTGNNLPyMI). I downloaded the transcript and used Claude Code to generate the entire interactive site from it — single HTML file. I find it useful to revisit this content time to time.
Show HN: How LLMs Work – Interactive visual guide based on Karpathy's lecture
All content is based on Andrej Karpathy's "Intro to Large Language Models" lecture (youtube.com/watch?v=7xTGNNLPyMI). I downloaded the transcript and used Claude Code to generate the entire interactive site from it — single HTML file. I find it useful to revisit this content time to time.
Show HN: Tolaria – Open-source macOS app to manage Markdown knowledge bases
Hey there! I am Luca, I write <a href="https://refactoring.fm/" rel="nofollow">https://refactoring.fm/</a> and I built Tolaria for myself to manage my own knowledge base (10K notes, 300+ articles written in over 6 years of newslettering) and work well with AI.<p>Tolaria is offline-first, file-based, has first-class support for git, and has strong opinions about how you should organize notes (types, relationships, etc).<p>Let me know your thoughts!
Show HN: Tolaria – Open-source macOS app to manage Markdown knowledge bases
Hey there! I am Luca, I write <a href="https://refactoring.fm/" rel="nofollow">https://refactoring.fm/</a> and I built Tolaria for myself to manage my own knowledge base (10K notes, 300+ articles written in over 6 years of newslettering) and work well with AI.<p>Tolaria is offline-first, file-based, has first-class support for git, and has strong opinions about how you should organize notes (types, relationships, etc).<p>Let me know your thoughts!
Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite
Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite
Scoring Show HN submissions for AI design patterns
Show HN: GoModel – an open-source AI gateway in Go
Hi, I’m Jakub, a solo founder based in Warsaw.<p>I’ve been building GoModel since December with a couple of contributors. It's an open-source AI gateway that sits between your app and model providers like OpenAI, Anthropic or others.<p>I built it for my startup to solve a few problems:<p><pre><code> - track AI usage and cost per client or team
- switch models without changing app code
- debug request flows more easily
- reduce AI spendings with exact and semantic caching
</code></pre>
How is it different?<p><pre><code> - ~17MB docker image
- LiteLLM's image is more than 44x bigger ("docker.litellm.ai/berriai/litellm:latest" ~ 746 MB on amd64)
- request workflow is visible and easy to inspect
- config is environment-variable-first by default
</code></pre>
I'm posting now partly because of the recent LiteLLM supply-chain attack. Their team handled it impressively well, but some people are looking at alternatives anyway, and GoModel is one.<p>Website: <a href="https://gomodel.enterpilot.io" rel="nofollow">https://gomodel.enterpilot.io</a><p>Any feedback is appreciated.
Show HN: GoModel – an open-source AI gateway in Go
Hi, I’m Jakub, a solo founder based in Warsaw.<p>I’ve been building GoModel since December with a couple of contributors. It's an open-source AI gateway that sits between your app and model providers like OpenAI, Anthropic or others.<p>I built it for my startup to solve a few problems:<p><pre><code> - track AI usage and cost per client or team
- switch models without changing app code
- debug request flows more easily
- reduce AI spendings with exact and semantic caching
</code></pre>
How is it different?<p><pre><code> - ~17MB docker image
- LiteLLM's image is more than 44x bigger ("docker.litellm.ai/berriai/litellm:latest" ~ 746 MB on amd64)
- request workflow is visible and easy to inspect
- config is environment-variable-first by default
</code></pre>
I'm posting now partly because of the recent LiteLLM supply-chain attack. Their team handled it impressively well, but some people are looking at alternatives anyway, and GoModel is one.<p>Website: <a href="https://gomodel.enterpilot.io" rel="nofollow">https://gomodel.enterpilot.io</a><p>Any feedback is appreciated.
Show HN: VidStudio, a browser based video editor that doesn't upload your files
Hi HN,
I built VidStudio, a privacy focused video editor that runs in the browser. I tried to keep it as frictionless as possible, so there are no accounts and no uploads. Everything is persisted on your
machine.<p>Some of the features: multi-track timeline, frame accurate seek, MP4 export, audio, video, image, and text tracks, and a WebGL backed canvas where available. It also works on mobile.<p>Under the hood, WebCodecs handles frame decode for timeline playback and scrubbing, which is what makes seeking responsive since decode runs on the hardware decoder when the browser supports it.
FFmpeg compiled to WebAssembly handles final encode, format conversion, and anything WebCodecs does not cover. Rendering goes through Pixi.js on a WebGL canvas, with a software fallback when WebGL is
not available. Projects live in IndexedDB and the heavy work runs in Web Workers so the UI stays responsive during exports.<p>Happy to answer technical questions about the tradeoffs involved in keeping the whole pipeline client-side. Any feedback welcome.<p>Link: <a href="https://vidstudio.app/video-editor" rel="nofollow">https://vidstudio.app/video-editor</a>
Sauna effect on heart rate
Show HN: I made a calculator that works over disjoint sets of intervals
I've been studying interval arithmetic for the past few weeks and it's a really interesting field because while there is a ton of super interesting research published over the past decades, it has never really gotten the recognition that it deserves, IMO.<p>One reason for this is that standard interval arithmetic has really poor handling of division by intervals containing zero. If you compute 1 / [-1, 2] in regular interval arithmetic, you get either [-∞, +∞], or you have to say that the operation is undefined. Both solutions are virtually useless. The real answer of course is [-∞, -1] U [0.5, +∞]: i.e. a union of two disjoint intervals.<p>This is useful because you can confidently exclude a non empty set of the real numbers ([-1, 0.5]) from the set of possible values that you can get by dividing 1 by a number between -1 and 2.<p>But this definition of interval division yields a value that is not an interval. This is a problem if you want to define a closed arithmetic system, where you can build and evaluate arbitrary expression over interval values.<p>(This behavior extends to any non continuous function like tan() for example, which is implemented in my project - not without difficulties!)<p>Well the obvious solution is to define your arithmetic over disjoint unions of intervals. This is the subject of a 2017 paper called "Interval Unions" by by Schichl, H., Domes, F., Montanher, T. and Kofler, K..<p>This open-source project I made implements interval union arithmetic in TypeScript in the form of a simple interactive calculator, so you can try it out for yourself! The underlying TypeScript library is dependency free and implements interval union arithmetic over IEEE 754 double precision floats (JS native number type) with outward rounding. This guarantees accuracy of interval results in the presence of rounding issue inherent to floating point.
Show HN: I made a calculator that works over disjoint sets of intervals
I've been studying interval arithmetic for the past few weeks and it's a really interesting field because while there is a ton of super interesting research published over the past decades, it has never really gotten the recognition that it deserves, IMO.<p>One reason for this is that standard interval arithmetic has really poor handling of division by intervals containing zero. If you compute 1 / [-1, 2] in regular interval arithmetic, you get either [-∞, +∞], or you have to say that the operation is undefined. Both solutions are virtually useless. The real answer of course is [-∞, -1] U [0.5, +∞]: i.e. a union of two disjoint intervals.<p>This is useful because you can confidently exclude a non empty set of the real numbers ([-1, 0.5]) from the set of possible values that you can get by dividing 1 by a number between -1 and 2.<p>But this definition of interval division yields a value that is not an interval. This is a problem if you want to define a closed arithmetic system, where you can build and evaluate arbitrary expression over interval values.<p>(This behavior extends to any non continuous function like tan() for example, which is implemented in my project - not without difficulties!)<p>Well the obvious solution is to define your arithmetic over disjoint unions of intervals. This is the subject of a 2017 paper called "Interval Unions" by by Schichl, H., Domes, F., Montanher, T. and Kofler, K..<p>This open-source project I made implements interval union arithmetic in TypeScript in the form of a simple interactive calculator, so you can try it out for yourself! The underlying TypeScript library is dependency free and implements interval union arithmetic over IEEE 754 double precision floats (JS native number type) with outward rounding. This guarantees accuracy of interval results in the presence of rounding issue inherent to floating point.