The best Hacker News stories from Show from the past day

Go back

Latest posts:

Show HN: I nerfed our coding agents on purpose

Tl;dr: I trained a classifier to route to the least expensive model and reasoning depth to complete the request. Coupling that with additional automated token efficiency techniques has yielded 3x usage for the same spend. For anyone interested in trying it themselves: <a href="https://nerfguard.com" rel="nofollow">https://nerfguard.com</a><p>Various teammates and I switched over to Codex from Claude Code recently. We still bounce between the tools, but Codex’s speed and steerability coupled with performance gains were hard to ignore. One of the downsides was that the per token pricing kicked in way sooner. This is happening across the board, but we felt it in Codex more acutely. We’re a startup filled with people who work around the clock and are obsessed with building — naturally our <i>daily</i> bill alone was striking.<p>Luckily we’re going after a big mission and speed matters significantly more than marginal token spend on the edges. Still, it got us thinking about how it was ludicrous that while our product has a side effect of decreasing token spend and speeding up agentic workflows by many orders of magnitude, we were using these top tier models for all types of internal coding tasks without any of those optimizations. The waste felt pretty ridiculous — the most glaring culprit was that we were seemingly using the max intelligence model on max reasoning for every task even when the task clearly didn’t require it. As a company who spends a lot of time on cached intelligence, it was also easy for us to see how there was plenty of other low hanging fruit as well.<p>So, on a recent weekend, I quickly built a tool to optimize our usage. At its core is a <i>very fast</i> classifier that classifies your requests to the least intelligence required for the task and includes some nice token optimizations on top. The result is roughly the same quality for multiples lower token spend. But even more exciting for us, is that the properly bin packed intelligence and reasoning levels meant our speed also went up considerably. This wasn’t negligible.<p>We’ve observed up to 3x savings and hours per day per person in saved time that we would have otherwise been waiting on tool turns and coding agent responses.<p>For us, that means improved engineering velocity and significantly higher usage for the same spend. It also means more usage before getting throttled.<p>As I told friends about this, they also wanted to start using it to maximize the usage they could get out of their coding agent plans. There are now engineers across many of the most cutting edge AI companies using this tool to optimize their token utilization in this way. Not just to save money, but to maximize output. Turns out that the best way to avoid getting nerfed by Claude is to intentionally nerf yourself selectively. We decided to release it for the rest of the builder community to use as well. You can now turn on Nerfguard for yourself and start getting more usage today.

Show HN: ABC Classic 100 Rankings visualised

This weekend is the ABC Classic FM countdown, which prompted me to dust off an old un-published data visualisation of rankings from previous years.<p>I've considered adding a search function, but I also kind of like that it requires a bit of exploration in the current form.<p>Some of the code is a bit clunky and I wouldn't mind refactoring it. I'm also not sure about browser compatibility - I've only got access to a couple of devices to test it on.

Show HN: Soft Body Jiggle Physics

A simple and fundamental standard for dynamic soft body dynamics.

Show HN: Infinite canvas notes in the non-Euclidean Poincaré disk

Hi!<p>This is an infinite canvas note-taking tool where notes are laid out in a non-Euclidean, hyperbolic geometric space. As you drag and navigate through the view, you’ll experience a unique fluid distortion that naturally leverages your brain's spatial memory.<p>I’ve been obsessed with the concept of space in HCI for years. Many modern UI patterns are essentially workarounds for the lack of screen real estate. While researching zoom-based UIs a while back, I stumbled upon old HCI papers that used the Poincaré disk model of the hyperbolic plane to organize data. It elegantly projects an infinite space into a finite disk, keeping everything contextually visible.<p>I wanted to build an experimental app around this concept years ago, but the non-Euclidean math was a significant roadblock. Recently, I decided to give it a shot with the help of LLMs. It turns out that LLMs can handle the mathematical heavy lifting quite well, specifically in designing the coordinate systems and optimization algorithms, provided that you guide them with a solid architectural design.<p>This is still an experimental demo, but I hope it leaves an impression. I’d love to know if you find this paradigm practical for organizing your thoughts.

Show HN: Cost.dev (YC W21) – making agents cost-aware and cheaper to call

We launched Infracost on HN five years ago (<a href="https://news.ycombinator.com/item?id=26064588">https://news.ycombinator.com/item?id=26064588</a>) where our CLI generated cost estimates for infra-as-code, e.g. "this Terraform PR adds $400/mo". The idea was to shift cloud costs (FinOps) left, so engineers get visibility of costs before deployment and make better decisions.<p>Earlier this year we started seeing agent traffic in our logs and it looked like coding agents were calling our CLI. But that CLI wasn't designed with coding agents in mind. We went down a philosophical rabbit hole to see if a CLI is even needed anymore given that Claude, Copilot et al. already follow best practices. Ultimately we decided to create a new CLI from the ground up with coding agents in mind for two reasons:<p>1. We optimized the CLI for agent callers and cut Claude's output token usage by up to 79% and API cost by up to 67% versus a bare-Claude baseline. We wrote a blog documenting our lessons on optimizing user token usage when designing a CLI, e.g. using predicate flags so the agent doesn't compose jq | python | wc pipelines, output format that strips JSON's redundant field names. The blog is here: <a href="https://www.infracost.io/resources/blog/we-cut-claude-s-token-usage-79-by-redesigning-our-cli-for-agents">https://www.infracost.io/resources/blog/we-cut-claude-s-toke...</a><p>2. With cloud costs, precision matters. Telling a coding agent "make this Terraform cost-optimized" can be expensive and lossy. You burn tokens loading code and policy context into every conversation. Your agent could make up a price and you wouldn't know because it's difficult to verify that across the ~10M price points that AWS, Azure and Google have. The CLI runs static analysis on the code, uses the latest prices from cloud vendors, and passes that context to the coding agent.<p>So that's what we're launching today - Cost.dev: <a href="https://cost.dev/" rel="nofollow">https://cost.dev/</a>.<p>- It runs locally. Your code never leaves your machine, you get a fast feedback loop, and you're not burning API calls per character when you want to fetch prices.<p>- The CLI does the deterministic work. Fetching price points, scanning the code, validating fixes. The coding agent does the natural-language part. You don't have to trust the LLM to remember the rules, and can verify it called the right CLI command.<p>- It provides a consistent rule layer across every tool you use. Get cost estimates in your IDE and your coding agent with a single install. We support Claude Code, GitHub Copilot, Cursor, Windsurf, OpenAI Codex, Gemini CLI, as well as IDEs like VS Code and JetBrains<p>Before we keep building more in that direction, I want to sanity-check with HN: is "agents writing IaC in prod" actually a thing yet, or am I betting on a future that's still a year out? I know software developers are using coding agents heavily, but are platform/infra folks doing that for prod too? Also, if you have any feedback on Cost.dev, I'd love to hear it!

Show HN: Mercek – A Desktop IDE for AWS ECS

Hey HN I've been using ECS for a while now and found it annoying having to log into the console everytime<p>I use Lens for Kubernetes but couldnt find an equivalent for ECS so i built one!<p>The project is open source as well <a href="https://github.com/utibeabasi6/mercek" rel="nofollow">https://github.com/utibeabasi6/mercek</a>

Show HN: Mercek – A Desktop IDE for AWS ECS

Hey HN I've been using ECS for a while now and found it annoying having to log into the console everytime<p>I use Lens for Kubernetes but couldnt find an equivalent for ECS so i built one!<p>The project is open source as well <a href="https://github.com/utibeabasi6/mercek" rel="nofollow">https://github.com/utibeabasi6/mercek</a>

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

Hi HN, not sure if anyone would be interested, but just wanted to share that I've been maintaining my small tool called 'lowfat' that helps me filters some of my verbose CLI output. It's a single binary, works as an agent hook or a shell wrapper. It has a plugin system to customize filters per command.<p>The idea is pretty simple: agents don't need the full kubectl get -o yaml or any 10k-line dump to make decisions. So that lowfat sits in between, strips the noise, and passes through what matters. Here's my real report after 2 months of personal use:<p><pre><code> lowfat history --all lowfat plugin candidates ───────────────────────────────────────────────────────── # command runs avg raw cost savings source status 1 kubectl get 101x 14.4K 1.5M 93.9% plugin good 2 grep 103x 13.5K 1.4M 96.2% plugin good 3 git diff 81x 995 80.6K 57.9% built-in good 4 kubectl 90x 485 43.6K 33.6% plugin good 5 docker 127x 5.5K 693.6K 96.1% built-in good 6 ls 489x 117 57.3K 56.2% built-in good 7 find 30x 16.5K 495.0K 95.5% plugin good 8 git show 63x 490 30.9K 38.0% built-in good 9 git 177x 368 65.2K 76.1% built-in good 10 git log 86x 556 47.8K 78.5% built-in good 11 kubectl logs 5x 3.6K 17.8K 43.0% plugin good 12 git status 86x 152 13.1K 58.0% built-in good 13 docker ps 20x 467 9.3K 52.8% plugin good 14 kubectl describe 6x 656 3.9K 1.2% plugin weak 15 docker images 9x 940 8.5K 61.8% built-in good 16 k get 2x 2.1K 4.2K 35.9% plugin good 17 terraform 10x 395 3.9K 32.1% plugin good 18 git commit 32x 77 2.5K 0.0% built-in weak 19 docker build 8x 487 3.9K 37.6% built-in good 20 docker compose 22x 979 21.5K 89.4% built-in good total: 4.4M raw → 4.1M saved (91.8%) </code></pre> My toolset above is kind limited, but it works pretty well for my usecase without any interruption Kinda help me not reaching the token limit for my company Bedrock limit usage and keep optimizing the saving on the go for later usage.<p>But, why not alternatives (<a href="https://github.com/zdk/lowfat#alternatives" rel="nofollow">https://github.com/zdk/lowfat#alternatives</a>) ? The answers are: - My goal is to make the core lightweight but extensible via plugins i.e. not trying to bundle every command in the installed binary so that people own their output filters. - Customizable per usecase via plugin or filter pipelines as I am using my own toolset. - Customizable for non-public CLI tools, for example, some enterprise might have their interal CLI tools that public won't have access. - People should own their data. So the design is local-first, No telemetry forever. - I kinda love UNIX-style composible pipes, so lowfat-filter has implemented this style. - Be able to adjust aggressiveness of the filter, so we can control that we won't strip something the agent needed.<p>GitHub: <a href="https://github.com/zdk/lowfat" rel="nofollow">https://github.com/zdk/lowfat</a><p>Anyway, if anyone is interested, feedbacks and questions are welcome!<p>Thanks!

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

Hi HN, not sure if anyone would be interested, but just wanted to share that I've been maintaining my small tool called 'lowfat' that helps me filters some of my verbose CLI output. It's a single binary, works as an agent hook or a shell wrapper. It has a plugin system to customize filters per command.<p>The idea is pretty simple: agents don't need the full kubectl get -o yaml or any 10k-line dump to make decisions. So that lowfat sits in between, strips the noise, and passes through what matters. Here's my real report after 2 months of personal use:<p><pre><code> lowfat history --all lowfat plugin candidates ───────────────────────────────────────────────────────── # command runs avg raw cost savings source status 1 kubectl get 101x 14.4K 1.5M 93.9% plugin good 2 grep 103x 13.5K 1.4M 96.2% plugin good 3 git diff 81x 995 80.6K 57.9% built-in good 4 kubectl 90x 485 43.6K 33.6% plugin good 5 docker 127x 5.5K 693.6K 96.1% built-in good 6 ls 489x 117 57.3K 56.2% built-in good 7 find 30x 16.5K 495.0K 95.5% plugin good 8 git show 63x 490 30.9K 38.0% built-in good 9 git 177x 368 65.2K 76.1% built-in good 10 git log 86x 556 47.8K 78.5% built-in good 11 kubectl logs 5x 3.6K 17.8K 43.0% plugin good 12 git status 86x 152 13.1K 58.0% built-in good 13 docker ps 20x 467 9.3K 52.8% plugin good 14 kubectl describe 6x 656 3.9K 1.2% plugin weak 15 docker images 9x 940 8.5K 61.8% built-in good 16 k get 2x 2.1K 4.2K 35.9% plugin good 17 terraform 10x 395 3.9K 32.1% plugin good 18 git commit 32x 77 2.5K 0.0% built-in weak 19 docker build 8x 487 3.9K 37.6% built-in good 20 docker compose 22x 979 21.5K 89.4% built-in good total: 4.4M raw → 4.1M saved (91.8%) </code></pre> My toolset above is kind limited, but it works pretty well for my usecase without any interruption Kinda help me not reaching the token limit for my company Bedrock limit usage and keep optimizing the saving on the go for later usage.<p>But, why not alternatives (<a href="https://github.com/zdk/lowfat#alternatives" rel="nofollow">https://github.com/zdk/lowfat#alternatives</a>) ? The answers are: - My goal is to make the core lightweight but extensible via plugins i.e. not trying to bundle every command in the installed binary so that people own their output filters. - Customizable per usecase via plugin or filter pipelines as I am using my own toolset. - Customizable for non-public CLI tools, for example, some enterprise might have their interal CLI tools that public won't have access. - People should own their data. So the design is local-first, No telemetry forever. - I kinda love UNIX-style composible pipes, so lowfat-filter has implemented this style. - Be able to adjust aggressiveness of the filter, so we can control that we won't strip something the agent needed.<p>GitHub: <a href="https://github.com/zdk/lowfat" rel="nofollow">https://github.com/zdk/lowfat</a><p>Anyway, if anyone is interested, feedbacks and questions are welcome!<p>Thanks!

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

To my knowledge, this is the first formally verified implementation of an intersection algorithm for polygons.<p>The experience of working with AI agents on this project changed a lot with recent model releases, as I describe in the readme. Opus 4.8 is able to provide algorithm implementation with formal proof in one shot, whereas previous models required me to provide proof strategies in multiple steps.<p>Trust in the correctness comes entirely from the Lean checker and human review of a small specification, not from the LLM.<p>Also check out the web demo built around the verified core linked in the readme: <a href="https://schildep.github.io/verified-polygon-intersection/" rel="nofollow">https://schildep.github.io/verified-polygon-intersection/</a>. It supports multipolygons including holes, self intersections, and overlapping edges.

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

To my knowledge, this is the first formally verified implementation of an intersection algorithm for polygons.<p>The experience of working with AI agents on this project changed a lot with recent model releases, as I describe in the readme. Opus 4.8 is able to provide algorithm implementation with formal proof in one shot, whereas previous models required me to provide proof strategies in multiple steps.<p>Trust in the correctness comes entirely from the Lean checker and human review of a small specification, not from the LLM.<p>Also check out the web demo built around the verified core linked in the readme: <a href="https://schildep.github.io/verified-polygon-intersection/" rel="nofollow">https://schildep.github.io/verified-polygon-intersection/</a>. It supports multipolygons including holes, self intersections, and overlapping edges.

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

To my knowledge, this is the first formally verified implementation of an intersection algorithm for polygons.<p>The experience of working with AI agents on this project changed a lot with recent model releases, as I describe in the readme. Opus 4.8 is able to provide algorithm implementation with formal proof in one shot, whereas previous models required me to provide proof strategies in multiple steps.<p>Trust in the correctness comes entirely from the Lean checker and human review of a small specification, not from the LLM.<p>Also check out the web demo built around the verified core linked in the readme: <a href="https://schildep.github.io/verified-polygon-intersection/" rel="nofollow">https://schildep.github.io/verified-polygon-intersection/</a>. It supports multipolygons including holes, self intersections, and overlapping edges.

Show HN: Prela – Purely Algebraic Relation Combinators

Prela is an embedded query language based on Tarski's Algebra of Relations. Its queries are concise, clear, and fast. It is implemented by shallow embedding in a host programming language: Prela operators are regular functions in the host. The implementation follows continuation-passing style which compiles to efficient columnar execution.

Show HN: Prela – Purely Algebraic Relation Combinators

Prela is an embedded query language based on Tarski's Algebra of Relations. Its queries are concise, clear, and fast. It is implemented by shallow embedding in a host programming language: Prela operators are regular functions in the host. The implementation follows continuation-passing style which compiles to efficient columnar execution.

Show HN: Prela – Purely Algebraic Relation Combinators

Prela is an embedded query language based on Tarski's Algebra of Relations. Its queries are concise, clear, and fast. It is implemented by shallow embedding in a host programming language: Prela operators are regular functions in the host. The implementation follows continuation-passing style which compiles to efficient columnar execution.

Show HN: FFmpeg WebCLI – Full FFmpeg in Browser, Offline PWA, No Uploads(WASM)

Built a browser-based FFmpeg editor that runs entirely client-side via WebAssembly. Your files never leave your device -- all processing happens in a Web Worker. Works offline as an installable PWA after first load.

Show HN: FFmpeg WebCLI – Full FFmpeg in Browser, Offline PWA, No Uploads(WASM)

Built a browser-based FFmpeg editor that runs entirely client-side via WebAssembly. Your files never leave your device -- all processing happens in a Web Worker. Works offline as an installable PWA after first load.

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer.<p>We’re two engineers who previously built Gem (co-founder/CTO and first hire), and we spent the last year coding almost exclusively using Codex and Claude Code. It’s been a huge change to how we code, and it’s been exhilarating seeing the models keep getting better – but we eventually realized that developing on localhost was holding us back:<p>- Git worktrees are clunky to set up and use for parallelizing work - It’s 2026, but somehow everyone is still walking around with laptops cracked open or SSHing into mac minis in their garage so their agents don’t stop working. - Mobile is treated like an afterthought even though coding is just texting now We started hitting resource constraints when multiple parallel agents test their own work by running the full app locally. - We tried different products, but couldn’t find any that solved all of our pain points – so we pivoted and decided to just build the ADE we wanted for ourselves.<p>Boxes.dev is a desktop and mobile app that lets you run Claude Code, Codex (using your subscription!), and the full dev environment for whatever you’re building, all on remote compute. It’s similar to Conductor or the Codex desktop app, except everything is in the cloud.<p>We use coding agents to scan your local dev setup and port it to the cloud. Then every Claude Code/Codex thread starts from a snapshot of the full setup, with its own filesystem and compute. No more git worktrees, no more cracked-open laptops, and your coding agents can actually test their work end-to-end because they can run your full app in isolation.<p>We’ve mirrored the Claude Code and Codex UX to feel natural to power users, and also have a fully-featured mobile app (no handoffs or remote control), plus scheduled automations and a Slack integration.<p>We’re obviously biased, but we’ve been building boxes.dev with boxes.dev for months and it’s honestly been a gamechanger. It’s hard to go back once you realize how much localhost has been limiting you; based on early feedback from beta testers, we’re increasingly sure that cloud is the future of agentic coding.<p>We’d love for you to experience it yourselves! Would appreciate any feedback – and happy to answer any questions on this thread.

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer.<p>We’re two engineers who previously built Gem (co-founder/CTO and first hire), and we spent the last year coding almost exclusively using Codex and Claude Code. It’s been a huge change to how we code, and it’s been exhilarating seeing the models keep getting better – but we eventually realized that developing on localhost was holding us back:<p>- Git worktrees are clunky to set up and use for parallelizing work - It’s 2026, but somehow everyone is still walking around with laptops cracked open or SSHing into mac minis in their garage so their agents don’t stop working. - Mobile is treated like an afterthought even though coding is just texting now We started hitting resource constraints when multiple parallel agents test their own work by running the full app locally. - We tried different products, but couldn’t find any that solved all of our pain points – so we pivoted and decided to just build the ADE we wanted for ourselves.<p>Boxes.dev is a desktop and mobile app that lets you run Claude Code, Codex (using your subscription!), and the full dev environment for whatever you’re building, all on remote compute. It’s similar to Conductor or the Codex desktop app, except everything is in the cloud.<p>We use coding agents to scan your local dev setup and port it to the cloud. Then every Claude Code/Codex thread starts from a snapshot of the full setup, with its own filesystem and compute. No more git worktrees, no more cracked-open laptops, and your coding agents can actually test their work end-to-end because they can run your full app in isolation.<p>We’ve mirrored the Claude Code and Codex UX to feel natural to power users, and also have a fully-featured mobile app (no handoffs or remote control), plus scheduled automations and a Slack integration.<p>We’re obviously biased, but we’ve been building boxes.dev with boxes.dev for months and it’s honestly been a gamechanger. It’s hard to go back once you realize how much localhost has been limiting you; based on early feedback from beta testers, we’re increasingly sure that cloud is the future of agentic coding.<p>We’d love for you to experience it yourselves! Would appreciate any feedback – and happy to answer any questions on this thread.

Show HN: Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites

You can get a 2h free trial by solving a proof-of-work captcha when topping up your account for the first time.<p>If you'd like to learn more, an independent interview was posted a couple of weeks ago [1], and the FAQ [2] has a lot of information as well.<p>For the source code sharing, we've talked with lawyers and are inclined to no longer require the NDA/NCC for privacy reasons shared with us before (signing requires identification), but instead use a source-available permissive license that doesn't allow competition, like PolyForm Shield [3] (we do still have about 6 months before finalising a decision, here).<p>This does come with a lot more risks for us (it's harder to track down if someone publishes the code or uses it against the license), but given we've already passed 100 monthly active accounts, we're feeling more confident it's an acceptable risk.<p>The plan is to give logged in accounts (who are 12 months old or more) a way to download a ZIP of the current code base that's in the server.<p>Obviously there's no easy way to prove that's the case, but we're open to ideas/suggestions if someone here has them.<p>[1]: <a href="https://theprivacydad.com/interview-with-the-engineer-of-uruky-a-private-search-engine/" rel="nofollow">https://theprivacydad.com/interview-with-the-engineer-of-uru...</a><p>[2]: <a href="https://uruky.com/faq" rel="nofollow">https://uruky.com/faq</a><p>[3]: <a href="https://polyformproject.org/licenses/shield/1.0.0" rel="nofollow">https://polyformproject.org/licenses/shield/1.0.0</a>

1 2 3 ... 991 992 993 >