The best Hacker News stories from Show from the past day
Latest posts:
Show HN: A GitHub Action that quizzes you on a pull request
A little idea I got from playing with AI SWE Agents. Can AI help make sure we understand the code that our AIs write?<p>PR Quiz uses AI to generate a quiz from a pull request and blocks you from merging until the quiz is passed. You can configure various options like the LLM model to use, max number of attempts to pass the quiz or min diff size to generate a quiz for. I found that the reasoning models, while more expensive, generated better questions from my limited testing.<p>Privacy: This GitHub Action runs a local webserver and uses ngrok to serve the quiz through a temporary url. Your code is only sent to the model provider (OpenAI).
Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL
After training calculator agent via RL, I really wanted to go bigger! So I built RL infrastructure for training long-horizon terminal/coding agents that scales from 2x A100s to 32x H100s (~$1M worth of compute!) Without any training, my 32B agent hit #19 on Terminal-Bench leaderboard, beating Stanford's Terminus-Qwen3-235B-A22! With training... well, too expensive, but I bet the results would be good!<p>*What I did*:<p>- Created a Claude Code-inspired agent (system msg + tools)<p>- Built Docker-isolated GRPO training where each rollout gets its own container<p>- Developed a multi-agent synthetic data pipeline to generate & validate training data with Opus-4<p>- Implemented a hybrid reward signal of unit test verifiers & a behavioural LLM judge.<p>*Key results*:<p>- My untrained Qwen3-32B agent achieved 13.75% on Terminal-Bench (#19, beats Stanford's Qwen3-235B MoE)<p>- I tested training to work stably on 32x H100s distributed across 4 bare metal nodes<p>- I created a mini-eval framework for LLM-judge performance. Sonnet-4 won.<p>- ~£30-50k needed for full training run of 1000 epochs (I could only afford testing )<p>*Technical details*:<p>- The synthetic dataset ranges from easy to extremely hard tasks. An example hard task's prompt:<p>"I found this mystery program at `/app/program` and I'm completely stumped. It's a stripped binary, so I have no idea what it does or how to run it properly. The program seems to expect some specific input and then produces an output, but I can't figure out what kind of input it needs. Could you help me figure out what this program requires?"<p>- Simple config presets allow training to run on multiple hardware setups with minimal effort.<p>- GRPO used with 16 rollouts per task, up to 32k tokens per rollout.<p>- Agent uses XML/YAML format to structure tool calls<p>*More details*:<p>My Github repos open source it all (agent, data, code) and has way more technical details if you are interested!:<p>- Terminal Agent RL repo<p>- Multi-agent synthetic data pipeline repo<p>I thought I would share this because I believe long-horizon RL is going to change everybody's lives, and so I feel it is important (and super fun!) for us all to share knowledge around this area, and also have enjoy exploring what is possible.<p>Thanks for reading!<p>Dan<p>(Built using rLLM RL framework which was brilliant to work with, and evaluated and inspired by the great Terminal Bench benchmark)
Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL
After training calculator agent via RL, I really wanted to go bigger! So I built RL infrastructure for training long-horizon terminal/coding agents that scales from 2x A100s to 32x H100s (~$1M worth of compute!) Without any training, my 32B agent hit #19 on Terminal-Bench leaderboard, beating Stanford's Terminus-Qwen3-235B-A22! With training... well, too expensive, but I bet the results would be good!<p>*What I did*:<p>- Created a Claude Code-inspired agent (system msg + tools)<p>- Built Docker-isolated GRPO training where each rollout gets its own container<p>- Developed a multi-agent synthetic data pipeline to generate & validate training data with Opus-4<p>- Implemented a hybrid reward signal of unit test verifiers & a behavioural LLM judge.<p>*Key results*:<p>- My untrained Qwen3-32B agent achieved 13.75% on Terminal-Bench (#19, beats Stanford's Qwen3-235B MoE)<p>- I tested training to work stably on 32x H100s distributed across 4 bare metal nodes<p>- I created a mini-eval framework for LLM-judge performance. Sonnet-4 won.<p>- ~£30-50k needed for full training run of 1000 epochs (I could only afford testing )<p>*Technical details*:<p>- The synthetic dataset ranges from easy to extremely hard tasks. An example hard task's prompt:<p>"I found this mystery program at `/app/program` and I'm completely stumped. It's a stripped binary, so I have no idea what it does or how to run it properly. The program seems to expect some specific input and then produces an output, but I can't figure out what kind of input it needs. Could you help me figure out what this program requires?"<p>- Simple config presets allow training to run on multiple hardware setups with minimal effort.<p>- GRPO used with 16 rollouts per task, up to 32k tokens per rollout.<p>- Agent uses XML/YAML format to structure tool calls<p>*More details*:<p>My Github repos open source it all (agent, data, code) and has way more technical details if you are interested!:<p>- Terminal Agent RL repo<p>- Multi-agent synthetic data pipeline repo<p>I thought I would share this because I believe long-horizon RL is going to change everybody's lives, and so I feel it is important (and super fun!) for us all to share knowledge around this area, and also have enjoy exploring what is possible.<p>Thanks for reading!<p>Dan<p>(Built using rLLM RL framework which was brilliant to work with, and evaluated and inspired by the great Terminal Bench benchmark)
Show HN: I built an AI that turns any book into a text adventure game
It's a web app that uses AI to turn any book into a playable text adventure. Your favorite book, but your choices, hence your story. You can even "remix" the genre like playing Dune as a noir detective story.<p>Note: Work in progress. Suggestions are welcome.
Show HN: I built an AI that turns any book into a text adventure game
It's a web app that uses AI to turn any book into a playable text adventure. Your favorite book, but your choices, hence your story. You can even "remix" the genre like playing Dune as a noir detective story.<p>Note: Work in progress. Suggestions are welcome.
Show HN: Open-source physical rack-mounted GUI for home lab
I have realized that a lot of people nowadays self-host services and set up home labs with mini racks.<p>One major pain point I have come across personally is to quickly get health status from self-hosted services and machines, and have the ability to headlessly control my Raspberry Pi inside a mini rack.<p>So It got me thinking about building a built-in GUI that users can easily add to their Raspberry Pi nodes in their (mini or full) racks or elsewhere.<p>I have previously designed this GUI for an open source project I have been working on (called Ubo pod: github.com/ubopod) and decided to detach/decouple the GUI into its own standalone module for this use case.<p>The GUI allows headless control of your Raspberry Pi, monitoring of system resources, and application status.<p>I am designing a new PCB and enclosure as part of this re-design to allow for a new form factor that mounts on server racks.<p>I am recording my journey of re-designing this and I would love to get early feedback from users to better understand what they may need or require from such a solution, specially on the hardware side.<p>The software behind the GUI is quite mature (<a href="https://github.com/ubopod/ubo_app">https://github.com/ubopod/ubo_app</a>) and you can actually try it right now without the hardware inside the web browser as shown in the video:<p><a href="https://www.youtube.com/watch?v=9Ob_HDO66_8" rel="nofollow">https://www.youtube.com/watch?v=9Ob_HDO66_8</a><p>All PCB designs are available here:<p><a href="https://github.com/ubopod/ubo-pcb">https://github.com/ubopod/ubo-pcb</a>
Show HN: I built a free tool to find valuable expired domains using AI
Hi HN,<p>I’ve been collecting and analyzing expired domains for years — especially those about to drop. Every day, tens of thousands expire. Most are junk, but a few still have traffic, backlinks, SEO value, or just great names. Finding them used to take hours.<p>Last week I put my internal tools online:
<a href="https://pendingdelete.domains" rel="nofollow">https://pendingdelete.domains</a>
No login, no paywall
Updated daily
Combines domain history, traffic, SEO data and AI-driven insights to identify valuable expirations
The goal: help spot valuable domains quickly and skip the noise.<p>Still a work-in-progress — would love feedback:
Is this useful?
What signals or filters would you add?
Any UI or speed improvements?<p>Thanks!
Show HN: Allzonefiles.io – get lists of all registered domains in the Internet
This site provides lists with 305M of domain names across 1570 domain zones in the entire Internet. You can download these lists from the website or via API. Domain lists for majority of zones are updated daily.
Show HN: I made a tool to generate photomosaics with your pictures
Hi HN!<p>I wanted to make some photomosaics for an anniversary gift, but I ended up building this tool and turning it into a website that anyone can use.<p>For those who don’t know, a photomosaic is an image made up of many smaller tile images, arranged in a way that forms a larger, recognisable picture.<p>The best part? Everything runs directly in your browser. No files are uploaded, and there’s no sign-up required.
Show HN: I made a tool to generate photomosaics with your pictures
Hi HN!<p>I wanted to make some photomosaics for an anniversary gift, but I ended up building this tool and turning it into a website that anyone can use.<p>For those who don’t know, a photomosaic is an image made up of many smaller tile images, arranged in a way that forms a larger, recognisable picture.<p>The best part? Everything runs directly in your browser. No files are uploaded, and there’s no sign-up required.
Show HN: Companies use AI to take your calls. I built AI to make them for you
We're living in this weird asymmetry where companies use AI to talk to us, but we're still manually dialing them. Companies everywhere are adopting AI voice agents lately. Big retail, family dentist clinics, local pharmacy. This year, I've been in a few calls where it's super natural sounding AI, which has been pretty cool to experience. But then it got me thinking - why are we, the consumers, still the ones making calls if they're using robots for theirs?<p>So I built Piper: basically AI that makes phone calls for you. You tell it what you need (book appointment, check on an order, dispute some charge, whatever), and it handles the entire conversation while you do actual work. Right now it's a web app, Chrome extension is pending approval but soon you'll be able to click any phone number anywhere and just let Piper handle it.<p>Technical stuff that was harder than expected:<p>Latency - every millisecond counts in conversation, had to optimize around kv cache, got it down to ~1000ms to first word over PSTN for telephony, which feels pretty natural<p>Keeping the voice agents on track - built custom context engineering logic that constantly updates the agent's situational awareness, so it knows when it's been transferred, when it's on hold, etc<p>Done ~50 successful calls with early testers so far. Main failures are when they need complex verification or documents. Also had to take down our IVR navigation temporarily :/, found some edge cases that were causing unnecessary transfers but working on fixing that.<p>I really think we're heading toward this world where AI talks to AI for most routine things, and phone calls might be the first real example of this happening at scale!<p>you can check out the a voice demo on our website. <a href="https://pipervoice.com" rel="nofollow">https://pipervoice.com</a>
Show HN: Companies use AI to take your calls. I built AI to make them for you
We're living in this weird asymmetry where companies use AI to talk to us, but we're still manually dialing them. Companies everywhere are adopting AI voice agents lately. Big retail, family dentist clinics, local pharmacy. This year, I've been in a few calls where it's super natural sounding AI, which has been pretty cool to experience. But then it got me thinking - why are we, the consumers, still the ones making calls if they're using robots for theirs?<p>So I built Piper: basically AI that makes phone calls for you. You tell it what you need (book appointment, check on an order, dispute some charge, whatever), and it handles the entire conversation while you do actual work. Right now it's a web app, Chrome extension is pending approval but soon you'll be able to click any phone number anywhere and just let Piper handle it.<p>Technical stuff that was harder than expected:<p>Latency - every millisecond counts in conversation, had to optimize around kv cache, got it down to ~1000ms to first word over PSTN for telephony, which feels pretty natural<p>Keeping the voice agents on track - built custom context engineering logic that constantly updates the agent's situational awareness, so it knows when it's been transferred, when it's on hold, etc<p>Done ~50 successful calls with early testers so far. Main failures are when they need complex verification or documents. Also had to take down our IVR navigation temporarily :/, found some edge cases that were causing unnecessary transfers but working on fixing that.<p>I really think we're heading toward this world where AI talks to AI for most routine things, and phone calls might be the first real example of this happening at scale!<p>you can check out the a voice demo on our website. <a href="https://pipervoice.com" rel="nofollow">https://pipervoice.com</a>
Show HN: Companies use AI to take your calls. I built AI to make them for you
We're living in this weird asymmetry where companies use AI to talk to us, but we're still manually dialing them. Companies everywhere are adopting AI voice agents lately. Big retail, family dentist clinics, local pharmacy. This year, I've been in a few calls where it's super natural sounding AI, which has been pretty cool to experience. But then it got me thinking - why are we, the consumers, still the ones making calls if they're using robots for theirs?<p>So I built Piper: basically AI that makes phone calls for you. You tell it what you need (book appointment, check on an order, dispute some charge, whatever), and it handles the entire conversation while you do actual work. Right now it's a web app, Chrome extension is pending approval but soon you'll be able to click any phone number anywhere and just let Piper handle it.<p>Technical stuff that was harder than expected:<p>Latency - every millisecond counts in conversation, had to optimize around kv cache, got it down to ~1000ms to first word over PSTN for telephony, which feels pretty natural<p>Keeping the voice agents on track - built custom context engineering logic that constantly updates the agent's situational awareness, so it knows when it's been transferred, when it's on hold, etc<p>Done ~50 successful calls with early testers so far. Main failures are when they need complex verification or documents. Also had to take down our IVR navigation temporarily :/, found some edge cases that were causing unnecessary transfers but working on fixing that.<p>I really think we're heading toward this world where AI talks to AI for most routine things, and phone calls might be the first real example of this happening at scale!<p>you can check out the a voice demo on our website. <a href="https://pipervoice.com" rel="nofollow">https://pipervoice.com</a>
Show HN: Use Their ID – Use your local UK MP’s ID for the Online Safety Act
Hi HN -
I made a site that takes a UK postcode, grabs the local MP's information and generates an AI mockup of what their ID might look like.<p>It's a small, silly protest at the stupidity of the Online Safety Act that just came into force.<p>edit - My open AI credits got hugged to death, please use a known postcode (like one from Kier Starmer's constituency, WC2B6NH) in the meantime.
Show HN: Use Their ID – Use your local UK MP’s ID for the Online Safety Act
Hi HN -
I made a site that takes a UK postcode, grabs the local MP's information and generates an AI mockup of what their ID might look like.<p>It's a small, silly protest at the stupidity of the Online Safety Act that just came into force.<p>edit - My open AI credits got hugged to death, please use a known postcode (like one from Kier Starmer's constituency, WC2B6NH) in the meantime.
Show HN: Use Their ID – Use your local UK MP’s ID for the Online Safety Act
Hi HN -
I made a site that takes a UK postcode, grabs the local MP's information and generates an AI mockup of what their ID might look like.<p>It's a small, silly protest at the stupidity of the Online Safety Act that just came into force.<p>edit - My open AI credits got hugged to death, please use a known postcode (like one from Kier Starmer's constituency, WC2B6NH) in the meantime.
Show HN: Competitor Finder – Paste your domain, get your top competitors
I built a simple tool to help founders figure out who their actual competitors are! You know... the ones your potential customers already know and compare you to.<p>Just paste your domain, and we generate a focused list of 10 competitors with names, sites, and a quick positioning note for each.<p>Why I built it: I run a competitor monitoring tool (<a href="https://champsignal.com" rel="nofollow">https://champsignal.com</a>), and I realized that before people can monitor competitors… they first need to <i>find</i> them. This is harder than you think for people that have not been around for years aha<p>(It's free and doesn't require signup)<p>Would love feedback, especially if you don't think it's giving you the right competitors. Happy to improve the product.
Show HN: Flyde 1.0 – Like n8n, but in your codebase
Hi HN!<p>I'm excited to share Flyde 1.0. A big update to the open-source visual programming tool I launched here in March of last year (<a href="https://news.ycombinator.com/item?id=39628285">https://news.ycombinator.com/item?id=39628285</a>).<p>Since Flyde’s launch, there's been a huge rise in demand for visual builders, especially for AI-heavy workflows. Visual-programming shines with async and concurrency-heavy logic, which describes most LLM chains perfectly.<p>A few months ago, I tried to capitalize on this trend by launching a commercial version of Flyde called Flowcode (<a href="https://news.ycombinator.com/item?id=43830193">https://news.ycombinator.com/item?id=43830193</a>). It didn't go well. I learned the hard way that Flyde’s strength wasn't just about flexibility or performance compared to tools like n8n.
The real value was always how Flyde fits inside your <i>existing codebase</i>. The launch also helped me understand that there's still a big gap: no tool really covers the full lifecycle, from rapid prototyping to deep integration, evaluation, and iteration inside your own projects.<p>So, over the last few months, I worked hard to polish Flyde:
- Cleaned up and simplified the nodes API
- Made it possible to fork any node for maximum flexibility
- Launched a new online playground for quick experimenting and sharing (<a href="https://www.flyde.dev/playground" rel="nofollow">https://www.flyde.dev/playground</a>)
- Created a new CLI tool to speed up development and setup
- Fixed a ton of bugs
- Simplified the UI/UX to make it smoother and less confusing<p>There’s still a lot of missing stuff. Better templates, docs, and nodes, but I think it’s finally stable and useful enough to give it another shot.<p>My plan is to first make sure that Flyde is usable and valuable as an OS project, and then try to provide additional value via “Flyde Studio” - a SaaS that will help non-engineers iterate on Flyde flows from a web-app. Changes become a PR in the host repo.<p>I'd really love some honest feedback and hear whether Flyde resonates with an existing pain/problem.<p>Check it out here:
Playground: <a href="https://www.flyde.dev/playground" rel="nofollow">https://www.flyde.dev/playground</a><p>GitHub: <a href="https://github.com/flydelabs/flyde">https://github.com/flydelabs/flyde</a><p>Looking forward to hearing your thoughts!
- Gabriel
Show HN: Flyde 1.0 – Like n8n, but in your codebase
Hi HN!<p>I'm excited to share Flyde 1.0. A big update to the open-source visual programming tool I launched here in March of last year (<a href="https://news.ycombinator.com/item?id=39628285">https://news.ycombinator.com/item?id=39628285</a>).<p>Since Flyde’s launch, there's been a huge rise in demand for visual builders, especially for AI-heavy workflows. Visual-programming shines with async and concurrency-heavy logic, which describes most LLM chains perfectly.<p>A few months ago, I tried to capitalize on this trend by launching a commercial version of Flyde called Flowcode (<a href="https://news.ycombinator.com/item?id=43830193">https://news.ycombinator.com/item?id=43830193</a>). It didn't go well. I learned the hard way that Flyde’s strength wasn't just about flexibility or performance compared to tools like n8n.
The real value was always how Flyde fits inside your <i>existing codebase</i>. The launch also helped me understand that there's still a big gap: no tool really covers the full lifecycle, from rapid prototyping to deep integration, evaluation, and iteration inside your own projects.<p>So, over the last few months, I worked hard to polish Flyde:
- Cleaned up and simplified the nodes API
- Made it possible to fork any node for maximum flexibility
- Launched a new online playground for quick experimenting and sharing (<a href="https://www.flyde.dev/playground" rel="nofollow">https://www.flyde.dev/playground</a>)
- Created a new CLI tool to speed up development and setup
- Fixed a ton of bugs
- Simplified the UI/UX to make it smoother and less confusing<p>There’s still a lot of missing stuff. Better templates, docs, and nodes, but I think it’s finally stable and useful enough to give it another shot.<p>My plan is to first make sure that Flyde is usable and valuable as an OS project, and then try to provide additional value via “Flyde Studio” - a SaaS that will help non-engineers iterate on Flyde flows from a web-app. Changes become a PR in the host repo.<p>I'd really love some honest feedback and hear whether Flyde resonates with an existing pain/problem.<p>Check it out here:
Playground: <a href="https://www.flyde.dev/playground" rel="nofollow">https://www.flyde.dev/playground</a><p>GitHub: <a href="https://github.com/flydelabs/flyde">https://github.com/flydelabs/flyde</a><p>Looking forward to hearing your thoughts!
- Gabriel
Show HN: QuickTunes: Apple Music player for Mac with iPod vibes
The slow and bloated nature of the Mac Apple Music app inspired us to create QuickTunes. It is a simple, fast, and native Apple Music player inspired by the simplicity of the iPod. You can use keyboard shortcuts to navigate a simple multi column layout, pick something, and press Play.