The best Hacker News stories from Show from the past week
Latest posts:
Show HN: ProofShot – Give AI coding agents eyes to verify the UI they build
I use AI agents to build UI features daily. The thing that kept annoying me: the agent writes code but never sees what it actually looks like in the browser. It can’t tell if the layout is broken or if the console is throwing errors.<p>So I built a CLI that lets the agent open a browser, interact with the page, record what happens, and collect any errors. Then it bundles everything — video, screenshots, logs — into a self-contained HTML file I can review in seconds.<p><pre><code> proofshot start --run "npm run dev" --port 3000
# agent navigates, clicks, takes screenshots
proofshot stop
</code></pre>
It works with whatever agent you use (Claude Code, Cursor, Codex, etc.) — it’s just shell commands. It's packaged as a skill so your AI coding agent knows exactly how it works. It's built on agent-browser from Vercel Labs which is far better and faster than Playwright MCP.<p>It’s not a testing framework. The agent doesn’t decide pass/fail. It just gives me the evidence so I don’t have to open the browser myself every time.<p>Open source and completely free.<p>Website: <a href="https://proofshot.argil.io/" rel="nofollow">https://proofshot.argil.io/</a>
Show HN: I took back Video.js after 16 years and we rewrote it to be 88% smaller
What do you do when private equity buys your old company and fires the maintainers of the popular open source project you started over a decade ago? You reboot it, and bring along some new friends to do it.<p>Video.js is used by billions of people every month, on sites like Amazon.com, Linkedin, and Dropbox, and yet it wasn’t in great shape. A skeleton crew of maintainers were doing their best with a dated architecture, but it needed more. So Sam from Plyr, Rahim from Vidstack, and Wes and Christain from Media Chrome jumped in to help me rebuild it better, faster, and smaller.<p>It’s in beta now. Please give it a try and tell us what breaks.
Show HN: I took back Video.js after 16 years and we rewrote it to be 88% smaller
What do you do when private equity buys your old company and fires the maintainers of the popular open source project you started over a decade ago? You reboot it, and bring along some new friends to do it.<p>Video.js is used by billions of people every month, on sites like Amazon.com, Linkedin, and Dropbox, and yet it wasn’t in great shape. A skeleton crew of maintainers were doing their best with a dated architecture, but it needed more. So Sam from Plyr, Rahim from Vidstack, and Wes and Christain from Media Chrome jumped in to help me rebuild it better, faster, and smaller.<p>It’s in beta now. Please give it a try and tell us what breaks.
Show HN: Email.md – Markdown to responsive, email-safe HTML
Show HN: Email.md – Markdown to responsive, email-safe HTML
Show HN: Gemini can now natively embed video, so I built sub-second video search
Gemini Embedding 2 can project raw video directly into a 768-dimensional vector space alongside text. No transcription, no frame captioning, no intermediate text. A query like "green car cutting me off" is directly comparable to a 30-second video clip at the vector level.<p>I used this to build a CLI that indexes hours of footage into ChromaDB, then searches it with natural language and auto-trims the matching clip. Demo video on the GitHub README.
Indexing costs ~$2.50/hr of footage. Still-frame detection skips idle chunks, so security camera / sentry mode footage is much cheaper.
Show HN: Gemini can now natively embed video, so I built sub-second video search
Gemini Embedding 2 can project raw video directly into a 768-dimensional vector space alongside text. No transcription, no frame captioning, no intermediate text. A query like "green car cutting me off" is directly comparable to a 30-second video clip at the vector level.<p>I used this to build a CLI that indexes hours of footage into ChromaDB, then searches it with natural language and auto-trims the matching clip. Demo video on the GitHub README.
Indexing costs ~$2.50/hr of footage. Still-frame detection skips idle chunks, so security camera / sentry mode footage is much cheaper.
Show HN: Cq – Stack Overflow for AI coding agents
Hi all, I'm Peter at Staff Engineer and Mozilla.ai and I want to share our idea for a standard for shared agent learning, conceptually it seemed to fit easily in my mental model as a Stack Overflow for agents.<p>The project is trying to see if we can get agents (any agent, any model) to propose 'knowledge units' (KUs) as a standard schema based on gotchas it runs into during use, and proactively query for existing KUs in order to get insights which it can verify and confirm if they prove useful.<p>It's currently very much a PoC with a more lofty proposal in the repo, we're trying to iterate from local use, up to team level, and ideally eventually have some kind of public commons.<p>At the team level (see our Docker compose example) and your coding agent configured to point to the API address for the team to send KUs there instead - where they can be reviewed by a human in the loop (HITL) via a UI in the browser, before they're allowed to appear in queries by other agents in your team.<p>We're learning a lot even from using it locally on various repos internally, not just in the kind of KUs it generates, but also from a UX perspective on trying to make it easy to get using it and approving KUs in the browser dashboard. There are bigger, complex problems to solve in the future around data privacy, governance etc. but for now we're super focussed on getting something that people can see some value from really quickly in their day-to-day.<p>Tech stack:<p>* Skills - markdown<p>* Local Python MCP server (FastMCP) - managing a local SQLite knowledge store<p>* Optional team API (FastAPI, Docker) for sharing knowledge across an org<p>* Installs as a Claude Code plugin or OpenCode MCP server<p>* Local-first by default; your knowledge stays on your machine unless you opt into team sync by setting the address in config<p>* OSS (Apache 2.0 licensed)<p>Here's an example of something which seemed straight forward, when asking Claude Code to write a GitHub action it often used actions that were multiple major versions out of date because of its training data. In this case I told the agent what I saw when I reviewed the GitHub action YAML file it created and it proposed the knowledge unit to be persisted. Next time in a completely different repo using OpenCode and an OpenAI model, the cq skill was used up front before it started the task and it got the information about the gotcha on major versions in training data and checked GitHub proactively, using the correct, latest major versions. It then confirmed the KU, increasing the confidence score.<p>I guess some folks might say: well there's a CLAUDE.md in your repo, or in ~/.claude/ but we're looking further than that, we want this to be available to all agents, to all models, and maybe more importantly we don't want to stuff AGENTS.md or CLAUDE.md with loads of rules that lead to unpredictable behaviour, this is targetted information on a particular task and seems a lot more useful.<p>Right now it can be installed locally as a plugin for Claude Code and OpenCode:<p>claude plugin marketplace add mozilla-ai/cq
claude plugin install cq<p>This allows you to capture data in your local ~/.cq/local.db (the data doesn't get sent anywhere else).<p>We'd love feedback on this, the repo is open and public - so GitHub issues are welcome. We've posted on some of our social media platforms with a link to the blog post (below) so feel free to reply to us if you found it useful, or ran into friction, we want to make this something that's accessible to everyone.<p>Blog post with the full story: <a href="https://blog.mozilla.ai/cq-stack-overflow-for-agents/" rel="nofollow">https://blog.mozilla.ai/cq-stack-overflow-for-agents/</a>
GitHub repo: <a href="https://github.com/mozilla-ai/cq" rel="nofollow">https://github.com/mozilla-ai/cq</a><p>Thanks again for your time.
Show HN: Cq – Stack Overflow for AI coding agents
Hi all, I'm Peter at Staff Engineer and Mozilla.ai and I want to share our idea for a standard for shared agent learning, conceptually it seemed to fit easily in my mental model as a Stack Overflow for agents.<p>The project is trying to see if we can get agents (any agent, any model) to propose 'knowledge units' (KUs) as a standard schema based on gotchas it runs into during use, and proactively query for existing KUs in order to get insights which it can verify and confirm if they prove useful.<p>It's currently very much a PoC with a more lofty proposal in the repo, we're trying to iterate from local use, up to team level, and ideally eventually have some kind of public commons.<p>At the team level (see our Docker compose example) and your coding agent configured to point to the API address for the team to send KUs there instead - where they can be reviewed by a human in the loop (HITL) via a UI in the browser, before they're allowed to appear in queries by other agents in your team.<p>We're learning a lot even from using it locally on various repos internally, not just in the kind of KUs it generates, but also from a UX perspective on trying to make it easy to get using it and approving KUs in the browser dashboard. There are bigger, complex problems to solve in the future around data privacy, governance etc. but for now we're super focussed on getting something that people can see some value from really quickly in their day-to-day.<p>Tech stack:<p>* Skills - markdown<p>* Local Python MCP server (FastMCP) - managing a local SQLite knowledge store<p>* Optional team API (FastAPI, Docker) for sharing knowledge across an org<p>* Installs as a Claude Code plugin or OpenCode MCP server<p>* Local-first by default; your knowledge stays on your machine unless you opt into team sync by setting the address in config<p>* OSS (Apache 2.0 licensed)<p>Here's an example of something which seemed straight forward, when asking Claude Code to write a GitHub action it often used actions that were multiple major versions out of date because of its training data. In this case I told the agent what I saw when I reviewed the GitHub action YAML file it created and it proposed the knowledge unit to be persisted. Next time in a completely different repo using OpenCode and an OpenAI model, the cq skill was used up front before it started the task and it got the information about the gotcha on major versions in training data and checked GitHub proactively, using the correct, latest major versions. It then confirmed the KU, increasing the confidence score.<p>I guess some folks might say: well there's a CLAUDE.md in your repo, or in ~/.claude/ but we're looking further than that, we want this to be available to all agents, to all models, and maybe more importantly we don't want to stuff AGENTS.md or CLAUDE.md with loads of rules that lead to unpredictable behaviour, this is targetted information on a particular task and seems a lot more useful.<p>Right now it can be installed locally as a plugin for Claude Code and OpenCode:<p>claude plugin marketplace add mozilla-ai/cq
claude plugin install cq<p>This allows you to capture data in your local ~/.cq/local.db (the data doesn't get sent anywhere else).<p>We'd love feedback on this, the repo is open and public - so GitHub issues are welcome. We've posted on some of our social media platforms with a link to the blog post (below) so feel free to reply to us if you found it useful, or ran into friction, we want to make this something that's accessible to everyone.<p>Blog post with the full story: <a href="https://blog.mozilla.ai/cq-stack-overflow-for-agents/" rel="nofollow">https://blog.mozilla.ai/cq-stack-overflow-for-agents/</a>
GitHub repo: <a href="https://github.com/mozilla-ai/cq" rel="nofollow">https://github.com/mozilla-ai/cq</a><p>Thanks again for your time.
Show HN: We built a terminal-only Bluesky / AT Proto client written in Fortran
Yes, that Fortran.
Show HN: Han – A Korean programming language written in Rust
A few weeks ago I saw a post about someone converting an entire C++ codebase to Rust using AI in under two weeks.<p>That inspired me — if AI can rewrite a whole language stack that fast, I wanted to try building a programming language from scratch with AI assistance.<p>I've also been noticing growing global interest in Korean language and culture, and I wondered: what would a programming language look like if every keyword was in Hangul (the Korean writing system)?<p>Han is the result. It's a statically-typed language written in Rust with a full compiler pipeline (lexer → parser → AST → interpreter + LLVM IR codegen).<p>It supports arrays, structs with impl blocks, closures, pattern matching, try/catch, file I/O, module imports, a REPL, and a basic LSP server.<p>This is a side project, not a "you should use this instead of Python" pitch.
Feedback on language design, compiler architecture, or the Korean keyword choices is very welcome.<p><a href="https://github.com/xodn348/han" rel="nofollow">https://github.com/xodn348/han</a>
Show HN: Channel Surfer – Watch YouTube like it’s cable TV
I know, it's a very first-world problem. But in my house, we have a hard time deciding what to watch. Too many options!<p>So I made this to recreate Cable TV for YouTube. I made it so it runs in the browser. Quickly import your subscriptions in the browser via a bookmarklet. No accounts, no sign-ins. Just quickly import your data locally.
Show HN: Axe – A 12MB binary that replaces your AI framework
I built Axe because I got tired of every AI tool trying to be a chatbot.<p>Most frameworks want a long-lived session with a massive context window doing everything at once. That's expensive, slow, and fragile. Good software is small, focused, and composable... AI agents should be too.<p>Axe treats LLM agents like Unix programs. Each agent is a TOML config with a focused job. Such as code reviewer, log analyzer, commit message writer. You can run them from the CLI, pipe data in, get results out. You can use pipes to chain them together. Or trigger from cron, git hooks, CI.<p>What Axe is:<p>- 12MB binary, two dependencies. no framework, no Python, no Docker (unless you want it)<p>- Stdin piping, something like `git diff | axe run reviewer` just works<p>- Sub-agent delegation. Where agents call other agents via tool use, depth-limited<p>- Persistent memory. If you want, agents can remember across runs without you managing state<p>- MCP support. Axe can connect any MCP server to your agents<p>- Built-in tools. Such as web_search and url_fetch out of the box<p>- Multi-provider. Bring what you love to use.. Anthropic, OpenAI, Ollama, or anything in models.dev format<p>- Path-sandboxed file ops. Keeps agents locked to a working directory<p>Written in Go. No daemon, no GUI.<p>What would you automate first?
Show HN: s@: decentralized social networking over static sites
Show HN: I built a tool that watches webpages and exposes changes as RSS
I built Site Spy after missing a visa appointment slot because a government page changed and I didn’t notice for two weeks.<p>It watches webpages for changes and shows the result like a diff. The part I think HN might find interesting is that it can monitor a specific element on a page, not just the whole page, and it can expose changes as RSS feeds.<p>So instead of tracking an entire noisy page, you can watch just a price, a stock status, a headline, or a specific content block. When it changes, you can inspect the diff, browse the snapshot history, or follow the updates in an RSS reader.<p>It’s a Chrome/Firefox extension plus a web dashboard.<p>Main features:<p>- Element picker for tracking a specific part of a page<p>- Diff view plus full snapshot timeline<p>- RSS feeds per watch, per tag, or across all watches<p>- MCP server for Claude, Cursor, and other AI agents<p>- Browser push, Email, and Telegram notifications<p>Chrome: <a href="https://chromewebstore.google.com/detail/site-spy/jeapcpanagdgipcfnncmogeojgfofige" rel="nofollow">https://chromewebstore.google.com/detail/site-spy/jeapcpanag...</a><p>Firefox: <a href="https://addons.mozilla.org/en-GB/firefox/addon/site-spy/" rel="nofollow">https://addons.mozilla.org/en-GB/firefox/addon/site-spy/</a><p>Docs: <a href="https://docs.sitespy.app" rel="nofollow">https://docs.sitespy.app</a><p>I’d especially love feedback on two things:<p>- Is RSS actually a useful interface for this, or do most people just want direct alerts?<p>- Does element-level tracking feel meaningfully better than full-page monitoring?
Show HN: Remotely use my guitar tuner
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon
Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead.<p>Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys.<p>To get started:<p><pre><code> brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup # downloads ~1 GB of models
rcli # interactive mode with push-to-talk
</code></pre>
Or:<p><pre><code> curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
</code></pre>
The numbers (M4 Max, 64 GB, reproducible via `rcli bench`):<p>LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files):
- Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295)
- Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87)
- LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372)
- Time-to-first-token: 6.6 ms<p>STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper.<p>TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx.<p>We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is.<p>The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind.<p>We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together.<p>MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology:<p>LLM benchmarks: <a href="https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-engine-apple-silicon">https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...</a><p>Speech benchmarks: <a href="https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-tts-apple-silicon">https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...</a><p>How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly.<p>Voice Pipeline optimizations details: <a href="https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai-pipeline-apple-silicon">https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai...</a>
RAG optimizations: <a href="https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retrieval-augmented-voice-ai">https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retr...</a><p>RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed.<p>Source: <a href="https://github.com/RunanywhereAI/RCLI" rel="nofollow">https://github.com/RunanywhereAI/RCLI</a> (MIT)<p>Demo: <a href="https://www.youtube.com/watch?v=eTYwkgNoaKg" rel="nofollow">https://www.youtube.com/watch?v=eTYwkgNoaKg</a><p>What would you build if on-device AI were genuinely as fast as cloud?
Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs
I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1. As of 2026, the top 4 models on that leaderboard are still descendants.<p>The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pretraining carves out discrete functional circuits in the layer stack that only work when preserved whole.<p>The whole thing was developed on 2x RTX 4090s in my basement. I'm now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on a dual GH200 rig (see my other post). Code and new models coming soon.<p>Happy to answer questions.
Show HN: DenchClaw – Local CRM on Top of OpenClaw
Hi everyone, I am Kumar, co-founder of Dench (<a href="https://denchclaw.com" rel="nofollow">https://denchclaw.com</a>). We were part of YC S24, an agentic workflow company that previously worked with sales floors automating niche enterprise tasks such as outbound calling, legal intake, etc.<p>Building consumer / power-user software always gave me more joy than FDEing into an enterprise. It did not give me joy to manually add AI tools to a cloud harness for every small new thing, at least not as much as completely local software that is open source and has all the powers of OpenClaw (I can now talk to my CRM on Telegram!).<p>A week ago, we launched Ironclaw, an Open Source OpenClaw CRM Framework (<a href="https://x.com/garrytan/status/2023518514120937672?s=20" rel="nofollow">https://x.com/garrytan/status/2023518514120937672?s=20</a>) but people confused us with NearAI’s Ironclaw, so we changed our name to DenchClaw (<a href="https://denchclaw.com" rel="nofollow">https://denchclaw.com</a>).<p>OpenClaw today feels like early React: the primitive is incredibly powerful, but the patterns are still forming, and everyone is piecing together their own way to actually use it. What made React explode was the emergence of frameworks like Gatsby and Next.js that turned raw capability into something opinionated, repeatable, and easy to adopt.<p>That is how we think about DenchClaw. We are trying to make it one of the clearest, most practical, and most complete ways to use OpenClaw in the real world.<p>Demo: <a href="https://www.youtube.com/watch?v=pfACTbc3Bh4#t=43" rel="nofollow">https://www.youtube.com/watch?v=pfACTbc3Bh4#t=43</a><p><pre><code> npx denchclaw
</code></pre>
I use DenchClaw daily for almost everything I do. It also works as a coding agent like Cursor - DenchClaw built DenchClaw. I am addicted now that I can ask it, “hey in the companies table only show me the ones who have more than 5 employees” and it updates it live than me having to manually add a filter.<p>On Dench, everything sits in a file system, the table filters, views, column toggles, calendar/gantt views, etc, so OpenClaw can directly work with it using Dench’s CRM skill.<p>The CRM is built on top of DuckDB, the smallest, most performant and at the same time also feature rich database we could find. Thank you DuckDB team!<p>It creates a new OpenClaw profile called “dench”, and opens a new OpenClaw Gateway… that means you can run all your usual openclaw commands by just prefixing every command with `openclaw --profile dench` . It will start your gateway on port 19001 range. You will be able to access the DenchClaw frontend at localhost:3100. Once you open it on Safari, just add it to your Dock to use it as a PWA.<p>Think of it as Cursor for your Mac (also works on Linux and Windows) which is based on OpenClaw. DenchClaw has a file tree view for you to use it as an elevated finder tool to do anything on your mac. I use it to create slides, do linkedin outreach using MY browser.<p>DenchClaw finds your Chrome Profile and copies it fully into its own, so you won’t have to log in into all your websites again. DenchClaw sees what you see, does what you do. It’s an everything app, that sits locally on your mac.<p>Just ask it “hey import my notion”, “hey import everything from my hubspot”, and it will literally go into your browser, export all objects and documents and put it in its own workspace that you can use.<p>We would love you all to break it, stress test its CRM capabilities, how it streams subagents for lead enrichment, hook it into your Apollo, Gmail, Notion and everything there is. Looking forward to comments/feedback!
Show HN: µJS, a 5KB alternative to Htmx and Turbo with zero dependencies
I built µJS because I wanted AJAX navigation without the verbosity of HTMX or the overhead of Turbo.<p>It intercepts links and form submissions, fetches pages via AJAX, and swaps fragments of the DOM. Single <script> tag, one call to `mu.init()`. No build step, no dependencies.<p>Key features: patch mode (update multiple fragments in one request), SSE support, DOM morphing via idiomorph, View Transitions, prefetch on hover, polling, and full HTTP verb support on any element.<p>At ~5KB gzipped, it's smaller than HTMX (16KB) and Turbo (25KB), and works with any backend: PHP, Python, Go, Ruby, whatever.<p>Playground: <a href="https://mujs.org/playground" rel="nofollow">https://mujs.org/playground</a><p>Comparison with HTMX and Turbo: <a href="https://mujs.org/comparison" rel="nofollow">https://mujs.org/comparison</a><p>About the project creation, why and when: <a href="https://mujs.org/about" rel="nofollow">https://mujs.org/about</a><p>GitHub: <a href="https://github.com/Digicreon/muJS" rel="nofollow">https://github.com/Digicreon/muJS</a><p>Happy to discuss the project.