The best Hacker News stories from Show from the past day
Latest posts:
Show HN: AgentSwift – Open-source iOS builder agent
I'm working on a coding agent for building iOS apps. It's built on openspec and xcodebuildmcp. It's free and open source.
Show HN: Drive any macOS app in the background without stealing the cursor
Hi HN, Francesco from Cua here.
I hacked this project together last weekend, inspired by the Codex Computer-Use release and lessons learned from deploying GUI-operating agents for our customers.<p>The main problem: when a UI automation process controls a desktop app today, it usually takes over the human’s session. Your cursor moves, keyboard focus gets stolen, windows jump to the front, and you have to stop working until the agent is done. That is why we have historically avoided encouraging users to run these processes directly on their host machine, instead relying on VMs or GUI containers for concurrency and background execution.<p>But computer-use - the tools we give agents to operate computers like humans - does not scale cleanly that way. As models get smarter, agents need to share hosts safely, run in the background, and avoid collisions with the human or other agents using the same machine.<p>We realized macOS has no first-class API for "drive this app without touching the cursor". CGEventPost routes through the hardware input stream, so it moves your cursor. CGEvent.postToPid avoids the cursor warp, but Chromium treats those events as untrusted and silently drops clicks at the renderer boundary. Activating the target app first raises the window and pulls focus, defeating the point of background execution.<p>Cua Driver is our attempt at a real fix: a background computer-use driver for macOS that lets an agent click, type, scroll, and read native apps while your cursor, frontmost app, and Space stay where they are. The default interface is a CLI, so it is easy to script or call from any coding agent shell.<p>Try it on macOS 14+:<p>/bin/bash -c "$(curl -fsSL <a href="https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh" rel="nofollow">https://raw.githubusercontent.com/trycua/cua/main/libs/cua-d...</a>)"<p>The first internal use case was delegated demo recording. We ask Claude Code to drive an app while 'cua-driver recording start' captures the trajectory, screenshots, actions, and click markers. The result is an agent-generated product demo, Screen Studio inspired.<p>Other things we have used it for:<p>- Replacing Vercel’s agent-browser and other browser-use CLIs. With Claude Code and Cua Driver, you do not need Chrome DevTools Protocol at all.<p>- A dev-loop QA agent that reproduces a visual bug, edits code, rebuilds, and verifies the UI while my editor stays frontmost.<p>- Personal-assistant flows that use iMessage from Claude Code, Hermes, or other general-purpose agent CLIs.<p>- Pulling visual context from Chrome, Figma, Preview, or YouTube windows I am not looking at, without relying on their APIs.<p>What made this harder than expected:<p>- CGEventPost warps the cursor because it goes through the HID stream.<p>- CGEvent.postToPid does not warp the cursor, but Chromium drops it at the renderer IPC boundary.<p>- Activating the target first raises the window and can drag you across Spaces.<p>- Electron apps stop keeping useful AX trees alive when windows are occluded without a private remote-aware SPI.<p>The unlock was SkyLight. SLEventPostToPid is a sibling of the public per-PID call, but it travels through a WindowServer channel Chromium accepts as trusted. Pair it with yabai’s focus-without-raise pattern, plus an off-screen primer click at (-1, -1), and the click lands without the window ever raising.<p>One thing we learned: the right addressing mode depends on the app. Native macOS apps usually have rich AX trees, Chromium-family apps often need a hybrid of AX and screenshots, and apps like Blender or CAD tools may expose almost no useful AX surface. The mistake is defaulting to pixels everywhere - or defaulting to AX everywhere.<p>Long technical writeup: <a href="https://github.com/trycua/cua/blob/main/blog/inside-macos-window-internals.md" rel="nofollow">https://github.com/trycua/cua/blob/main/blog/inside-macos-wi...</a><p>I would like feedback from people building Mac automation, agent harnesses, or accessibility tooling. If it breaks on an macOS app you care about, that is useful data for us.
Show HN: Live Sun and Moon Dashboard with NASA Footage
Show HN: Startup Equity Adventure Game
I put this together (with Claude) as a semi-gamified way for folks to learn about startup equity. Take a look, and share your scorecard :)
Show HN: A free ESG stock screener that publishes its losses and methodology
Hey HN, JSS(JumpstartSignal) is a free, ESG-filtered daily stock screener. I built it after some really badly-timed quantum computing stock buys, so I felt I needed to learn more about systematic, longer-horizon approaches and the underlying technicals instead of chasing themes. Three things about it that might be of interest:<p>1. Methodology is fully documented at <a href="https://jumpstartsignal.com/how-it-works/" rel="nofollow">https://jumpstartsignal.com/how-it-works/</a> 5-stage pipeline, 54 signals tested individually plus 1,836 combinations evaluated, walk-forward validation across 25 hold periods. Nothing hand-tuned to a single backtest window.<p>2. Many wins, misses, and losses are published as case studies e.g. <a href="https://jumpstartsignal.com/case-studies/nvda/" rel="nofollow">https://jumpstartsignal.com/case-studies/nvda/</a> walks through the 32 times the system flagged NVDA starting at $5.44 in 2018. <a href="https://jumpstartsignal.com/case-studies/sedg/" rel="nofollow">https://jumpstartsignal.com/case-studies/sedg/</a> shows a -49% loss, and <a href="https://jumpstartsignal.com/case-studies/tsla/" rel="nofollow">https://jumpstartsignal.com/case-studies/tsla/</a> explains why the system <i>never</i> flagged Tesla (it passed Stages 1 and 2 on 207 days but only peaked at 20/100 in scoring vs the 70 needed for OPPORTUNITY tier). <a href="https://jumpstartsignal.com/results/" rel="nofollow">https://jumpstartsignal.com/results/</a> also shows the 10 best entries alongside the 10 worst.<p>3. A genetic algorithm picked the signal weights, but constrained to maintain alpha across multiple market regimes (otherwise it overfits to a single bull market). The constraint dropped some "best in backtest" configurations that only worked 2018-2021.<p>Topline: 2012-2025 backtest at SPOTLIGHT + OPPORTUNITY tier produced +163% alpha vs SPY (results page has the per-trade breakdown).<p>Daily watchlist emailed free; reports + results + case studies are publicly browsable without signup.<p>Happy to take questions about methodology, what the system gets wrong, or why specific tickers landed where they did.
Show HN: The Unix Magic poster, annotated (updated)
This is a site that maps the references on Gary Overacre's 1980s UNIX Magic
poster to short write-ups with sources. I posted an earlier version about a
year ago [1]. Since then I rewrote some of the annotations, added
deep-linking to individual markers and a frame/sidebar view, gave the site a
terminal-style redesign, and fixed historical inaccuracies (daemon etymology,
nroff origin, B language vs. Multics, etc.).<p>Contributions and comments welcome; each marker is a GitHub issue.<p>site: <a href="https://unixmagic.net" rel="nofollow">https://unixmagic.net</a><p>[1] <a href="https://news.ycombinator.com/item?id=43019136">https://news.ycombinator.com/item?id=43019136</a>
Show HN: The Unix Magic poster, annotated (updated)
This is a site that maps the references on Gary Overacre's 1980s UNIX Magic
poster to short write-ups with sources. I posted an earlier version about a
year ago [1]. Since then I rewrote some of the annotations, added
deep-linking to individual markers and a frame/sidebar view, gave the site a
terminal-style redesign, and fixed historical inaccuracies (daemon etymology,
nroff origin, B language vs. Multics, etc.).<p>Contributions and comments welcome; each marker is a GitHub issue.<p>site: <a href="https://unixmagic.net" rel="nofollow">https://unixmagic.net</a><p>[1] <a href="https://news.ycombinator.com/item?id=43019136">https://news.ycombinator.com/item?id=43019136</a>
Show HN: Tiao, A two-player turn-based board game
Hi HN,<p>I built this digital version of Tiao, a two-player turn based strategy board game. Think Checkers meets Go. It's free, runs in the browser, has multiplayer, AI, over the board mode and a lot of other neat things. The source is on GitHub (AGPL).<p>The game was originally designed by my friend Andreas Edmeier. He created the rules and has been playtesting and refining the game design for years. I built the website for it. The core in about 2 weeks using TypeScript, Next.js, Express, Websockets, and MongoDB. Fully dockerized, deployed on a Hetzner VPS with Coolify. Authentication with better-auth. Real-time gameplay, ELO matchmaking, OpenPanel analytics, and a fully functional achievements system.<p>Play it: <a href="https://playtiao.com" rel="nofollow">https://playtiao.com</a>
Source: <a href="https://github.com/trebeljahr/tiao" rel="nofollow">https://github.com/trebeljahr/tiao</a><p>Happy to answer questions about the tech, the game design, or anything else.<p>My hope is that more people will play this game because I think it is genuinely fun and would be cool to one day see people play this on a Go board or on their phones/computers.<p>Have a good one.
Show HN: Tiao, A two-player turn-based board game
Hi HN,<p>I built this digital version of Tiao, a two-player turn based strategy board game. Think Checkers meets Go. It's free, runs in the browser, has multiplayer, AI, over the board mode and a lot of other neat things. The source is on GitHub (AGPL).<p>The game was originally designed by my friend Andreas Edmeier. He created the rules and has been playtesting and refining the game design for years. I built the website for it. The core in about 2 weeks using TypeScript, Next.js, Express, Websockets, and MongoDB. Fully dockerized, deployed on a Hetzner VPS with Coolify. Authentication with better-auth. Real-time gameplay, ELO matchmaking, OpenPanel analytics, and a fully functional achievements system.<p>Play it: <a href="https://playtiao.com" rel="nofollow">https://playtiao.com</a>
Source: <a href="https://github.com/trebeljahr/tiao" rel="nofollow">https://github.com/trebeljahr/tiao</a><p>Happy to answer questions about the tech, the game design, or anything else.<p>My hope is that more people will play this game because I think it is genuinely fun and would be cool to one day see people play this on a Go board or on their phones/computers.<p>Have a good one.
Show HN: Utilyze – an open source GPU monitoring tool more accurate than nvtop
The standard GPU utilization metric reported by nvidia-smi, nvtop, Weights & Biases, Amazon CloudWatch, Google Cloud Monitoring, and Azure Monitor is highly misleading. It reports the fraction of time that any kernel is running on the GPU, which means a GPU can report 100% utilization even if only a small portion of its compute capacity is actually being used. In practice, we've seen workloads with ~1–10% real compute throughput while dashboards show 100%.<p>This becomes a problem when teams rely on that metric for capacity planning or optimization decisions, it can make underutilized systems look saturated.<p>We're releasing an open-source (Apache 2.0) tool, Utilyze, to measure GPU utilization differently. It samples hardware performance counters and reports compute and memory throughput relative to the hardware's theoretical limits. It also estimates an attainable utilization ceiling for a given workload.<p>GitHub link: <a href="https://github.com/systalyze/utilyze" rel="nofollow">https://github.com/systalyze/utilyze</a><p>We'd love to hear your thoughts!
Show HN: Utilyze – an open source GPU monitoring tool more accurate than nvtop
The standard GPU utilization metric reported by nvidia-smi, nvtop, Weights & Biases, Amazon CloudWatch, Google Cloud Monitoring, and Azure Monitor is highly misleading. It reports the fraction of time that any kernel is running on the GPU, which means a GPU can report 100% utilization even if only a small portion of its compute capacity is actually being used. In practice, we've seen workloads with ~1–10% real compute throughput while dashboards show 100%.<p>This becomes a problem when teams rely on that metric for capacity planning or optimization decisions, it can make underutilized systems look saturated.<p>We're releasing an open-source (Apache 2.0) tool, Utilyze, to measure GPU utilization differently. It samples hardware performance counters and reports compute and memory throughput relative to the hardware's theoretical limits. It also estimates an attainable utilization ceiling for a given workload.<p>GitHub link: <a href="https://github.com/systalyze/utilyze" rel="nofollow">https://github.com/systalyze/utilyze</a><p>We'd love to hear your thoughts!
Show HN: A terminal spreadsheet editor with Vim keybindings
While speccing out this spreadsheet tool, I realized that I never had to think about the keybindings. It all just came naturally from Vim. Normal/insert/visual modes, hjkl navigation, dd/yy/p, :w, :q. The usual muscle memory works.<p>It supports CSV/TSV import and export, and a native .cell format that preserves formulas. The formula engine handles SUM, AVERAGE, COUNT, MIN, MAX, and IF with range references.<p>The codebase is a Cargo workspace: a pure cell-sheet-core library (no TUI dependency) and a cell-sheet-tui crate on top of ratatui. Early days, but it's usable.<p>To try it out:
cargo install cell-sheet-tui<p>Feedback of any kind is greatly appreciated!
Show HN: A terminal spreadsheet editor with Vim keybindings
While speccing out this spreadsheet tool, I realized that I never had to think about the keybindings. It all just came naturally from Vim. Normal/insert/visual modes, hjkl navigation, dd/yy/p, :w, :q. The usual muscle memory works.<p>It supports CSV/TSV import and export, and a native .cell format that preserves formulas. The formula engine handles SUM, AVERAGE, COUNT, MIN, MAX, and IF with range references.<p>The codebase is a Cargo workspace: a pure cell-sheet-core library (no TUI dependency) and a cell-sheet-tui crate on top of ratatui. Early days, but it's usable.<p>To try it out:
cargo install cell-sheet-tui<p>Feedback of any kind is greatly appreciated!
Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview
Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%.<p>Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 lately (<a href="https://debugml.github.io/cheating-agents/" rel="nofollow">https://debugml.github.io/cheating-agents/</a>), I would like to also clarify a few things<p>1. Absolutely no {agents/skills}.md files were inserted at any point. No cheating mechanisms whatsoever<p>2. The cli agent was run in leaderboard compliant way (no modification of resources or timeouts)<p>3. The full terminal bench run was done using the fully open source version of the agent, no difference between what is on github and what was run.<p>I was originally going to wait for it to land on the leaderboard, but it has been 8 days and the maintainers do not respond unfortunately (there is a large backlog of the pull requests on their HF) so I decided to post anyways.<p>HF PR: <a href="https://huggingface.co/datasets/harborframework/terminal-bench-2-leaderboard/discussions/145" rel="nofollow">https://huggingface.co/datasets/harborframework/terminal-ben...</a><p>It is astounding how much the harness matters, based on this and other experiments I have done.
Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview
Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%.<p>Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 lately (<a href="https://debugml.github.io/cheating-agents/" rel="nofollow">https://debugml.github.io/cheating-agents/</a>), I would like to also clarify a few things<p>1. Absolutely no {agents/skills}.md files were inserted at any point. No cheating mechanisms whatsoever<p>2. The cli agent was run in leaderboard compliant way (no modification of resources or timeouts)<p>3. The full terminal bench run was done using the fully open source version of the agent, no difference between what is on github and what was run.<p>I was originally going to wait for it to land on the leaderboard, but it has been 8 days and the maintainers do not respond unfortunately (there is a large backlog of the pull requests on their HF) so I decided to post anyways.<p>HF PR: <a href="https://huggingface.co/datasets/harborframework/terminal-bench-2-leaderboard/discussions/145" rel="nofollow">https://huggingface.co/datasets/harborframework/terminal-ben...</a><p>It is astounding how much the harness matters, based on this and other experiments I have done.
Show HN: AI memory with biological decay (52% recall)
Most RAG setups fail because they treat memory like a static filing cabinet. When every transient bug fix or abandoned rule is stored forever, the context window eventually chokes on noise, spiking token costs and degrading the agent's reasoning.<p>This implementation experiments with a biological approach by using the Ebbinghaus forgetting curve to manage context as a living substrate. Memories are assigned a "strength" score where each recall reinforces the data and flattens its decay curve (spaced repetition), while unused data eventually hits a threshold and is pruned.<p>To solve the "logical neighbor" problem where semantic search misses relevant but non-similar nodes, a graph layer is layered over the vector store. Benchmarked against the LoCoMo dataset, this reached 52% Recall@5, nearly double the accuracy of stateless vector stores, while cutting token waste by roughly 84%.<p>Built as a local first MCP server using DuckDB, the hypothesis is that for agents handling long-running projects, "what to forget" is just as critical as "what to remember." I'd be interested to hear if others are exploring non-linear decay or similar biological constraints for context management.<p>GitHub: <a href="https://github.com/sachitrafa/cognitive-ai-memory" rel="nofollow">https://github.com/sachitrafa/cognitive-ai-memory</a>
Show HN: AI memory with biological decay (52% recall)
Most RAG setups fail because they treat memory like a static filing cabinet. When every transient bug fix or abandoned rule is stored forever, the context window eventually chokes on noise, spiking token costs and degrading the agent's reasoning.<p>This implementation experiments with a biological approach by using the Ebbinghaus forgetting curve to manage context as a living substrate. Memories are assigned a "strength" score where each recall reinforces the data and flattens its decay curve (spaced repetition), while unused data eventually hits a threshold and is pruned.<p>To solve the "logical neighbor" problem where semantic search misses relevant but non-similar nodes, a graph layer is layered over the vector store. Benchmarked against the LoCoMo dataset, this reached 52% Recall@5, nearly double the accuracy of stateless vector stores, while cutting token waste by roughly 84%.<p>Built as a local first MCP server using DuckDB, the hypothesis is that for agents handling long-running projects, "what to forget" is just as critical as "what to remember." I'd be interested to hear if others are exploring non-linear decay or similar biological constraints for context management.<p>GitHub: <a href="https://github.com/sachitrafa/cognitive-ai-memory" rel="nofollow">https://github.com/sachitrafa/cognitive-ai-memory</a>
Show HN: Free textbook on engineering thermodynamics
Author here. Feel free to send questions of any kind.
Show HN: Free textbook on engineering thermodynamics
Author here. Feel free to send questions of any kind.
Show HN: Free textbook on engineering thermodynamics
Author here. Feel free to send questions of any kind.