The best Hacker News stories from Show from the past day

Go back

Latest posts:

Show HN: Agentic interface for mainframes and COBOL

Hi HN, we’re Sai and Aayush, and we’re building Hypercubic (<a href="https://www.hypercubic.ai/">https://www.hypercubic.ai/</a>), bringing AI tools to the mainframe and COBOL world. (We did a Launch HN last year: <a href="https://news.ycombinator.com/item?id=45877517">https://news.ycombinator.com/item?id=45877517</a>.) Today we’re launching Hopper, an agentic development environment for mainframes.<p>You can download it here: <a href="https://www.hypercubic.ai/hopper">https://www.hypercubic.ai/hopper</a>, and you can also request access and immediately get a mainframe user account to play with.<p>There's also a video runthrough at <a href="https://www.youtube.com/watch?v=q81L5DcfBvE" rel="nofollow">https://www.youtube.com/watch?v=q81L5DcfBvE</a>.<p>Mainframes still run a surprising amount of critical infrastructure: banking, payments, insurance, airlines, government programs, logistics, and core operations at large institutions. Many of these systems are decades old, but they continue to process enormous transaction volumes because they are reliable, secure, and deeply embedded into business operations.<p>A lot of that software is written in COBOL and runs on IBM z/OS. The development environment looks very different from modern cloud or Unix-style development. Instead of GitHub, shell commands, package managers, and CI pipelines, developers often work through TN3270 terminal sessions, ISPF panels, partitioned datasets, JCL, JES queues, spool output, return codes, VSAM files, CICS transactions, and shop-specific conventions.<p>TN3270 is the terminal interface used to interact with many IBM mainframe systems. ISPF is the menu and panel system developers use inside that terminal to browse datasets, edit source, submit jobs, and inspect output. It is powerful and reliable, but it was designed for expert humans navigating screens, function keys, and fixed-width workflows, not AI agents.<p>A simple COBOL change might require finding the right source member, checking copybooks, locating compile JCL, submitting a job, reading JES/SYSPRINT output, interpreting condition codes, patching fixed-width source, and resubmitting.<p>Much of this work is so well-defined and repetitive that it's a good fit for agentic AI. To get that working, however, a chatbot next to a terminal is not enough. The agent needs to operate inside the mainframe environment.<p>Hopper combines three things: (1) A real TN3270 terminal, (2) Mainframe-aware panels for datasets, members, jobs, and spool output, and (3) An AI agent that can operate across those z/OS surfaces.<p>For example, here is a tiny version of the kind of thing Hopper can help debug:<p><pre><code> COBOL: IDENTIFICATION DIVISION. PROGRAM-ID. PAYCALC. DATA DIVISION. WORKING-STORAGE SECTION. 01 CUSTOMER-BALANCE PIC 9(7)V99. PROCEDURE DIVISION. ADD 100.00 TO CUSTOMER-BALNCE DISPLAY "UPDATED BALANCE: " CUSTOMER-BALANCE STOP RUN. JCL: //PAYCOMP JOB (ACCT),'COMPILE',CLASS=A,MSGCLASS=X //COBOL EXEC IGYWCL [//COBOL.SYSIN](https://cobol.sysin/) DD DSN=USER1.APP.COBOL(PAYCALC),DISP=SHR [//LKED.SYSLMOD](https://lked.syslmod/) DD DSN=USER1.APP.LOAD(PAYCALC),DISP=SHR </code></pre> A human would submit this job, inspect JES output, open `SYSPRINT`, find the undefined `CUSTOMER-BALNCE`, map it back to the source, patch the member, and resubmit. Hopper is designed to let an agent operate through that same loop autonomously.<p>Hopper is not trying to hide the mainframe behind a generic abstraction, and it's not a chatbot. The design principle is simple: preserve the fidelity of the mainframe environment, but make it accessible to AI agents.<p>Sensitive operations require approval, and the terminal remains visible at all times.<p>Once agents can operate inside the mainframe environment, new workflows become possible: faster job debugging, automated documentation, safer code changes, test generation, migration planning, traffic replay, and modernization verification.<p>We’re curious to hear your thoughts! especially from anyone who has worked with mainframes, COBOL or has done legacy enterprise modernization.

Show HN: Statewright – Visual state machines that make AI agents reliable

Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves.<p>I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer.<p>For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts.<p>What if I made the problem smaller instead of making the model bigger?<p>I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts.<p>The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: <a href="https://statewright.ai/research" rel="nofollow">https://statewright.ai/research</a><p>Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about.<p>So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck.<p>You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs.<p>Statewright is currently live with a free tier, try it out in Claude Code by running the following:<p>/plugin marketplace add statewright/statewright<p>/plugin install statewright<p>/reload-plugins<p>Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here.<p>Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws.

Show HN: A modern Music Player Daemon based on Rockbox firmware

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.<p>We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.<p>Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).<p>Training: - Pretrained on 200B tokens across 16 TPU v6e (27 hours) - Post-trained on 2B tokens of synthesized function-calling data (45 minutes) - Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)<p>You can test it right now and finetune on your Mac/PC: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a><p>The full writeup on the architecture is here: <a href="https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md" rel="nofollow">https://github.com/cactus-compute/needle/blob/main/docs/simp...</a><p>We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.<p>While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.<p>This is part of our broader work on Cactus (<a href="https://github.com/cactus-compute/cactus" rel="nofollow">https://github.com/cactus-compute/cactus</a>), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: <a href="https://news.ycombinator.com/item?id=44524544">https://news.ycombinator.com/item?id=44524544</a><p>Everything is MIT licensed. Weights: <a href="https://huggingface.co/Cactus-Compute/needle" rel="nofollow">https://huggingface.co/Cactus-Compute/needle</a> GitHub: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a>

Show HN: OpenGravity – A zero-install, BYOK vanilla JS clone of Antigravity

Hi. I’m a high school student studying for my GCSEs. I was using Google Antigravity heavily for my side projects, but I kept hitting the usage limits, and getting random "agent terminated" errors. So I decided to try build my own version of the IDE. I love the UI, so I copied it as accurately as possible, and then hooked up some logic into it, including the INCREDIBLY finicky webcontainer api.<p>I tried to keep it super lightweight, no build steps, or dependencies, and now that its open source, I'm hoping people can build things on top of it that arent possible with closed source tools, like complex custom agent workflows.<p>Some screenshots: - <a href="https://github.com/ab-613/OpenGravity/blob/main/examples/screenshot.png?raw=true" rel="nofollow">https://github.com/ab-613/OpenGravity/blob/main/examples/scr...</a> - <a href="https://github.com/ab-613/OpenGravity/blob/main/examples/html site example.png?raw=true" rel="nofollow">https://github.com/ab-613/OpenGravity/blob/main/examples/htm...</a><p>What it's made from:<p>- Pure Vanilla JS: no react, vue, or build step. Built entirely in plain HTML/CSS/JS to keep it super lightweight.<p>- WebContainer API and xterm.js: Instead of faking a terminal, I (after much pain) hooked up the WebContainer API so the AI agent has a real, in browser linux environment to run shell commands, install dependencies, and edit local files.<p>- BYOK (Bring Your Own Key): API key ALWAYS stays in localStorage.<p>Whats currently happening:<p>- It works, but it's an alpha. The AI can proactively start projects going properly and edit files, but because I built this over a few days before my exams, a lot of the UI dropdowns and buttons are currently just hardcoded placeholders.<p>- I’m open sourcing it early because I think the foundation of a Vanilla JS + WebContainer IDE is really strong, and I'd love to see where the community takes it while I'm doing my exams.<p>- Live demo: <a href="https://opengravity.pages.dev" rel="nofollow">https://opengravity.pages.dev</a> (Zoom out to 80% if not full screen. It will prompt for a gemini api key on load). Start by uploading a folder, then you can fiddle with the terminal and agent, and see how it goes!<p>Would love to hear feedback on the code, the WebContainer integration, or how to improve the agent loop!

Show HN: OpenGravity – A zero-install, BYOK vanilla JS clone of Antigravity

Hi. I’m a high school student studying for my GCSEs. I was using Google Antigravity heavily for my side projects, but I kept hitting the usage limits, and getting random "agent terminated" errors. So I decided to try build my own version of the IDE. I love the UI, so I copied it as accurately as possible, and then hooked up some logic into it, including the INCREDIBLY finicky webcontainer api.<p>I tried to keep it super lightweight, no build steps, or dependencies, and now that its open source, I'm hoping people can build things on top of it that arent possible with closed source tools, like complex custom agent workflows.<p>Some screenshots: - <a href="https://github.com/ab-613/OpenGravity/blob/main/examples/screenshot.png?raw=true" rel="nofollow">https://github.com/ab-613/OpenGravity/blob/main/examples/scr...</a> - <a href="https://github.com/ab-613/OpenGravity/blob/main/examples/html site example.png?raw=true" rel="nofollow">https://github.com/ab-613/OpenGravity/blob/main/examples/htm...</a><p>What it's made from:<p>- Pure Vanilla JS: no react, vue, or build step. Built entirely in plain HTML/CSS/JS to keep it super lightweight.<p>- WebContainer API and xterm.js: Instead of faking a terminal, I (after much pain) hooked up the WebContainer API so the AI agent has a real, in browser linux environment to run shell commands, install dependencies, and edit local files.<p>- BYOK (Bring Your Own Key): API key ALWAYS stays in localStorage.<p>Whats currently happening:<p>- It works, but it's an alpha. The AI can proactively start projects going properly and edit files, but because I built this over a few days before my exams, a lot of the UI dropdowns and buttons are currently just hardcoded placeholders.<p>- I’m open sourcing it early because I think the foundation of a Vanilla JS + WebContainer IDE is really strong, and I'd love to see where the community takes it while I'm doing my exams.<p>- Live demo: <a href="https://opengravity.pages.dev" rel="nofollow">https://opengravity.pages.dev</a> (Zoom out to 80% if not full screen. It will prompt for a gemini api key on load). Start by uploading a folder, then you can fiddle with the terminal and agent, and see how it goes!<p>Would love to hear feedback on the code, the WebContainer integration, or how to improve the agent loop!

Show HN: adamsreview – better multi-agent PR reviews for Claude Code

I built adamsreview, a Claude Code plugin that runs deeper, multi-stage PR reviews using parallel sub-agents, validation passes, persistent JSON state, and optional ensemble review via Codex CLI and PR bot comments.<p>On my own PRs, it has been catching dramatically more real bugs than Claude’s built-in /review, /ultrareview, CodeRabbit, Greptile, and Codex’s built-in review, while producing fewer false positives.<p>adamsreview is six Claude Code slash commands packaged as a plugin: review, codex-review, add, promote, walkthrough, and fix. I modeled it after the built-in /review command and extended it meaningfully.<p>You can clear context between review stages because state is stored in JSON artifacts on disk, with built-in scripts for keeping it updated.<p>The walkthrough command uses Claude’s AskUserQuestion feature to walk you through uncertain findings or items needing human review one by one. Then, the fix command dispatches per-fix-group agents and re-reviews the work with Opus, reverting any regressions before committing survivors.<p>It runs against your regular Claude Code subscription (Max plan recommended), unlike /ultrareview, which charges against your Extra Usage pool.<p>I would love feedback from Claude Code users, pro devs, and anyone with strong opinions about AI code reviews.<p>Repo: <a href="https://github.com/adamjgmiller/adamsreview" rel="nofollow">https://github.com/adamjgmiller/adamsreview</a><p>Install: /plugin marketplace add adamjgmiller/adamsreview, /plugin install adamsreview@adamsreview

Show HN: adamsreview – better multi-agent PR reviews for Claude Code

I built adamsreview, a Claude Code plugin that runs deeper, multi-stage PR reviews using parallel sub-agents, validation passes, persistent JSON state, and optional ensemble review via Codex CLI and PR bot comments.<p>On my own PRs, it has been catching dramatically more real bugs than Claude’s built-in /review, /ultrareview, CodeRabbit, Greptile, and Codex’s built-in review, while producing fewer false positives.<p>adamsreview is six Claude Code slash commands packaged as a plugin: review, codex-review, add, promote, walkthrough, and fix. I modeled it after the built-in /review command and extended it meaningfully.<p>You can clear context between review stages because state is stored in JSON artifacts on disk, with built-in scripts for keeping it updated.<p>The walkthrough command uses Claude’s AskUserQuestion feature to walk you through uncertain findings or items needing human review one by one. Then, the fix command dispatches per-fix-group agents and re-reviews the work with Opus, reverting any regressions before committing survivors.<p>It runs against your regular Claude Code subscription (Max plan recommended), unlike /ultrareview, which charges against your Extra Usage pool.<p>I would love feedback from Claude Code users, pro devs, and anyone with strong opinions about AI code reviews.<p>Repo: <a href="https://github.com/adamjgmiller/adamsreview" rel="nofollow">https://github.com/adamjgmiller/adamsreview</a><p>Install: /plugin marketplace add adamjgmiller/adamsreview, /plugin install adamsreview@adamsreview

Show HN: TikTok but for Scientific Papers

Show HN: TikTok but for Scientific Papers

Show HN: Modafinil - Let agents continue running while MacBook lid is closed

Show HN: Countries where you can leave your MacBook at a random coffee shop

Hi HN,<p>I wanted to know which countries you can simply leave your laptop at a Starbucks, and where you can't.<p>Feel free to click and vote.

Show HN: Countries where you can leave your MacBook at a random coffee shop

Hi HN,<p>I wanted to know which countries you can simply leave your laptop at a Starbucks, and where you can't.<p>Feel free to click and vote.

Show HN: Countries where you can leave your MacBook at a random coffee shop

Hi HN,<p>I wanted to know which countries you can simply leave your laptop at a Starbucks, and where you can't.<p>Feel free to click and vote.

Show HN: An index of indie web/blog indexes

I saw a comment here about how there are so many indexes of indie sites, blogs, etc but there wasn't an index of all the indexes. So I built it. It doesn't require a log in, just go browse! I've curated about 30 or so, but there is a submission form if there are ones I am missing.<p>Also happy to take UI improvements because I am not great in that area!

Show HN: An index of indie web/blog indexes

I saw a comment here about how there are so many indexes of indie sites, blogs, etc but there wasn't an index of all the indexes. So I built it. It doesn't require a log in, just go browse! I've curated about 30 or so, but there is a submission form if there are ones I am missing.<p>Also happy to take UI improvements because I am not great in that area!

Show HN: An index of indie web/blog indexes

I saw a comment here about how there are so many indexes of indie sites, blogs, etc but there wasn't an index of all the indexes. So I built it. It doesn't require a log in, just go browse! I've curated about 30 or so, but there is a submission form if there are ones I am missing.<p>Also happy to take UI improvements because I am not great in that area!

Show HN: Rust but Lisp

Show HN: Rust but Lisp

Show HN: Building a web server in assembly to give my life (a lack of) meaning

This is ymawky, a static file web server for MacOS written entirely in ARM64 assembly. It supports GET, PUT, DELETE, HEAD, and OPTIONS requests, and supports Range: bytes=X-Y headers (which allows scrubbing for video streaming). It decodes percent-encoded URLs, strictly enforces docroot, serves custom error pages for any HTTP error response, supports directory listing, and has (some) mitigations against slowloris-like attacks.<p>I’ve also written a more detailed writeup here: <a href="https://imtomt.github.io/ymawky/" rel="nofollow">https://imtomt.github.io/ymawky/</a>

1 2 3 ... 979 980 981 >