The best Hacker News stories from Show from the past day

Go back

Latest posts:

Show HN: Open-source browser for AI agents

Hi HN, I forked chromium and built agent-browser-protocol (ABP) after noticing that most browser-agent failures aren’t really about the model misunderstanding the page. Instead, the problem is that the model is reasoning from a stale state.<p>ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent.<p>The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work.<p>A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed<p>As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below.<p>Try it out: `claude mcp add browser -- npx -y agent-browser-protocol --mcp` (Codex/OpenCode instructions in the docs)<p>Demo video: <a href="https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369" rel="nofollow">https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369</a>

Show HN: Open-source browser for AI agents

Hi HN, I forked chromium and built agent-browser-protocol (ABP) after noticing that most browser-agent failures aren’t really about the model misunderstanding the page. Instead, the problem is that the model is reasoning from a stale state.<p>ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent.<p>The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work.<p>A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed<p>As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below.<p>Try it out: `claude mcp add browser -- npx -y agent-browser-protocol --mcp` (Codex/OpenCode instructions in the docs)<p>Demo video: <a href="https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369" rel="nofollow">https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369</a>

Show HN: Klaus – OpenClaw on a VM, batteries included

We are Bailey and Robbie and we are working on Klaus (<a href="https://klausai.com/" rel="nofollow">https://klausai.com/</a>): hosted OpenClaw that is secure and powerful out of the box.<p>Running OpenClaw requires setting up a cloud VM or local container (a pain) or giving OpenClaw root access to your machine (insecure). Many basic integrations (eg Slack, Google Workspace) require you to create your own OAuth app.<p>We make running OpenClaw simple by giving each user their own EC2 instance, preconfigured with keys for OpenRouter, AgentMail, and Orthogonal. And we have OAuth apps to make it easy to integrate with Slack and Google Workspace.<p>We are both HN readers (Bailey has been on here for ~10 years) and we know OpenClaw has serious security concerns. We do a lot to make our users’ instances more secure: we run on a private subnet, automatically update the OpenClaw version our users run, and because you’re on our VM by default the only keys you leak if you get hacked belong to us. Connecting your email is still a risk. The best defense I know of is Opus 4.6 for resilience to prompt injection. If you have a better solution, we’d love to hear it!<p>We learned a lot about infrastructure management in the past month. Kimi K2.5 and Mimimax M2.5 are extremely good at hallucinating new ways to break openclaw.json and otherwise wreaking havoc on an EC2 instance. The week after our launch we spent 20+ hours fixing broken machines by hand.<p>We wrote a ton of best practices on using OpenClaw on AWS Linux into our users’ AGENTS.md, got really good at un-bricking EC2 machines over SSM, added a command-and-control server to every instance to facilitate hotfixes and migrations, and set up a Klaus instance to answer FAQs on discord.<p>In addition to all of this, we built ClawBert, our AI SRE for hotfixing OpenClaw instances automatically: <a href="https://www.youtube.com/watch?v=v65F6VBXqKY" rel="nofollow">https://www.youtube.com/watch?v=v65F6VBXqKY</a>. Clawbert is a Claude Code instance that runs whenever a health check fails or the user triggers it in the UI. It can read that user’s entries in our database and execute commands on the user’s instance. We expose a log of Clawbert’s runs to the user.<p>We know that setting up OpenClaw is easy for most HN readers, but I promise it is not for most people. Klaus has a long way to go, but it’s still very rewarding to see people who’ve never used Claude Code get their first taste of AI agents.<p>We charge $19/m for a t4g.small, $49/m for a t4g.medium, and $200/m for a t4g.xlarge and priority support. You get $15 in tokens and $20 in Orthogonal credits one-time.<p>We want to know what you are building on OpenClaw so we can make sure we support it. We are already working with companies like Orthogonal and Openrouter that are building things to make agents more useful, and we’re sure there are more tools out there we don’t know about. If you’ve built something agents want, please let us know. Comments welcome!

Show HN: Klaus – OpenClaw on a VM, batteries included

We are Bailey and Robbie and we are working on Klaus (<a href="https://klausai.com/" rel="nofollow">https://klausai.com/</a>): hosted OpenClaw that is secure and powerful out of the box.<p>Running OpenClaw requires setting up a cloud VM or local container (a pain) or giving OpenClaw root access to your machine (insecure). Many basic integrations (eg Slack, Google Workspace) require you to create your own OAuth app.<p>We make running OpenClaw simple by giving each user their own EC2 instance, preconfigured with keys for OpenRouter, AgentMail, and Orthogonal. And we have OAuth apps to make it easy to integrate with Slack and Google Workspace.<p>We are both HN readers (Bailey has been on here for ~10 years) and we know OpenClaw has serious security concerns. We do a lot to make our users’ instances more secure: we run on a private subnet, automatically update the OpenClaw version our users run, and because you’re on our VM by default the only keys you leak if you get hacked belong to us. Connecting your email is still a risk. The best defense I know of is Opus 4.6 for resilience to prompt injection. If you have a better solution, we’d love to hear it!<p>We learned a lot about infrastructure management in the past month. Kimi K2.5 and Mimimax M2.5 are extremely good at hallucinating new ways to break openclaw.json and otherwise wreaking havoc on an EC2 instance. The week after our launch we spent 20+ hours fixing broken machines by hand.<p>We wrote a ton of best practices on using OpenClaw on AWS Linux into our users’ AGENTS.md, got really good at un-bricking EC2 machines over SSM, added a command-and-control server to every instance to facilitate hotfixes and migrations, and set up a Klaus instance to answer FAQs on discord.<p>In addition to all of this, we built ClawBert, our AI SRE for hotfixing OpenClaw instances automatically: <a href="https://www.youtube.com/watch?v=v65F6VBXqKY" rel="nofollow">https://www.youtube.com/watch?v=v65F6VBXqKY</a>. Clawbert is a Claude Code instance that runs whenever a health check fails or the user triggers it in the UI. It can read that user’s entries in our database and execute commands on the user’s instance. We expose a log of Clawbert’s runs to the user.<p>We know that setting up OpenClaw is easy for most HN readers, but I promise it is not for most people. Klaus has a long way to go, but it’s still very rewarding to see people who’ve never used Claude Code get their first taste of AI agents.<p>We charge $19/m for a t4g.small, $49/m for a t4g.medium, and $200/m for a t4g.xlarge and priority support. You get $15 in tokens and $20 in Orthogonal credits one-time.<p>We want to know what you are building on OpenClaw so we can make sure we support it. We are already working with companies like Orthogonal and Openrouter that are building things to make agents more useful, and we’re sure there are more tools out there we don’t know about. If you’ve built something agents want, please let us know. Comments welcome!

Show HN: I built a tool that watches webpages and exposes changes as RSS

I built Site Spy after missing a visa appointment slot because a government page changed and I didn’t notice for two weeks.<p>It watches webpages for changes and shows the result like a diff. The part I think HN might find interesting is that it can monitor a specific element on a page, not just the whole page, and it can expose changes as RSS feeds.<p>So instead of tracking an entire noisy page, you can watch just a price, a stock status, a headline, or a specific content block. When it changes, you can inspect the diff, browse the snapshot history, or follow the updates in an RSS reader.<p>It’s a Chrome/Firefox extension plus a web dashboard.<p>Main features:<p>- Element picker for tracking a specific part of a page<p>- Diff view plus full snapshot timeline<p>- RSS feeds per watch, per tag, or across all watches<p>- MCP server for Claude, Cursor, and other AI agents<p>- Browser push, Email, and Telegram notifications<p>Chrome: <a href="https://chromewebstore.google.com/detail/site-spy/jeapcpanagdgipcfnncmogeojgfofige" rel="nofollow">https://chromewebstore.google.com/detail/site-spy/jeapcpanag...</a><p>Firefox: <a href="https://addons.mozilla.org/en-GB/firefox/addon/site-spy/" rel="nofollow">https://addons.mozilla.org/en-GB/firefox/addon/site-spy/</a><p>Docs: <a href="https://docs.sitespy.app" rel="nofollow">https://docs.sitespy.app</a><p>I’d especially love feedback on two things:<p>- Is RSS actually a useful interface for this, or do most people just want direct alerts?<p>- Does element-level tracking feel meaningfully better than full-page monitoring?

Show HN: I built a tool that watches webpages and exposes changes as RSS

I built Site Spy after missing a visa appointment slot because a government page changed and I didn’t notice for two weeks.<p>It watches webpages for changes and shows the result like a diff. The part I think HN might find interesting is that it can monitor a specific element on a page, not just the whole page, and it can expose changes as RSS feeds.<p>So instead of tracking an entire noisy page, you can watch just a price, a stock status, a headline, or a specific content block. When it changes, you can inspect the diff, browse the snapshot history, or follow the updates in an RSS reader.<p>It’s a Chrome/Firefox extension plus a web dashboard.<p>Main features:<p>- Element picker for tracking a specific part of a page<p>- Diff view plus full snapshot timeline<p>- RSS feeds per watch, per tag, or across all watches<p>- MCP server for Claude, Cursor, and other AI agents<p>- Browser push, Email, and Telegram notifications<p>Chrome: <a href="https://chromewebstore.google.com/detail/site-spy/jeapcpanagdgipcfnncmogeojgfofige" rel="nofollow">https://chromewebstore.google.com/detail/site-spy/jeapcpanag...</a><p>Firefox: <a href="https://addons.mozilla.org/en-GB/firefox/addon/site-spy/" rel="nofollow">https://addons.mozilla.org/en-GB/firefox/addon/site-spy/</a><p>Docs: <a href="https://docs.sitespy.app" rel="nofollow">https://docs.sitespy.app</a><p>I’d especially love feedback on two things:<p>- Is RSS actually a useful interface for this, or do most people just want direct alerts?<p>- Does element-level tracking feel meaningfully better than full-page monitoring?

Show HN: What's my JND? – a colour guessing game

<a href="https://www.keithcirkel.co.uk/too-much-color/" rel="nofollow">https://www.keithcirkel.co.uk/too-much-color/</a>

Show HN: What's my JND? – a colour guessing game

<a href="https://www.keithcirkel.co.uk/too-much-color/" rel="nofollow">https://www.keithcirkel.co.uk/too-much-color/</a>

Show HN: I Was Here – Draw on street view, others can find your drawings

Hey HN, I made a site where you can draw on street-level panoramas. Your drawings persist and other people can see them in real time.<p>Strokes get projected onto the 3D panorama so they wrap around buildings and follow the geometry, not just a flat overlay. Uses WebGL2 for rendering, Mapillary for the street imagery.<p>The idea is for it to become a global canvas, anyone can leave a mark anywhere and others stumble onto it.

Show HN: I Was Here – Draw on street view, others can find your drawings

Hey HN, I made a site where you can draw on street-level panoramas. Your drawings persist and other people can see them in real time.<p>Strokes get projected onto the 3D panorama so they wrap around buildings and follow the geometry, not just a flat overlay. Uses WebGL2 for rendering, Mapillary for the street imagery.<p>The idea is for it to become a global canvas, anyone can leave a mark anywhere and others stumble onto it.

Show HN: Remotely use my guitar tuner

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead.<p>Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys.<p>To get started:<p><pre><code> brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git brew install rcli rcli setup # downloads ~1 GB of models rcli # interactive mode with push-to-talk </code></pre> Or:<p><pre><code> curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash </code></pre> The numbers (M4 Max, 64 GB, reproducible via `rcli bench`):<p>LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms<p>STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper.<p>TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx.<p>We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is.<p>The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind.<p>We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together.<p>MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology:<p>LLM benchmarks: <a href="https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-engine-apple-silicon">https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...</a><p>Speech benchmarks: <a href="https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-tts-apple-silicon">https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...</a><p>How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly.<p>Voice Pipeline optimizations details: <a href="https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai-pipeline-apple-silicon">https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai...</a> RAG optimizations: <a href="https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retrieval-augmented-voice-ai">https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retr...</a><p>RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed.<p>Source: <a href="https://github.com/RunanywhereAI/RCLI" rel="nofollow">https://github.com/RunanywhereAI/RCLI</a> (MIT)<p>Demo: <a href="https://www.youtube.com/watch?v=eTYwkgNoaKg" rel="nofollow">https://www.youtube.com/watch?v=eTYwkgNoaKg</a><p>What would you build if on-device AI were genuinely as fast as cloud?

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1. As of 2026, the top 4 models on that leaderboard are still descendants.<p>The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pretraining carves out discrete functional circuits in the layer stack that only work when preserved whole.<p>The whole thing was developed on 2x RTX 4090s in my basement. I'm now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on a dual GH200 rig (see my other post). Code and new models coming soon.<p>Happy to answer questions.

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1. As of 2026, the top 4 models on that leaderboard are still descendants.<p>The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pretraining carves out discrete functional circuits in the layer stack that only work when preserved whole.<p>The whole thing was developed on 2x RTX 4090s in my basement. I'm now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on a dual GH200 rig (see my other post). Code and new models coming soon.<p>Happy to answer questions.

Show HN: I built a site where strangers leave kind voice notes for each other

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

Agent Kanban has 4 main features:<p>GitOps & team friendly kanban board integration inside VS Code Structured plan / todo / implement via @kanban commands Leverages your existing agent harness rather than trying to bundle a built in one .md task format provides a permanent (editable) source of truth including considerations, decisions and actions, that is resistant to context rot

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

Agent Kanban has 4 main features:<p>GitOps & team friendly kanban board integration inside VS Code Structured plan / todo / implement via @kanban commands Leverages your existing agent harness rather than trying to bundle a built in one .md task format provides a permanent (editable) source of truth including considerations, decisions and actions, that is resistant to context rot

Show HN: The Mog Programming Language

Hi, Ted here, creator of Mog.<p>- Mog is a statically typed, compiled, embedded language (think statically typed Lua) designed to be written by LLMs -- the full spec fits in 3,200 tokens. - An AI agent writes a Mog program, compiles it, and dynamically loads it as a plugin, script, or hook. - The host controls exactly which functions a Mog program can call (capability-based permissions), so permissions propagate from agent to agent-written code. - Compiled to native code for low-latency plugin execution -- no interpreter overhead, no JIT, no process startup cost. - The compiler is written in safe Rust so the entire toolchain can be audited for security. Even without a full security audit, Mog is already useful for agents extending themselves with their own code. - MIT licensed, contributions welcome.<p>Motivations for Mog:<p>1. Syntax Only an AI Could Love: Mog is written for AIs to write, so the spec fits easily in context (~3200 tokens), and it's intended to minimize foot-guns to lower the error rate when generating Mog code. This is why Mog has no operator precedence: non-associative operations have to use parentheses, e.g. (a + b) * c. It's also why there's no implicit type coercion, which I've found over the decades to be an annoying source of runtime bugs. There's also less support in Mog for generics, and there's absolutely no support for metaprogramming, macros, or syntactic abstraction.<p>When asking people to write code in a language, these restrictions could be onerous. But LLMs don't care, and the less expressivity you trust them with, the better.<p>2. Capabilities-Based Permissionsl: There's a paradox with existing security models for AI agents. If you give an agent like OpenClaw unfettered access to your data, that's insecure and you'll get pwned. But if you sandbox it, it can't do most of what you want. Worse, if you run scripts the agent wrote, those scripts don't inherit the permissions that constrain the agent's own bash tool calls, which leads to pwnage and other chaos. And that's not even assuming you run one of the many OpenClaw plugins with malware.<p>Mog tries to solve this by taking inspiration from embedded languages. It compiles all the way to machine code, ahead of time, but the compiler doesn't output any dangerous code (at least it shouldn't -- Mog is quite new, so that could still be buggy). This allows a host program, such as an AI agent, to generate Mog source code, compile it, and load it into itself using dlopen(), while maintaining security guarantees.<p>The main trick is that a Mog program on its own can't do much. It has no direct access to syscalls, libc, or memory. It can basically call functions, do heap allocations (but only within the arena the host gives it), and return something. If the host wants the Mog program to be able to do I/O, it has to supply the functions that the Mog program will call. A core invariant is that a Mog program should never be able to crash the host program, corrupt its state, or consume more resources than the host allows.<p>This allows the host to inspect the arguments to any potentially dangerous operation that the Mog program attempts, since it's code that runs in the host. For example, a host agent could give a Mog program a function to run a bash command, then enforce its own session-level permissions on that command, even though the command was dynamically generated by a plugin that was written without prior knowledge of those permission settings.<p>(There are a couple other tricks that PL people might find interesting. One is that the host can limit the execution time of the guest program. It does this using cooperative interrupt polling, i.e. the compiler inserts runtime checks that check if the host has asked the guest to stop. This causes a roughly 10% drop in performance on extremely tight loops, which are the worst case. It could almost certainly be optimized.)<p>3. Self Modification Without Restart: When I try to modify my OpenClaw from my phone, I have to restart the whole agent. Mog fixes this: an agent can compile and run new plugins without interrupting a session, which makes it dynamically responsive to user feedback (e.g., you tell it to always ask you before deleting a file and without any interruption it compiles and loads the code to... actually do that).<p>Async support is built into the language, by adapting LLVM's coroutine lowering to our Rust port of the QBE compiler, which is what Mog uses for compilation. The Mog host library can be slotted into an async event loop (tested with Bun), so Mog async calls get scheduled seamlessly by the agent's event loop. Another trick is that the Mog program uses a stack inside the memory arena that the host provides for it to run in, rather than the system stack. The system tracks a guard page between the stack and heap. This design prevents stack overflow without runtime overhead.<p>Lots of work still needs to be done to make Mog a "batteries-included" experience like Python. Most of that work involves fleshing out a standard library to include things like JSON, CSV, Sqlite, and HTTP. One high-impact addition would be an `llm` library that allows the guest to make LLM calls through the agent, which should support multiple models and token budgeting, so the host could prevent the plugin from burning too many tokens.<p>I suspect we'll also want to do more work to make the program lifecycle operations more ergonomic. And finally, there should be a more fully featured library for integrating a Mog host into an AI agent like OpenClaw or OpenAI's Codex CLI.

Show HN: The Mog Programming Language

Hi, Ted here, creator of Mog.<p>- Mog is a statically typed, compiled, embedded language (think statically typed Lua) designed to be written by LLMs -- the full spec fits in 3,200 tokens. - An AI agent writes a Mog program, compiles it, and dynamically loads it as a plugin, script, or hook. - The host controls exactly which functions a Mog program can call (capability-based permissions), so permissions propagate from agent to agent-written code. - Compiled to native code for low-latency plugin execution -- no interpreter overhead, no JIT, no process startup cost. - The compiler is written in safe Rust so the entire toolchain can be audited for security. Even without a full security audit, Mog is already useful for agents extending themselves with their own code. - MIT licensed, contributions welcome.<p>Motivations for Mog:<p>1. Syntax Only an AI Could Love: Mog is written for AIs to write, so the spec fits easily in context (~3200 tokens), and it's intended to minimize foot-guns to lower the error rate when generating Mog code. This is why Mog has no operator precedence: non-associative operations have to use parentheses, e.g. (a + b) * c. It's also why there's no implicit type coercion, which I've found over the decades to be an annoying source of runtime bugs. There's also less support in Mog for generics, and there's absolutely no support for metaprogramming, macros, or syntactic abstraction.<p>When asking people to write code in a language, these restrictions could be onerous. But LLMs don't care, and the less expressivity you trust them with, the better.<p>2. Capabilities-Based Permissionsl: There's a paradox with existing security models for AI agents. If you give an agent like OpenClaw unfettered access to your data, that's insecure and you'll get pwned. But if you sandbox it, it can't do most of what you want. Worse, if you run scripts the agent wrote, those scripts don't inherit the permissions that constrain the agent's own bash tool calls, which leads to pwnage and other chaos. And that's not even assuming you run one of the many OpenClaw plugins with malware.<p>Mog tries to solve this by taking inspiration from embedded languages. It compiles all the way to machine code, ahead of time, but the compiler doesn't output any dangerous code (at least it shouldn't -- Mog is quite new, so that could still be buggy). This allows a host program, such as an AI agent, to generate Mog source code, compile it, and load it into itself using dlopen(), while maintaining security guarantees.<p>The main trick is that a Mog program on its own can't do much. It has no direct access to syscalls, libc, or memory. It can basically call functions, do heap allocations (but only within the arena the host gives it), and return something. If the host wants the Mog program to be able to do I/O, it has to supply the functions that the Mog program will call. A core invariant is that a Mog program should never be able to crash the host program, corrupt its state, or consume more resources than the host allows.<p>This allows the host to inspect the arguments to any potentially dangerous operation that the Mog program attempts, since it's code that runs in the host. For example, a host agent could give a Mog program a function to run a bash command, then enforce its own session-level permissions on that command, even though the command was dynamically generated by a plugin that was written without prior knowledge of those permission settings.<p>(There are a couple other tricks that PL people might find interesting. One is that the host can limit the execution time of the guest program. It does this using cooperative interrupt polling, i.e. the compiler inserts runtime checks that check if the host has asked the guest to stop. This causes a roughly 10% drop in performance on extremely tight loops, which are the worst case. It could almost certainly be optimized.)<p>3. Self Modification Without Restart: When I try to modify my OpenClaw from my phone, I have to restart the whole agent. Mog fixes this: an agent can compile and run new plugins without interrupting a session, which makes it dynamically responsive to user feedback (e.g., you tell it to always ask you before deleting a file and without any interruption it compiles and loads the code to... actually do that).<p>Async support is built into the language, by adapting LLVM's coroutine lowering to our Rust port of the QBE compiler, which is what Mog uses for compilation. The Mog host library can be slotted into an async event loop (tested with Bun), so Mog async calls get scheduled seamlessly by the agent's event loop. Another trick is that the Mog program uses a stack inside the memory arena that the host provides for it to run in, rather than the system stack. The system tracks a guard page between the stack and heap. This design prevents stack overflow without runtime overhead.<p>Lots of work still needs to be done to make Mog a "batteries-included" experience like Python. Most of that work involves fleshing out a standard library to include things like JSON, CSV, Sqlite, and HTTP. One high-impact addition would be an `llm` library that allows the guest to make LLM calls through the agent, which should support multiple models and token budgeting, so the host could prevent the plugin from burning too many tokens.<p>I suspect we'll also want to do more work to make the program lifecycle operations more ergonomic. And finally, there should be a more fully featured library for integrating a Mog host into an AI agent like OpenClaw or OpenAI's Codex CLI.

Show HN: The Mog Programming Language

Hi, Ted here, creator of Mog.<p>- Mog is a statically typed, compiled, embedded language (think statically typed Lua) designed to be written by LLMs -- the full spec fits in 3,200 tokens. - An AI agent writes a Mog program, compiles it, and dynamically loads it as a plugin, script, or hook. - The host controls exactly which functions a Mog program can call (capability-based permissions), so permissions propagate from agent to agent-written code. - Compiled to native code for low-latency plugin execution -- no interpreter overhead, no JIT, no process startup cost. - The compiler is written in safe Rust so the entire toolchain can be audited for security. Even without a full security audit, Mog is already useful for agents extending themselves with their own code. - MIT licensed, contributions welcome.<p>Motivations for Mog:<p>1. Syntax Only an AI Could Love: Mog is written for AIs to write, so the spec fits easily in context (~3200 tokens), and it's intended to minimize foot-guns to lower the error rate when generating Mog code. This is why Mog has no operator precedence: non-associative operations have to use parentheses, e.g. (a + b) * c. It's also why there's no implicit type coercion, which I've found over the decades to be an annoying source of runtime bugs. There's also less support in Mog for generics, and there's absolutely no support for metaprogramming, macros, or syntactic abstraction.<p>When asking people to write code in a language, these restrictions could be onerous. But LLMs don't care, and the less expressivity you trust them with, the better.<p>2. Capabilities-Based Permissionsl: There's a paradox with existing security models for AI agents. If you give an agent like OpenClaw unfettered access to your data, that's insecure and you'll get pwned. But if you sandbox it, it can't do most of what you want. Worse, if you run scripts the agent wrote, those scripts don't inherit the permissions that constrain the agent's own bash tool calls, which leads to pwnage and other chaos. And that's not even assuming you run one of the many OpenClaw plugins with malware.<p>Mog tries to solve this by taking inspiration from embedded languages. It compiles all the way to machine code, ahead of time, but the compiler doesn't output any dangerous code (at least it shouldn't -- Mog is quite new, so that could still be buggy). This allows a host program, such as an AI agent, to generate Mog source code, compile it, and load it into itself using dlopen(), while maintaining security guarantees.<p>The main trick is that a Mog program on its own can't do much. It has no direct access to syscalls, libc, or memory. It can basically call functions, do heap allocations (but only within the arena the host gives it), and return something. If the host wants the Mog program to be able to do I/O, it has to supply the functions that the Mog program will call. A core invariant is that a Mog program should never be able to crash the host program, corrupt its state, or consume more resources than the host allows.<p>This allows the host to inspect the arguments to any potentially dangerous operation that the Mog program attempts, since it's code that runs in the host. For example, a host agent could give a Mog program a function to run a bash command, then enforce its own session-level permissions on that command, even though the command was dynamically generated by a plugin that was written without prior knowledge of those permission settings.<p>(There are a couple other tricks that PL people might find interesting. One is that the host can limit the execution time of the guest program. It does this using cooperative interrupt polling, i.e. the compiler inserts runtime checks that check if the host has asked the guest to stop. This causes a roughly 10% drop in performance on extremely tight loops, which are the worst case. It could almost certainly be optimized.)<p>3. Self Modification Without Restart: When I try to modify my OpenClaw from my phone, I have to restart the whole agent. Mog fixes this: an agent can compile and run new plugins without interrupting a session, which makes it dynamically responsive to user feedback (e.g., you tell it to always ask you before deleting a file and without any interruption it compiles and loads the code to... actually do that).<p>Async support is built into the language, by adapting LLVM's coroutine lowering to our Rust port of the QBE compiler, which is what Mog uses for compilation. The Mog host library can be slotted into an async event loop (tested with Bun), so Mog async calls get scheduled seamlessly by the agent's event loop. Another trick is that the Mog program uses a stack inside the memory arena that the host provides for it to run in, rather than the system stack. The system tracks a guard page between the stack and heap. This design prevents stack overflow without runtime overhead.<p>Lots of work still needs to be done to make Mog a "batteries-included" experience like Python. Most of that work involves fleshing out a standard library to include things like JSON, CSV, Sqlite, and HTTP. One high-impact addition would be an `llm` library that allows the guest to make LLM calls through the agent, which should support multiple models and token budgeting, so the host could prevent the plugin from burning too many tokens.<p>I suspect we'll also want to do more work to make the program lifecycle operations more ergonomic. And finally, there should be a more fully featured library for integrating a Mog host into an AI agent like OpenClaw or OpenAI's Codex CLI.

< 1 2 3 4 ... 951 952 953 >