The best Hacker News stories from Show from the past day

Go back

Latest posts:

Show HN: Pgclaw – A "Clawdbot" in every row with 400 lines of Postgres SQL

Hi HN,<p>Been hacking on a simple way to run agents entirely inside of a Postgres database, "an agent per row".<p>Things you could build with this: * Your own agent orchestrator * A personal assistant with time travel * (more things I can't think of yet)<p>Not quite there yet but thought I'd share it in its current state.

Show HN: Sol LeWitt-style instruction-based drawings in the browser

Sol LeWitt was a conceptual artist who never touched his own walls.<p>He wrote instructions and other people executed them, the original prompt engineer!<p>I bookmarked a project called "Solving Sol" seven years ago and made a repo in 2018. Committed a README. Never pushed anything else.<p>Fast forward to 2026, I finally built it.<p><a href="https://intervolz.com/sollewitt/" rel="nofollow">https://intervolz.com/sollewitt/</a>

Show HN: Moltis – AI assistant with memory, tools, and self-extending skills

Hey HN. I'm Fabien, principal engineer, 25 years shipping production systems (Ruby, Swift, now Rust). I built Moltis because I wanted an AI assistant I could run myself, trust end to end, and make extensible in the Rust way using traits and the type system. It shares some ideas with OpenClaw (same memory approach, Pi-inspired self-extension) but is Rust-native from the ground up. The agent can create its own skills at runtime.<p>Moltis is one Rust binary, 150k lines, ~60MB, web UI included. No Node, no Python, no runtime deps. Multi-provider LLM routing (OpenAI, local GGUF/MLX, Hugging Face), sandboxed execution (Docker/Podman/Apple Containers), hybrid vector + full-text memory, MCP tool servers with auto-restart, and multi-channel (web, Telegram, API) with shared context. MIT licensed. No telemetry phoning home, but full observability built in (OpenTelemetry, Prometheus).<p>I've included 1-click deploys on DigitalOcean and Fly.io, but since a Docker image is provided you can easily run it on your own servers as well. I've written before about owning your content (<a href="https://pen.so/2020/11/07/own-your-content/" rel="nofollow">https://pen.so/2020/11/07/own-your-content/</a>) and owning your email (<a href="https://pen.so/2020/12/10/own-your-email/" rel="nofollow">https://pen.so/2020/12/10/own-your-email/</a>). Same logic here: if something touches your files, credentials, and daily workflow, you should be able to inspect it, audit it, and fork it if the project changes direction.<p>It's alpha. I use it daily and I'm shipping because it's useful, not because it's done.<p>Longer architecture deep-dive: <a href="https://pen.so/2026/02/12/moltis-a-personal-ai-assistant-built-in-rust/" rel="nofollow">https://pen.so/2026/02/12/moltis-a-personal-ai-assistant-bui...</a><p>Happy to discuss the Rust architecture, security model, or local LLM setup. Would love feedback.

Show HN: Skill that lets Claude Code/Codex spin up VMs and GPUs

I've been working on CloudRouter, a skill + CLI that gives coding agents like Claude Code and Codex the ability to start cloud VMs and GPUs.<p>When an agent writes code, it usually needs to start a dev server, run tests, open a browser to verify its work. Today that all happens on your local machine. This works fine for a single task, but the agent is sharing your computer: your ports, RAM, screen. If you run multiple agents in parallel, it gets a bit chaotic. Docker helps with isolation, but it still uses your machine's resources, and doesn't give the agent a browser, a desktop, or a GPU to close the loop properly. The agent could handle all of this on its own if it had a primitive for starting VMs.<p>CloudRouter is that primitive — a skill that gives the agent its own machines. The agent can start a VM from your local project directory, upload the project files, run commands on the VM, and tear it down when it's done. If it needs a GPU, it can request one.<p><pre><code> cloudrouter start ./my-project cloudrouter start --gpu B200 ./my-project cloudrouter ssh cr_abc123 "npm install && npm run dev" </code></pre> Every VM comes with a VNC desktop, VS Code, and Jupyter Lab, all behind auth-protected URLs. When the agent is doing browser automation on the VM, you can open the VNC URL and watch it in real time. CloudRouter wraps agent-browser [1] for browser automation.<p><pre><code> cloudrouter browser open cr_abc123 "http://localhost:3000" cloudrouter browser snapshot -i cr_abc123 # → @e1 [link] Home @e2 [link] Settings @e3 [button] Sign Out cloudrouter browser click cr_abc123 @e2 cloudrouter browser screenshot cr_abc123 result.png </code></pre> Here's a short demo: <a href="https://youtu.be/SCkkzxKBcPE" rel="nofollow">https://youtu.be/SCkkzxKBcPE</a><p>What surprised me is how this inverted my workflow. Most cloud dev tooling starts from cloud (background agents, remote SSH, etc) to local for testing. But CloudRouter keeps your agents local and pushes the agent's work to the cloud. The agent does the same things it would do locally — running dev servers, operating browsers — but now on a VM. As I stopped watching agents work and worrying about local constraints, I started to run more tasks in parallel.<p>The GPU side is the part I'm most curious to see develop. Today if you want a coding agent to help with anything involving training or inference, there's a manual step where you go provision a machine. With CloudRouter the agent can just spin up a GPU sandbox, run the workload, and clean it up when it's done. Some of my friends have been using it to have agents run small experiments in parallel, but my ears are open to other use cases.<p>Would love your feedback and ideas. CloudRouter lives under packages/cloudrouter of our monorepo <a href="https://github.com/manaflow-ai/manaflow" rel="nofollow">https://github.com/manaflow-ai/manaflow</a>.<p>[1] <a href="https://github.com/vercel-labs/agent-browser" rel="nofollow">https://github.com/vercel-labs/agent-browser</a>

Show HN: Data Engineering Book – An open source, community-driven guide

Hi HN! I'm currently a Master's student at USTC (University of Science and Technology of China). I've been diving deep into Data Engineering, especially in the context of Large Language Models (LLMs).<p>The Problem: I found that learning resources for modern data engineering are often fragmented and scattered across hundreds of medium articles or disjointed tutorials. It's hard to piece everything together into a coherent system.<p>The Solution: I decided to open-source my learning notes and build them into a structured book. My goal is to help developers fast-track their learning curve.<p>Key Features:<p>LLM-Centric: Focuses on data pipelines specifically designed for LLM training and RAG systems.<p>Scenario-Based: Instead of just listing tools, I compare different methods/architectures based on specific business scenarios (e.g., "When to use Vector DB vs. Keyword Search").<p>Hands-on Projects: Includes full code for real-world implementations, not just "Hello World" examples.<p>This is a work in progress, and I'm treating it as "Book-as-Code". I would love to hear your feedback on the roadmap or any "anti-patterns" I might have included!<p>Check it out:<p>Online: <a href="https://datascale-ai.github.io/data_engineering_book/" rel="nofollow">https://datascale-ai.github.io/data_engineering_book/</a><p>GitHub: <a href="https://github.com/datascale-ai/data_engineering_book" rel="nofollow">https://github.com/datascale-ai/data_engineering_book</a>

Show HN: 20+ Claude Code agents coordinating on real work (open source)

Single-agent LLMs suck at long-running complex tasks.<p>We’ve open-sourced a multi-agent orchestrator that we’ve been using to handle long-running LLM tasks. We found that single LLM agents tend to stall, loop, or generate non-compiling code, so we built a harness for agents to coordinate over shared context while work is in progress.<p>How it works: 1. Orchestrator agent that manages task decomposition 2. Sub-agents for parallel work 3. Subscriptions to task state and progress 4. Real-time sharing of intermediate discoveries between agents<p>We tested this on a Putnam-level math problem, but the pattern generalizes to things like refactors, app builds, and long research. It’s packaged as a Claude Code skill and designed to be small, readable, and modifiable.<p>Use it, break it, tell me about what workloads we should try and run next!

Show HN: 20+ Claude Code agents coordinating on real work (open source)

Single-agent LLMs suck at long-running complex tasks.<p>We’ve open-sourced a multi-agent orchestrator that we’ve been using to handle long-running LLM tasks. We found that single LLM agents tend to stall, loop, or generate non-compiling code, so we built a harness for agents to coordinate over shared context while work is in progress.<p>How it works: 1. Orchestrator agent that manages task decomposition 2. Sub-agents for parallel work 3. Subscriptions to task state and progress 4. Real-time sharing of intermediate discoveries between agents<p>We tested this on a Putnam-level math problem, but the pattern generalizes to things like refactors, app builds, and long research. It’s packaged as a Claude Code skill and designed to be small, readable, and modifiable.<p>Use it, break it, tell me about what workloads we should try and run next!

Show HN: Agent Alcove – Claude, GPT, and Gemini debate across forums

Show HN: Agent Alcove – Claude, GPT, and Gemini debate across forums

Show HN: CodeRLM – Tree-sitter-backed code indexing for LLM agents

I've been building a tool that changes how LLM coding agents explore codebases, and I wanted to share it along with some early observations.<p>Typically claude code globs directories, greps for patterns, and reads files with minimal guidance. It works in kind of the same way you'd learn to navigate a city by walking every street. You'll eventually build a mental map, but claude never does - at least not any that persists across different contexts.<p>The Recursive Language Models paper from Zhang, Kraska, and Khattab at MIT CSAIL introduced a cleaner framing. Instead of cramming everything into context, the model gets a searchable environment. The model can then query just for what it needs and can drill deeper where needed.<p>coderlm is my implementation of that idea for codebases. A Rust server indexes a project with tree-sitter, builds a symbol table with cross-references, and exposes an API. The agent queries for structure, symbols, implementations, callers, and grep results — getting back exactly the code it needs instead of scanning for it.<p>The agent workflow looks like:<p>1. `init` — register the project, get the top-level structure<p>2. `structure` — drill into specific directories<p>3. `search` — find symbols by name across the codebase<p>4. `impl` — retrieve the exact source of a function or class<p>5. `callers` — find everything that calls a given symbol<p>6. `grep` — fall back to text search when you need it<p>This replaces the glob/grep/read cycle with index-backed lookups. The server currently supports Rust, Python, TypeScript, JavaScript, and Go for symbol parsing, though all file types show up in the tree and are searchable via grep.<p>It ships as a Claude Code plugin with hooks that guide the agent to use indexed lookups instead of native file tools, plus a Python CLI wrapper with zero dependencies.<p>For anecdotal results, I ran the same prompt against a codebase to "explore and identify opportunities to clarify the existing structure".<p>Using coderlm, claude was able to generate a plan in about 3 minutes. The coderlm enabled instance found a genuine bug (duplicated code with identical names), orphaned code for cleanup, mismatched naming conventions crossing module boundaries, and overlapping vocabulary. These are all <i>semantic</i> issues which clearly benefit from the tree-sitter centric approach.<p>Using the native tools, claude was able to identify various file clutter in the root of the project, out of date references, and a migration timestamp collision. These findings are more consistent with methodical walks of the filesystem and took about 8 minutes to produce.<p>The indexed approach did better at catching semantic issues than native tools and had a key benefit in being faster to resolve.<p>I've spent some effort to streamline the installation process, but it isn't turnkey yet. You'll need the rust toolchain to build the server which runs as a separate process. Installing the plugin from a claude marketplace is possible, but the skill isn't being added to your .claude yet so there are some manual steps to just getting to a point where claude could use it.<p>Claude continues to demonstrate significant resistance to using CodeRLM in exploration tasks. Typically to use you will need to explicitly direct claude to use it.<p>---<p>Repo: github.com/JaredStewart/coderlm<p>Paper: Recursive Language Models <a href="https://arxiv.org/abs/2512.24601" rel="nofollow">https://arxiv.org/abs/2512.24601</a> — Zhang, Kraska, Khattab (MIT CSAIL, 2025)<p>Inspired by: <a href="https://github.com/brainqub3/claude_code_RLM" rel="nofollow">https://github.com/brainqub3/claude_code_RLM</a>

Show HN: CodeRLM – Tree-sitter-backed code indexing for LLM agents

I've been building a tool that changes how LLM coding agents explore codebases, and I wanted to share it along with some early observations.<p>Typically claude code globs directories, greps for patterns, and reads files with minimal guidance. It works in kind of the same way you'd learn to navigate a city by walking every street. You'll eventually build a mental map, but claude never does - at least not any that persists across different contexts.<p>The Recursive Language Models paper from Zhang, Kraska, and Khattab at MIT CSAIL introduced a cleaner framing. Instead of cramming everything into context, the model gets a searchable environment. The model can then query just for what it needs and can drill deeper where needed.<p>coderlm is my implementation of that idea for codebases. A Rust server indexes a project with tree-sitter, builds a symbol table with cross-references, and exposes an API. The agent queries for structure, symbols, implementations, callers, and grep results — getting back exactly the code it needs instead of scanning for it.<p>The agent workflow looks like:<p>1. `init` — register the project, get the top-level structure<p>2. `structure` — drill into specific directories<p>3. `search` — find symbols by name across the codebase<p>4. `impl` — retrieve the exact source of a function or class<p>5. `callers` — find everything that calls a given symbol<p>6. `grep` — fall back to text search when you need it<p>This replaces the glob/grep/read cycle with index-backed lookups. The server currently supports Rust, Python, TypeScript, JavaScript, and Go for symbol parsing, though all file types show up in the tree and are searchable via grep.<p>It ships as a Claude Code plugin with hooks that guide the agent to use indexed lookups instead of native file tools, plus a Python CLI wrapper with zero dependencies.<p>For anecdotal results, I ran the same prompt against a codebase to "explore and identify opportunities to clarify the existing structure".<p>Using coderlm, claude was able to generate a plan in about 3 minutes. The coderlm enabled instance found a genuine bug (duplicated code with identical names), orphaned code for cleanup, mismatched naming conventions crossing module boundaries, and overlapping vocabulary. These are all <i>semantic</i> issues which clearly benefit from the tree-sitter centric approach.<p>Using the native tools, claude was able to identify various file clutter in the root of the project, out of date references, and a migration timestamp collision. These findings are more consistent with methodical walks of the filesystem and took about 8 minutes to produce.<p>The indexed approach did better at catching semantic issues than native tools and had a key benefit in being faster to resolve.<p>I've spent some effort to streamline the installation process, but it isn't turnkey yet. You'll need the rust toolchain to build the server which runs as a separate process. Installing the plugin from a claude marketplace is possible, but the skill isn't being added to your .claude yet so there are some manual steps to just getting to a point where claude could use it.<p>Claude continues to demonstrate significant resistance to using CodeRLM in exploration tasks. Typically to use you will need to explicitly direct claude to use it.<p>---<p>Repo: github.com/JaredStewart/coderlm<p>Paper: Recursive Language Models <a href="https://arxiv.org/abs/2512.24601" rel="nofollow">https://arxiv.org/abs/2512.24601</a> — Zhang, Kraska, Khattab (MIT CSAIL, 2025)<p>Inspired by: <a href="https://github.com/brainqub3/claude_code_RLM" rel="nofollow">https://github.com/brainqub3/claude_code_RLM</a>

Rari – Rust-powered React framework

Show HN: Geo Racers – Race from London to Tokyo on a single bus pass

Show HN: Geo Racers – Race from London to Tokyo on a single bus pass

Show HN: Stripe-no-webhooks – Sync your Stripe data to your Postgres DB

Hey HN, stripe-no-webhooks is an open-source library that syncs your Stripe payments data to your own Postgres database: <a href="https://github.com/pretzelai/stripe-no-webhooks" rel="nofollow">https://github.com/pretzelai/stripe-no-webhooks</a>.<p>Here's a demo video: <a href="https://youtu.be/cyEgW7wElcs" rel="nofollow">https://youtu.be/cyEgW7wElcs</a><p>Why is this useful? (1) You don't have to figure out which webhooks you need or write listeners for each one. The library handles all of that. This follows the approach of libraries like dj-stripe in the Django world (<a href="https://dj-stripe.dev/" rel="nofollow">https://dj-stripe.dev/</a>). (2) Stripe's API has a 100 rpm rate limit. If you're checking subscription status frequently or building internal tools, you'll hit it. Querying your own Postgres doesn't have this problem. (3) You can give an AI agent read access to the stripe.* schema to debug payment issues—failed charges, refunds, whatever—without handing over Stripe dashboard access. (4) You can join Stripe data with your own tables for custom analytics, LTV calculations, etc.<p>It creates a webhook endpoint in your Stripe account to forward webhooks to your backend where a webhook listener stores all the data into a new <i>stripe.*</i> schema. You define your plans in TypeScript, run a sync command, and the library takes care of creating Stripe products and prices, handling webhooks, and keeping your database in sync. We also let you backfill your Stripe data for existing accounts.<p>It supports pre-paid usage credits, account wallets and usage-based billing. It also lets you generate a pricing table component that you can customize. You can access the user information using the simple API the library provides:<p><pre><code> billing.subscriptions.get({ userId }); billing.credits.consume({ userId, key: "api_calls", amount: 1 }); billing.usage.record({ userId, key: "ai_model_tokens_input", amount: 4726 }); </code></pre> Effectively, you don't have to deal with either the Stripe dashboard or the Stripe API/SDK any more if you don't want to. The library gives you a nice abstraction on top of Stripe that should cover ~most subscription payment use-cases.<p>Let's see how it works with a quick example. Say you have a billing plan like Cursor (the IDE) used to have: $20/mo, you get 500 API completions + 2000 tab completions, you can buy additional API credits, and any additional usage is billed as overage.<p>You define your plan in TypeScript:<p><pre><code> { name: "Pro", description: "Cursor Pro plan", price: [{ amount: 2000, currency: "usd", interval: "month" }], features: { api_completion: { pricePerCredit: 1, // 1 cent per unit trackUsage: true, // Enable usage billing credits: { allocation: 500 }, displayName: "API Completions", }, tab_completion: { credits: { allocation: 2000 }, displayName: "Tab Completions", }, }, } </code></pre> Then on the CLI, you run the `init` command which creates the DB tables + some API handlers. Run `sync` to sync the plans to your Stripe account and create a webhook endpoint. When a subscription is created, the library automatically grants the 500 API completion credits and 2000 tab completion credits to the user. Renewals and up/downgrades are handled sanely.<p>Consume code would look like this:<p><pre><code> await billing.credits.consume({ userId: user.id, key: "api_completion", amount: 1, }); </code></pre> And if they want to allow manual top-ups by the user:<p><pre><code> await billing.credits.topUp({ userId: user.id, key: "api_completion", amount: 500, // buy 500 credits, charges $5.00 }); </code></pre> Similarly, we have APIs for wallets and usage.<p>This would be a lot of work to implement by yourself on top of Stripe. You need to keep track of all of these entitlements in your own DB and deal with renewals, expiry, ad-hoc grants, etc. It's definitely doable, especially with AI coding, but you'll probably end up building something fragile and hard to maintain.<p>This is just a high-level overview of what the library is capable of. It also supports seat-level credits, monetary wallets (with micro-cent precision), auto top-ups, robust failure recovery, tax collection, invoices, and an out-of-the-box pricing table.<p>I vibe-coded a little toy app for testing: <a href="https://snw-test.vercel.app" rel="nofollow">https://snw-test.vercel.app</a>. There's no validation so feel free to sign up with a dummy email, then subscribe to a plan with a test card: 4242 4242 4242 4242, any future expiry, any 3-digit CVV.<p>Screenshot: <a href="https://imgur.com/a/demo-screenshot-Rh6Ucqx" rel="nofollow">https://imgur.com/a/demo-screenshot-Rh6Ucqx</a><p>Feel free to try it out! If you end up using this library, please report any bugs on the repo. If you're having trouble / want to chat, I'm happy to help - my contact is in my HN profile.

Show HN: Clawe – open-source Trello for agent teams

We recently started to use agents to update some documentation across our codebase on a weekly basis, and everything quickly turned into cron jobs, logs, and terminal output.<p>it worked, but was hard to tell what agents were doing, why something failed, or whether a workflow was actually progressing.<p>We thought it would be more interesting to treat agents as long-lived workers with state and responsibilities and explicit handoffs. Something you can actually see and reason about, instead of just tailing logs.<p>So we built Clawe, a small coordination layer on top of OpenClaw that lets agent workflows run, pause, retry, and hand control back to a human at specific points.<p>This started as an experiment in how agent systems might feel to operate, but we're starting to see real potential for it, especially for content review and maintenance workflows in marketing. Curious what abstractions make sense, what feels unnecessary, and what breaks first.<p>Repo: <a href="https://github.com/getclawe/clawe" rel="nofollow">https://github.com/getclawe/clawe</a>

Show HN: Itsyhome – Control HomeKit from your Mac menu bar (open source)

Hey HN!<p>Nick here – developer of Itsyhome, a menu bar app for macOS that gives you control over your whole HomeKit fleet (and very soon Home Assistant). I run 130+ HomeKit devices at home and the Home app was too heavy for quick adjustments.<p>Full HomeKit support, favourites, hidden items, device groups, pinning of rooms/accessories/groups as separate menu bar items, iCloud sync – all in a native experience and tiny package.<p>Open source (<a href="https://github.com/nickustinov/itsyhome-macos" rel="nofollow">https://github.com/nickustinov/itsyhome-macos</a>) and free to use (there is an optional one-time purchase for a Pro version which includes cameras and automation features).<p>Itsyhome is a Mac Catalyst app because HomeKit requires the iOS SDK, so it runs a headless Catalyst process for HomeKit (and now Home Assistant) access while using a native AppKit plugin over a bridge protocol to provide the actual menu bar UI – since AppKit gives you the real macOS menu bar experience that Catalyst alone can't.<p>It comes with deeplink support, a webhook server, a CLI tool (golang, all open source), a Stream Deck plugin (open source, all accessories supported), and the recent update also includes an SSE event stream (HomeKit and HA) - you can curl -N localhost:8423/events and get a real-time JSON stream of every device state change in your home.<p>Home Assistant version is still in beta – would anyone be willing to test it via TestFlight?<p>Appreciate any feedback and happy to answer any questions.

Show HN: I made paperboat.website, a platform for friends and creativity

Show HN: Agent framework that generates its own topology and evolves at runtime

Hi HN,<p>I’m Vincent from Aden. We spent 4 years building ERP automation for construction (PO/invoice reconciliation). We had real enterprise customers but hit a technical wall: Chatbots aren't for real work. Accountants don't want to chat; they want the ledger reconciled while they sleep. They want services, not tools.<p>Existing agent frameworks (LangChain, AutoGPT) failed in production - brittle, looping, and unable to handle messy data. General Computer Use (GCU) frameworks were even worse. My reflections:<p>1. The "Toy App" Ceiling & GCU Trap Most frameworks assume synchronous sessions. If the tab closes, state is lost. You can't fit 2 weeks of asynchronous business state into an ephemeral chat session.<p>The GCU hype (agents "looking" at screens) is skeuomorphic. It’s slow (screenshots), expensive (tokens), and fragile (UI changes = crash). It mimics human constraints rather than leveraging machine speed. Real automation should be headless.<p>2. Inversion of Control: OODA > DAGs Traditional DAGs are deterministic; if a step fails, the program crashes. In the AI era, the Goal is the law, not the Code. We use an OODA loop to manage stochastic behavior:<p>- Observe: Exceptions are observations (FileNotFound = new state), not crashes.<p>- Orient: Adjust strategy based on Memory and - Traits.<p>- Decide: Generate new code at runtime.<p>- Act: Execute.<p>The topology shouldn't be hardcoded; it should emerge from the task's entropy.<p>3. Reliability: The "Synthetic" SLA You can't guarantee one inference ($k=1$) is correct, but you can guarantee a System of Inference ($k=n$) converges on correctness. Reliability is now a function of compute budget. By wrapping an 80% accurate model in a "Best-of-3" verification loop, we mathematically force the error rate down—trading Latency/Tokens for Certainty.<p>4. Biology & Psychology in Code "Hard Logic" can't solve "Soft Problems." We map cognition to architectural primitives: Homeostasis: Solving "Perseveration" (infinite loops) via a "Stress" metric. If an action fails 3x, "neuroplasticity" drops, forcing a strategy shift. Traits: Personality as a constraint. "High Conscientiousness" increases verification; "High Risk" executes DROP TABLE without asking.<p>For the industry, we need engineers interested in the intersection of biology, psychology, and distributed systems to help us move beyond brittle scripts. It'd be great to have you roasting my codes and sharing feedback.<p>Repo: <a href="https://github.com/adenhq/hive" rel="nofollow">https://github.com/adenhq/hive</a>

Show HN: Agent framework that generates its own topology and evolves at runtime

Hi HN,<p>I’m Vincent from Aden. We spent 4 years building ERP automation for construction (PO/invoice reconciliation). We had real enterprise customers but hit a technical wall: Chatbots aren't for real work. Accountants don't want to chat; they want the ledger reconciled while they sleep. They want services, not tools.<p>Existing agent frameworks (LangChain, AutoGPT) failed in production - brittle, looping, and unable to handle messy data. General Computer Use (GCU) frameworks were even worse. My reflections:<p>1. The "Toy App" Ceiling & GCU Trap Most frameworks assume synchronous sessions. If the tab closes, state is lost. You can't fit 2 weeks of asynchronous business state into an ephemeral chat session.<p>The GCU hype (agents "looking" at screens) is skeuomorphic. It’s slow (screenshots), expensive (tokens), and fragile (UI changes = crash). It mimics human constraints rather than leveraging machine speed. Real automation should be headless.<p>2. Inversion of Control: OODA > DAGs Traditional DAGs are deterministic; if a step fails, the program crashes. In the AI era, the Goal is the law, not the Code. We use an OODA loop to manage stochastic behavior:<p>- Observe: Exceptions are observations (FileNotFound = new state), not crashes.<p>- Orient: Adjust strategy based on Memory and - Traits.<p>- Decide: Generate new code at runtime.<p>- Act: Execute.<p>The topology shouldn't be hardcoded; it should emerge from the task's entropy.<p>3. Reliability: The "Synthetic" SLA You can't guarantee one inference ($k=1$) is correct, but you can guarantee a System of Inference ($k=n$) converges on correctness. Reliability is now a function of compute budget. By wrapping an 80% accurate model in a "Best-of-3" verification loop, we mathematically force the error rate down—trading Latency/Tokens for Certainty.<p>4. Biology & Psychology in Code "Hard Logic" can't solve "Soft Problems." We map cognition to architectural primitives: Homeostasis: Solving "Perseveration" (infinite loops) via a "Stress" metric. If an action fails 3x, "neuroplasticity" drops, forcing a strategy shift. Traits: Personality as a constraint. "High Conscientiousness" increases verification; "High Risk" executes DROP TABLE without asking.<p>For the industry, we need engineers interested in the intersection of biology, psychology, and distributed systems to help us move beyond brittle scripts. It'd be great to have you roasting my codes and sharing feedback.<p>Repo: <a href="https://github.com/adenhq/hive" rel="nofollow">https://github.com/adenhq/hive</a>

1 2 3 ... 939 940 941 >