The best Hacker News stories from Show from the past day
Latest posts:
Show HN: Stop playing my matchstick puzzles, start building your own in seconds
Show HN: Large Scale Article Extract of Newspapers 1730s-1960s
Hello HN, over the past 7 months I've spent nearly 3,000 hours on building SNEWPAPERS, the first historical newpaper archive with full-text extractions, nearly perfect OCR, a vast categorization taxonomy and of course with semantic and agentic search capabilities.<p>Problem:
I wanted to search through newspaper archives, but when I tried every service only lets you search for keywords and dates, and gives you back raw images of the papers, and too many of them with no context. A sea of noise.<p>Solution:
I taught machines how to read the newspapers and so far I've extracted the content from > 600k pages (about 5TB) from the Chronicling America collection. Problems I had to deal with were an infinite variety of layouts, font sizes, image scan qualities, resolutions, aspect ratios, navigating around the images on the page. I also had to figure out how to get OCR to be nearly perfect so people wouldn't hate reading the extracts. I stitched together a multi-model pipeline (layout tech, ocr tech, llm, vllm) with heuristics to go from layout -> segmentation -> classification. I put it all in OpenSearch / Postgres and made it semantically searchable and also put an agentic search tool on top that knows how to use the API really well and helps you write queries to find what you're looking for. Happy to discuss AWS architecture and scaling as well, that was tough!<p>If you have five minutes and you just want to jump in and have your own personalized experience, what I would suggest is:<p>Before searching for anything, go to the Sleuth page
Ask it about anything from 1736 to 1963, maybe 1 or 2 follow up questions
Then go to the search page so you can see the queries it wrote for you (bottom left "saved queries") and uncover more info on whatever it is you're interested in<p>If you think it's cool and you want to learn more, then there's about 10 minutes of video guides on the various capabilities in "Guide" on the nav bar<p>Some other people have also taken a crack at this, notably:<p><a href="https://dell-research-harvard.github.io/resources/americanstories" rel="nofollow">https://dell-research-harvard.github.io/resources/americanst...</a> (very good attempt)
<a href="https://labs.loc.gov/work/experiments/newspaper-navigator/" rel="nofollow">https://labs.loc.gov/work/experiments/newspaper-navigator/</a> (focused on images)
Show HN: Mljar Studio – local AI data analyst that saves analysis as notebooks
Hi HN,<p>I’ve been working on mljar-supervised (open-source AutoML for tabular data) for a few years. Recently I built a desktop app around it called MLJAR Studio.<p>The idea is simple: you talk to your data in natural language, the AI generates Python code, executes it locally, and the whole conversation becomes a reproducible notebook (*.ipynb file). So instead of just chatting with data, you end up with something you can inspect, modify, and rerun.<p>What MLJAR Studio does:<p>- Sets up a local Python environment automatically, runs on Mac, Windows, and Linux<p>- Installs missing packages during the conversation<p>- Built-in AutoML for tabular data (classification, regression, multiclass)<p>- Works with standard Python libraries (pandas, matplotlib, etc.)<p>- Works with any data file: CSV, Excel, Stata, Parquet ...<p>- Connects to PostgreSQL, MySQL, SQL Server, Snowflake, Databricks, and Supabase.<p>For AI: use Ollama locally (zero data egress), bring your own OpenAI key, or use MLJAR AI add-on.<p>I built this because I wanted something between Jupyter Notebook (flexible but manual) and AI tools that generate code but don’t preserve the workflow. Most tools I tried either hide too much or don’t give reproducible results and are cloud based<p>Demos:<p>- 60-second demo: <a href="https://youtu.be/BjxpZYRiY4c" rel="nofollow">https://youtu.be/BjxpZYRiY4c</a><p>- Full 3-minute analysis: <a href="https://youtu.be/1DHMMxaNJxI" rel="nofollow">https://youtu.be/1DHMMxaNJxI</a><p>Pricing is $199 one-time, with a 7-day trial.<p>Curious if this is useful for others doing real data work, or if I’m solving my own problem here.<p>Happy to answer questions.
Show HN: Mljar Studio – local AI data analyst that saves analysis as notebooks
Hi HN,<p>I’ve been working on mljar-supervised (open-source AutoML for tabular data) for a few years. Recently I built a desktop app around it called MLJAR Studio.<p>The idea is simple: you talk to your data in natural language, the AI generates Python code, executes it locally, and the whole conversation becomes a reproducible notebook (*.ipynb file). So instead of just chatting with data, you end up with something you can inspect, modify, and rerun.<p>What MLJAR Studio does:<p>- Sets up a local Python environment automatically, runs on Mac, Windows, and Linux<p>- Installs missing packages during the conversation<p>- Built-in AutoML for tabular data (classification, regression, multiclass)<p>- Works with standard Python libraries (pandas, matplotlib, etc.)<p>- Works with any data file: CSV, Excel, Stata, Parquet ...<p>- Connects to PostgreSQL, MySQL, SQL Server, Snowflake, Databricks, and Supabase.<p>For AI: use Ollama locally (zero data egress), bring your own OpenAI key, or use MLJAR AI add-on.<p>I built this because I wanted something between Jupyter Notebook (flexible but manual) and AI tools that generate code but don’t preserve the workflow. Most tools I tried either hide too much or don’t give reproducible results and are cloud based<p>Demos:<p>- 60-second demo: <a href="https://youtu.be/BjxpZYRiY4c" rel="nofollow">https://youtu.be/BjxpZYRiY4c</a><p>- Full 3-minute analysis: <a href="https://youtu.be/1DHMMxaNJxI" rel="nofollow">https://youtu.be/1DHMMxaNJxI</a><p>Pricing is $199 one-time, with a 7-day trial.<p>Curious if this is useful for others doing real data work, or if I’m solving my own problem here.<p>Happy to answer questions.
Show HN: Filling PDF forms with AI using client-side tool calling
Hey HN!<p>I built SimplePDF Copilot: an AI assistant that can interact with the PDF editor. It fills fields, answers questions, focuses on a specific field, adds fields, deletes pages, and so on.<p>It's built on top of SimplePDF that I started 7 years ago, pioneering privacy-respecting client-side pdf editing, now used monthly by 200k+ people.<p>As for the privacy model: the PDF itself never leaves the browser. Parsing, rendering, and field detection all run client-side.<p>The text the model needs (and your messages) goes to whatever LLM you point at. By default that's our demo proxy (DeepSeek V4 Flash, rate-capped), but you can BYOK and point it at any cloud provider, or go fully local (I've been testing with LM Studio).<p>Unlike the existing "Chat with PDF" tools that only retrieve the text/OCR layer, Copilot can act on the PDF: filling fields, adding fields (detected client-side using CommonForms by Joe Barrow [1], jbarrow on HN with some post-processing heuristics I added on top), focusing on fields, deleting pages, and so on.<p>I built this because SimplePDF is mostly used by healthcare customers where document privacy is paramount, and I wanted an AI experience that didn't require shipping PII to a third party.
Stack is pretty standard:<p>- Tanstack Start<p>- AI SDK from Vercel<p>- Tailwind (I personally prefer CSS modules, I'm old-school but the goal since I open source it, I figured that Tailwind would be a better fit)<p>The more interesting part is the client-side tool calling: events are passed back and forth via iframe postMessage.<p>If you're not familiar with "tool calling" and "client-side tool calling", a quick primer:<p>Tool calling is what LLMs use to take actions. When Claude runs grep or ls, or hits an MCP server, those are tool calls.<p>Client-side tool calling means the intent to call a tool comes from the LLM, but the execution happens in the browser.<p>That matters for: speed, you can't go faster than client-to-client operations and also gives you the ability to limit the data you expose to the LLM. For the demo I do feed the content of the document to the LLM, but that connection could be severed as simply as removing the tool that exposes the content data.<p>The demo is fully open source, available on Github [2] and the demo is the same as the link of this post [3]<p>What's not open source is SimplePDF itself (loaded as the iframe).<p>I could talk on and on about this, let me know if you have any questions, anything goes!<p>[1] <a href="https://github.com/jbarrow/commonforms" rel="nofollow">https://github.com/jbarrow/commonforms</a><p>[2] <a href="https://github.com/SimplePDF/simplepdf-embed/tree/main/copilot" rel="nofollow">https://github.com/SimplePDF/simplepdf-embed/tree/main/copil...</a><p>[3] <a href="https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228e6ff7b11eb3f2d945b6175913e87898ec96ca8076&form=w9&lang=en" rel="nofollow">https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228...</a>
Show HN: Piruetas – A self-hosted diary app I built for my girlfriend
I searched for a simple, self-hosted journal app for my girlfriend and everything I found was either too
complex, too feature-heavy, too feature-less for what I needed or required trusting a cloud service.<p>So I built Piruetas (it means pirouettes in Spanish - she chose the name btw).<p>It's a day-per-page diary with rich text editing, drag-and-drop image uploads, auto-save, public
share links, and a clean mobile UI.
It can be set up for Personal or Multi-user usage via docker compose deployment.<p>She seems to like it so I decided to give back to the community and make it available for everyone (after some QA)<p>Live demo: <a href="https://piruet.app" rel="nofollow">https://piruet.app</a> (login: demo / piruetas — data resets every 30 min!)
GitHub: <a href="https://github.com/patillacode/piruetas" rel="nofollow">https://github.com/patillacode/piruetas</a>
Show HN: Ableton Live MCP
Ever wanted to control Ableton with just your voice? Me too! I made this MCP server so I could just ask Codex to do anything in Ableton Live for me, while I was nap-trapped by my baby.<p>The chat messages I sent to Codex to make this:<p><i>in ableton, make a self reflective song, with audio vocals (via macos say) and chip tunes and 80's drum machines. should be a real edm banger<p>i want midi for everything but vocals please, with ableton devices. not prerendered audio for instruments<p>needs some fills<p>and should hit way harder after "3-2-1 i become the sound"<p>the vocals are squished too much (read too quickly), give them a little more length<p>add some dynamics, the song is basically one volume. and some pumping side chain<p>improve dynamics of the clap, seems a bit flat and indistinguished, want it harder after the 3-2-1 drop<p>introduce a new element on a new track after the 3-2-1 drop, that comes in but then recedes before the final exit<p>doesn't seem like the new thing has any notes<p>the element is a bit muddy/indistinct. perhaps it needs simplification and more space, different instrument choice, i dunno</i>
Show HN: Ableton Live MCP
Ever wanted to control Ableton with just your voice? Me too! I made this MCP server so I could just ask Codex to do anything in Ableton Live for me, while I was nap-trapped by my baby.<p>The chat messages I sent to Codex to make this:<p><i>in ableton, make a self reflective song, with audio vocals (via macos say) and chip tunes and 80's drum machines. should be a real edm banger<p>i want midi for everything but vocals please, with ableton devices. not prerendered audio for instruments<p>needs some fills<p>and should hit way harder after "3-2-1 i become the sound"<p>the vocals are squished too much (read too quickly), give them a little more length<p>add some dynamics, the song is basically one volume. and some pumping side chain<p>improve dynamics of the clap, seems a bit flat and indistinguished, want it harder after the 3-2-1 drop<p>introduce a new element on a new track after the 3-2-1 drop, that comes in but then recedes before the final exit<p>doesn't seem like the new thing has any notes<p>the element is a bit muddy/indistinct. perhaps it needs simplification and more space, different instrument choice, i dunno</i>
Show HN: Ableton Live MCP
Ever wanted to control Ableton with just your voice? Me too! I made this MCP server so I could just ask Codex to do anything in Ableton Live for me, while I was nap-trapped by my baby.<p>The chat messages I sent to Codex to make this:<p><i>in ableton, make a self reflective song, with audio vocals (via macos say) and chip tunes and 80's drum machines. should be a real edm banger<p>i want midi for everything but vocals please, with ableton devices. not prerendered audio for instruments<p>needs some fills<p>and should hit way harder after "3-2-1 i become the sound"<p>the vocals are squished too much (read too quickly), give them a little more length<p>add some dynamics, the song is basically one volume. and some pumping side chain<p>improve dynamics of the clap, seems a bit flat and indistinguished, want it harder after the 3-2-1 drop<p>introduce a new element on a new track after the 3-2-1 drop, that comes in but then recedes before the final exit<p>doesn't seem like the new thing has any notes<p>the element is a bit muddy/indistinct. perhaps it needs simplification and more space, different instrument choice, i dunno</i>
Show HN: DAC – open-source dashboard as code tool for agents and humans
Hi all, this is Burak.<p>When agents became a reality one of the first things I wanted to do was to automate building dashboards. The first, and the most obvious, wall that I ran into was that a lot of the tools were just driven by UI. This meant that without the agents handling browser UIs and whatnot, it wasn't possible to have the agents do that. In addition, it would be impossible to review any of the changes the agent would make.<p>The first instinct there is to get your agent to build a React app for the dashboard. This works beautifully for the happy path, but I quickly ran into other issues there:
- every dashboard turns out to be different
- have to implement a backend to centralize the query execution
- there is no centralized mechanism to control the rules and standards around visualizations
- there is no way to get a semantic layer working with the dashboards easily<p>In the end, agents ended up reinventing the wheel for every new dashboard, even under the same project. Building a standardized, local project for these turned out to be building a BI tool from scratch.<p>After trying these out, I asked myself: what if the dashboards were built for agents as the primary user?<p>A product like that would need to have a couple of features:
- First of all, everything needs to be driven by version-controllable text. YAML is fine.
- Changes to the dashboards should be easy to review and understand by humans.
- Agents are great at writing code, it'd be great if this were driven by code to have dynamic stuff: JSX would be great.
- Static analysis being a first-class citizen: validate dashboards before deploying. Agents can check their work too.
- A standardized way of deploying these based on a couple of files in a folder: operationally very simple.
- Built-in semantic layer to standardize metrics.<p>That's what I ended up building: dac (Dashboard-As-Code) is an open-source tool and a spec to define dashboards, well, as code. It contains an implementation in Go that can be deployed as a single binary anywhere. The dashboards are defined in YAML and JSX, YAML for static stuff, JSX for dynamic dashboards. You can run queries at load time to define conditional charts, generate tabs on the fly per customer, or list charts for each A/B test you are running.<p>I built it in Go because I do love Go, and I think it is the greatest language at the moment to work with AI agents.<p>dac runs as a single binary, you can get started with a `dac init` command and it'll automatically create some sample dashboards for you based on duckdb. It supports 10+ SQL backends, with more to come. It supports validation, custom themes and whatnot.<p>You can see it here: <a href="https://github.com/bruin-data/dac" rel="nofollow">https://github.com/bruin-data/dac</a><p>I would love to hear what can be improved here, please let me know your thoughts.
Show HN: DAC – open-source dashboard as code tool for agents and humans
Hi all, this is Burak.<p>When agents became a reality one of the first things I wanted to do was to automate building dashboards. The first, and the most obvious, wall that I ran into was that a lot of the tools were just driven by UI. This meant that without the agents handling browser UIs and whatnot, it wasn't possible to have the agents do that. In addition, it would be impossible to review any of the changes the agent would make.<p>The first instinct there is to get your agent to build a React app for the dashboard. This works beautifully for the happy path, but I quickly ran into other issues there:
- every dashboard turns out to be different
- have to implement a backend to centralize the query execution
- there is no centralized mechanism to control the rules and standards around visualizations
- there is no way to get a semantic layer working with the dashboards easily<p>In the end, agents ended up reinventing the wheel for every new dashboard, even under the same project. Building a standardized, local project for these turned out to be building a BI tool from scratch.<p>After trying these out, I asked myself: what if the dashboards were built for agents as the primary user?<p>A product like that would need to have a couple of features:
- First of all, everything needs to be driven by version-controllable text. YAML is fine.
- Changes to the dashboards should be easy to review and understand by humans.
- Agents are great at writing code, it'd be great if this were driven by code to have dynamic stuff: JSX would be great.
- Static analysis being a first-class citizen: validate dashboards before deploying. Agents can check their work too.
- A standardized way of deploying these based on a couple of files in a folder: operationally very simple.
- Built-in semantic layer to standardize metrics.<p>That's what I ended up building: dac (Dashboard-As-Code) is an open-source tool and a spec to define dashboards, well, as code. It contains an implementation in Go that can be deployed as a single binary anywhere. The dashboards are defined in YAML and JSX, YAML for static stuff, JSX for dynamic dashboards. You can run queries at load time to define conditional charts, generate tabs on the fly per customer, or list charts for each A/B test you are running.<p>I built it in Go because I do love Go, and I think it is the greatest language at the moment to work with AI agents.<p>dac runs as a single binary, you can get started with a `dac init` command and it'll automatically create some sample dashboards for you based on duckdb. It supports 10+ SQL backends, with more to come. It supports validation, custom themes and whatnot.<p>You can see it here: <a href="https://github.com/bruin-data/dac" rel="nofollow">https://github.com/bruin-data/dac</a><p>I would love to hear what can be improved here, please let me know your thoughts.
Show HN: Pollen – distributed WASM runtime, no control plane, single binary
Show HN: Pollen – distributed WASM runtime, no control plane, single binary
Show HN: Apple's SHARP running in the browser via ONNX runtime web
Hi HN, author here. SHARP is Apple's recent single-image 3D Gaussian splatting model (<a href="https://arxiv.org/abs/2512.10685" rel="nofollow">https://arxiv.org/abs/2512.10685</a>). Their reference code is PyTorch + a pretty heavy pipeline; I wanted to see if it could run in a browser with no server hop, so I exported the predictor to ONNX and ran it via onnxruntime-web with the WebGPU EP.<p>What works: drop in an image, get a .ply you can download or preview live, all on your machine — your image never leaves the tab. The model is large (~2.4 GB sidecar) so first load is slow on a cold cache, but inference itself is a few seconds on a recent Mac.<p>Caveats: SHARP's released weights are research-use only (Apple's model license, not the code's). I host the exported ONNX on R2 so thedemo "just works", but you can also export your own from the upstream Apple repo and upload locally.<p>Happy to talk about it in the comments :)
Show HN: Apple's SHARP running in the browser via ONNX runtime web
Hi HN, author here. SHARP is Apple's recent single-image 3D Gaussian splatting model (<a href="https://arxiv.org/abs/2512.10685" rel="nofollow">https://arxiv.org/abs/2512.10685</a>). Their reference code is PyTorch + a pretty heavy pipeline; I wanted to see if it could run in a browser with no server hop, so I exported the predictor to ONNX and ran it via onnxruntime-web with the WebGPU EP.<p>What works: drop in an image, get a .ply you can download or preview live, all on your machine — your image never leaves the tab. The model is large (~2.4 GB sidecar) so first load is slow on a cold cache, but inference itself is a few seconds on a recent Mac.<p>Caveats: SHARP's released weights are research-use only (Apple's model license, not the code's). I host the exported ONNX on R2 so thedemo "just works", but you can also export your own from the upstream Apple repo and upload locally.<p>Happy to talk about it in the comments :)
Show HN: Apple's SHARP running in the browser via ONNX runtime web
Hi HN, author here. SHARP is Apple's recent single-image 3D Gaussian splatting model (<a href="https://arxiv.org/abs/2512.10685" rel="nofollow">https://arxiv.org/abs/2512.10685</a>). Their reference code is PyTorch + a pretty heavy pipeline; I wanted to see if it could run in a browser with no server hop, so I exported the predictor to ONNX and ran it via onnxruntime-web with the WebGPU EP.<p>What works: drop in an image, get a .ply you can download or preview live, all on your machine — your image never leaves the tab. The model is large (~2.4 GB sidecar) so first load is slow on a cold cache, but inference itself is a few seconds on a recent Mac.<p>Caveats: SHARP's released weights are research-use only (Apple's model license, not the code's). I host the exported ONNX on R2 so thedemo "just works", but you can also export your own from the upstream Apple repo and upload locally.<p>Happy to talk about it in the comments :)
Show HN: Apple's SHARP running in the browser via ONNX runtime web
Hi HN, author here. SHARP is Apple's recent single-image 3D Gaussian splatting model (<a href="https://arxiv.org/abs/2512.10685" rel="nofollow">https://arxiv.org/abs/2512.10685</a>). Their reference code is PyTorch + a pretty heavy pipeline; I wanted to see if it could run in a browser with no server hop, so I exported the predictor to ONNX and ran it via onnxruntime-web with the WebGPU EP.<p>What works: drop in an image, get a .ply you can download or preview live, all on your machine — your image never leaves the tab. The model is large (~2.4 GB sidecar) so first load is slow on a cold cache, but inference itself is a few seconds on a recent Mac.<p>Caveats: SHARP's released weights are research-use only (Apple's model license, not the code's). I host the exported ONNX on R2 so thedemo "just works", but you can also export your own from the upstream Apple repo and upload locally.<p>Happy to talk about it in the comments :)
Show HN: State of the Art of Coding Models, According to Hacker News Commenters
Hello HN,<p>I was away from my computer for two weeks, and after coming back and reading the latest discussions on HN about coding assistants (models, harnesses), I felt very out of the loop. My normal process would have been to keep reading and figure out the latest and greatest from people's comments, but I wanted to try and automate this process.<p>Basically the goal is to get a quick overview over which coding models are popular on HN. A next iteration could also scan for harnesses that people use, or info on self-hosting or hardware setups.<p>I wrote a short intro on the page about the pipeline that collects and analyzes the data, but feel free to ask for more details or check the Google Sheet for more info.<p><a href="https://hnup.date/hn-sota" rel="nofollow">https://hnup.date/hn-sota</a>
Show HN: State of the Art of Coding Models, According to Hacker News Commenters
Hello HN,<p>I was away from my computer for two weeks, and after coming back and reading the latest discussions on HN about coding assistants (models, harnesses), I felt very out of the loop. My normal process would have been to keep reading and figure out the latest and greatest from people's comments, but I wanted to try and automate this process.<p>Basically the goal is to get a quick overview over which coding models are popular on HN. A next iteration could also scan for harnesses that people use, or info on self-hosting or hardware setups.<p>I wrote a short intro on the page about the pipeline that collects and analyzes the data, but feel free to ask for more details or check the Google Sheet for more info.<p><a href="https://hnup.date/hn-sota" rel="nofollow">https://hnup.date/hn-sota</a>
Show HN: State of the Art of Coding Models, According to Hacker News Commenters
Hello HN,<p>I was away from my computer for two weeks, and after coming back and reading the latest discussions on HN about coding assistants (models, harnesses), I felt very out of the loop. My normal process would have been to keep reading and figure out the latest and greatest from people's comments, but I wanted to try and automate this process.<p>Basically the goal is to get a quick overview over which coding models are popular on HN. A next iteration could also scan for harnesses that people use, or info on self-hosting or hardware setups.<p>I wrote a short intro on the page about the pipeline that collects and analyzes the data, but feel free to ask for more details or check the Google Sheet for more info.<p><a href="https://hnup.date/hn-sota" rel="nofollow">https://hnup.date/hn-sota</a>