The best Hacker News stories from Show from the past day
Latest posts:
Show HN: Chonky – a neural approach for text semantic chunking
TLDR: I’ve made a transformer model and a wrapper library that segments text into meaningful semantic chunks.<p>The current text splitting approaches rely on heuristics (although one can use neural embedder to group semantically related sentences).<p>I propose a fully neural approach to semantic chunking.<p>I took the base distilbert model and trained it on a bookcorpus to split concatenated text paragraphs into original paragraphs. Basically it’s a token classification task. Model fine-tuning took day and a half on a 2x1080ti.<p>The library could be used as a text splitter module in a RAG system or for splitting transcripts for example.<p>The usage pattern that I see is the following: strip all the markup tags to produce pure text and feed this text into the model.<p>The problem is that although in theory this should improve overall RAG pipeline performance I didn’t manage to measure it properly. Other limitations: the model only supports English for now and the output text is downcased.<p>Please give it a try. I'll appreciate a feedback.<p>The Python library: <a href="https://github.com/mirth/chonky">https://github.com/mirth/chonky</a><p>The transformer model: <a href="https://huggingface.co/mirth/chonky_distilbert_base_uncased_1" rel="nofollow">https://huggingface.co/mirth/chonky_distilbert_base_uncased_...</a>
Show HN: Chonky – a neural approach for text semantic chunking
TLDR: I’ve made a transformer model and a wrapper library that segments text into meaningful semantic chunks.<p>The current text splitting approaches rely on heuristics (although one can use neural embedder to group semantically related sentences).<p>I propose a fully neural approach to semantic chunking.<p>I took the base distilbert model and trained it on a bookcorpus to split concatenated text paragraphs into original paragraphs. Basically it’s a token classification task. Model fine-tuning took day and a half on a 2x1080ti.<p>The library could be used as a text splitter module in a RAG system or for splitting transcripts for example.<p>The usage pattern that I see is the following: strip all the markup tags to produce pure text and feed this text into the model.<p>The problem is that although in theory this should improve overall RAG pipeline performance I didn’t manage to measure it properly. Other limitations: the model only supports English for now and the output text is downcased.<p>Please give it a try. I'll appreciate a feedback.<p>The Python library: <a href="https://github.com/mirth/chonky">https://github.com/mirth/chonky</a><p>The transformer model: <a href="https://huggingface.co/mirth/chonky_distilbert_base_uncased_1" rel="nofollow">https://huggingface.co/mirth/chonky_distilbert_base_uncased_...</a>
Show HN: Pg_CRDT – CRDTs in Postgres Using Automerge
Show HN: Python at the Speed of Rust
I’m sure many of you are familiar, but there’s a treacherous gap between finding (or building) a model that works in PyTorch, and getting that deployed into your application, especially in consumer-facing applications.<p>I’ve been very interested in solving this problem with a great developer experience. Over time, I gradually realized that the highest-impact thing to have was a way to go from existing Python code to a self-contained native binary—in other words, a Python compiler.<p>I was already pretty familiar with a successful attempt: when Apple introduced armv8 on the iPhone 5s, they quickly mandated 64-bit support for all apps. Unity—where I had been programming since I was 11—kinda got f*cked because they used Mono to run developers’ C# code, and Mono didn’t support 64-bit ARM. Unity ended up building IL2CPP, which transpiles the C# intermediate language into C++, then cross-compiles it. Till date, this is perhaps the most amazing technical feat Unity has achieved imo.<p>I set out to build something similar, but this time starting from Python. It’s a pretty difficult problem, given the dynamic nature of the language. The key unlock was the PyTorch 2.0 release, where they pioneered the use of symbolic tracing to power `torch.compile`. In a nutshell, they register a callback with the Python interpreter (using CPython’s frame evaluation API), run a function with fake inputs, and record an IR graph of everything that happened in the function.<p>Once you have an IR graph, you can lower it to C++/Rust code, operation-by-operation, by propagating type information throughout the program (see the blog post for an example). And now is the perfect time to have this infrastructure, because LLMs can do all the hard work of writing and validating the required operations in native code.<p>Anyway, I wanted to share the proof-of-concept and gather feedback. Using Function is pretty simple, just decorate a module-level function with `@compile` then use the CLI to compile it: `fxn compile module.py` .<p>TL;DR: Get Rust performance without having to learn Rust ;)
Show HN: Python at the Speed of Rust
I’m sure many of you are familiar, but there’s a treacherous gap between finding (or building) a model that works in PyTorch, and getting that deployed into your application, especially in consumer-facing applications.<p>I’ve been very interested in solving this problem with a great developer experience. Over time, I gradually realized that the highest-impact thing to have was a way to go from existing Python code to a self-contained native binary—in other words, a Python compiler.<p>I was already pretty familiar with a successful attempt: when Apple introduced armv8 on the iPhone 5s, they quickly mandated 64-bit support for all apps. Unity—where I had been programming since I was 11—kinda got f*cked because they used Mono to run developers’ C# code, and Mono didn’t support 64-bit ARM. Unity ended up building IL2CPP, which transpiles the C# intermediate language into C++, then cross-compiles it. Till date, this is perhaps the most amazing technical feat Unity has achieved imo.<p>I set out to build something similar, but this time starting from Python. It’s a pretty difficult problem, given the dynamic nature of the language. The key unlock was the PyTorch 2.0 release, where they pioneered the use of symbolic tracing to power `torch.compile`. In a nutshell, they register a callback with the Python interpreter (using CPython’s frame evaluation API), run a function with fake inputs, and record an IR graph of everything that happened in the function.<p>Once you have an IR graph, you can lower it to C++/Rust code, operation-by-operation, by propagating type information throughout the program (see the blog post for an example). And now is the perfect time to have this infrastructure, because LLMs can do all the hard work of writing and validating the required operations in native code.<p>Anyway, I wanted to share the proof-of-concept and gather feedback. Using Function is pretty simple, just decorate a module-level function with `@compile` then use the CLI to compile it: `fxn compile module.py` .<p>TL;DR: Get Rust performance without having to learn Rust ;)
Show HN: Omiword – A daily, sector-based word puzzle
Hi everybody. I occasionally make little browser-based games, and this is my latest attempt. It's not quite done, but it's quite playable (note: it does include audio):<p><a href="https://www.omiword.com/" rel="nofollow">https://www.omiword.com/</a><p>This has been my occasional tinker target for ~5 years now, starting in the early days of Covid. The objective is to drag letter tiles within certain boundaries to spell four common American-English words.<p>It hasn't got ads or anything, it's just supposed to be fun for its own sake. If people happen to like it, I might add an option for folks to make a small, one-time payment to unlock access to the archive.<p>I'm happy to hear any feedback, or about any shortcomings that you might discover.
Show HN: memEx, a personal knowledge base inspired by zettlekasten and org-mode
Show HN: memEx, a personal knowledge base inspired by zettlekasten and org-mode
Show HN: memEx, a personal knowledge base inspired by zettlekasten and org-mode
Show HN: Building better base images
This project addresses the inefficiencies of traditional Dockerfile-based container builds where each customization layer creates storage bloat through duplicate dependencies from repeated apt-get install commands, network inefficiency from redundant package downloads across different images, and slow iteration cycles requiring full rebuilds of all previous steps. Our solution enables building minimal base images from scratch using debootstrap that precisely include only required components in the initial build, while allowing creation of specialized variants (Java, Kafka, etc.) from these common foundations - resulting in significantly leaner images, faster builds, and more efficient resource utilization compared to standard Docker layer stacking approaches.
Show HN: Building better base images
This project addresses the inefficiencies of traditional Dockerfile-based container builds where each customization layer creates storage bloat through duplicate dependencies from repeated apt-get install commands, network inefficiency from redundant package downloads across different images, and slow iteration cycles requiring full rebuilds of all previous steps. Our solution enables building minimal base images from scratch using debootstrap that precisely include only required components in the initial build, while allowing creation of specialized variants (Java, Kafka, etc.) from these common foundations - resulting in significantly leaner images, faster builds, and more efficient resource utilization compared to standard Docker layer stacking approaches.
Show HN: Building better base images
This project addresses the inefficiencies of traditional Dockerfile-based container builds where each customization layer creates storage bloat through duplicate dependencies from repeated apt-get install commands, network inefficiency from redundant package downloads across different images, and slow iteration cycles requiring full rebuilds of all previous steps. Our solution enables building minimal base images from scratch using debootstrap that precisely include only required components in the initial build, while allowing creation of specialized variants (Java, Kafka, etc.) from these common foundations - resulting in significantly leaner images, faster builds, and more efficient resource utilization compared to standard Docker layer stacking approaches.
Show HN: I built an app that reduces podcast preparation effort by 95% +
Show HN: Time Travel with Your SQL
Hi, my name is Anguel and I am one of the developers of WhoDB (<a href="https://github.com/clidey/whodb">https://github.com/clidey/whodb</a>)<p>I am not a fan of the dbeaver, beekeeper, adminer, etc experience because they are bloated, ugly, and at best okay but not great.<p>Hence why I started working on WhoDB.<p>The approach:<p>- browser-based (chrome/firefox)<p>- no bloat<p>- jupyter notebook-like experience (Scratchpad)<p>- built-in AI co-pilot with ollama (local) or openai/anthropic<p>We just shipped query history and replay (time travel?) to the Scratchpad.<p>Would love for you to check it out and give some feedback aka roast us into oblivion:<p>docker run -p 8080:8080 clidey/whodb<p>Food for thought:<p>1. What's your biggest database pain point?<p>2. Any killer feature missing from current tools?
Show HN: Time Travel with Your SQL
Hi, my name is Anguel and I am one of the developers of WhoDB (<a href="https://github.com/clidey/whodb">https://github.com/clidey/whodb</a>)<p>I am not a fan of the dbeaver, beekeeper, adminer, etc experience because they are bloated, ugly, and at best okay but not great.<p>Hence why I started working on WhoDB.<p>The approach:<p>- browser-based (chrome/firefox)<p>- no bloat<p>- jupyter notebook-like experience (Scratchpad)<p>- built-in AI co-pilot with ollama (local) or openai/anthropic<p>We just shipped query history and replay (time travel?) to the Scratchpad.<p>Would love for you to check it out and give some feedback aka roast us into oblivion:<p>docker run -p 8080:8080 clidey/whodb<p>Food for thought:<p>1. What's your biggest database pain point?<p>2. Any killer feature missing from current tools?
Show HN: Atari Missile Command Game Built Using AI Gemini 2.5 Pro
A modern HTML5 canvas remake of the classic Atari game from 1980. Defend your cities and missile bases from incoming enemy attacks using your missile launchers. Initially built using the Google Gemini 2.5 Pro AI LLM model.
Show HN: Atari Missile Command Game Built Using AI Gemini 2.5 Pro
A modern HTML5 canvas remake of the classic Atari game from 1980. Defend your cities and missile bases from incoming enemy attacks using your missile launchers. Initially built using the Google Gemini 2.5 Pro AI LLM model.
Show HN: Atari Missile Command Game Built Using AI Gemini 2.5 Pro
A modern HTML5 canvas remake of the classic Atari game from 1980. Defend your cities and missile bases from incoming enemy attacks using your missile launchers. Initially built using the Google Gemini 2.5 Pro AI LLM model.
Show HN: Fermi – A Wordle-style game for order-of-magnitude thinking
I always thought it was cool when someone could make a plausible estimate from reasonable guesses. I recently learned that these are sometimes named after Enrico Fermi, the famous physicist, and its the same technique used to create his famous Fermi paradox.<p>You build a rough logic chain using a few sliders and fixed quantities (e.g. weeks per year), and the goal is to get within an order of magnitude of the true answer. The math is simple; the thinking is the game.<p>Would love feedback.
Show HN: I turned my kid's worksheet into a math game in 10 minutes with Claude