The best Hacker News stories from Show from the past day
Latest posts:
Show HN: B-field, a novel probabilistic key-value data structure (`rust-bfield`)
`rust-bfield` is a Rust implementation of our novel "B-field" data structure, which functions like a Bloom filter for key-value lookups instead of set membership queries.<p>The B-field allows you to compactly store data using only a few bytes per key-value pair. We've successfully utilized it in genomics to associate billions of "k-mers" with taxonomic identifiers while maintaining an efficient memory footprint. But the data structure is also useful beyond computational biology, particularly where you have large unique key domains and constrained value ranges.<p>Available under an Apache 2 license. We hope it proves useful, and we're happy to answer any questions!
Show HN: B-field, a novel probabilistic key-value data structure (`rust-bfield`)
`rust-bfield` is a Rust implementation of our novel "B-field" data structure, which functions like a Bloom filter for key-value lookups instead of set membership queries.<p>The B-field allows you to compactly store data using only a few bytes per key-value pair. We've successfully utilized it in genomics to associate billions of "k-mers" with taxonomic identifiers while maintaining an efficient memory footprint. But the data structure is also useful beyond computational biology, particularly where you have large unique key domains and constrained value ranges.<p>Available under an Apache 2 license. We hope it proves useful, and we're happy to answer any questions!
Show HN: B-field, a novel probabilistic key-value data structure (`rust-bfield`)
`rust-bfield` is a Rust implementation of our novel "B-field" data structure, which functions like a Bloom filter for key-value lookups instead of set membership queries.<p>The B-field allows you to compactly store data using only a few bytes per key-value pair. We've successfully utilized it in genomics to associate billions of "k-mers" with taxonomic identifiers while maintaining an efficient memory footprint. But the data structure is also useful beyond computational biology, particularly where you have large unique key domains and constrained value ranges.<p>Available under an Apache 2 license. We hope it proves useful, and we're happy to answer any questions!
Show HN: PBT – A property-based testing library for Ruby
Hello HN,<p>I introduce a property-based testing tool for Ruby.
Ruby's ease of test-writing and rich ecosystem are widely acclaimed. However, property-based testing is not as widely used as in other languages such as Haskell or Elixir, and I think this is because there is no de facto testing tool.<p>This gem currently not only has the basic functionality of stateless property-based testing but also has the following features:<p>- Support verbose mode that allows you to see its shrinking procedure and algorithms.<p>- Support several concurrency/parallel executions of each test. As of now, Ractor/Thread/Process are available. (The default is sequential, considering benchmark results and actual use cases)<p>Happy hacking!
Show HN: PBT – A property-based testing library for Ruby
Hello HN,<p>I introduce a property-based testing tool for Ruby.
Ruby's ease of test-writing and rich ecosystem are widely acclaimed. However, property-based testing is not as widely used as in other languages such as Haskell or Elixir, and I think this is because there is no de facto testing tool.<p>This gem currently not only has the basic functionality of stateless property-based testing but also has the following features:<p>- Support verbose mode that allows you to see its shrinking procedure and algorithms.<p>- Support several concurrency/parallel executions of each test. As of now, Ractor/Thread/Process are available. (The default is sequential, considering benchmark results and actual use cases)<p>Happy hacking!
Show HN: PBT – A property-based testing library for Ruby
Hello HN,<p>I introduce a property-based testing tool for Ruby.
Ruby's ease of test-writing and rich ecosystem are widely acclaimed. However, property-based testing is not as widely used as in other languages such as Haskell or Elixir, and I think this is because there is no de facto testing tool.<p>This gem currently not only has the basic functionality of stateless property-based testing but also has the following features:<p>- Support verbose mode that allows you to see its shrinking procedure and algorithms.<p>- Support several concurrency/parallel executions of each test. As of now, Ractor/Thread/Process are available. (The default is sequential, considering benchmark results and actual use cases)<p>Happy hacking!
Show HN: Neosync – Open-Source Data Anonymization for Postgres and MySQL
Hey HN, we're Evis and Nick and we're excited to be launching Neosync (<a href="https://www.github.com/nucleuscloud/neosync">https://www.github.com/nucleuscloud/neosync</a>). Neosync is an open source platform that helps developers anonymize production data, generate synthetic data and sync it across their environments for better testing, debugging and developer experience.<p>Most developers and teams have some version of a database seed script that creates some mock data for their local and stage databases. The problem is that production data is messy and it’s very difficult to replicate that with mock data. This causes two big problems for developers.<p>The first problem is that features seem to work locally/stage but have bugs and edge cases in production because the seed data you used to develop against was not representative of production data.<p>The second problem we saw was that debugging production errors would take a long time and would often resurface. When we see a bug in production, the first thing we want to do is reproduce it locally, but if we can’t reproduce the state of the data locally, then we’re kind of flying blind.<p>Working directly with production data would solve both of these problems but most teams can’t because of: (1) privacy/security issues and (2) scale. So we set out to solve these two problems with Neosync.<p>We solve the privacy and security problem using anonymization and synthetic data. We have 40+ pre-built transformers (or you can write your own in code) that can anonymize PII or sensitive data so that it’s safe to use locally. Additionally, you can generate synthetic data from scratch that fits your existing schema across your database.<p>The second problem is scale. Some production databases are too big to fit locally or just have more data than you need. Also, in some cases, you may want to debug a certain customer’s data and you only want their data. We solve this with subsetting. You can pass in a SQL query to filter your table(s) and Neosync will handle all of the heavy lifting including referential integrity.<p>At the core of Neosync does three things: (1) It streams data from a source to one or multiple destination databases. We never store your sensitive data. (2) While that data is being streamed, we transform it. You define which schemas and tables you want to sync and at the column level, select a transformer that defines how you want to anonymize the data or generate synthetic data. (3) We subset your data based on your filters.<p>We do all of this while handling referential integrity. Whether you have primary keys, foreign keys, unique constraints, circular dependencies (within a table and across tables), sequences and more, Neosync preserves those references.<p>We also ship with APIs, a Terraform provider, a CLI and Github action that you can use to hydrate a CI database.<p>Neosync is an open source project written in Go and Typescript and can be run on Docker Compose, Bare Metal, or Kubernetes via Helm. You can also use our hosted platform or managed platform that you can deploy in your VPC. We also have a hosted platform with a generous free tier - <a href="https://neosync.dev">https://neosync.dev</a><p>Here's a brief loom demo: <a href="https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?sid=2faf613c-92be-44fa-9278-c8087e777356" rel="nofollow">https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?...</a><p>We'd love any feedback you have!
Show HN: Neosync – Open-Source Data Anonymization for Postgres and MySQL
Hey HN, we're Evis and Nick and we're excited to be launching Neosync (<a href="https://www.github.com/nucleuscloud/neosync">https://www.github.com/nucleuscloud/neosync</a>). Neosync is an open source platform that helps developers anonymize production data, generate synthetic data and sync it across their environments for better testing, debugging and developer experience.<p>Most developers and teams have some version of a database seed script that creates some mock data for their local and stage databases. The problem is that production data is messy and it’s very difficult to replicate that with mock data. This causes two big problems for developers.<p>The first problem is that features seem to work locally/stage but have bugs and edge cases in production because the seed data you used to develop against was not representative of production data.<p>The second problem we saw was that debugging production errors would take a long time and would often resurface. When we see a bug in production, the first thing we want to do is reproduce it locally, but if we can’t reproduce the state of the data locally, then we’re kind of flying blind.<p>Working directly with production data would solve both of these problems but most teams can’t because of: (1) privacy/security issues and (2) scale. So we set out to solve these two problems with Neosync.<p>We solve the privacy and security problem using anonymization and synthetic data. We have 40+ pre-built transformers (or you can write your own in code) that can anonymize PII or sensitive data so that it’s safe to use locally. Additionally, you can generate synthetic data from scratch that fits your existing schema across your database.<p>The second problem is scale. Some production databases are too big to fit locally or just have more data than you need. Also, in some cases, you may want to debug a certain customer’s data and you only want their data. We solve this with subsetting. You can pass in a SQL query to filter your table(s) and Neosync will handle all of the heavy lifting including referential integrity.<p>At the core of Neosync does three things: (1) It streams data from a source to one or multiple destination databases. We never store your sensitive data. (2) While that data is being streamed, we transform it. You define which schemas and tables you want to sync and at the column level, select a transformer that defines how you want to anonymize the data or generate synthetic data. (3) We subset your data based on your filters.<p>We do all of this while handling referential integrity. Whether you have primary keys, foreign keys, unique constraints, circular dependencies (within a table and across tables), sequences and more, Neosync preserves those references.<p>We also ship with APIs, a Terraform provider, a CLI and Github action that you can use to hydrate a CI database.<p>Neosync is an open source project written in Go and Typescript and can be run on Docker Compose, Bare Metal, or Kubernetes via Helm. You can also use our hosted platform or managed platform that you can deploy in your VPC. We also have a hosted platform with a generous free tier - <a href="https://neosync.dev">https://neosync.dev</a><p>Here's a brief loom demo: <a href="https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?sid=2faf613c-92be-44fa-9278-c8087e777356" rel="nofollow">https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?...</a><p>We'd love any feedback you have!
Show HN: Neosync – Open-Source Data Anonymization for Postgres and MySQL
Hey HN, we're Evis and Nick and we're excited to be launching Neosync (<a href="https://www.github.com/nucleuscloud/neosync">https://www.github.com/nucleuscloud/neosync</a>). Neosync is an open source platform that helps developers anonymize production data, generate synthetic data and sync it across their environments for better testing, debugging and developer experience.<p>Most developers and teams have some version of a database seed script that creates some mock data for their local and stage databases. The problem is that production data is messy and it’s very difficult to replicate that with mock data. This causes two big problems for developers.<p>The first problem is that features seem to work locally/stage but have bugs and edge cases in production because the seed data you used to develop against was not representative of production data.<p>The second problem we saw was that debugging production errors would take a long time and would often resurface. When we see a bug in production, the first thing we want to do is reproduce it locally, but if we can’t reproduce the state of the data locally, then we’re kind of flying blind.<p>Working directly with production data would solve both of these problems but most teams can’t because of: (1) privacy/security issues and (2) scale. So we set out to solve these two problems with Neosync.<p>We solve the privacy and security problem using anonymization and synthetic data. We have 40+ pre-built transformers (or you can write your own in code) that can anonymize PII or sensitive data so that it’s safe to use locally. Additionally, you can generate synthetic data from scratch that fits your existing schema across your database.<p>The second problem is scale. Some production databases are too big to fit locally or just have more data than you need. Also, in some cases, you may want to debug a certain customer’s data and you only want their data. We solve this with subsetting. You can pass in a SQL query to filter your table(s) and Neosync will handle all of the heavy lifting including referential integrity.<p>At the core of Neosync does three things: (1) It streams data from a source to one or multiple destination databases. We never store your sensitive data. (2) While that data is being streamed, we transform it. You define which schemas and tables you want to sync and at the column level, select a transformer that defines how you want to anonymize the data or generate synthetic data. (3) We subset your data based on your filters.<p>We do all of this while handling referential integrity. Whether you have primary keys, foreign keys, unique constraints, circular dependencies (within a table and across tables), sequences and more, Neosync preserves those references.<p>We also ship with APIs, a Terraform provider, a CLI and Github action that you can use to hydrate a CI database.<p>Neosync is an open source project written in Go and Typescript and can be run on Docker Compose, Bare Metal, or Kubernetes via Helm. You can also use our hosted platform or managed platform that you can deploy in your VPC. We also have a hosted platform with a generous free tier - <a href="https://neosync.dev">https://neosync.dev</a><p>Here's a brief loom demo: <a href="https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?sid=2faf613c-92be-44fa-9278-c8087e777356" rel="nofollow">https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?...</a><p>We'd love any feedback you have!
Show HN: Route your prompts to the best LLM
Hey HN, we've just finished building a dynamic router for LLMs, which takes each prompt and sends it to the most appropriate model and provider. We'd love to know what you think!<p>Here is a quick(ish) screen-recroding explaining how it works: <a href="https://youtu.be/ZpY6SIkBosE" rel="nofollow">https://youtu.be/ZpY6SIkBosE</a><p>Best results when training a custom router on your own prompt data: <a href="https://youtu.be/9JYqNbIEac0" rel="nofollow">https://youtu.be/9JYqNbIEac0</a><p>The router balances user preferences for quality, speed and cost. The end result is higher quality and faster LLM responses at lower cost.<p>The quality for each candidate LLM is predicted ahead of time using a neural scoring function, which is a BERT-like architecture conditioned on the prompt and a latent representation of the LLM being scored. The different LLMs are queried across the batch dimension, with the neural scoring architecture taking a single latent representation of the LLM as input per forward pass. This makes the scoring function very modular to query for different LLM combinations. It is trained in a supervised manner on several open LLM datasets, using GPT4 as a judge. The cost and speed data is taken from our live benchmarks, updated every few hours across all continents. The final "loss function" is a linear combination of quality, cost, inter-token-latency and time-to-first-token, with the user effectively scaling the weighting factors of this linear combination.<p>Smaller LLMs are often good enough for simple prompts, but knowing exactly how and when they might break is difficult. Simple perturbations of the phrasing can cause smaller LLMs to fail catastrophically, making them hard to rely on. For example, Gemma-7B converts numbers to strings and returns the "largest" string when asking for the "largest" number in a set, but works fine when asking for the "highest" or "maximum".<p>The router is able to learn these quirky distributions, and ensure that the smaller, cheaper and faster LLMs are only used when there is high confidence that they will get the answer correct.<p>Pricing-wise, we charge the same rates as the backend providers we route to, without taking any margins. We also give $50 in free credits to all new signups.<p>The router can be used off-the-shelf, or it can be trained directly on your own data for improved performance.<p>What do people think? Could this be useful?<p>Feedback of all kinds is welcome!
Show HN: Route your prompts to the best LLM
Hey HN, we've just finished building a dynamic router for LLMs, which takes each prompt and sends it to the most appropriate model and provider. We'd love to know what you think!<p>Here is a quick(ish) screen-recroding explaining how it works: <a href="https://youtu.be/ZpY6SIkBosE" rel="nofollow">https://youtu.be/ZpY6SIkBosE</a><p>Best results when training a custom router on your own prompt data: <a href="https://youtu.be/9JYqNbIEac0" rel="nofollow">https://youtu.be/9JYqNbIEac0</a><p>The router balances user preferences for quality, speed and cost. The end result is higher quality and faster LLM responses at lower cost.<p>The quality for each candidate LLM is predicted ahead of time using a neural scoring function, which is a BERT-like architecture conditioned on the prompt and a latent representation of the LLM being scored. The different LLMs are queried across the batch dimension, with the neural scoring architecture taking a single latent representation of the LLM as input per forward pass. This makes the scoring function very modular to query for different LLM combinations. It is trained in a supervised manner on several open LLM datasets, using GPT4 as a judge. The cost and speed data is taken from our live benchmarks, updated every few hours across all continents. The final "loss function" is a linear combination of quality, cost, inter-token-latency and time-to-first-token, with the user effectively scaling the weighting factors of this linear combination.<p>Smaller LLMs are often good enough for simple prompts, but knowing exactly how and when they might break is difficult. Simple perturbations of the phrasing can cause smaller LLMs to fail catastrophically, making them hard to rely on. For example, Gemma-7B converts numbers to strings and returns the "largest" string when asking for the "largest" number in a set, but works fine when asking for the "highest" or "maximum".<p>The router is able to learn these quirky distributions, and ensure that the smaller, cheaper and faster LLMs are only used when there is high confidence that they will get the answer correct.<p>Pricing-wise, we charge the same rates as the backend providers we route to, without taking any margins. We also give $50 in free credits to all new signups.<p>The router can be used off-the-shelf, or it can be trained directly on your own data for improved performance.<p>What do people think? Could this be useful?<p>Feedback of all kinds is welcome!
Show HN: Route your prompts to the best LLM
Hey HN, we've just finished building a dynamic router for LLMs, which takes each prompt and sends it to the most appropriate model and provider. We'd love to know what you think!<p>Here is a quick(ish) screen-recroding explaining how it works: <a href="https://youtu.be/ZpY6SIkBosE" rel="nofollow">https://youtu.be/ZpY6SIkBosE</a><p>Best results when training a custom router on your own prompt data: <a href="https://youtu.be/9JYqNbIEac0" rel="nofollow">https://youtu.be/9JYqNbIEac0</a><p>The router balances user preferences for quality, speed and cost. The end result is higher quality and faster LLM responses at lower cost.<p>The quality for each candidate LLM is predicted ahead of time using a neural scoring function, which is a BERT-like architecture conditioned on the prompt and a latent representation of the LLM being scored. The different LLMs are queried across the batch dimension, with the neural scoring architecture taking a single latent representation of the LLM as input per forward pass. This makes the scoring function very modular to query for different LLM combinations. It is trained in a supervised manner on several open LLM datasets, using GPT4 as a judge. The cost and speed data is taken from our live benchmarks, updated every few hours across all continents. The final "loss function" is a linear combination of quality, cost, inter-token-latency and time-to-first-token, with the user effectively scaling the weighting factors of this linear combination.<p>Smaller LLMs are often good enough for simple prompts, but knowing exactly how and when they might break is difficult. Simple perturbations of the phrasing can cause smaller LLMs to fail catastrophically, making them hard to rely on. For example, Gemma-7B converts numbers to strings and returns the "largest" string when asking for the "largest" number in a set, but works fine when asking for the "highest" or "maximum".<p>The router is able to learn these quirky distributions, and ensure that the smaller, cheaper and faster LLMs are only used when there is high confidence that they will get the answer correct.<p>Pricing-wise, we charge the same rates as the backend providers we route to, without taking any margins. We also give $50 in free credits to all new signups.<p>The router can be used off-the-shelf, or it can be trained directly on your own data for improved performance.<p>What do people think? Could this be useful?<p>Feedback of all kinds is welcome!
Show HN: Oracolo – A minimalist Nostr blog in a single HTML file
Oracolo is a minimalist blog powered by Nostr, that consists of a single html file, weighing only ~140Kb. It works also without a web server; for example you can send it via email as a business card.<p>Take it as a didactic experiment, it's no production ready, indeed it has some limitations (no SEO friendly structure), but can work as a temporary solution (e.g. coming soon and parking pages), and it is still an example of how easy it is to create a Nostr-powered web app and deploy it on a low-tech infrastructure.<p>Comments and suggestions on how to improve it are welcome!
Show HN: Oracolo – A minimalist Nostr blog in a single HTML file
Oracolo is a minimalist blog powered by Nostr, that consists of a single html file, weighing only ~140Kb. It works also without a web server; for example you can send it via email as a business card.<p>Take it as a didactic experiment, it's no production ready, indeed it has some limitations (no SEO friendly structure), but can work as a temporary solution (e.g. coming soon and parking pages), and it is still an example of how easy it is to create a Nostr-powered web app and deploy it on a low-tech infrastructure.<p>Comments and suggestions on how to improve it are welcome!
Show HN: Oracolo – A minimalist Nostr blog in a single HTML file
Oracolo is a minimalist blog powered by Nostr, that consists of a single html file, weighing only ~140Kb. It works also without a web server; for example you can send it via email as a business card.<p>Take it as a didactic experiment, it's no production ready, indeed it has some limitations (no SEO friendly structure), but can work as a temporary solution (e.g. coming soon and parking pages), and it is still an example of how easy it is to create a Nostr-powered web app and deploy it on a low-tech infrastructure.<p>Comments and suggestions on how to improve it are welcome!
Show HN: Openpanel – An open-source alternative to Mixpanel
I have created an open-source alternative to Mixpanel and will explain a bit about why I decided to do this.<p>Mixpanel is a GREAT tool and quite easy to understand (compared to GA4 and similar). I have used Mixpanel extensively for one of my React Native apps, but the last invoice was $300, which was way over my budget. I think I was paying for MTU (monthly tracked users), which was around 7000-10k users.<p>However, a downside of Mixpanel is that it is purely a product analytics tool; you don't get any basic web analytics similar to what GA4 or Plausible offers.<p>Therefore, I have combined the best features of Mixpanel and Plausible to create what I believe is the ultimate experience in an analytics tool (product and web).<p>The focus has always been: it should be easy yet also powerful. This has been a challenging balance, but I think I have managed to keep it somewhat simple.<p>Key Features:
- Privacy-first
- Visualize your events like Mixpanel
- Plausible-like overview
- Self-hostable
- Better support for React Native than Plausible
- Real-time (no delays for events)
Ability to access all individual events and sessions<p>It's currently in beta and completely free during the beta period.<p>Give it a spin: <a href="https://openpanel.dev" rel="nofollow">https://openpanel.dev</a>
Show HN: Openpanel – An open-source alternative to Mixpanel
I have created an open-source alternative to Mixpanel and will explain a bit about why I decided to do this.<p>Mixpanel is a GREAT tool and quite easy to understand (compared to GA4 and similar). I have used Mixpanel extensively for one of my React Native apps, but the last invoice was $300, which was way over my budget. I think I was paying for MTU (monthly tracked users), which was around 7000-10k users.<p>However, a downside of Mixpanel is that it is purely a product analytics tool; you don't get any basic web analytics similar to what GA4 or Plausible offers.<p>Therefore, I have combined the best features of Mixpanel and Plausible to create what I believe is the ultimate experience in an analytics tool (product and web).<p>The focus has always been: it should be easy yet also powerful. This has been a challenging balance, but I think I have managed to keep it somewhat simple.<p>Key Features:
- Privacy-first
- Visualize your events like Mixpanel
- Plausible-like overview
- Self-hostable
- Better support for React Native than Plausible
- Real-time (no delays for events)
Ability to access all individual events and sessions<p>It's currently in beta and completely free during the beta period.<p>Give it a spin: <a href="https://openpanel.dev" rel="nofollow">https://openpanel.dev</a>
Show HN: Openpanel – An open-source alternative to Mixpanel
I have created an open-source alternative to Mixpanel and will explain a bit about why I decided to do this.<p>Mixpanel is a GREAT tool and quite easy to understand (compared to GA4 and similar). I have used Mixpanel extensively for one of my React Native apps, but the last invoice was $300, which was way over my budget. I think I was paying for MTU (monthly tracked users), which was around 7000-10k users.<p>However, a downside of Mixpanel is that it is purely a product analytics tool; you don't get any basic web analytics similar to what GA4 or Plausible offers.<p>Therefore, I have combined the best features of Mixpanel and Plausible to create what I believe is the ultimate experience in an analytics tool (product and web).<p>The focus has always been: it should be easy yet also powerful. This has been a challenging balance, but I think I have managed to keep it somewhat simple.<p>Key Features:
- Privacy-first
- Visualize your events like Mixpanel
- Plausible-like overview
- Self-hostable
- Better support for React Native than Plausible
- Real-time (no delays for events)
Ability to access all individual events and sessions<p>It's currently in beta and completely free during the beta period.<p>Give it a spin: <a href="https://openpanel.dev" rel="nofollow">https://openpanel.dev</a>
Show HN: I built a game to help you learn neural network architectures
Show HN: I built a game to help you learn neural network architectures