The best Hacker News stories from Show from the past day
Latest posts:
Show HN: Excel to Python Compiler
We (me and @aarondia) built a tool to help you turn psuedo-software Excel files into real-software Python. Ideally, Pyoneer helps you automate your manual Excel processes. You can try it today here: <a href="https://pyoneer.ai" rel="nofollow">https://pyoneer.ai</a>.<p><i>How it works:</i><p>1. You upload an Excel file<p>2. We statically parse the Excel file and build a dependency graph of all the cells, tables, formulas, and pivots.<p>3. We do a graph traversal, and translate nodes as we hit them. We use OpenAI APIs to translate formulas. There’s a bunch of extra work here — because even with the best prompt engineering a fella like me can do, OpenAI sucks at translating formulas (primarily because it doesn’t know what datatypes its dealing with). We augment this translation with a mapping from ranges to variable names and types, which in our experience can improve the percentage of correctly translatable formulas by about 5x.<p>4. We generate test cases for our translations as well, to make sure the Python process matches your Excel process.<p>5. We give you back a Jupyter notebook that contains the code we generated.<p>If there are pieces of the Excel we can’t translate successfully (complex formulas, or pivot tables currently), then we leave them as a TODO in the code. This makes it easy for you to hop in and continue finishing the script.<p><i>Who is this for:</i><p>Developers who know Python, primarily! Pyoneer might be useful if:<p>1. You’ve got an Excel file you’re looking to move to Python (usually for speed, size, or maintenance reasons).<p>2. There’s enough logic contained in the notebook that it’s going to be a hassle for you to just rewrite it from scratch.<p>3. Or you don’t know the logic that is in the Excel workbook well since you didn’t write it in the first place :)<p>Post translation, even if Pyoneer doesn't nail it perfectly or translate all the formulas, you'll be able to pop into the notebook and continue cleaning up the TODOs / finish writing the formulas.<p><i>What the Alpha launch supports:</i><p>Launched early! Currently we’re focused on supporting:<p>1. Any number of sheets, with any reference structure between them.<p>2. Cells that translate as variables directly. We’ll translate the formulas to Python code that has the same result, or else we’ll generate a TODO letting you know we failed translating this cell.<p>3. Tables that translate as Pandas dataframes. We support at most one table per sheet, at the tables must be contigious. If the formulas in a column are consistent, then we will try and translate this as a single pandas statement.<p>We do not support: pivot tables or complex formulas. When we fail to translate these, we generate TODO statements. We also don’t support graphs or macros - and you won’t see these reflected in the output at all currently.<p><i>Why we built this:</i><p>We did YCS20 and built an open source tool called Mito(<a href="https://trymito.io">https://trymito.io</a>). It’s been a good journey since then - we’ve scaled revenue and to over 2k Github stars (<a href="https://github.com/mito-ds/mito">https://github.com/mito-ds/mito</a>). But fundamentally, Mito is a tool that’s useful for Excel users who wanted to start writing Python code more effectively.<p>We wanted to take another stab at the Excel -> Python pain point that was more developer focused - that helped developers that have to translate Excel files into Python do this much more quickly. Hence, Pyoneer!<p>I’ll be in the comments today if you’ve got feedback, criticism, questions, or comments.
Show HN: Excel to Python Compiler
We (me and @aarondia) built a tool to help you turn psuedo-software Excel files into real-software Python. Ideally, Pyoneer helps you automate your manual Excel processes. You can try it today here: <a href="https://pyoneer.ai" rel="nofollow">https://pyoneer.ai</a>.<p><i>How it works:</i><p>1. You upload an Excel file<p>2. We statically parse the Excel file and build a dependency graph of all the cells, tables, formulas, and pivots.<p>3. We do a graph traversal, and translate nodes as we hit them. We use OpenAI APIs to translate formulas. There’s a bunch of extra work here — because even with the best prompt engineering a fella like me can do, OpenAI sucks at translating formulas (primarily because it doesn’t know what datatypes its dealing with). We augment this translation with a mapping from ranges to variable names and types, which in our experience can improve the percentage of correctly translatable formulas by about 5x.<p>4. We generate test cases for our translations as well, to make sure the Python process matches your Excel process.<p>5. We give you back a Jupyter notebook that contains the code we generated.<p>If there are pieces of the Excel we can’t translate successfully (complex formulas, or pivot tables currently), then we leave them as a TODO in the code. This makes it easy for you to hop in and continue finishing the script.<p><i>Who is this for:</i><p>Developers who know Python, primarily! Pyoneer might be useful if:<p>1. You’ve got an Excel file you’re looking to move to Python (usually for speed, size, or maintenance reasons).<p>2. There’s enough logic contained in the notebook that it’s going to be a hassle for you to just rewrite it from scratch.<p>3. Or you don’t know the logic that is in the Excel workbook well since you didn’t write it in the first place :)<p>Post translation, even if Pyoneer doesn't nail it perfectly or translate all the formulas, you'll be able to pop into the notebook and continue cleaning up the TODOs / finish writing the formulas.<p><i>What the Alpha launch supports:</i><p>Launched early! Currently we’re focused on supporting:<p>1. Any number of sheets, with any reference structure between them.<p>2. Cells that translate as variables directly. We’ll translate the formulas to Python code that has the same result, or else we’ll generate a TODO letting you know we failed translating this cell.<p>3. Tables that translate as Pandas dataframes. We support at most one table per sheet, at the tables must be contigious. If the formulas in a column are consistent, then we will try and translate this as a single pandas statement.<p>We do not support: pivot tables or complex formulas. When we fail to translate these, we generate TODO statements. We also don’t support graphs or macros - and you won’t see these reflected in the output at all currently.<p><i>Why we built this:</i><p>We did YCS20 and built an open source tool called Mito(<a href="https://trymito.io">https://trymito.io</a>). It’s been a good journey since then - we’ve scaled revenue and to over 2k Github stars (<a href="https://github.com/mito-ds/mito">https://github.com/mito-ds/mito</a>). But fundamentally, Mito is a tool that’s useful for Excel users who wanted to start writing Python code more effectively.<p>We wanted to take another stab at the Excel -> Python pain point that was more developer focused - that helped developers that have to translate Excel files into Python do this much more quickly. Hence, Pyoneer!<p>I’ll be in the comments today if you’ve got feedback, criticism, questions, or comments.
Show HN: Porter Cloud – PaaS with an eject button
Hi HN! Porter Cloud (<a href="https://porter.run/porter-cloud">https://porter.run/porter-cloud</a>) is a Platform as a Service (PaaS) like Heroku, but we make it easy for you to migrate to AWS, Azure, or GCP when you're ready.<p>Like Heroku, Porter takes care of a lot of generic DevOps work for you (like setting up CI/CD, containerizing your applications, autoscaling, SSL certificates, setting up a reverse proxy) and lets you deploy your apps with a few clicks — saving you a lot of time while developing. However, as you probably know, there’s a downside: platforms like this become constraining if and when your app takes off and you need to scale. The time you saved while developing can get pretty expensive once you’re paying for a lot of users — and the platforms tend to try to keep you locked in!<p>Our idea is to give you the best of both worlds: use Porter Cloud for as long as it saves you time and development cost, but at any time you can press the “eject button” to migrate your app to your own AWS, Azure, or GCP account as you please. We make it seamless to break out, so you’re no longer subject to the rigid constraints of a conventional PaaS. You can migrate in a few simple steps outlined here: <a href="https://docs.porter.run/other/eject">https://docs.porter.run/other/eject</a>.<p>A bit of background: we first launched on HN almost 3 years ago with our original product (<a href="https://news.ycombinator.com/item?id=26993421">https://news.ycombinator.com/item?id=26993421</a>, <a href="https://porter.run">https://porter.run</a>), which deploys your applications to your own AWS, Azure, or GCP account with the simple experience of a PaaS.<p>Since then, we’ve helped countless companies migrate from a PaaS to one of the big three cloud providers. Most of them had gotten started on a PaaS in the early days to optimize for speed and ease of use, but ultimately had to go through a painful migration to AWS, Azure, or GCP as they scaled and ran into various constraints on their original PaaS.<p>Interestingly, we learned that many companies that start on a PaaS are fully aware that they’ll have to migrate to one of the big three public clouds [1] at some point. Yet they choose to deploy on a PaaS anyway because outgrowing a cloud platform is a “champagne problem” when you’re focused on getting something off the ground. This, however, becomes a very tangible problem when you need to migrate your entire production infrastructure while serving many users at scale. It’s a “nice problem to have”, until it isn’t.<p>We’ve built Porter Cloud so that the next generation of startups can get off the ground as quickly as possible, with a peace of mind that you can effortlessly move to one of the tried and true hyperscalers when you are ready to scale.<p>We are excited to see what people build on Porter Cloud. If you’ve ever dealt with a migration from a PaaS to one of the big three cloud providers, we’d also love to hear about your experience in the comments. Looking forward to feedback and discussion!<p>[1] By “big three clouds” we mean the lower-level primitives of each cloud provider. We don’t mean their higher level offerings like AWS App Runner, Google Cloud Run, or Azure App Service, since those run into the same PaaS problems described above.
Show HN: Porter Cloud – PaaS with an eject button
Hi HN! Porter Cloud (<a href="https://porter.run/porter-cloud">https://porter.run/porter-cloud</a>) is a Platform as a Service (PaaS) like Heroku, but we make it easy for you to migrate to AWS, Azure, or GCP when you're ready.<p>Like Heroku, Porter takes care of a lot of generic DevOps work for you (like setting up CI/CD, containerizing your applications, autoscaling, SSL certificates, setting up a reverse proxy) and lets you deploy your apps with a few clicks — saving you a lot of time while developing. However, as you probably know, there’s a downside: platforms like this become constraining if and when your app takes off and you need to scale. The time you saved while developing can get pretty expensive once you’re paying for a lot of users — and the platforms tend to try to keep you locked in!<p>Our idea is to give you the best of both worlds: use Porter Cloud for as long as it saves you time and development cost, but at any time you can press the “eject button” to migrate your app to your own AWS, Azure, or GCP account as you please. We make it seamless to break out, so you’re no longer subject to the rigid constraints of a conventional PaaS. You can migrate in a few simple steps outlined here: <a href="https://docs.porter.run/other/eject">https://docs.porter.run/other/eject</a>.<p>A bit of background: we first launched on HN almost 3 years ago with our original product (<a href="https://news.ycombinator.com/item?id=26993421">https://news.ycombinator.com/item?id=26993421</a>, <a href="https://porter.run">https://porter.run</a>), which deploys your applications to your own AWS, Azure, or GCP account with the simple experience of a PaaS.<p>Since then, we’ve helped countless companies migrate from a PaaS to one of the big three cloud providers. Most of them had gotten started on a PaaS in the early days to optimize for speed and ease of use, but ultimately had to go through a painful migration to AWS, Azure, or GCP as they scaled and ran into various constraints on their original PaaS.<p>Interestingly, we learned that many companies that start on a PaaS are fully aware that they’ll have to migrate to one of the big three public clouds [1] at some point. Yet they choose to deploy on a PaaS anyway because outgrowing a cloud platform is a “champagne problem” when you’re focused on getting something off the ground. This, however, becomes a very tangible problem when you need to migrate your entire production infrastructure while serving many users at scale. It’s a “nice problem to have”, until it isn’t.<p>We’ve built Porter Cloud so that the next generation of startups can get off the ground as quickly as possible, with a peace of mind that you can effortlessly move to one of the tried and true hyperscalers when you are ready to scale.<p>We are excited to see what people build on Porter Cloud. If you’ve ever dealt with a migration from a PaaS to one of the big three cloud providers, we’d also love to hear about your experience in the comments. Looking forward to feedback and discussion!<p>[1] By “big three clouds” we mean the lower-level primitives of each cloud provider. We don’t mean their higher level offerings like AWS App Runner, Google Cloud Run, or Azure App Service, since those run into the same PaaS problems described above.
Show HN: Porter Cloud – PaaS with an eject button
Hi HN! Porter Cloud (<a href="https://porter.run/porter-cloud">https://porter.run/porter-cloud</a>) is a Platform as a Service (PaaS) like Heroku, but we make it easy for you to migrate to AWS, Azure, or GCP when you're ready.<p>Like Heroku, Porter takes care of a lot of generic DevOps work for you (like setting up CI/CD, containerizing your applications, autoscaling, SSL certificates, setting up a reverse proxy) and lets you deploy your apps with a few clicks — saving you a lot of time while developing. However, as you probably know, there’s a downside: platforms like this become constraining if and when your app takes off and you need to scale. The time you saved while developing can get pretty expensive once you’re paying for a lot of users — and the platforms tend to try to keep you locked in!<p>Our idea is to give you the best of both worlds: use Porter Cloud for as long as it saves you time and development cost, but at any time you can press the “eject button” to migrate your app to your own AWS, Azure, or GCP account as you please. We make it seamless to break out, so you’re no longer subject to the rigid constraints of a conventional PaaS. You can migrate in a few simple steps outlined here: <a href="https://docs.porter.run/other/eject">https://docs.porter.run/other/eject</a>.<p>A bit of background: we first launched on HN almost 3 years ago with our original product (<a href="https://news.ycombinator.com/item?id=26993421">https://news.ycombinator.com/item?id=26993421</a>, <a href="https://porter.run">https://porter.run</a>), which deploys your applications to your own AWS, Azure, or GCP account with the simple experience of a PaaS.<p>Since then, we’ve helped countless companies migrate from a PaaS to one of the big three cloud providers. Most of them had gotten started on a PaaS in the early days to optimize for speed and ease of use, but ultimately had to go through a painful migration to AWS, Azure, or GCP as they scaled and ran into various constraints on their original PaaS.<p>Interestingly, we learned that many companies that start on a PaaS are fully aware that they’ll have to migrate to one of the big three public clouds [1] at some point. Yet they choose to deploy on a PaaS anyway because outgrowing a cloud platform is a “champagne problem” when you’re focused on getting something off the ground. This, however, becomes a very tangible problem when you need to migrate your entire production infrastructure while serving many users at scale. It’s a “nice problem to have”, until it isn’t.<p>We’ve built Porter Cloud so that the next generation of startups can get off the ground as quickly as possible, with a peace of mind that you can effortlessly move to one of the tried and true hyperscalers when you are ready to scale.<p>We are excited to see what people build on Porter Cloud. If you’ve ever dealt with a migration from a PaaS to one of the big three cloud providers, we’d also love to hear about your experience in the comments. Looking forward to feedback and discussion!<p>[1] By “big three clouds” we mean the lower-level primitives of each cloud provider. We don’t mean their higher level offerings like AWS App Runner, Google Cloud Run, or Azure App Service, since those run into the same PaaS problems described above.
Show HN: HackerNews but for research papers
Hey guys, I love HN! I wanted to extend the same aesthetic and community towards things beyond tech-related news.<p>I thought it would be cool to get the same quality of community gathered around the latest and greatest research coming out.<p>Let me know what you guys think of what I have so far. It's still early so there are probably bugs and other quality issues.<p>If there's any features missing that you'd want let me know.<p>ALSO, if any of you are familiar with the map of the territory of any particular field, please let me know! Would love to pick your brain and to come up with a 'most important papers' section for each field.<p>Thank you!!<p>-stefan
Show HN: HackerNews but for research papers
Hey guys, I love HN! I wanted to extend the same aesthetic and community towards things beyond tech-related news.<p>I thought it would be cool to get the same quality of community gathered around the latest and greatest research coming out.<p>Let me know what you guys think of what I have so far. It's still early so there are probably bugs and other quality issues.<p>If there's any features missing that you'd want let me know.<p>ALSO, if any of you are familiar with the map of the territory of any particular field, please let me know! Would love to pick your brain and to come up with a 'most important papers' section for each field.<p>Thank you!!<p>-stefan
Show HN: HackerNews but for research papers
Hey guys, I love HN! I wanted to extend the same aesthetic and community towards things beyond tech-related news.<p>I thought it would be cool to get the same quality of community gathered around the latest and greatest research coming out.<p>Let me know what you guys think of what I have so far. It's still early so there are probably bugs and other quality issues.<p>If there's any features missing that you'd want let me know.<p>ALSO, if any of you are familiar with the map of the territory of any particular field, please let me know! Would love to pick your brain and to come up with a 'most important papers' section for each field.<p>Thank you!!<p>-stefan
Show HN: HackerNews but for research papers
Hey guys, I love HN! I wanted to extend the same aesthetic and community towards things beyond tech-related news.<p>I thought it would be cool to get the same quality of community gathered around the latest and greatest research coming out.<p>Let me know what you guys think of what I have so far. It's still early so there are probably bugs and other quality issues.<p>If there's any features missing that you'd want let me know.<p>ALSO, if any of you are familiar with the map of the territory of any particular field, please let me know! Would love to pick your brain and to come up with a 'most important papers' section for each field.<p>Thank you!!<p>-stefan
Show HN: Adblock for Podcasts
This is a small app that achieves surprisingly good podcast adblocking. It transcribes the podcast, identifies ad segments in the transcript, then creates a new version of the podcast without the ads.
Show HN: B-field, a novel probabilistic key-value data structure (`rust-bfield`)
`rust-bfield` is a Rust implementation of our novel "B-field" data structure, which functions like a Bloom filter for key-value lookups instead of set membership queries.<p>The B-field allows you to compactly store data using only a few bytes per key-value pair. We've successfully utilized it in genomics to associate billions of "k-mers" with taxonomic identifiers while maintaining an efficient memory footprint. But the data structure is also useful beyond computational biology, particularly where you have large unique key domains and constrained value ranges.<p>Available under an Apache 2 license. We hope it proves useful, and we're happy to answer any questions!
Show HN: B-field, a novel probabilistic key-value data structure (`rust-bfield`)
`rust-bfield` is a Rust implementation of our novel "B-field" data structure, which functions like a Bloom filter for key-value lookups instead of set membership queries.<p>The B-field allows you to compactly store data using only a few bytes per key-value pair. We've successfully utilized it in genomics to associate billions of "k-mers" with taxonomic identifiers while maintaining an efficient memory footprint. But the data structure is also useful beyond computational biology, particularly where you have large unique key domains and constrained value ranges.<p>Available under an Apache 2 license. We hope it proves useful, and we're happy to answer any questions!
Show HN: B-field, a novel probabilistic key-value data structure (`rust-bfield`)
`rust-bfield` is a Rust implementation of our novel "B-field" data structure, which functions like a Bloom filter for key-value lookups instead of set membership queries.<p>The B-field allows you to compactly store data using only a few bytes per key-value pair. We've successfully utilized it in genomics to associate billions of "k-mers" with taxonomic identifiers while maintaining an efficient memory footprint. But the data structure is also useful beyond computational biology, particularly where you have large unique key domains and constrained value ranges.<p>Available under an Apache 2 license. We hope it proves useful, and we're happy to answer any questions!
Show HN: PBT – A property-based testing library for Ruby
Hello HN,<p>I introduce a property-based testing tool for Ruby.
Ruby's ease of test-writing and rich ecosystem are widely acclaimed. However, property-based testing is not as widely used as in other languages such as Haskell or Elixir, and I think this is because there is no de facto testing tool.<p>This gem currently not only has the basic functionality of stateless property-based testing but also has the following features:<p>- Support verbose mode that allows you to see its shrinking procedure and algorithms.<p>- Support several concurrency/parallel executions of each test. As of now, Ractor/Thread/Process are available. (The default is sequential, considering benchmark results and actual use cases)<p>Happy hacking!
Show HN: PBT – A property-based testing library for Ruby
Hello HN,<p>I introduce a property-based testing tool for Ruby.
Ruby's ease of test-writing and rich ecosystem are widely acclaimed. However, property-based testing is not as widely used as in other languages such as Haskell or Elixir, and I think this is because there is no de facto testing tool.<p>This gem currently not only has the basic functionality of stateless property-based testing but also has the following features:<p>- Support verbose mode that allows you to see its shrinking procedure and algorithms.<p>- Support several concurrency/parallel executions of each test. As of now, Ractor/Thread/Process are available. (The default is sequential, considering benchmark results and actual use cases)<p>Happy hacking!
Show HN: PBT – A property-based testing library for Ruby
Hello HN,<p>I introduce a property-based testing tool for Ruby.
Ruby's ease of test-writing and rich ecosystem are widely acclaimed. However, property-based testing is not as widely used as in other languages such as Haskell or Elixir, and I think this is because there is no de facto testing tool.<p>This gem currently not only has the basic functionality of stateless property-based testing but also has the following features:<p>- Support verbose mode that allows you to see its shrinking procedure and algorithms.<p>- Support several concurrency/parallel executions of each test. As of now, Ractor/Thread/Process are available. (The default is sequential, considering benchmark results and actual use cases)<p>Happy hacking!
Show HN: Neosync – Open-Source Data Anonymization for Postgres and MySQL
Hey HN, we're Evis and Nick and we're excited to be launching Neosync (<a href="https://www.github.com/nucleuscloud/neosync">https://www.github.com/nucleuscloud/neosync</a>). Neosync is an open source platform that helps developers anonymize production data, generate synthetic data and sync it across their environments for better testing, debugging and developer experience.<p>Most developers and teams have some version of a database seed script that creates some mock data for their local and stage databases. The problem is that production data is messy and it’s very difficult to replicate that with mock data. This causes two big problems for developers.<p>The first problem is that features seem to work locally/stage but have bugs and edge cases in production because the seed data you used to develop against was not representative of production data.<p>The second problem we saw was that debugging production errors would take a long time and would often resurface. When we see a bug in production, the first thing we want to do is reproduce it locally, but if we can’t reproduce the state of the data locally, then we’re kind of flying blind.<p>Working directly with production data would solve both of these problems but most teams can’t because of: (1) privacy/security issues and (2) scale. So we set out to solve these two problems with Neosync.<p>We solve the privacy and security problem using anonymization and synthetic data. We have 40+ pre-built transformers (or you can write your own in code) that can anonymize PII or sensitive data so that it’s safe to use locally. Additionally, you can generate synthetic data from scratch that fits your existing schema across your database.<p>The second problem is scale. Some production databases are too big to fit locally or just have more data than you need. Also, in some cases, you may want to debug a certain customer’s data and you only want their data. We solve this with subsetting. You can pass in a SQL query to filter your table(s) and Neosync will handle all of the heavy lifting including referential integrity.<p>At the core of Neosync does three things: (1) It streams data from a source to one or multiple destination databases. We never store your sensitive data. (2) While that data is being streamed, we transform it. You define which schemas and tables you want to sync and at the column level, select a transformer that defines how you want to anonymize the data or generate synthetic data. (3) We subset your data based on your filters.<p>We do all of this while handling referential integrity. Whether you have primary keys, foreign keys, unique constraints, circular dependencies (within a table and across tables), sequences and more, Neosync preserves those references.<p>We also ship with APIs, a Terraform provider, a CLI and Github action that you can use to hydrate a CI database.<p>Neosync is an open source project written in Go and Typescript and can be run on Docker Compose, Bare Metal, or Kubernetes via Helm. You can also use our hosted platform or managed platform that you can deploy in your VPC. We also have a hosted platform with a generous free tier - <a href="https://neosync.dev">https://neosync.dev</a><p>Here's a brief loom demo: <a href="https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?sid=2faf613c-92be-44fa-9278-c8087e777356" rel="nofollow">https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?...</a><p>We'd love any feedback you have!
Show HN: Neosync – Open-Source Data Anonymization for Postgres and MySQL
Hey HN, we're Evis and Nick and we're excited to be launching Neosync (<a href="https://www.github.com/nucleuscloud/neosync">https://www.github.com/nucleuscloud/neosync</a>). Neosync is an open source platform that helps developers anonymize production data, generate synthetic data and sync it across their environments for better testing, debugging and developer experience.<p>Most developers and teams have some version of a database seed script that creates some mock data for their local and stage databases. The problem is that production data is messy and it’s very difficult to replicate that with mock data. This causes two big problems for developers.<p>The first problem is that features seem to work locally/stage but have bugs and edge cases in production because the seed data you used to develop against was not representative of production data.<p>The second problem we saw was that debugging production errors would take a long time and would often resurface. When we see a bug in production, the first thing we want to do is reproduce it locally, but if we can’t reproduce the state of the data locally, then we’re kind of flying blind.<p>Working directly with production data would solve both of these problems but most teams can’t because of: (1) privacy/security issues and (2) scale. So we set out to solve these two problems with Neosync.<p>We solve the privacy and security problem using anonymization and synthetic data. We have 40+ pre-built transformers (or you can write your own in code) that can anonymize PII or sensitive data so that it’s safe to use locally. Additionally, you can generate synthetic data from scratch that fits your existing schema across your database.<p>The second problem is scale. Some production databases are too big to fit locally or just have more data than you need. Also, in some cases, you may want to debug a certain customer’s data and you only want their data. We solve this with subsetting. You can pass in a SQL query to filter your table(s) and Neosync will handle all of the heavy lifting including referential integrity.<p>At the core of Neosync does three things: (1) It streams data from a source to one or multiple destination databases. We never store your sensitive data. (2) While that data is being streamed, we transform it. You define which schemas and tables you want to sync and at the column level, select a transformer that defines how you want to anonymize the data or generate synthetic data. (3) We subset your data based on your filters.<p>We do all of this while handling referential integrity. Whether you have primary keys, foreign keys, unique constraints, circular dependencies (within a table and across tables), sequences and more, Neosync preserves those references.<p>We also ship with APIs, a Terraform provider, a CLI and Github action that you can use to hydrate a CI database.<p>Neosync is an open source project written in Go and Typescript and can be run on Docker Compose, Bare Metal, or Kubernetes via Helm. You can also use our hosted platform or managed platform that you can deploy in your VPC. We also have a hosted platform with a generous free tier - <a href="https://neosync.dev">https://neosync.dev</a><p>Here's a brief loom demo: <a href="https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?sid=2faf613c-92be-44fa-9278-c8087e777356" rel="nofollow">https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?...</a><p>We'd love any feedback you have!
Show HN: Neosync – Open-Source Data Anonymization for Postgres and MySQL
Hey HN, we're Evis and Nick and we're excited to be launching Neosync (<a href="https://www.github.com/nucleuscloud/neosync">https://www.github.com/nucleuscloud/neosync</a>). Neosync is an open source platform that helps developers anonymize production data, generate synthetic data and sync it across their environments for better testing, debugging and developer experience.<p>Most developers and teams have some version of a database seed script that creates some mock data for their local and stage databases. The problem is that production data is messy and it’s very difficult to replicate that with mock data. This causes two big problems for developers.<p>The first problem is that features seem to work locally/stage but have bugs and edge cases in production because the seed data you used to develop against was not representative of production data.<p>The second problem we saw was that debugging production errors would take a long time and would often resurface. When we see a bug in production, the first thing we want to do is reproduce it locally, but if we can’t reproduce the state of the data locally, then we’re kind of flying blind.<p>Working directly with production data would solve both of these problems but most teams can’t because of: (1) privacy/security issues and (2) scale. So we set out to solve these two problems with Neosync.<p>We solve the privacy and security problem using anonymization and synthetic data. We have 40+ pre-built transformers (or you can write your own in code) that can anonymize PII or sensitive data so that it’s safe to use locally. Additionally, you can generate synthetic data from scratch that fits your existing schema across your database.<p>The second problem is scale. Some production databases are too big to fit locally or just have more data than you need. Also, in some cases, you may want to debug a certain customer’s data and you only want their data. We solve this with subsetting. You can pass in a SQL query to filter your table(s) and Neosync will handle all of the heavy lifting including referential integrity.<p>At the core of Neosync does three things: (1) It streams data from a source to one or multiple destination databases. We never store your sensitive data. (2) While that data is being streamed, we transform it. You define which schemas and tables you want to sync and at the column level, select a transformer that defines how you want to anonymize the data or generate synthetic data. (3) We subset your data based on your filters.<p>We do all of this while handling referential integrity. Whether you have primary keys, foreign keys, unique constraints, circular dependencies (within a table and across tables), sequences and more, Neosync preserves those references.<p>We also ship with APIs, a Terraform provider, a CLI and Github action that you can use to hydrate a CI database.<p>Neosync is an open source project written in Go and Typescript and can be run on Docker Compose, Bare Metal, or Kubernetes via Helm. You can also use our hosted platform or managed platform that you can deploy in your VPC. We also have a hosted platform with a generous free tier - <a href="https://neosync.dev">https://neosync.dev</a><p>Here's a brief loom demo: <a href="https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?sid=2faf613c-92be-44fa-9278-c8087e777356" rel="nofollow">https://www.loom.com/share/ac21378d01cd4d848cf723e4960e8338?...</a><p>We'd love any feedback you have!
Show HN: Route your prompts to the best LLM
Hey HN, we've just finished building a dynamic router for LLMs, which takes each prompt and sends it to the most appropriate model and provider. We'd love to know what you think!<p>Here is a quick(ish) screen-recroding explaining how it works: <a href="https://youtu.be/ZpY6SIkBosE" rel="nofollow">https://youtu.be/ZpY6SIkBosE</a><p>Best results when training a custom router on your own prompt data: <a href="https://youtu.be/9JYqNbIEac0" rel="nofollow">https://youtu.be/9JYqNbIEac0</a><p>The router balances user preferences for quality, speed and cost. The end result is higher quality and faster LLM responses at lower cost.<p>The quality for each candidate LLM is predicted ahead of time using a neural scoring function, which is a BERT-like architecture conditioned on the prompt and a latent representation of the LLM being scored. The different LLMs are queried across the batch dimension, with the neural scoring architecture taking a single latent representation of the LLM as input per forward pass. This makes the scoring function very modular to query for different LLM combinations. It is trained in a supervised manner on several open LLM datasets, using GPT4 as a judge. The cost and speed data is taken from our live benchmarks, updated every few hours across all continents. The final "loss function" is a linear combination of quality, cost, inter-token-latency and time-to-first-token, with the user effectively scaling the weighting factors of this linear combination.<p>Smaller LLMs are often good enough for simple prompts, but knowing exactly how and when they might break is difficult. Simple perturbations of the phrasing can cause smaller LLMs to fail catastrophically, making them hard to rely on. For example, Gemma-7B converts numbers to strings and returns the "largest" string when asking for the "largest" number in a set, but works fine when asking for the "highest" or "maximum".<p>The router is able to learn these quirky distributions, and ensure that the smaller, cheaper and faster LLMs are only used when there is high confidence that they will get the answer correct.<p>Pricing-wise, we charge the same rates as the backend providers we route to, without taking any margins. We also give $50 in free credits to all new signups.<p>The router can be used off-the-shelf, or it can be trained directly on your own data for improved performance.<p>What do people think? Could this be useful?<p>Feedback of all kinds is welcome!