The best Hacker News stories from Show from the past day
Latest posts:
Show HN: Airweave – Let agents search any app
Hey HN, we're Lennert and Rauf. We’re building Airweave (<a href="https://github.com/airweave-ai/airweave">https://github.com/airweave-ai/airweave</a>), an open-source tool that lets agents search and retrieve data from any app or database. Here’s a general intro: <a href="https://www.youtube.com/watch?v=EFI-7SYGQ48" rel="nofollow">https://www.youtube.com/watch?v=EFI-7SYGQ48</a>, and here’s a longer one that shows more real-world use cases, examples of how Airweave is used by Cursor (0:33) and Claude desktop (2:04), etc.: <a href="https://youtu.be/p2dl-39HwQo" rel="nofollow">https://youtu.be/p2dl-39HwQo</a><p>A couple of months ago we were building agents that interacted with different apps and were frustrated when they struggled to handle vague natural language requests like "resolve that one Linear issue about missing auth configs", "if you get an email from an unsatisfied customer, reimburse their payment in Stripe", or "what were the returns for Q1 based on the financials sheet in gdrive?", only to have the agent inefficiently chain together loads of function calls to find the data or not find it at all and hallucinate.<p>We also noticed that despite the rise of MCP creating more desire for agents to interact with external resources, the majority of agent dev tooling focused on function calling and actions instead of search. We were annoyed by the lack of tooling that enabled agents to semantically search workspace or database contents, so we started building Airweave first as an internal solution. Then we decided to open-source it and pursue it full time after we got positive reactions from coworkers and other agent builders.<p>Airweave connects to productivity tools, databases, or document stores via their APIs and transforms their contents into searchable knowledge bases, accessible through a standardized interface for the agent. The search interface is exposed via REST or MCP. When using MCP, Airweave essentially builds a semantically searchable MCP server on top of the resource. The platform handles the entire data pipeline from connection and extraction to chunking, embedding, and serving. To ensure knowledge is current, it has automated sync capabilities, with configurable schedules and change detection through content hashing.<p>We built it with support for white-labeled multi-tenancy to provide OAuth2-based integration across multiple user accounts while maintaining privacy and security boundaries. We're also actively working on permission-awareness (i.e., RBAC on the data) for the platform.<p>So happy to share learnings and get insights from your experiences. looking forward to comments!
Show HN: Airweave – Let agents search any app
Hey HN, we're Lennert and Rauf. We’re building Airweave (<a href="https://github.com/airweave-ai/airweave">https://github.com/airweave-ai/airweave</a>), an open-source tool that lets agents search and retrieve data from any app or database. Here’s a general intro: <a href="https://www.youtube.com/watch?v=EFI-7SYGQ48" rel="nofollow">https://www.youtube.com/watch?v=EFI-7SYGQ48</a>, and here’s a longer one that shows more real-world use cases, examples of how Airweave is used by Cursor (0:33) and Claude desktop (2:04), etc.: <a href="https://youtu.be/p2dl-39HwQo" rel="nofollow">https://youtu.be/p2dl-39HwQo</a><p>A couple of months ago we were building agents that interacted with different apps and were frustrated when they struggled to handle vague natural language requests like "resolve that one Linear issue about missing auth configs", "if you get an email from an unsatisfied customer, reimburse their payment in Stripe", or "what were the returns for Q1 based on the financials sheet in gdrive?", only to have the agent inefficiently chain together loads of function calls to find the data or not find it at all and hallucinate.<p>We also noticed that despite the rise of MCP creating more desire for agents to interact with external resources, the majority of agent dev tooling focused on function calling and actions instead of search. We were annoyed by the lack of tooling that enabled agents to semantically search workspace or database contents, so we started building Airweave first as an internal solution. Then we decided to open-source it and pursue it full time after we got positive reactions from coworkers and other agent builders.<p>Airweave connects to productivity tools, databases, or document stores via their APIs and transforms their contents into searchable knowledge bases, accessible through a standardized interface for the agent. The search interface is exposed via REST or MCP. When using MCP, Airweave essentially builds a semantically searchable MCP server on top of the resource. The platform handles the entire data pipeline from connection and extraction to chunking, embedding, and serving. To ensure knowledge is current, it has automated sync capabilities, with configurable schedules and change detection through content hashing.<p>We built it with support for white-labeled multi-tenancy to provide OAuth2-based integration across multiple user accounts while maintaining privacy and security boundaries. We're also actively working on permission-awareness (i.e., RBAC on the data) for the platform.<p>So happy to share learnings and get insights from your experiences. looking forward to comments!
Show HN: Airweave – Let agents search any app
Hey HN, we're Lennert and Rauf. We’re building Airweave (<a href="https://github.com/airweave-ai/airweave">https://github.com/airweave-ai/airweave</a>), an open-source tool that lets agents search and retrieve data from any app or database. Here’s a general intro: <a href="https://www.youtube.com/watch?v=EFI-7SYGQ48" rel="nofollow">https://www.youtube.com/watch?v=EFI-7SYGQ48</a>, and here’s a longer one that shows more real-world use cases, examples of how Airweave is used by Cursor (0:33) and Claude desktop (2:04), etc.: <a href="https://youtu.be/p2dl-39HwQo" rel="nofollow">https://youtu.be/p2dl-39HwQo</a><p>A couple of months ago we were building agents that interacted with different apps and were frustrated when they struggled to handle vague natural language requests like "resolve that one Linear issue about missing auth configs", "if you get an email from an unsatisfied customer, reimburse their payment in Stripe", or "what were the returns for Q1 based on the financials sheet in gdrive?", only to have the agent inefficiently chain together loads of function calls to find the data or not find it at all and hallucinate.<p>We also noticed that despite the rise of MCP creating more desire for agents to interact with external resources, the majority of agent dev tooling focused on function calling and actions instead of search. We were annoyed by the lack of tooling that enabled agents to semantically search workspace or database contents, so we started building Airweave first as an internal solution. Then we decided to open-source it and pursue it full time after we got positive reactions from coworkers and other agent builders.<p>Airweave connects to productivity tools, databases, or document stores via their APIs and transforms their contents into searchable knowledge bases, accessible through a standardized interface for the agent. The search interface is exposed via REST or MCP. When using MCP, Airweave essentially builds a semantically searchable MCP server on top of the resource. The platform handles the entire data pipeline from connection and extraction to chunking, embedding, and serving. To ensure knowledge is current, it has automated sync capabilities, with configurable schedules and change detection through content hashing.<p>We built it with support for white-labeled multi-tenancy to provide OAuth2-based integration across multiple user accounts while maintaining privacy and security boundaries. We're also actively working on permission-awareness (i.e., RBAC on the data) for the platform.<p>So happy to share learnings and get insights from your experiences. looking forward to comments!
Show HN: GlassFlow – OSS streaming dedup and joins from Kafka to ClickHouse
Hi HN! We are Ashish and Armend, founders of GlassFlow. We just launched our open-source streaming ETL that deduplicates and joins Kafka streams before ingesting them to ClickHouse <a href="https://github.com/glassflow/clickhouse-etl">https://github.com/glassflow/clickhouse-etl</a><p>Why we built this:
Dedup with batch data is straightforward. You load the data into a temporary table. Then, find only the latest versions of the record through hashes or keys and keep them. After that, move the clean data into your main table. But have you tried this with streaming data?
Users of our prev product were running real-time analytics pipelines from Kafka to ClickHouse and noticed that the analyses were wrong due to duplicates. The source systems produced duplicates as they ingested similar user data from CRMs, shop systems and click streams.<p>We wanted to solve this issue for them with the existing ClickHouse options, but ClickHouse ReplacingMergeTree has an uncontrollable background merging process. This means the new data is in the system, but you never know when they’ll finish the merging, and until then, your queries return incorrect results.<p>We looked into using FINAL but haven't been happy with the speed for real-time workloads.<p>We tried Flink, but there is too much overhead to manage Java Flink jobs, and a self-built solution would have put us in a position to set up and maintain state storage, possibly a very large one (number of unique keys), to keep track of whether we have already encountered a record. And if your dedupe service fails, you need to rehydrate that state before processing new records. That would have been too much maintenance for us.<p>We decided to solve it by building a new product and are excited to share it with you.<p>The key difference is that the streams are deduplicated before ingesting to ClickHouse. So, ClickHouse always has clean data and less load, eliminating the risk of wrong results. We want more people to benefit from it and decided to open-source it (Apache-2.0).<p>Main components:<p>- Streaming deduplication:
You define the deduplication key and a time window (up to 7 days), and it handles the checks in real time to avoid duplicates before hitting ClickHouse. The state store is built in.<p>- Temporal Stream Joins:
You can join two Kafka streams on the fly with a few config inputs. You set the join key, choose a time window (up to 7 days), and you're good.<p>- Built-in Kafka source connector:
There is no need to build custom consumers or manage polling logic. Just point it at your Kafka cluster, and it auto-subscribes to the topics you define. Payloads are parsed as JSON by default, so you get structured data immediately. As underlying tech, we decided on NATS to make it lightweight and low-latency.<p>- ClickHouse sink:
Data gets pushed into ClickHouse through a native connector optimized for performance. You can tweak batch sizes and flush intervals to match your throughput needs. It handles retries automatically, so you don't lose data on transient failures.<p>We'd love to hear your feedback and know if you solved it nicely with existing tools. Thanks for reading!
Show HN: GlassFlow – OSS streaming dedup and joins from Kafka to ClickHouse
Hi HN! We are Ashish and Armend, founders of GlassFlow. We just launched our open-source streaming ETL that deduplicates and joins Kafka streams before ingesting them to ClickHouse <a href="https://github.com/glassflow/clickhouse-etl">https://github.com/glassflow/clickhouse-etl</a><p>Why we built this:
Dedup with batch data is straightforward. You load the data into a temporary table. Then, find only the latest versions of the record through hashes or keys and keep them. After that, move the clean data into your main table. But have you tried this with streaming data?
Users of our prev product were running real-time analytics pipelines from Kafka to ClickHouse and noticed that the analyses were wrong due to duplicates. The source systems produced duplicates as they ingested similar user data from CRMs, shop systems and click streams.<p>We wanted to solve this issue for them with the existing ClickHouse options, but ClickHouse ReplacingMergeTree has an uncontrollable background merging process. This means the new data is in the system, but you never know when they’ll finish the merging, and until then, your queries return incorrect results.<p>We looked into using FINAL but haven't been happy with the speed for real-time workloads.<p>We tried Flink, but there is too much overhead to manage Java Flink jobs, and a self-built solution would have put us in a position to set up and maintain state storage, possibly a very large one (number of unique keys), to keep track of whether we have already encountered a record. And if your dedupe service fails, you need to rehydrate that state before processing new records. That would have been too much maintenance for us.<p>We decided to solve it by building a new product and are excited to share it with you.<p>The key difference is that the streams are deduplicated before ingesting to ClickHouse. So, ClickHouse always has clean data and less load, eliminating the risk of wrong results. We want more people to benefit from it and decided to open-source it (Apache-2.0).<p>Main components:<p>- Streaming deduplication:
You define the deduplication key and a time window (up to 7 days), and it handles the checks in real time to avoid duplicates before hitting ClickHouse. The state store is built in.<p>- Temporal Stream Joins:
You can join two Kafka streams on the fly with a few config inputs. You set the join key, choose a time window (up to 7 days), and you're good.<p>- Built-in Kafka source connector:
There is no need to build custom consumers or manage polling logic. Just point it at your Kafka cluster, and it auto-subscribes to the topics you define. Payloads are parsed as JSON by default, so you get structured data immediately. As underlying tech, we decided on NATS to make it lightweight and low-latency.<p>- ClickHouse sink:
Data gets pushed into ClickHouse through a native connector optimized for performance. You can tweak batch sizes and flush intervals to match your throughput needs. It handles retries automatically, so you don't lose data on transient failures.<p>We'd love to hear your feedback and know if you solved it nicely with existing tools. Thanks for reading!
Show HN: GlassFlow – OSS streaming dedup and joins from Kafka to ClickHouse
Hi HN! We are Ashish and Armend, founders of GlassFlow. We just launched our open-source streaming ETL that deduplicates and joins Kafka streams before ingesting them to ClickHouse <a href="https://github.com/glassflow/clickhouse-etl">https://github.com/glassflow/clickhouse-etl</a><p>Why we built this:
Dedup with batch data is straightforward. You load the data into a temporary table. Then, find only the latest versions of the record through hashes or keys and keep them. After that, move the clean data into your main table. But have you tried this with streaming data?
Users of our prev product were running real-time analytics pipelines from Kafka to ClickHouse and noticed that the analyses were wrong due to duplicates. The source systems produced duplicates as they ingested similar user data from CRMs, shop systems and click streams.<p>We wanted to solve this issue for them with the existing ClickHouse options, but ClickHouse ReplacingMergeTree has an uncontrollable background merging process. This means the new data is in the system, but you never know when they’ll finish the merging, and until then, your queries return incorrect results.<p>We looked into using FINAL but haven't been happy with the speed for real-time workloads.<p>We tried Flink, but there is too much overhead to manage Java Flink jobs, and a self-built solution would have put us in a position to set up and maintain state storage, possibly a very large one (number of unique keys), to keep track of whether we have already encountered a record. And if your dedupe service fails, you need to rehydrate that state before processing new records. That would have been too much maintenance for us.<p>We decided to solve it by building a new product and are excited to share it with you.<p>The key difference is that the streams are deduplicated before ingesting to ClickHouse. So, ClickHouse always has clean data and less load, eliminating the risk of wrong results. We want more people to benefit from it and decided to open-source it (Apache-2.0).<p>Main components:<p>- Streaming deduplication:
You define the deduplication key and a time window (up to 7 days), and it handles the checks in real time to avoid duplicates before hitting ClickHouse. The state store is built in.<p>- Temporal Stream Joins:
You can join two Kafka streams on the fly with a few config inputs. You set the join key, choose a time window (up to 7 days), and you're good.<p>- Built-in Kafka source connector:
There is no need to build custom consumers or manage polling logic. Just point it at your Kafka cluster, and it auto-subscribes to the topics you define. Payloads are parsed as JSON by default, so you get structured data immediately. As underlying tech, we decided on NATS to make it lightweight and low-latency.<p>- ClickHouse sink:
Data gets pushed into ClickHouse through a native connector optimized for performance. You can tweak batch sizes and flush intervals to match your throughput needs. It handles retries automatically, so you don't lose data on transient failures.<p>We'd love to hear your feedback and know if you solved it nicely with existing tools. Thanks for reading!
Show HN: I’m 16 years old and working on my first startup, a study app
As a student with a lot of notes I had a problem with studying fast for tests. So I created Notiv an AI study app that analyzes your notes and prepares you for test.
Show HN: I’m 16 years old and working on my first startup, a study app
As a student with a lot of notes I had a problem with studying fast for tests. So I created Notiv an AI study app that analyzes your notes and prepares you for test.
Show HN: LoopMix128 – Fast C PRNG (.46ns), 2^128 Period, BigCrush/PractRand Pass
LoopMix128 is a fast C PRNG I wrote for non-cryptographic tasks.<p>GitHub (MIT): <a href="https://github.com/danielcota/LoopMix128">https://github.com/danielcota/LoopMix128</a><p>Highlights:<p>* ~0.37 ns/value (GCC 11.4, -O3 -march=native), 98% faster than xoroshiro128++ and PCG64.<p>* Passes TestU01 BigCrush & PractRand (32TB).<p>* Guaranteed 2^128 period.<p>* Proven injective (192-bit state) via Z3 SMT solver; allows parallel streams.<p>* Core requires only stdint.h.<p>Seeking feedback on design, use cases, or further testing.
Show HN: LoopMix128 – Fast C PRNG (.46ns), 2^128 Period, BigCrush/PractRand Pass
LoopMix128 is a fast C PRNG I wrote for non-cryptographic tasks.<p>GitHub (MIT): <a href="https://github.com/danielcota/LoopMix128">https://github.com/danielcota/LoopMix128</a><p>Highlights:<p>* ~0.37 ns/value (GCC 11.4, -O3 -march=native), 98% faster than xoroshiro128++ and PCG64.<p>* Passes TestU01 BigCrush & PractRand (32TB).<p>* Guaranteed 2^128 period.<p>* Proven injective (192-bit state) via Z3 SMT solver; allows parallel streams.<p>* Core requires only stdint.h.<p>Seeking feedback on design, use cases, or further testing.
Show HN: LoopMix128 – Fast C PRNG (.46ns), 2^128 Period, BigCrush/PractRand Pass
LoopMix128 is a fast C PRNG I wrote for non-cryptographic tasks.<p>GitHub (MIT): <a href="https://github.com/danielcota/LoopMix128">https://github.com/danielcota/LoopMix128</a><p>Highlights:<p>* ~0.37 ns/value (GCC 11.4, -O3 -march=native), 98% faster than xoroshiro128++ and PCG64.<p>* Passes TestU01 BigCrush & PractRand (32TB).<p>* Guaranteed 2^128 period.<p>* Proven injective (192-bit state) via Z3 SMT solver; allows parallel streams.<p>* Core requires only stdint.h.<p>Seeking feedback on design, use cases, or further testing.
Show HN: Code Claude Code
In the nature of Open Source, I am releasing something I'm actively working on but is insanely simple and will likely be made anyways.<p>It is an SDK for scripting Claude Code.<p>It's a lightweight (155 lines) and free wrapper around claude code<p>This is a big deal because it seems that using claude code and cursor has become largly repitive. My workflow typically goes like this:<p>Plan out my task into a file, then have claude code implement the plan into my code.<p>I'm actively building a product with this, but still wanted to make it OSS!<p>Use it now with
`pip install codesys`
Show HN: Code Claude Code
In the nature of Open Source, I am releasing something I'm actively working on but is insanely simple and will likely be made anyways.<p>It is an SDK for scripting Claude Code.<p>It's a lightweight (155 lines) and free wrapper around claude code<p>This is a big deal because it seems that using claude code and cursor has become largly repitive. My workflow typically goes like this:<p>Plan out my task into a file, then have claude code implement the plan into my code.<p>I'm actively building a product with this, but still wanted to make it OSS!<p>Use it now with
`pip install codesys`
Show HN: Code Claude Code
In the nature of Open Source, I am releasing something I'm actively working on but is insanely simple and will likely be made anyways.<p>It is an SDK for scripting Claude Code.<p>It's a lightweight (155 lines) and free wrapper around claude code<p>This is a big deal because it seems that using claude code and cursor has become largly repitive. My workflow typically goes like this:<p>Plan out my task into a file, then have claude code implement the plan into my code.<p>I'm actively building a product with this, but still wanted to make it OSS!<p>Use it now with
`pip install codesys`
Show HN: Xenolab – Rasp Pi monitor for my pet carnivourus plants
Show HN: Xenolab – Rasp Pi monitor for my pet carnivourus plants
Show HN: Xenolab – Rasp Pi monitor for my pet carnivourus plants
Show HN: BlenderQ – A TUI for managing multiple Blender renders
Hi HN,<p>I’m a solo content-creator/Blender user and developed this tool as an easy way to manage and render multiple Blender renders locally.<p>The TUI portion is written in TypeScript because it gave me a good way to build the front end that allowed for some complex components in a language that I was intimately familiar with, and the portions that interact with Blender are actually Python scripts.
Show HN: BlenderQ – A TUI for managing multiple Blender renders
Hi HN,<p>I’m a solo content-creator/Blender user and developed this tool as an easy way to manage and render multiple Blender renders locally.<p>The TUI portion is written in TypeScript because it gave me a good way to build the front end that allowed for some complex components in a language that I was intimately familiar with, and the portions that interact with Blender are actually Python scripts.
Show HN: Hydra (YC W22) – Serverless Analytics on Postgres
Hi HN, Hydra cofounders (Joe and JD) here (<a href="https://www.hydra.so/">https://www.hydra.so/</a>)! We enable realtime analytics on Postgres without requiring an external analytics database.<p>Traditionally, this was unfeasible: Postgres is a rowstore database that’s 1000X slower at analytical processing than a columnstore database.<p>(A quick refresher for anyone interested: A rowstore means table rows are stored sequentially, making it efficient at inserting / updating a record, but inefficient at filtering and aggregating data. At most businesses, analytical reporting scans large volumes of events, traces, time-series data. As the volume grows, the inefficiency of the rowstore compounds: i.e. it's not scalable for analytics. In contrast, a columnstore stores all the values of each column in sequence.)<p>For decades, it was a requirement for businesses to manage these differences between the row and columnstore’s relative strengths, by maintaining two separate systems. This led to large gaps in both functionality and syntax, and background knowledge of engineers. For example, here are the gaps between Redshift (a popular columnstore) and Postgres (rowstore) features: (<a href="https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-features.html" rel="nofollow">https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported...</a>).<p>We think there’s a better, simpler way: unify the rowstore and columnstore – keep the data in one place, stop the costs and hassle of managing an external analytics database. With Hydra, events, traces, time-series data, user sessions, clickstream, IOT telemetry, etc. are now accessible as a columnstore right alongside my standard rowstore tables.<p>Our solution: Hydra separates compute from storage to bring the analytics columnstore with serverless processing and automatic caching to your postgres database.<p>The term "serverless" can be a bit confusing, because a server always exists, but it means compute is ephemeral and spun up and down automatically. The database automatically provisions and isolates dedicated compute resources for each query process. Serverless is different from managed compute, where the user explicitly chooses to allocate and scale CPU and memory continuously, and potentially overpay during idle time.<p>How is serverless useful? It's important that every analytics query has its own resources per process. The major hurdles with running analytics on Postgres is 1) Rowstore performance 2) Resource contention. #2 is very often overlooked - but in practice, when analytics queries are run they tend to hog resources (RAM and CPU) from Postgres transactional work. So, a slightly expensive analytics query has the ability to slow down the entire database: that's why serverless is important: it guarantees the expensive queries are isolated and run on dedicated database resources per process.<p>why is hydra so fast at analytics? (<a href="https://tinyurl.com/hydraDBMS" rel="nofollow">https://tinyurl.com/hydraDBMS</a>) 1) columnstore by default 2) metadata for efficient file-skipping and retrieval 3) parallel, vectorized execution 4) automatic caching<p>what’s the killer feature? hydra can quickly join columnstore tables with standard row tables within postgres with direct sql.<p>example: “segment events as a table.” Instead of dumping segment event data into a s3 bucket or external analytics database, use hydra to store and join events (clicks, signups, purchases) with user profile data within postgres. know your users in realtime: “what events predict churn?” or “which user will likely convert?” is immediately actionable.<p>Thanks for reading! We would love to hear your feedback and if you'd like to try Hydra now, we offer a $300 credit and 14-days free per account. We're excited to see how bringing the columnstore and rowstore side-by-side can help your project.