The best Hacker News stories from Show from the past day
Latest posts:
Show HN: I built this Postgres logger
Hey HN,<p>Some of you were really interested in Postgres logging with pgAudit in my previous post here: <a href="https://news.ycombinator.com/item?id=37082827">https://news.ycombinator.com/item?id=37082827</a><p>So I built this logger: <a href="https://rocketgraph.io/logger-demo" rel="nofollow noreferrer">https://rocketgraph.io/logger-demo</a><p>using pgAudit to show you what can be done with Postgres auditing. It offers some powerful features like "get me all the CREATE queries that ran in the past hour". These are generated by AWS RDS Instance running on my Rocketgraph account. Then they are forwarded to Cloudwatch for complex querying. In the future we can connect these logs to slack so you can get slack alerts when a developer accidentally DROPs a table.<p>If you like my work, please check it out here: <a href="https://github.com/RocketsGraphQL/rgraph">https://github.com/RocketsGraphQL/rgraph</a><p>And if you want this logging on your own Postgres instance. Use <a href="https://rocketgraph.io/" rel="nofollow noreferrer">https://rocketgraph.io/</a>
And setup a project. pgAudit is automatically installed.
Show HN: Use Code Llama as Drop-In Replacement for Copilot Chat
Hi HN,<p>Code Llama was released, but we noticed a ton of questions in the main thread about how/where to use it — not just from an API or the terminal, but <i>in your own codebase</i> as a drop-in replacement for Copilot Chat. Without this, developers don't get much utility from the model.<p>This concern is also important because benchmarks like HumanEval don't perfectly reflect the quality of responses. There's likely to be a flurry of improvements to coding models in the coming months, and rather than relying on the benchmarks to evaluate them, the community will get better feedback from people actually using the models. This means <i>real</i> usage in <i>real</i>, everyday workflows.<p>We've worked to make this possible with Continue (<a href="https://github.com/continuedev/continue">https://github.com/continuedev/continue</a>) and want to hear what you find to be the real capabilities of Code Llama. Is it on-par with GPT-4, does it require fine-tuning, or does it excel at certain tasks?<p>If you’d like to try Code Llama with Continue, it only takes a few steps to set up (<a href="https://continue.dev/docs/walkthroughs/codellama">https://continue.dev/docs/walkthroughs/codellama</a>), either locally with Ollama, or through TogetherAI or Replicate's APIs.
Show HN: Use Code Llama as Drop-In Replacement for Copilot Chat
Hi HN,<p>Code Llama was released, but we noticed a ton of questions in the main thread about how/where to use it — not just from an API or the terminal, but <i>in your own codebase</i> as a drop-in replacement for Copilot Chat. Without this, developers don't get much utility from the model.<p>This concern is also important because benchmarks like HumanEval don't perfectly reflect the quality of responses. There's likely to be a flurry of improvements to coding models in the coming months, and rather than relying on the benchmarks to evaluate them, the community will get better feedback from people actually using the models. This means <i>real</i> usage in <i>real</i>, everyday workflows.<p>We've worked to make this possible with Continue (<a href="https://github.com/continuedev/continue">https://github.com/continuedev/continue</a>) and want to hear what you find to be the real capabilities of Code Llama. Is it on-par with GPT-4, does it require fine-tuning, or does it excel at certain tasks?<p>If you’d like to try Code Llama with Continue, it only takes a few steps to set up (<a href="https://continue.dev/docs/walkthroughs/codellama">https://continue.dev/docs/walkthroughs/codellama</a>), either locally with Ollama, or through TogetherAI or Replicate's APIs.
Show HN: Use Code Llama as Drop-In Replacement for Copilot Chat
Hi HN,<p>Code Llama was released, but we noticed a ton of questions in the main thread about how/where to use it — not just from an API or the terminal, but <i>in your own codebase</i> as a drop-in replacement for Copilot Chat. Without this, developers don't get much utility from the model.<p>This concern is also important because benchmarks like HumanEval don't perfectly reflect the quality of responses. There's likely to be a flurry of improvements to coding models in the coming months, and rather than relying on the benchmarks to evaluate them, the community will get better feedback from people actually using the models. This means <i>real</i> usage in <i>real</i>, everyday workflows.<p>We've worked to make this possible with Continue (<a href="https://github.com/continuedev/continue">https://github.com/continuedev/continue</a>) and want to hear what you find to be the real capabilities of Code Llama. Is it on-par with GPT-4, does it require fine-tuning, or does it excel at certain tasks?<p>If you’d like to try Code Llama with Continue, it only takes a few steps to set up (<a href="https://continue.dev/docs/walkthroughs/codellama">https://continue.dev/docs/walkthroughs/codellama</a>), either locally with Ollama, or through TogetherAI or Replicate's APIs.
Show HN: Shimmer – ADHD coaching for adults, now on web
Hi, I’m Chris, one of the co-founders of Shimmer. Last October, following my ADHD diagnosis, I launched Shimmer (<a href="https://shimmer.care">https://shimmer.care</a>), one-to-one ADHD Coaching for adults. Our HN launch was here: <a href="https://news.ycombinator.com/item?id=33468611">https://news.ycombinator.com/item?id=33468611</a>.<p>A quick recap before I dive into our new launch: Shimmer is an ADHD coaching service for adults. We took apart the traditionally expensive, inaccessible ADHD coaching offering ($300-600+/session) and redesigned it from first principles. You get matched with one of our expert ADHD coaches, meet weekly over video, and get supported throughout the week via text and with learning tools. This solution is special to me personally (and our community) because it doesn’t just give you “knowledge” or offer another “tool”—our coaches help you set realistic goals, take personalized steps towards it, and keep you accountable.<p>Today we’re excited to launch our most-request feature: Web.<p>Over the past 9 months, we learned (and iterated) a lot with our members and coaches. A few key challenges pointed to the need for a web version:
(1) ADHD “object permanence” challenges (e.g. out of sight out of mind), we needed to be multi-platform so when you finish a task or goal or encounter a challenge, regardless of if you’re near your laptop or phone, you can check it off & ping your coach right away,
(2) members used reflection modules (e.g. after each task, you’re prompted to reflect on what worked and didn’t work, and it informs your coach) more thoroughly than we originally anticipated, and web allows for deeper reflection and typing,
(3) overarching coaching goals were often forgotten during the day-to-day, and the web makes it easier to use visual cues to keep goals top of mind for motivation,
(4) many of our members struggle with phone addiction and driving members to the mobile app ended them up in Tiktok/IG, whereas the web app offers a focused environment to get in their “coaching zone”.<p>Our new web app was designed alongside over 1,200 members, 22 coaches, countless hours of testing and iterating. We’re excited (but nervous!) to unveil this new version. If you have ADHD (or think you do), we’d love for you to check out our platform and give us critical feedback (or positive reinforcement!). It’s a super streamlined and ADHD-friendly signup process and in honor of our web launch and back to school/work, the first month is 30% off.<p>Our pricing: $115/mo. for Essentials plan (15-min weekly sessions), $230/mo. for Standard plan (30-min weekly sessions), $345/mo. for Immersive plan (45-min weekly sessions); all plans additional 30% off first month, HSA/FSA-eligible.<p>We know these prices are expensive for many people with ADHD and we’re committed to bringing costs down over time. It’s more affordable than what many people are paying for coaches, but the fact that we’re relying on humans, and not going the “we can automate all this with AI” route, puts a floor on how low the costs can drop. That said, here are some actions we’re taking to drive down costs for those who need it: (1) we offer needs-based scholarships and aim to have 5% of members on them at any time, (2) we often run fully sponsored scholarships with our partners—over 40 full ride scholarships and 100 group coaching spots have been disbursed alongside Asian Mental Health Project, government of Canada, and more, and (3) we have aligned our coaching model alongside Health & Wellness Coaching, which is expected to be reimbursed in 2024. If you have ideas or expertise here, please reach out to me directly at chris@shimmer.care.<p>On behalf of our small but mighty & passionate Shimmer team, I’m excited for the Hacker News community to share your thoughts, feedback, and ideas. If you feel comfortable, I’d also love to hear your personal ADHD story and what has worked / hasn’t worked for you.<p>Co-founders Christal & Vikram
Show HN: Shimmer – ADHD coaching for adults, now on web
Hi, I’m Chris, one of the co-founders of Shimmer. Last October, following my ADHD diagnosis, I launched Shimmer (<a href="https://shimmer.care">https://shimmer.care</a>), one-to-one ADHD Coaching for adults. Our HN launch was here: <a href="https://news.ycombinator.com/item?id=33468611">https://news.ycombinator.com/item?id=33468611</a>.<p>A quick recap before I dive into our new launch: Shimmer is an ADHD coaching service for adults. We took apart the traditionally expensive, inaccessible ADHD coaching offering ($300-600+/session) and redesigned it from first principles. You get matched with one of our expert ADHD coaches, meet weekly over video, and get supported throughout the week via text and with learning tools. This solution is special to me personally (and our community) because it doesn’t just give you “knowledge” or offer another “tool”—our coaches help you set realistic goals, take personalized steps towards it, and keep you accountable.<p>Today we’re excited to launch our most-request feature: Web.<p>Over the past 9 months, we learned (and iterated) a lot with our members and coaches. A few key challenges pointed to the need for a web version:
(1) ADHD “object permanence” challenges (e.g. out of sight out of mind), we needed to be multi-platform so when you finish a task or goal or encounter a challenge, regardless of if you’re near your laptop or phone, you can check it off & ping your coach right away,
(2) members used reflection modules (e.g. after each task, you’re prompted to reflect on what worked and didn’t work, and it informs your coach) more thoroughly than we originally anticipated, and web allows for deeper reflection and typing,
(3) overarching coaching goals were often forgotten during the day-to-day, and the web makes it easier to use visual cues to keep goals top of mind for motivation,
(4) many of our members struggle with phone addiction and driving members to the mobile app ended them up in Tiktok/IG, whereas the web app offers a focused environment to get in their “coaching zone”.<p>Our new web app was designed alongside over 1,200 members, 22 coaches, countless hours of testing and iterating. We’re excited (but nervous!) to unveil this new version. If you have ADHD (or think you do), we’d love for you to check out our platform and give us critical feedback (or positive reinforcement!). It’s a super streamlined and ADHD-friendly signup process and in honor of our web launch and back to school/work, the first month is 30% off.<p>Our pricing: $115/mo. for Essentials plan (15-min weekly sessions), $230/mo. for Standard plan (30-min weekly sessions), $345/mo. for Immersive plan (45-min weekly sessions); all plans additional 30% off first month, HSA/FSA-eligible.<p>We know these prices are expensive for many people with ADHD and we’re committed to bringing costs down over time. It’s more affordable than what many people are paying for coaches, but the fact that we’re relying on humans, and not going the “we can automate all this with AI” route, puts a floor on how low the costs can drop. That said, here are some actions we’re taking to drive down costs for those who need it: (1) we offer needs-based scholarships and aim to have 5% of members on them at any time, (2) we often run fully sponsored scholarships with our partners—over 40 full ride scholarships and 100 group coaching spots have been disbursed alongside Asian Mental Health Project, government of Canada, and more, and (3) we have aligned our coaching model alongside Health & Wellness Coaching, which is expected to be reimbursed in 2024. If you have ideas or expertise here, please reach out to me directly at chris@shimmer.care.<p>On behalf of our small but mighty & passionate Shimmer team, I’m excited for the Hacker News community to share your thoughts, feedback, and ideas. If you feel comfortable, I’d also love to hear your personal ADHD story and what has worked / hasn’t worked for you.<p>Co-founders Christal & Vikram
Show HN: Shimmer – ADHD coaching for adults, now on web
Hi, I’m Chris, one of the co-founders of Shimmer. Last October, following my ADHD diagnosis, I launched Shimmer (<a href="https://shimmer.care">https://shimmer.care</a>), one-to-one ADHD Coaching for adults. Our HN launch was here: <a href="https://news.ycombinator.com/item?id=33468611">https://news.ycombinator.com/item?id=33468611</a>.<p>A quick recap before I dive into our new launch: Shimmer is an ADHD coaching service for adults. We took apart the traditionally expensive, inaccessible ADHD coaching offering ($300-600+/session) and redesigned it from first principles. You get matched with one of our expert ADHD coaches, meet weekly over video, and get supported throughout the week via text and with learning tools. This solution is special to me personally (and our community) because it doesn’t just give you “knowledge” or offer another “tool”—our coaches help you set realistic goals, take personalized steps towards it, and keep you accountable.<p>Today we’re excited to launch our most-request feature: Web.<p>Over the past 9 months, we learned (and iterated) a lot with our members and coaches. A few key challenges pointed to the need for a web version:
(1) ADHD “object permanence” challenges (e.g. out of sight out of mind), we needed to be multi-platform so when you finish a task or goal or encounter a challenge, regardless of if you’re near your laptop or phone, you can check it off & ping your coach right away,
(2) members used reflection modules (e.g. after each task, you’re prompted to reflect on what worked and didn’t work, and it informs your coach) more thoroughly than we originally anticipated, and web allows for deeper reflection and typing,
(3) overarching coaching goals were often forgotten during the day-to-day, and the web makes it easier to use visual cues to keep goals top of mind for motivation,
(4) many of our members struggle with phone addiction and driving members to the mobile app ended them up in Tiktok/IG, whereas the web app offers a focused environment to get in their “coaching zone”.<p>Our new web app was designed alongside over 1,200 members, 22 coaches, countless hours of testing and iterating. We’re excited (but nervous!) to unveil this new version. If you have ADHD (or think you do), we’d love for you to check out our platform and give us critical feedback (or positive reinforcement!). It’s a super streamlined and ADHD-friendly signup process and in honor of our web launch and back to school/work, the first month is 30% off.<p>Our pricing: $115/mo. for Essentials plan (15-min weekly sessions), $230/mo. for Standard plan (30-min weekly sessions), $345/mo. for Immersive plan (45-min weekly sessions); all plans additional 30% off first month, HSA/FSA-eligible.<p>We know these prices are expensive for many people with ADHD and we’re committed to bringing costs down over time. It’s more affordable than what many people are paying for coaches, but the fact that we’re relying on humans, and not going the “we can automate all this with AI” route, puts a floor on how low the costs can drop. That said, here are some actions we’re taking to drive down costs for those who need it: (1) we offer needs-based scholarships and aim to have 5% of members on them at any time, (2) we often run fully sponsored scholarships with our partners—over 40 full ride scholarships and 100 group coaching spots have been disbursed alongside Asian Mental Health Project, government of Canada, and more, and (3) we have aligned our coaching model alongside Health & Wellness Coaching, which is expected to be reimbursed in 2024. If you have ideas or expertise here, please reach out to me directly at chris@shimmer.care.<p>On behalf of our small but mighty & passionate Shimmer team, I’m excited for the Hacker News community to share your thoughts, feedback, and ideas. If you feel comfortable, I’d also love to hear your personal ADHD story and what has worked / hasn’t worked for you.<p>Co-founders Christal & Vikram
Show HN: Shimmer – ADHD coaching for adults, now on web
Hi, I’m Chris, one of the co-founders of Shimmer. Last October, following my ADHD diagnosis, I launched Shimmer (<a href="https://shimmer.care">https://shimmer.care</a>), one-to-one ADHD Coaching for adults. Our HN launch was here: <a href="https://news.ycombinator.com/item?id=33468611">https://news.ycombinator.com/item?id=33468611</a>.<p>A quick recap before I dive into our new launch: Shimmer is an ADHD coaching service for adults. We took apart the traditionally expensive, inaccessible ADHD coaching offering ($300-600+/session) and redesigned it from first principles. You get matched with one of our expert ADHD coaches, meet weekly over video, and get supported throughout the week via text and with learning tools. This solution is special to me personally (and our community) because it doesn’t just give you “knowledge” or offer another “tool”—our coaches help you set realistic goals, take personalized steps towards it, and keep you accountable.<p>Today we’re excited to launch our most-request feature: Web.<p>Over the past 9 months, we learned (and iterated) a lot with our members and coaches. A few key challenges pointed to the need for a web version:
(1) ADHD “object permanence” challenges (e.g. out of sight out of mind), we needed to be multi-platform so when you finish a task or goal or encounter a challenge, regardless of if you’re near your laptop or phone, you can check it off & ping your coach right away,
(2) members used reflection modules (e.g. after each task, you’re prompted to reflect on what worked and didn’t work, and it informs your coach) more thoroughly than we originally anticipated, and web allows for deeper reflection and typing,
(3) overarching coaching goals were often forgotten during the day-to-day, and the web makes it easier to use visual cues to keep goals top of mind for motivation,
(4) many of our members struggle with phone addiction and driving members to the mobile app ended them up in Tiktok/IG, whereas the web app offers a focused environment to get in their “coaching zone”.<p>Our new web app was designed alongside over 1,200 members, 22 coaches, countless hours of testing and iterating. We’re excited (but nervous!) to unveil this new version. If you have ADHD (or think you do), we’d love for you to check out our platform and give us critical feedback (or positive reinforcement!). It’s a super streamlined and ADHD-friendly signup process and in honor of our web launch and back to school/work, the first month is 30% off.<p>Our pricing: $115/mo. for Essentials plan (15-min weekly sessions), $230/mo. for Standard plan (30-min weekly sessions), $345/mo. for Immersive plan (45-min weekly sessions); all plans additional 30% off first month, HSA/FSA-eligible.<p>We know these prices are expensive for many people with ADHD and we’re committed to bringing costs down over time. It’s more affordable than what many people are paying for coaches, but the fact that we’re relying on humans, and not going the “we can automate all this with AI” route, puts a floor on how low the costs can drop. That said, here are some actions we’re taking to drive down costs for those who need it: (1) we offer needs-based scholarships and aim to have 5% of members on them at any time, (2) we often run fully sponsored scholarships with our partners—over 40 full ride scholarships and 100 group coaching spots have been disbursed alongside Asian Mental Health Project, government of Canada, and more, and (3) we have aligned our coaching model alongside Health & Wellness Coaching, which is expected to be reimbursed in 2024. If you have ideas or expertise here, please reach out to me directly at chris@shimmer.care.<p>On behalf of our small but mighty & passionate Shimmer team, I’m excited for the Hacker News community to share your thoughts, feedback, and ideas. If you feel comfortable, I’d also love to hear your personal ADHD story and what has worked / hasn’t worked for you.<p>Co-founders Christal & Vikram
Show HN: Open-source obsidian.md sync server
<a href="https://github.com/acheong08/obsidian-sync">https://github.com/acheong08/obsidian-sync</a><p>Hello HN,<p>I'm a recent high school graduate and can't afford $8 per month for the official sync service, so I tried my hand at replicating the server.<p>It's still missing a few features, such as file recovery and history, but the basic sync is working.<p>To the creators of Obsidian.md: I'm probably violating the TOS, and I'm sorry. I'll take down the repository if asked. It's not ready for production and is highly inefficient; Not competition, so I hope you'll be lenient.
Show HN: Open-source obsidian.md sync server
<a href="https://github.com/acheong08/obsidian-sync">https://github.com/acheong08/obsidian-sync</a><p>Hello HN,<p>I'm a recent high school graduate and can't afford $8 per month for the official sync service, so I tried my hand at replicating the server.<p>It's still missing a few features, such as file recovery and history, but the basic sync is working.<p>To the creators of Obsidian.md: I'm probably violating the TOS, and I'm sorry. I'll take down the repository if asked. It's not ready for production and is highly inefficient; Not competition, so I hope you'll be lenient.
Show HN: Open-source obsidian.md sync server
<a href="https://github.com/acheong08/obsidian-sync">https://github.com/acheong08/obsidian-sync</a><p>Hello HN,<p>I'm a recent high school graduate and can't afford $8 per month for the official sync service, so I tried my hand at replicating the server.<p>It's still missing a few features, such as file recovery and history, but the basic sync is working.<p>To the creators of Obsidian.md: I'm probably violating the TOS, and I'm sorry. I'll take down the repository if asked. It's not ready for production and is highly inefficient; Not competition, so I hope you'll be lenient.
Show HN: Open-source obsidian.md sync server
<a href="https://github.com/acheong08/obsidian-sync">https://github.com/acheong08/obsidian-sync</a><p>Hello HN,<p>I'm a recent high school graduate and can't afford $8 per month for the official sync service, so I tried my hand at replicating the server.<p>It's still missing a few features, such as file recovery and history, but the basic sync is working.<p>To the creators of Obsidian.md: I'm probably violating the TOS, and I'm sorry. I'll take down the repository if asked. It's not ready for production and is highly inefficient; Not competition, so I hope you'll be lenient.
Show HN: SQL Formatter
Show HN: Gentrace – evaluation and observability for generative AI
Hi HN,<p>Gentrace is our new evaluation and observability tool for generative AI (open beta).<p>Generative pipelines are hard to evaluate because outputs are subjective. Lots of developers end up just doing “gut checks” on a few inputs before shipping changes, or they build up a spreadsheet of test cases that they manually run through the pipeline. Some companies outsource filling out the spreadsheet. However, in any of these cases, you end up with a very slow and expensive process for evaluation.<p>At one point, we did this too. Gentrace is the result of a pivot; it was an internal tool we used to automatically grade new PRs as developers shipped changes to generative pipelines that other people thought might be useful.<p>Gentrace makes pre-production testing of generative pipelines continuous and nearly instantaneous. In Gentrace, you:<p>- Import and/or construct suites of test data
- Use a combination of AI and heuristic evaluators to grade for quality, hallucination, safety, etc
- Use our interface to correct automated grades or add your own (yourself or a member of your team)<p>Gentrace integrates at a code level for evaluation, meaning we test your generative AI pipeline the way you would test normal code. This allows you to test more than just prompt changes; for example, you can compare models (eg Claude 2 vs GPT-4 vs GPT 3.5 vs Llama 2) or see the effects of additional chained steps (”Rewrite the previous answer in the following tone:”).<p>Here’s a video overview that goes into a bit more detail: <a href="https://youtu.be/XxgDPSrTWIw" rel="nofollow noreferrer">https://youtu.be/XxgDPSrTWIw</a><p>In production, Gentrace observes for speed, cost, and data flow. It also shows real user feedback as well. We do this by integrating via our SDK at a code level; Gentrace does not proxy requests.<p>Soon, we’ll allow you to convert production data into test cases, allowing customer support to turn bad production generations into “failing tests” for AI teams to make pass.<p>We process interim steps and multiple outputs as well, helping evaluate agent flows / chains where the “last output” isn’t always the only thing that matters.<p>There’s been a lot of observability tools published recently. We differ from those by focusing more strongly on blending observability with strong evaluation and by using an SDK rather than a “man-in-the-middle” approach to capturing data (ie Gentrace can be down and your request to OpenAI will still succeed).<p>Within the evaluation landscape, we differentiate by integrating with code (see above for benefits) for capturing generative outputs and by providing a customizable UI workflow for building evaluators. In Gentrace, you start with off-the-shelf automated evaluators and then customize them to your specific task. You also build and run new evaluators on old generative outputs. Finally, you easily override automated evaluators and/or blend automated evaluation with evaluation by humans on your team.<p>We also focus on being suitable for business use. We are SOC 2 Type 1 compliant (Type 2 coming shortly), have robust legal documentation around data processing, security, and privacy, and have already passed several vendor legal and security reviews at large technology companies.<p>Our standard usage-based pricing is available on the website: <a href="https://gentrace.ai/pricing" rel="nofollow noreferrer">https://gentrace.ai/pricing</a><p>If you are building features with generative AI, we would love to get your feedback. You can self-serve sign up (without a credit card) for a 14 day trial here: <a href="https://gentrace.ai/" rel="nofollow noreferrer">https://gentrace.ai/</a><p>We’re available right here for feedback and questions. We’re also available at support@gentrace.ai.<p>Best,
Doug, Vivek, and Daniel
Show HN: Gentrace – evaluation and observability for generative AI
Hi HN,<p>Gentrace is our new evaluation and observability tool for generative AI (open beta).<p>Generative pipelines are hard to evaluate because outputs are subjective. Lots of developers end up just doing “gut checks” on a few inputs before shipping changes, or they build up a spreadsheet of test cases that they manually run through the pipeline. Some companies outsource filling out the spreadsheet. However, in any of these cases, you end up with a very slow and expensive process for evaluation.<p>At one point, we did this too. Gentrace is the result of a pivot; it was an internal tool we used to automatically grade new PRs as developers shipped changes to generative pipelines that other people thought might be useful.<p>Gentrace makes pre-production testing of generative pipelines continuous and nearly instantaneous. In Gentrace, you:<p>- Import and/or construct suites of test data
- Use a combination of AI and heuristic evaluators to grade for quality, hallucination, safety, etc
- Use our interface to correct automated grades or add your own (yourself or a member of your team)<p>Gentrace integrates at a code level for evaluation, meaning we test your generative AI pipeline the way you would test normal code. This allows you to test more than just prompt changes; for example, you can compare models (eg Claude 2 vs GPT-4 vs GPT 3.5 vs Llama 2) or see the effects of additional chained steps (”Rewrite the previous answer in the following tone:”).<p>Here’s a video overview that goes into a bit more detail: <a href="https://youtu.be/XxgDPSrTWIw" rel="nofollow noreferrer">https://youtu.be/XxgDPSrTWIw</a><p>In production, Gentrace observes for speed, cost, and data flow. It also shows real user feedback as well. We do this by integrating via our SDK at a code level; Gentrace does not proxy requests.<p>Soon, we’ll allow you to convert production data into test cases, allowing customer support to turn bad production generations into “failing tests” for AI teams to make pass.<p>We process interim steps and multiple outputs as well, helping evaluate agent flows / chains where the “last output” isn’t always the only thing that matters.<p>There’s been a lot of observability tools published recently. We differ from those by focusing more strongly on blending observability with strong evaluation and by using an SDK rather than a “man-in-the-middle” approach to capturing data (ie Gentrace can be down and your request to OpenAI will still succeed).<p>Within the evaluation landscape, we differentiate by integrating with code (see above for benefits) for capturing generative outputs and by providing a customizable UI workflow for building evaluators. In Gentrace, you start with off-the-shelf automated evaluators and then customize them to your specific task. You also build and run new evaluators on old generative outputs. Finally, you easily override automated evaluators and/or blend automated evaluation with evaluation by humans on your team.<p>We also focus on being suitable for business use. We are SOC 2 Type 1 compliant (Type 2 coming shortly), have robust legal documentation around data processing, security, and privacy, and have already passed several vendor legal and security reviews at large technology companies.<p>Our standard usage-based pricing is available on the website: <a href="https://gentrace.ai/pricing" rel="nofollow noreferrer">https://gentrace.ai/pricing</a><p>If you are building features with generative AI, we would love to get your feedback. You can self-serve sign up (without a credit card) for a 14 day trial here: <a href="https://gentrace.ai/" rel="nofollow noreferrer">https://gentrace.ai/</a><p>We’re available right here for feedback and questions. We’re also available at support@gentrace.ai.<p>Best,
Doug, Vivek, and Daniel
Show HN: Gentrace – evaluation and observability for generative AI
Hi HN,<p>Gentrace is our new evaluation and observability tool for generative AI (open beta).<p>Generative pipelines are hard to evaluate because outputs are subjective. Lots of developers end up just doing “gut checks” on a few inputs before shipping changes, or they build up a spreadsheet of test cases that they manually run through the pipeline. Some companies outsource filling out the spreadsheet. However, in any of these cases, you end up with a very slow and expensive process for evaluation.<p>At one point, we did this too. Gentrace is the result of a pivot; it was an internal tool we used to automatically grade new PRs as developers shipped changes to generative pipelines that other people thought might be useful.<p>Gentrace makes pre-production testing of generative pipelines continuous and nearly instantaneous. In Gentrace, you:<p>- Import and/or construct suites of test data
- Use a combination of AI and heuristic evaluators to grade for quality, hallucination, safety, etc
- Use our interface to correct automated grades or add your own (yourself or a member of your team)<p>Gentrace integrates at a code level for evaluation, meaning we test your generative AI pipeline the way you would test normal code. This allows you to test more than just prompt changes; for example, you can compare models (eg Claude 2 vs GPT-4 vs GPT 3.5 vs Llama 2) or see the effects of additional chained steps (”Rewrite the previous answer in the following tone:”).<p>Here’s a video overview that goes into a bit more detail: <a href="https://youtu.be/XxgDPSrTWIw" rel="nofollow noreferrer">https://youtu.be/XxgDPSrTWIw</a><p>In production, Gentrace observes for speed, cost, and data flow. It also shows real user feedback as well. We do this by integrating via our SDK at a code level; Gentrace does not proxy requests.<p>Soon, we’ll allow you to convert production data into test cases, allowing customer support to turn bad production generations into “failing tests” for AI teams to make pass.<p>We process interim steps and multiple outputs as well, helping evaluate agent flows / chains where the “last output” isn’t always the only thing that matters.<p>There’s been a lot of observability tools published recently. We differ from those by focusing more strongly on blending observability with strong evaluation and by using an SDK rather than a “man-in-the-middle” approach to capturing data (ie Gentrace can be down and your request to OpenAI will still succeed).<p>Within the evaluation landscape, we differentiate by integrating with code (see above for benefits) for capturing generative outputs and by providing a customizable UI workflow for building evaluators. In Gentrace, you start with off-the-shelf automated evaluators and then customize them to your specific task. You also build and run new evaluators on old generative outputs. Finally, you easily override automated evaluators and/or blend automated evaluation with evaluation by humans on your team.<p>We also focus on being suitable for business use. We are SOC 2 Type 1 compliant (Type 2 coming shortly), have robust legal documentation around data processing, security, and privacy, and have already passed several vendor legal and security reviews at large technology companies.<p>Our standard usage-based pricing is available on the website: <a href="https://gentrace.ai/pricing" rel="nofollow noreferrer">https://gentrace.ai/pricing</a><p>If you are building features with generative AI, we would love to get your feedback. You can self-serve sign up (without a credit card) for a 14 day trial here: <a href="https://gentrace.ai/" rel="nofollow noreferrer">https://gentrace.ai/</a><p>We’re available right here for feedback and questions. We’re also available at support@gentrace.ai.<p>Best,
Doug, Vivek, and Daniel
Show HN: Pip install inference, open source computer vision deployment
Deploying vision models is time consuming and tedious. Setting up dependencies. Fixing conflicts. Configuring TRT acceleration. Flashing (and re-flashing) NVIDIA Jetsons. A streamlined, developer-friendly solution for inference is needed.<p>We, the Roboflow team, have been hard at work open sourcing Inference, an open source vision deployment solution. Our solution is designed with developers in mind, offering a HTTP-based interface. Run models on your hardware without having to write architecture-specific inference code. Here's a demo showing how to go from a model to GPU inference on a video of a football game in ~10 minutes:<p><a href="https://www.youtube.com/watch?v=at-yuwIMiN4">https://www.youtube.com/watch?v=at-yuwIMiN4</a><p>Inference powers millions of daily API calls for global sports broadcasts, one of the world’s largest railways, a leading electric car manufacturer, and multiple other Fortune 500 companies, along with countless hackers’ hobby and research projects. Inference works in Docker and supports CPU (ARM and x86), NVIDIA GPU, and TRT. Inference manages dependencies and the environment. All you need to do is make HTTP requests to the server.<p>YOLOv5, YOLOv8, YOLACT, CLIP, SAM, and other popular vision models are supported (some models need to be hosted on Roboflow first, see the docs; we're working on bring your own model weights!).<p>Try it out and tell us what you think!
Show HN: Pip install inference, open source computer vision deployment
Deploying vision models is time consuming and tedious. Setting up dependencies. Fixing conflicts. Configuring TRT acceleration. Flashing (and re-flashing) NVIDIA Jetsons. A streamlined, developer-friendly solution for inference is needed.<p>We, the Roboflow team, have been hard at work open sourcing Inference, an open source vision deployment solution. Our solution is designed with developers in mind, offering a HTTP-based interface. Run models on your hardware without having to write architecture-specific inference code. Here's a demo showing how to go from a model to GPU inference on a video of a football game in ~10 minutes:<p><a href="https://www.youtube.com/watch?v=at-yuwIMiN4">https://www.youtube.com/watch?v=at-yuwIMiN4</a><p>Inference powers millions of daily API calls for global sports broadcasts, one of the world’s largest railways, a leading electric car manufacturer, and multiple other Fortune 500 companies, along with countless hackers’ hobby and research projects. Inference works in Docker and supports CPU (ARM and x86), NVIDIA GPU, and TRT. Inference manages dependencies and the environment. All you need to do is make HTTP requests to the server.<p>YOLOv5, YOLOv8, YOLACT, CLIP, SAM, and other popular vision models are supported (some models need to be hosted on Roboflow first, see the docs; we're working on bring your own model weights!).<p>Try it out and tell us what you think!
Show HN: Fast vector similarity using Rust and Python
I recently found myself computing the similarity between lots of very high dimensional vectors (i.e., sentence embedding vectors from LLMs), and I wanted to try some more powerful measures of similarity/dependency than just Cosine similarity, which seems to be the default for everything nowadays because of its computational efficiency.<p>There are many other more involved measures that can detect more subtle relationships, but the problem is that some of them are quite slow to compute, especially if you're trying to do it in Python. For my favorite measure of statistical dependency, Hoeffding's D, that's true even if you use Numpy. Since I recently learned Rust and wanted to learn how to make Python packages using Rust, I put together this new library that I call Fast Vector Similarity.<p>I was blown away by the performance of Rust and the quality of the tooling while making this. And even though it required a lot of fussing with Github Actions, I was also really impressed with just how easy it was to make a Python library using Rust that could be automatically compiled into wheels for every combination of platform (Linux, Windows, Mac) and Python Version (3.8 through 3.11) and uploaded to PyPi, all triggered by a commit to the repo and handled by Github's servers-- and all for free if you're working on a public repo!<p>Anyway, this library can easily be installed to try out using `pip install fast_vector_similarity`, and you can see some simple demo Python code in the readme to show how to use it.<p>Aside from exposing some very high performance implementations of some very nice similarity measures, I also included the ability to get robust estimates of these measures using the Bootstrap method. Basically, if you have two very high dimensional vectors, instead of using the entire vector to measure similarity, you can take the same random subset of indices from both vectors and compute the similarity of just those elements. Then you repeat the process hundreds or thousands of times and look at the robust average (i.e., throw away the results outside the 25th percentile to 75th percentile and average the remaining ones, to reduce the impact of outliers) and standard deviation of the results. Obviously this is very demanding of performance, but it's still reasonable if you're not trying to compute it for too many vectors.<p>Everything is designed to fully saturate the performance of multi-core machines by extensive use of broadcasting/vectorization and the use of paralell processing via the Rayon library. I was really impressed with how easy and low-overhead it is to make highly parallelized code in Rust, especially compared to coming from Python, where you have to jump through a lot of hoops to use multiprocessing and there is a ton of overhead.<p>Anyway, please let me know what you think. I'm looking to add more measures of similarity if I can find ones that can be efficiently computed (I already gave up on including HSIC because I couldn't get it to go fast enough, even using BLAS/LAPACK).
Show HN: Fast vector similarity using Rust and Python
I recently found myself computing the similarity between lots of very high dimensional vectors (i.e., sentence embedding vectors from LLMs), and I wanted to try some more powerful measures of similarity/dependency than just Cosine similarity, which seems to be the default for everything nowadays because of its computational efficiency.<p>There are many other more involved measures that can detect more subtle relationships, but the problem is that some of them are quite slow to compute, especially if you're trying to do it in Python. For my favorite measure of statistical dependency, Hoeffding's D, that's true even if you use Numpy. Since I recently learned Rust and wanted to learn how to make Python packages using Rust, I put together this new library that I call Fast Vector Similarity.<p>I was blown away by the performance of Rust and the quality of the tooling while making this. And even though it required a lot of fussing with Github Actions, I was also really impressed with just how easy it was to make a Python library using Rust that could be automatically compiled into wheels for every combination of platform (Linux, Windows, Mac) and Python Version (3.8 through 3.11) and uploaded to PyPi, all triggered by a commit to the repo and handled by Github's servers-- and all for free if you're working on a public repo!<p>Anyway, this library can easily be installed to try out using `pip install fast_vector_similarity`, and you can see some simple demo Python code in the readme to show how to use it.<p>Aside from exposing some very high performance implementations of some very nice similarity measures, I also included the ability to get robust estimates of these measures using the Bootstrap method. Basically, if you have two very high dimensional vectors, instead of using the entire vector to measure similarity, you can take the same random subset of indices from both vectors and compute the similarity of just those elements. Then you repeat the process hundreds or thousands of times and look at the robust average (i.e., throw away the results outside the 25th percentile to 75th percentile and average the remaining ones, to reduce the impact of outliers) and standard deviation of the results. Obviously this is very demanding of performance, but it's still reasonable if you're not trying to compute it for too many vectors.<p>Everything is designed to fully saturate the performance of multi-core machines by extensive use of broadcasting/vectorization and the use of paralell processing via the Rayon library. I was really impressed with how easy and low-overhead it is to make highly parallelized code in Rust, especially compared to coming from Python, where you have to jump through a lot of hoops to use multiprocessing and there is a ton of overhead.<p>Anyway, please let me know what you think. I'm looking to add more measures of similarity if I can find ones that can be efficiently computed (I already gave up on including HSIC because I couldn't get it to go fast enough, even using BLAS/LAPACK).