The best Hacker News stories from Show from the past day
Latest posts:
Show HN: A real time AI video agent with under 1 second of latency
Hey it’s Hassaan & Quinn – co-founders of Tavus, an AI research company and developer platform for video APIs. We’ve been building AI video models for ‘digital twins’ or ‘avatars’ since 2020.<p>We’re sharing some of the challenges we faced building an AI video interface that has realistic conversations with a human, including getting it to under 1 second of latency.<p>To try it, talk to Hassaan’s digital twin: <a href="https://www.hassaanraza.com" rel="nofollow">https://www.hassaanraza.com</a>, or to our "demo twin" Carter: <a href="https://www.tavus.io">https://www.tavus.io</a><p>We built this because until now, we've had to adapt communication to the limits of technology. But what if we could interact naturally with a computer? Conversational video makes it possible – we think it'll eventually be a key human-computer interface.<p>To make conversational video effective, it has to have really low latency and conversational awareness. A fast-paced conversation between friends has ~250 ms between utterances, but if you’re talking about something more complex or with someone new, there is additional “thinking” time. So, less than 1000 ms latency makes the conversation feel pretty realistic, and that became our target.<p>Our architecture decisions had to balance 3 things: latency, scale, & cost. Getting all of these was a huge challenge.<p>The first lesson learned was to make it low-latency, we had to build it from the ground up. We went from a team that cared about seconds to a team that counts every millisecond. We also had to support thousands of conversations happening all at once, without getting destroyed on compute costs.<p>For example, during early development, each conversation had to run on an individual H100 in order to fit all components and model weights into GPU memory just to run our Phoenix-1 model faster than 30fps. This was unscalable & expensive.<p>We developed a new model, Phoenix-2, with a number of improvements, including inference speed. We switched from a NeRF based backbone to Gaussian Splatting for a multitude of reasons, one being the requirement that we could generate frames faster than realtime, at 70+ fps on lower-end hardware.
We exceeded this and focused on optimizing memory and core usage on GPU to allow for lower-end hardware to run it all. We did other things to save on time and cost like using streaming vs batching, parallelizing processes, etc. But those are stories for another day.<p>We still had to lower the utterance-to-utterance time to hit our goal of under a second of latency. This meant each component (vision, ASR, LLM, TTS, video generation) had to be hyper-optimized.<p>The worst offender was the LLM. It didn’t matter how fast the tokens per second (t/s) were, it was the time-to-first token (tfft) that really made the difference. That meant services like Groq were actually too slow – they had high t/s, but slow ttft. Most providers were too slow.<p>The next worst offender was actually detecting when someone stopped speaking. This is hard. Basic solutions use time after silence to ‘determine’ when someone has stopped talking. But it adds latency. If you tune it to be too short, the AI agent will talk over you. Too long, and it’ll take a while to respond. The model had to be dedicated to accurately detecting end-of-turn based on conversation signals, and speculating on inputs to get a head start.<p>We went from 3-5 to <1 second (& as fast as 600 ms) with these architectural optimizations while running on lower-end hardware.<p>All this allowed us to ship with a less than 1 second of latency, which we believe is the fastest out there. We have a bunch of customers, including Delphi, a professional coach and expert cloning platform. They have users that have conversations with digital twins that span from minutes, to one hour, to even four hours (!) - which is mind blowing, even to us.<p>Thanks for reading! let us know what you think and what you would build. If you want to play around with our APIs after seeing the demo, you can sign up for free from our website <a href="https://www.tavus.io">https://www.tavus.io</a>.
Show HN: Venator – Open-source threat detection
a flexible threat detection platform that simplifies rule execution and management with k8s cronJobs and helm. flexible enough to run standalone or with other schedulers like hashicorp nomad.
Show HN: Venator – Open-source threat detection
a flexible threat detection platform that simplifies rule execution and management with k8s cronJobs and helm. flexible enough to run standalone or with other schedulers like hashicorp nomad.
Show HN: Venator – Open-source threat detection
a flexible threat detection platform that simplifies rule execution and management with k8s cronJobs and helm. flexible enough to run standalone or with other schedulers like hashicorp nomad.
Show HN: Facad – A colorful directory listing tool for the command line
Facad is about functionality, not just aesthetics. Key features:<p>- Intuitive file type representation<p>- Smart sorting (directories first, then by extension)<p>- Four-column layout for quick directory analysis<p>It evolved from this alias:<p><pre><code> alias ls='ls -A -F --group-directories-first --sort=extension --color=always'
</code></pre>
Facad takes this concept further, offering more flexibility and visual clarity.
Show HN: Facad – A colorful directory listing tool for the command line
Facad is about functionality, not just aesthetics. Key features:<p>- Intuitive file type representation<p>- Smart sorting (directories first, then by extension)<p>- Four-column layout for quick directory analysis<p>It evolved from this alias:<p><pre><code> alias ls='ls -A -F --group-directories-first --sort=extension --color=always'
</code></pre>
Facad takes this concept further, offering more flexibility and visual clarity.
Show HN: Facad – A colorful directory listing tool for the command line
Facad is about functionality, not just aesthetics. Key features:<p>- Intuitive file type representation<p>- Smart sorting (directories first, then by extension)<p>- Four-column layout for quick directory analysis<p>It evolved from this alias:<p><pre><code> alias ls='ls -A -F --group-directories-first --sort=extension --color=always'
</code></pre>
Facad takes this concept further, offering more flexibility and visual clarity.
Show HN: qrframe – generate beautiful qr codes with javascript code
I originally built a QR code generator as a resume project using Rust and I realized a web interface would make customization way easier.<p>This still generates the "data" using that rust library via wasm, but the rendering is all editable javascript to make an SVG or paint on an HTML canvas.<p>I was especially inspired by <a href="https://qrbtf.com" rel="nofollow">https://qrbtf.com</a> which had some unique style options I had never seen before, which I ended up copying, and then I made some more.
Show HN: qrframe – generate beautiful qr codes with javascript code
I originally built a QR code generator as a resume project using Rust and I realized a web interface would make customization way easier.<p>This still generates the "data" using that rust library via wasm, but the rendering is all editable javascript to make an SVG or paint on an HTML canvas.<p>I was especially inspired by <a href="https://qrbtf.com" rel="nofollow">https://qrbtf.com</a> which had some unique style options I had never seen before, which I ended up copying, and then I made some more.
Show HN: qrframe – generate beautiful qr codes with javascript code
I originally built a QR code generator as a resume project using Rust and I realized a web interface would make customization way easier.<p>This still generates the "data" using that rust library via wasm, but the rendering is all editable javascript to make an SVG or paint on an HTML canvas.<p>I was especially inspired by <a href="https://qrbtf.com" rel="nofollow">https://qrbtf.com</a> which had some unique style options I had never seen before, which I ended up copying, and then I made some more.
Show HN: A macOS app to prevent sound quality degradation on AirPods
Right, here's the thing: If you are using AirPods(or any Bluetooth headphones with a mic in fact) on Mac and something activates the mic(i.e. you Shazam a song), the sound will be interrupted momentarily and will return in very low quality. This is happening because Bluetooth can't handle both way high quality streaming and the bandwidth is decreased to make it work.<p>It's a known issue and here's what Apple recommends to fix it: <a href="https://support.apple.com/en-hk/102217" rel="nofollow">https://support.apple.com/en-hk/102217</a><p>Most of the time(unless you are on a Mac Mini/Studio/Pro), you have much higher quality microphones built in, so in most use cases, you want to hear from your AirPods but be heard from your internal microphone, which means if every time you connect your AirPods and go into the settings and set the default input device as the internal mic, you won't have sound quality degradation on mic activation, and if you use your mic to talk to people or record something, you will have better sound quality too.<p>Based on this observation, first I tried to create a script or some automation that can do it for me but found out that it can be clunky or needlessly complex.<p>Here's someone who used this approach to fix this issue: <a href="https://www.dermitch.de/post/macos-force-microphone-when-using-airpods/" rel="nofollow">https://www.dermitch.de/post/macos-force-microphone-when-usi...</a><p>Anyway, I decided to take the "build your app for that" route and created this app and called it CrystalClear Audio which doesn't involve any technical setup to use. Making it was also not as easy I hoped, I was expecting this to be a half an hour project but ended up filing bug reports with Apple because some API wasn't behaving as expected or mysterious things were happening when using it(like phantom device changes).<p>After spending that much time with all this, I decided to publish it on Mac AppStore and after too many rejections(all my mistakes) I got it published: <a href="https://apps.apple.com/us/app/crystalclear-sound/id6695723746" rel="nofollow">https://apps.apple.com/us/app/crystalclear-sound/id669572374...</a><p>The app is not free but comes with a free trial. I decided to go with a very cheap subscription model because I suspect further development might be needed as bugs emerge or API behavior changes. I know its a hated business model but IMHO it's better than ads or tracking of any sort to justify the work done. It's not free because supporting a free app is just as hard as supporting a paid one and it's not one time payment because I don't know what would the right price be for supporting an app for years to come and still have people willing to pay for it.<p>I hope other people find this useful and if you do, you can support by upvoting on Producthunt so even more people can find it sueful: <a href="https://www.producthunt.com/posts/crystalclear-sound" rel="nofollow">https://www.producthunt.com/posts/crystalclear-sound</a><p>PS: the app is also useful for quickly switching between giving the sound out of the laptop speakers and the headphones, I ended up using that quite often.
Show HN: A macOS app to prevent sound quality degradation on AirPods
Right, here's the thing: If you are using AirPods(or any Bluetooth headphones with a mic in fact) on Mac and something activates the mic(i.e. you Shazam a song), the sound will be interrupted momentarily and will return in very low quality. This is happening because Bluetooth can't handle both way high quality streaming and the bandwidth is decreased to make it work.<p>It's a known issue and here's what Apple recommends to fix it: <a href="https://support.apple.com/en-hk/102217" rel="nofollow">https://support.apple.com/en-hk/102217</a><p>Most of the time(unless you are on a Mac Mini/Studio/Pro), you have much higher quality microphones built in, so in most use cases, you want to hear from your AirPods but be heard from your internal microphone, which means if every time you connect your AirPods and go into the settings and set the default input device as the internal mic, you won't have sound quality degradation on mic activation, and if you use your mic to talk to people or record something, you will have better sound quality too.<p>Based on this observation, first I tried to create a script or some automation that can do it for me but found out that it can be clunky or needlessly complex.<p>Here's someone who used this approach to fix this issue: <a href="https://www.dermitch.de/post/macos-force-microphone-when-using-airpods/" rel="nofollow">https://www.dermitch.de/post/macos-force-microphone-when-usi...</a><p>Anyway, I decided to take the "build your app for that" route and created this app and called it CrystalClear Audio which doesn't involve any technical setup to use. Making it was also not as easy I hoped, I was expecting this to be a half an hour project but ended up filing bug reports with Apple because some API wasn't behaving as expected or mysterious things were happening when using it(like phantom device changes).<p>After spending that much time with all this, I decided to publish it on Mac AppStore and after too many rejections(all my mistakes) I got it published: <a href="https://apps.apple.com/us/app/crystalclear-sound/id6695723746" rel="nofollow">https://apps.apple.com/us/app/crystalclear-sound/id669572374...</a><p>The app is not free but comes with a free trial. I decided to go with a very cheap subscription model because I suspect further development might be needed as bugs emerge or API behavior changes. I know its a hated business model but IMHO it's better than ads or tracking of any sort to justify the work done. It's not free because supporting a free app is just as hard as supporting a paid one and it's not one time payment because I don't know what would the right price be for supporting an app for years to come and still have people willing to pay for it.<p>I hope other people find this useful and if you do, you can support by upvoting on Producthunt so even more people can find it sueful: <a href="https://www.producthunt.com/posts/crystalclear-sound" rel="nofollow">https://www.producthunt.com/posts/crystalclear-sound</a><p>PS: the app is also useful for quickly switching between giving the sound out of the laptop speakers and the headphones, I ended up using that quite often.
Show HN: Bringing multithreading to Python's async event loop
This project explores the integration of multithreading into the asyncio event loop in Python.<p>While this was initially built with enhancing CPU utilization for FastAPI servers in mind, the approach can be used with more general async programs too.<p>If you’re interested in diving deeper into the details, I’ve written a blog post about it here: <a href="https://www.neilbotelho.com/blog/multithreaded-async.html" rel="nofollow">https://www.neilbotelho.com/blog/multithreaded-async.html</a>
Show HN: Bringing multithreading to Python's async event loop
This project explores the integration of multithreading into the asyncio event loop in Python.<p>While this was initially built with enhancing CPU utilization for FastAPI servers in mind, the approach can be used with more general async programs too.<p>If you’re interested in diving deeper into the details, I’ve written a blog post about it here: <a href="https://www.neilbotelho.com/blog/multithreaded-async.html" rel="nofollow">https://www.neilbotelho.com/blog/multithreaded-async.html</a>
Show HN: An experimental AntiBot, AntiCrawl reverse proxy for the web
Show HN: Open-source app builder for comfy workflows
Hey, we’ve been working on an open-source project built on top of Comfy for the last few weeks. It is still very much a work in progress, but I think it is at a place where it could start to be useful. The idea is that you can turn a workflow into a web app with an easy-to-use UI.<p>Currently, it should work with any workflows that take images and text as input and return images. We are aiming to add video support over the next few days.<p>Feedback and contributions are more than welcome!
Show HN: GitHub Repo Visualizer Using D3
I built this as part of my quest to properly learn data visualization. The code is the easy part!<p>Some lessons learned:<p>- personal verification of the the general truth that pie charts are tough! and the returns are not great for the effort due to people's difficulties perceiving angles
- may not use "vanilla" d3 with no React. was difficult to adapt for mobile
- the GitHub API provides fairly standardized responses so building dynamic charts wasn't too bad. But when working with streaming data (say Kafka) I can see this getting interesting... schema registry should help but creating a view into the data with a lookback would be interesting with d3, done it with altair before.
Show HN: GitHub Repo Visualizer Using D3
I built this as part of my quest to properly learn data visualization. The code is the easy part!<p>Some lessons learned:<p>- personal verification of the the general truth that pie charts are tough! and the returns are not great for the effort due to people's difficulties perceiving angles
- may not use "vanilla" d3 with no React. was difficult to adapt for mobile
- the GitHub API provides fairly standardized responses so building dynamic charts wasn't too bad. But when working with streaming data (say Kafka) I can see this getting interesting... schema registry should help but creating a view into the data with a lookback would be interesting with d3, done it with altair before.
Show HN: Enable right click and copy on websites that disabled it
Show HN: htmgo - build simple and scalable systems with golang + htmx
Hey all, I just wanted to share a project I've been working on for the past month.<p>After years of heavy frameworks, I really like the idea of using htmx, but it’s a little too low level for me and needs a thin layer above it to facilitate things like components, better syntax with complex JS inside of an attribute, etc<p>To try and solve this problem with a very minimal stack (golang + htmx) that I've been really enjoying, I'm building this project to cater to my needs and was thinking it would be useful for other developers.