The best Hacker News stories from All from the past day
Latest posts:
Google won't be mandating a strict return-to-office plan
Were RNNs all we needed?
Were RNNs all we needed?
Filed: WP Engine Inc. v Automattic Inc. and Matthew Charles Mullenweg [pdf]
Canvas is a new way to write and code with ChatGPT
Canvas is a new way to write and code with ChatGPT
Patent troll Sable pays up, dedicates all its patents to the public
Patent troll Sable pays up, dedicates all its patents to the public
I made a game you can play without anyone knowing (no visuals/sound)
Hello everyone! I just launched an iOS game called Tik! and it has no visuals or sound of any kind.
So the obvious question is.. how do you play it?<p>The game uses your phone’s Haptics in order to play a rhythm of “Tiks” (haptic vibrations). The user then has to try and recreate the timing of the rhythm they just felt by tapping it anywhere on the screen. It sounds easy, but getting the timing right is tricky, and so it usually takes a couple tries before your able to get it right.<p>The inspiration for the game came from wanting something to do in a really boring presentation. It would have been disrespectful to look at my phone, but I also needed a distraction. I typically hold my phone in these kinds of scenarios, and fiddle with the case, when it occurred to me: what if there was a game I could play just holding the phone anywhere (under a desk, in my pocket, to the side, etc.). Sometime later Tik! was born :)<p>I would love your feedback on it. The game is paid, but if someone would like a promo code to try it please let me know below.
Link: <a href="https://apps.apple.com/app/id6720712299" rel="nofollow">https://apps.apple.com/app/id6720712299</a>
Don't build your castle in other people's kingdoms (2021)
American WWII bomb explodes at Japanese airport, causing large crater in taxiway
An adult fruit fly brain has been mapped
An adult fruit fly brain has been mapped
Pledging $300k to the Zig Software Foundation
The Fastest Mutexes
The Fastest Mutexes
Boris Vallejo and the pixel art of the demoscene
Ask HN: Who is hiring? (October 2024)
Please state the location and include REMOTE for remote work, REMOTE (US)
or similar if the country is restricted, and ONSITE when remote work is <i>not</i> an option.<p>Please only post if you personally are part of the hiring company—no
recruiting firms or job boards. One post per company. If it isn't a household name,
explain what your company does.<p>Commenters: please don't reply to job posts to complain about
something. It's off topic here.<p>Readers: please only email if you are personally interested in the job.<p>Searchers: try <a href="https://vawogbemi-whoishiring.web.val.run" rel="nofollow">https://vawogbemi-whoishiring.web.val.run</a>, <a href="http://nchelluri.github.io/hnjobs/" rel="nofollow">http://nchelluri.github.io/hnjobs/</a>, <a href="https://hnresumetojobs.com" rel="nofollow">https://hnresumetojobs.com</a>,
<a href="https://hnhired.fly.dev" rel="nofollow">https://hnhired.fly.dev</a>, <a href="https://kennytilton.github.io/whoishiring/" rel="nofollow">https://kennytilton.github.io/whoishiring/</a>, <a href="https://hnjobs.emilburzo.com" rel="nofollow">https://hnjobs.emilburzo.com</a>.<p>Don't miss these other fine threads:<p><i>Who wants to be hired?</i> <a href="https://news.ycombinator.com/item?id=41709299">https://news.ycombinator.com/item?id=41709299</a><p><i>Freelancer? Seeking freelancer?</i> <a href="https://news.ycombinator.com/item?id=41709300">https://news.ycombinator.com/item?id=41709300</a>
Ask HN: Should you reply STOP to unwanted texts?
I have been advising people I know to block, then delete and report junk (iOS) to unwanted texts. Others have argued with me that you should reply STOP. I disagree, especially after checking a shortened link in a “campaign” text and finding the link was a phishing attempt. What do you think?
Show HN: A real time AI video agent with under 1 second of latency
Hey it’s Hassaan & Quinn – co-founders of Tavus, an AI research company and developer platform for video APIs. We’ve been building AI video models for ‘digital twins’ or ‘avatars’ since 2020.<p>We’re sharing some of the challenges we faced building an AI video interface that has realistic conversations with a human, including getting it to under 1 second of latency.<p>To try it, talk to Hassaan’s digital twin: <a href="https://www.hassaanraza.com" rel="nofollow">https://www.hassaanraza.com</a>, or to our "demo twin" Carter: <a href="https://www.tavus.io">https://www.tavus.io</a><p>We built this because until now, we've had to adapt communication to the limits of technology. But what if we could interact naturally with a computer? Conversational video makes it possible – we think it'll eventually be a key human-computer interface.<p>To make conversational video effective, it has to have really low latency and conversational awareness. A fast-paced conversation between friends has ~250 ms between utterances, but if you’re talking about something more complex or with someone new, there is additional “thinking” time. So, less than 1000 ms latency makes the conversation feel pretty realistic, and that became our target.<p>Our architecture decisions had to balance 3 things: latency, scale, & cost. Getting all of these was a huge challenge.<p>The first lesson learned was to make it low-latency, we had to build it from the ground up. We went from a team that cared about seconds to a team that counts every millisecond. We also had to support thousands of conversations happening all at once, without getting destroyed on compute costs.<p>For example, during early development, each conversation had to run on an individual H100 in order to fit all components and model weights into GPU memory just to run our Phoenix-1 model faster than 30fps. This was unscalable & expensive.<p>We developed a new model, Phoenix-2, with a number of improvements, including inference speed. We switched from a NeRF based backbone to Gaussian Splatting for a multitude of reasons, one being the requirement that we could generate frames faster than realtime, at 70+ fps on lower-end hardware.
We exceeded this and focused on optimizing memory and core usage on GPU to allow for lower-end hardware to run it all. We did other things to save on time and cost like using streaming vs batching, parallelizing processes, etc. But those are stories for another day.<p>We still had to lower the utterance-to-utterance time to hit our goal of under a second of latency. This meant each component (vision, ASR, LLM, TTS, video generation) had to be hyper-optimized.<p>The worst offender was the LLM. It didn’t matter how fast the tokens per second (t/s) were, it was the time-to-first token (tfft) that really made the difference. That meant services like Groq were actually too slow – they had high t/s, but slow ttft. Most providers were too slow.<p>The next worst offender was actually detecting when someone stopped speaking. This is hard. Basic solutions use time after silence to ‘determine’ when someone has stopped talking. But it adds latency. If you tune it to be too short, the AI agent will talk over you. Too long, and it’ll take a while to respond. The model had to be dedicated to accurately detecting end-of-turn based on conversation signals, and speculating on inputs to get a head start.<p>We went from 3-5 to <1 second (& as fast as 600 ms) with these architectural optimizations while running on lower-end hardware.<p>All this allowed us to ship with a less than 1 second of latency, which we believe is the fastest out there. We have a bunch of customers, including Delphi, a professional coach and expert cloning platform. They have users that have conversations with digital twins that span from minutes, to one hour, to even four hours (!) - which is mind blowing, even to us.<p>Thanks for reading! let us know what you think and what you would build. If you want to play around with our APIs after seeing the demo, you can sign up for free from our website <a href="https://www.tavus.io">https://www.tavus.io</a>.