The best Hacker News stories from Show from the past day

Go back

Latest posts:

Show HN: Infinity – Realistic AI characters that can speak

Hey HN, this is Lina, Andrew, and Sidney from Infinity AI (<a href="https://infinity.ai/">https://infinity.ai/</a>). We've trained our own foundation video model focused on people. As far as we know, this is the first time someone has trained a video diffusion transformer that’s driven by audio input. This is cool because it allows for expressive, realistic-looking characters that actually speak. Here’s a blog with a bunch of examples: <a href="https://toinfinityai.github.io/v2-launch-page/" rel="nofollow">https://toinfinityai.github.io/v2-launch-page/</a><p>If you want to try it out, you can either (1) go to <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a>, or (2) post a comment in this thread describing a character and we’ll generate a video for you and reply with a link. For example: “Mona Lisa saying ‘what the heck are you smiling at?’”: <a href="https://bit.ly/3z8l1TM" rel="nofollow">https://bit.ly/3z8l1TM</a> “A 3D pixar-style gnome with a pointy red hat reciting the Declaration of Independence”: <a href="https://bit.ly/3XzpTdS" rel="nofollow">https://bit.ly/3XzpTdS</a> “Elon Musk singing Fly Me To The Moon by Sinatra”: <a href="https://bit.ly/47jyC7C" rel="nofollow">https://bit.ly/47jyC7C</a><p>Our tool at Infinity allows creators to type out a script with what they want their characters to say (and eventually, what they want their characters to do) and get a video out. We’ve trained for about 11 GPU years (~$500k) so far and our model recently started getting good results, so we wanted to share it here. We are still actively training.<p>We had trouble creating videos of good characters with existing AI tools. Generative AI video models (like Runway and Luma) don’t allow characters to speak. And talking avatar companies (like HeyGen and Synthesia) just do lip syncing on top of the previously recorded videos. This means you often get facial expressions and gestures that don’t make sense with the audio, resulting in the “uncanny” look you can’t quite put your finger on. See blog.<p>When we started Infinity, our V1 model took the lip syncing approach. In addition to mismatched gestures, this method had many limitations, including a finite library of actors (we had to fine-tune a model for each one with existing video footage) and an inability to animate imaginary characters.<p>To address these limitations in V2, we decided to train an end-to-end video diffusion transformer model that takes in a single image, audio, and other conditioning signals and outputs video. We believe this end-to-end approach is the best way to capture the full complexity and nuances of human motion and emotion. One drawback of our approach is that the model is slow despite using rectified flow (2-4x speed up) and a 3D VAE embedding layer (2-5x speed up).<p>Here are a few things the model does surprisingly well on: (1) it can handle multiple languages, (2) it has learned some physics (e.g. it generates earrings that dangle properly and infers a matching pair on the other ear), (3) it can animate diverse types of images (paintings, sculptures, etc) despite not being trained on those, and (4) it can handle singing. See blog.<p>Here are some failure modes of the model: (1) it cannot handle animals (only humanoid images), (2) it often inserts hands into the frame (very annoying and distracting), (3) it’s not robust on cartoons, and (4) it can distort people’s identities (noticeable on well-known figures). See blog.<p>Try the model here: <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a><p>We’d love to hear what you think!

Show HN: Infinity – Realistic AI characters that can speak

Hey HN, this is Lina, Andrew, and Sidney from Infinity AI (<a href="https://infinity.ai/">https://infinity.ai/</a>). We've trained our own foundation video model focused on people. As far as we know, this is the first time someone has trained a video diffusion transformer that’s driven by audio input. This is cool because it allows for expressive, realistic-looking characters that actually speak. Here’s a blog with a bunch of examples: <a href="https://toinfinityai.github.io/v2-launch-page/" rel="nofollow">https://toinfinityai.github.io/v2-launch-page/</a><p>If you want to try it out, you can either (1) go to <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a>, or (2) post a comment in this thread describing a character and we’ll generate a video for you and reply with a link. For example: “Mona Lisa saying ‘what the heck are you smiling at?’”: <a href="https://bit.ly/3z8l1TM" rel="nofollow">https://bit.ly/3z8l1TM</a> “A 3D pixar-style gnome with a pointy red hat reciting the Declaration of Independence”: <a href="https://bit.ly/3XzpTdS" rel="nofollow">https://bit.ly/3XzpTdS</a> “Elon Musk singing Fly Me To The Moon by Sinatra”: <a href="https://bit.ly/47jyC7C" rel="nofollow">https://bit.ly/47jyC7C</a><p>Our tool at Infinity allows creators to type out a script with what they want their characters to say (and eventually, what they want their characters to do) and get a video out. We’ve trained for about 11 GPU years (~$500k) so far and our model recently started getting good results, so we wanted to share it here. We are still actively training.<p>We had trouble creating videos of good characters with existing AI tools. Generative AI video models (like Runway and Luma) don’t allow characters to speak. And talking avatar companies (like HeyGen and Synthesia) just do lip syncing on top of the previously recorded videos. This means you often get facial expressions and gestures that don’t make sense with the audio, resulting in the “uncanny” look you can’t quite put your finger on. See blog.<p>When we started Infinity, our V1 model took the lip syncing approach. In addition to mismatched gestures, this method had many limitations, including a finite library of actors (we had to fine-tune a model for each one with existing video footage) and an inability to animate imaginary characters.<p>To address these limitations in V2, we decided to train an end-to-end video diffusion transformer model that takes in a single image, audio, and other conditioning signals and outputs video. We believe this end-to-end approach is the best way to capture the full complexity and nuances of human motion and emotion. One drawback of our approach is that the model is slow despite using rectified flow (2-4x speed up) and a 3D VAE embedding layer (2-5x speed up).<p>Here are a few things the model does surprisingly well on: (1) it can handle multiple languages, (2) it has learned some physics (e.g. it generates earrings that dangle properly and infers a matching pair on the other ear), (3) it can animate diverse types of images (paintings, sculptures, etc) despite not being trained on those, and (4) it can handle singing. See blog.<p>Here are some failure modes of the model: (1) it cannot handle animals (only humanoid images), (2) it often inserts hands into the frame (very annoying and distracting), (3) it’s not robust on cartoons, and (4) it can distort people’s identities (noticeable on well-known figures). See blog.<p>Try the model here: <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a><p>We’d love to hear what you think!

Show HN: Infinity – Realistic AI characters that can speak

Hey HN, this is Lina, Andrew, and Sidney from Infinity AI (<a href="https://infinity.ai/">https://infinity.ai/</a>). We've trained our own foundation video model focused on people. As far as we know, this is the first time someone has trained a video diffusion transformer that’s driven by audio input. This is cool because it allows for expressive, realistic-looking characters that actually speak. Here’s a blog with a bunch of examples: <a href="https://toinfinityai.github.io/v2-launch-page/" rel="nofollow">https://toinfinityai.github.io/v2-launch-page/</a><p>If you want to try it out, you can either (1) go to <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a>, or (2) post a comment in this thread describing a character and we’ll generate a video for you and reply with a link. For example: “Mona Lisa saying ‘what the heck are you smiling at?’”: <a href="https://bit.ly/3z8l1TM" rel="nofollow">https://bit.ly/3z8l1TM</a> “A 3D pixar-style gnome with a pointy red hat reciting the Declaration of Independence”: <a href="https://bit.ly/3XzpTdS" rel="nofollow">https://bit.ly/3XzpTdS</a> “Elon Musk singing Fly Me To The Moon by Sinatra”: <a href="https://bit.ly/47jyC7C" rel="nofollow">https://bit.ly/47jyC7C</a><p>Our tool at Infinity allows creators to type out a script with what they want their characters to say (and eventually, what they want their characters to do) and get a video out. We’ve trained for about 11 GPU years (~$500k) so far and our model recently started getting good results, so we wanted to share it here. We are still actively training.<p>We had trouble creating videos of good characters with existing AI tools. Generative AI video models (like Runway and Luma) don’t allow characters to speak. And talking avatar companies (like HeyGen and Synthesia) just do lip syncing on top of the previously recorded videos. This means you often get facial expressions and gestures that don’t make sense with the audio, resulting in the “uncanny” look you can’t quite put your finger on. See blog.<p>When we started Infinity, our V1 model took the lip syncing approach. In addition to mismatched gestures, this method had many limitations, including a finite library of actors (we had to fine-tune a model for each one with existing video footage) and an inability to animate imaginary characters.<p>To address these limitations in V2, we decided to train an end-to-end video diffusion transformer model that takes in a single image, audio, and other conditioning signals and outputs video. We believe this end-to-end approach is the best way to capture the full complexity and nuances of human motion and emotion. One drawback of our approach is that the model is slow despite using rectified flow (2-4x speed up) and a 3D VAE embedding layer (2-5x speed up).<p>Here are a few things the model does surprisingly well on: (1) it can handle multiple languages, (2) it has learned some physics (e.g. it generates earrings that dangle properly and infers a matching pair on the other ear), (3) it can animate diverse types of images (paintings, sculptures, etc) despite not being trained on those, and (4) it can handle singing. See blog.<p>Here are some failure modes of the model: (1) it cannot handle animals (only humanoid images), (2) it often inserts hands into the frame (very annoying and distracting), (3) it’s not robust on cartoons, and (4) it can distort people’s identities (noticeable on well-known figures). See blog.<p>Try the model here: <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a><p>We’d love to hear what you think!

Show HN: Infinity – Realistic AI characters that can speak

Hey HN, this is Lina, Andrew, and Sidney from Infinity AI (<a href="https://infinity.ai/">https://infinity.ai/</a>). We've trained our own foundation video model focused on people. As far as we know, this is the first time someone has trained a video diffusion transformer that’s driven by audio input. This is cool because it allows for expressive, realistic-looking characters that actually speak. Here’s a blog with a bunch of examples: <a href="https://toinfinityai.github.io/v2-launch-page/" rel="nofollow">https://toinfinityai.github.io/v2-launch-page/</a><p>If you want to try it out, you can either (1) go to <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a>, or (2) post a comment in this thread describing a character and we’ll generate a video for you and reply with a link. For example: “Mona Lisa saying ‘what the heck are you smiling at?’”: <a href="https://bit.ly/3z8l1TM" rel="nofollow">https://bit.ly/3z8l1TM</a> “A 3D pixar-style gnome with a pointy red hat reciting the Declaration of Independence”: <a href="https://bit.ly/3XzpTdS" rel="nofollow">https://bit.ly/3XzpTdS</a> “Elon Musk singing Fly Me To The Moon by Sinatra”: <a href="https://bit.ly/47jyC7C" rel="nofollow">https://bit.ly/47jyC7C</a><p>Our tool at Infinity allows creators to type out a script with what they want their characters to say (and eventually, what they want their characters to do) and get a video out. We’ve trained for about 11 GPU years (~$500k) so far and our model recently started getting good results, so we wanted to share it here. We are still actively training.<p>We had trouble creating videos of good characters with existing AI tools. Generative AI video models (like Runway and Luma) don’t allow characters to speak. And talking avatar companies (like HeyGen and Synthesia) just do lip syncing on top of the previously recorded videos. This means you often get facial expressions and gestures that don’t make sense with the audio, resulting in the “uncanny” look you can’t quite put your finger on. See blog.<p>When we started Infinity, our V1 model took the lip syncing approach. In addition to mismatched gestures, this method had many limitations, including a finite library of actors (we had to fine-tune a model for each one with existing video footage) and an inability to animate imaginary characters.<p>To address these limitations in V2, we decided to train an end-to-end video diffusion transformer model that takes in a single image, audio, and other conditioning signals and outputs video. We believe this end-to-end approach is the best way to capture the full complexity and nuances of human motion and emotion. One drawback of our approach is that the model is slow despite using rectified flow (2-4x speed up) and a 3D VAE embedding layer (2-5x speed up).<p>Here are a few things the model does surprisingly well on: (1) it can handle multiple languages, (2) it has learned some physics (e.g. it generates earrings that dangle properly and infers a matching pair on the other ear), (3) it can animate diverse types of images (paintings, sculptures, etc) despite not being trained on those, and (4) it can handle singing. See blog.<p>Here are some failure modes of the model: (1) it cannot handle animals (only humanoid images), (2) it often inserts hands into the frame (very annoying and distracting), (3) it’s not robust on cartoons, and (4) it can distort people’s identities (noticeable on well-known figures). See blog.<p>Try the model here: <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a><p>We’d love to hear what you think!

Show HN: Wealthfolio: Private, open-source investment tracker

Thank you for your comments, just some context:<p>- The app is a simple desktop application that works on macOS, Windows, and Ubuntu.<p>- I developed this app for my own needs. Getting tired of SaaS app subscriptions and privacy concerns.<p>- For now, the activities are logged manually or imported from a CSV file. No integration with Plaid or other platforms.<p>- No monetization is planned for now (only a "buy me a coffee" if you use and appreciate the app).

Show HN: Wealthfolio: Private, open-source investment tracker

Thank you for your comments, just some context:<p>- The app is a simple desktop application that works on macOS, Windows, and Ubuntu.<p>- I developed this app for my own needs. Getting tired of SaaS app subscriptions and privacy concerns.<p>- For now, the activities are logged manually or imported from a CSV file. No integration with Plaid or other platforms.<p>- No monetization is planned for now (only a "buy me a coffee" if you use and appreciate the app).

Show HN: Wealthfolio: Private, open-source investment tracker

Thank you for your comments, just some context:<p>- The app is a simple desktop application that works on macOS, Windows, and Ubuntu.<p>- I developed this app for my own needs. Getting tired of SaaS app subscriptions and privacy concerns.<p>- For now, the activities are logged manually or imported from a CSV file. No integration with Plaid or other platforms.<p>- No monetization is planned for now (only a "buy me a coffee" if you use and appreciate the app).

Show HN: I made CMS less than 15 kilobytes, flat file

<a href="https://github.com/turboblack/HamsterCMS">https://github.com/turboblack/HamsterCMS</a><p><a href="http://web1.0hosting.net/" rel="nofollow">http://web1.0hosting.net/</a> - free hosting based on this CMS

Show HN: We built a FOSS documentation CMS with a pretty GUI

Kalmia started as a small hobby project about two months ago, but it quickly evolved when we needed a better solution to manage our office's documentation. It has now become our go-to tool for both internal and user-facing docs.<p>Recently, we decided to open source it, as we believe others might benefit from a lightweight, customizable documentation system like this. We're excited to see how the community can take it further, contribute, and adapt it to their own needs!

Show HN: We built a FOSS documentation CMS with a pretty GUI

Kalmia started as a small hobby project about two months ago, but it quickly evolved when we needed a better solution to manage our office's documentation. It has now become our go-to tool for both internal and user-facing docs.<p>Recently, we decided to open source it, as we believe others might benefit from a lightweight, customizable documentation system like this. We're excited to see how the community can take it further, contribute, and adapt it to their own needs!

Show HN: We built a FOSS documentation CMS with a pretty GUI

Kalmia started as a small hobby project about two months ago, but it quickly evolved when we needed a better solution to manage our office's documentation. It has now become our go-to tool for both internal and user-facing docs.<p>Recently, we decided to open source it, as we believe others might benefit from a lightweight, customizable documentation system like this. We're excited to see how the community can take it further, contribute, and adapt it to their own needs!

Show HN: We built a FOSS documentation CMS with a pretty GUI

Kalmia started as a small hobby project about two months ago, but it quickly evolved when we needed a better solution to manage our office's documentation. It has now become our go-to tool for both internal and user-facing docs.<p>Recently, we decided to open source it, as we believe others might benefit from a lightweight, customizable documentation system like this. We're excited to see how the community can take it further, contribute, and adapt it to their own needs!

Show HN: Node.js ORM to query SQL database through an array-like API

Hello everyone! I'm exited to share a NodeJS package I was working on for the past two months.<p>The package is designed to simplify querying SQL databases through an array-like API. Qustar supports PostgreSQL, SQLite, MySQL, and MariaDB, and offers TypeScript support for a robust development experience.<p>It's in early stage of development. I would like to hear your thoughts about it.

Show HN: Node.js ORM to query SQL database through an array-like API

Hello everyone! I'm exited to share a NodeJS package I was working on for the past two months.<p>The package is designed to simplify querying SQL databases through an array-like API. Qustar supports PostgreSQL, SQLite, MySQL, and MariaDB, and offers TypeScript support for a robust development experience.<p>It's in early stage of development. I would like to hear your thoughts about it.

Show HN: Node.js ORM to query SQL database through an array-like API

Hello everyone! I'm exited to share a NodeJS package I was working on for the past two months.<p>The package is designed to simplify querying SQL databases through an array-like API. Qustar supports PostgreSQL, SQLite, MySQL, and MariaDB, and offers TypeScript support for a robust development experience.<p>It's in early stage of development. I would like to hear your thoughts about it.

Show HN: Node.js ORM to query SQL database through an array-like API

Hello everyone! I'm exited to share a NodeJS package I was working on for the past two months.<p>The package is designed to simplify querying SQL databases through an array-like API. Qustar supports PostgreSQL, SQLite, MySQL, and MariaDB, and offers TypeScript support for a robust development experience.<p>It's in early stage of development. I would like to hear your thoughts about it.

Show HN: Laminar – Open-Source DataDog + PostHog for LLM Apps, Built in Rust

Hey HN, we’re Robert, Din and Temirlan from Laminar (<a href="https://www.lmnr.ai">https://www.lmnr.ai</a>), an open-source observability and analytics platform for complex LLM apps. It’s designed to be fast, reliable, and scalable. The stack is RabbitMQ for message queues, Postgres for storage, Clickhouse for analytics, Qdrant for semantic search - all powered by Rust.<p>How is Laminar different from the swarm of other “LLM observability” platforms?<p>On the observability part, we’re focused on handling full execution traces, not just LLM calls. We built a Rust ingestor for OpenTelemetry (Otel) spans with GenAI semantic conventions. As LLM apps get more complex (think Agents with hundreds of LLM and function calls, or complex RAG pipelines), full tracing is critical. With Otel spans, we can: 1. Cover the entire execution trace. 2. Keep the platform future-proof 3. Leverage an amazing OpenLLMetry (<a href="https://github.com/traceloop/openllmetry">https://github.com/traceloop/openllmetry</a>), open-source package for span production.<p>The key difference is that we tie text analytics directly to execution traces. Rich text data makes LLM traces unique, so we let you track “semantic metrics” (like what your AI agent is actually saying) and connect those metrics to where they happen in the trace. If you want to know if your AI drive-through agent made an upsell, you can design an LLM extraction pipeline in our builder (more on it later), host it on Laminar, and handle everything from event requests to output logging. Processing requests simply come as events in the Otel span.<p>We think it’s a win to separate core app logic from LLM event processing. Most devs don’t want to manage background queues for LLM analytics processing but still want insights into how their Agents or RAGs are working.<p>Our Pipeline Builder uses graph UI where nodes are LLM and util functions, and edges showing data flow. We built a custom task execution engine with support of parallel branch executions, cycles and branches (it’s overkill for simple pipelines, but it’s extremely cool and we’ve spent a lot of time designing a robust engine). You can also call pipelines directly as API endpoints. We found them to be extremely useful for iterating on and separating LLM logic. Laminar also traces pipeline directly, which removes the overhead of sending large outputs over the network.<p>One thing missing from all LLM observability platforms right now is an adequate search over traces. We’re attacking this problem by indexing each span in a vector DB and performing hybrid search at query time. This feature is still in beta, but we think it’s gonna be crucial part of our platform going forward.<p>We also support evaluations. We loved the “run everything locally, send results to a server” approach from Braintrust and Weights & Biases, so we did that too: a simple SDK and nice dashboards to track everything. Evals are still early, but we’re pushing hard on them.<p>Our goal is to make Laminar the Supabase for LLMOps - the go-to open-source comprehensive platform for all things LLMs / GenAI. In it’s current shape, Laminar is just few weeks old and developing rapidly, we’d love any feedback or for you to give Laminar a try in your LLM projects!

Show HN: Laminar – Open-Source DataDog + PostHog for LLM Apps, Built in Rust

Hey HN, we’re Robert, Din and Temirlan from Laminar (<a href="https://www.lmnr.ai">https://www.lmnr.ai</a>), an open-source observability and analytics platform for complex LLM apps. It’s designed to be fast, reliable, and scalable. The stack is RabbitMQ for message queues, Postgres for storage, Clickhouse for analytics, Qdrant for semantic search - all powered by Rust.<p>How is Laminar different from the swarm of other “LLM observability” platforms?<p>On the observability part, we’re focused on handling full execution traces, not just LLM calls. We built a Rust ingestor for OpenTelemetry (Otel) spans with GenAI semantic conventions. As LLM apps get more complex (think Agents with hundreds of LLM and function calls, or complex RAG pipelines), full tracing is critical. With Otel spans, we can: 1. Cover the entire execution trace. 2. Keep the platform future-proof 3. Leverage an amazing OpenLLMetry (<a href="https://github.com/traceloop/openllmetry">https://github.com/traceloop/openllmetry</a>), open-source package for span production.<p>The key difference is that we tie text analytics directly to execution traces. Rich text data makes LLM traces unique, so we let you track “semantic metrics” (like what your AI agent is actually saying) and connect those metrics to where they happen in the trace. If you want to know if your AI drive-through agent made an upsell, you can design an LLM extraction pipeline in our builder (more on it later), host it on Laminar, and handle everything from event requests to output logging. Processing requests simply come as events in the Otel span.<p>We think it’s a win to separate core app logic from LLM event processing. Most devs don’t want to manage background queues for LLM analytics processing but still want insights into how their Agents or RAGs are working.<p>Our Pipeline Builder uses graph UI where nodes are LLM and util functions, and edges showing data flow. We built a custom task execution engine with support of parallel branch executions, cycles and branches (it’s overkill for simple pipelines, but it’s extremely cool and we’ve spent a lot of time designing a robust engine). You can also call pipelines directly as API endpoints. We found them to be extremely useful for iterating on and separating LLM logic. Laminar also traces pipeline directly, which removes the overhead of sending large outputs over the network.<p>One thing missing from all LLM observability platforms right now is an adequate search over traces. We’re attacking this problem by indexing each span in a vector DB and performing hybrid search at query time. This feature is still in beta, but we think it’s gonna be crucial part of our platform going forward.<p>We also support evaluations. We loved the “run everything locally, send results to a server” approach from Braintrust and Weights & Biases, so we did that too: a simple SDK and nice dashboards to track everything. Evals are still early, but we’re pushing hard on them.<p>Our goal is to make Laminar the Supabase for LLMOps - the go-to open-source comprehensive platform for all things LLMs / GenAI. In it’s current shape, Laminar is just few weeks old and developing rapidly, we’d love any feedback or for you to give Laminar a try in your LLM projects!

Show HN: Hacker League – Open-Source Rocket League on Linux

Show HN: Hacker League – Open-Source Rocket League on Linux

< 1 2 3 ... 254 255 256 257 258 ... 935 936 937 >