The best Hacker News stories from Show from the past day

Latest posts:

Show HN: Llama Running on a Microcontroller

Show HN: Multi-Object Tracking in Python

Hello! I've created a small library for tracking, along with a tutorial. I plan to continue developing it.<p>Tracking is an important topic, closely related to object detection. However, I've noticed that it doesn't receive as much attention compared to machine learning approaches. Or, the focus is on filters like the Kalman filter. This tutorial begins with single object tracking and progressively complicates the tasks, introducing various models and a hypothesis tree to solve them.

Show HN: Multi-Object Tracking in Python

Hello! I've created a small library for tracking, along with a tutorial. I plan to continue developing it.<p>Tracking is an important topic, closely related to object detection. However, I've noticed that it doesn't receive as much attention compared to machine learning approaches. Or, the focus is on filters like the Kalman filter. This tutorial begins with single object tracking and progressively complicates the tasks, introducing various models and a hypothesis tree to solve them.

Show HN: MonkeyPatch – Cheap, fast and predictable LLM functions in Python

Hi HN, Jack here! I'm one of the creators of MonkeyPatch, an easy tool that helps you build LLM-powered functions and apps that get cheaper and faster the more you use them.<p>For example, if you need to classify PDFs, extract product feedback from tweets, or auto-generate synthetic data, you can spin up an LLM-powered Python function in <5 minutes to power your application. Unlike existing LLM clients, these functions generate well-typed outputs with guardrails to mitigate unexpected behavior.<p>After about 200-300 calls, these functions will begin to get cheaper and faster. We've seen 8-10x reduction in cost and latency in some use-cases! This happens via progressive knowledge distillation - MonkeyPatch incrementally fine-tunes smaller, cheaper models in the background, tests them against the constraints defined by the developer, and retains the smallest model that meets accuracy requirements, which typically has significantly lower costs and latency.<p>As an LLM researcher, I kept getting asked by startups and friends to build specific LLM features that they could embed into their applications. I realized that most developers have to either 1) use existing low-level LLM clients (GPT4/Claude), which can be unreliable, untyped, and pricey, or 2) pore through LangChain documentation for days to build something.<p>We built MonkeyPatch to make it easy for developers to inject LLM-powered functions into their code and create tests to ensure they behave as intended. Our goal is to help developers easily build apps and functions without worrying about reliability, cost, and latency, while following best software engineering practices.<p>We're only available in Python currently but actively working on a Typescript version. The repo has all the instructions you need to get up and running in a few minutes.<p>The world of LLMs is changing by the day and so we're not 100% sure how MonkeyPatch will evolve. For now, I'm just excited to share what we've been working on with the HN community. Would love to know what you guys think!<p>Open-source repo: <a href="https://github.com/monkeypatch/monkeypatch.py">https://github.com/monkeypatch/monkeypatch.py</a><p>Sample use-cases: <a href="https://github.com/monkeypatch/monkeypatch.py/tree/master/examples">https://github.com/monkeypatch/monkeypatch.py/tree/master/ex...</a><p>Benchmarks: <a href="https://github.com/monkeypatch/monkeypatch.py#scaling-and-finetuning">https://github.com/monkeypatch/monkeypatch.py#scaling-and-fi...</a>

Show HN: MonkeyPatch – Cheap, fast and predictable LLM functions in Python

Hi HN, Jack here! I'm one of the creators of MonkeyPatch, an easy tool that helps you build LLM-powered functions and apps that get cheaper and faster the more you use them.<p>For example, if you need to classify PDFs, extract product feedback from tweets, or auto-generate synthetic data, you can spin up an LLM-powered Python function in <5 minutes to power your application. Unlike existing LLM clients, these functions generate well-typed outputs with guardrails to mitigate unexpected behavior.<p>After about 200-300 calls, these functions will begin to get cheaper and faster. We've seen 8-10x reduction in cost and latency in some use-cases! This happens via progressive knowledge distillation - MonkeyPatch incrementally fine-tunes smaller, cheaper models in the background, tests them against the constraints defined by the developer, and retains the smallest model that meets accuracy requirements, which typically has significantly lower costs and latency.<p>As an LLM researcher, I kept getting asked by startups and friends to build specific LLM features that they could embed into their applications. I realized that most developers have to either 1) use existing low-level LLM clients (GPT4/Claude), which can be unreliable, untyped, and pricey, or 2) pore through LangChain documentation for days to build something.<p>We built MonkeyPatch to make it easy for developers to inject LLM-powered functions into their code and create tests to ensure they behave as intended. Our goal is to help developers easily build apps and functions without worrying about reliability, cost, and latency, while following best software engineering practices.<p>We're only available in Python currently but actively working on a Typescript version. The repo has all the instructions you need to get up and running in a few minutes.<p>The world of LLMs is changing by the day and so we're not 100% sure how MonkeyPatch will evolve. For now, I'm just excited to share what we've been working on with the HN community. Would love to know what you guys think!<p>Open-source repo: <a href="https://github.com/monkeypatch/monkeypatch.py">https://github.com/monkeypatch/monkeypatch.py</a><p>Sample use-cases: <a href="https://github.com/monkeypatch/monkeypatch.py/tree/master/examples">https://github.com/monkeypatch/monkeypatch.py/tree/master/ex...</a><p>Benchmarks: <a href="https://github.com/monkeypatch/monkeypatch.py#scaling-and-finetuning">https://github.com/monkeypatch/monkeypatch.py#scaling-and-fi...</a>

Show HN: Watermelon – copilot for code review

Show HN: Watermelon – copilot for code review

Bay Bridge: the cheapest H100 training clusters

Show HN: SvelteKit SaaS Boilerplate to help launch your product fast

Hi HN!<p>I am a indie hacker and love building apps with SvelteKit, so I built a boilerplate with the tech stack I always use.<p>It has almost everything needed to launch a SaaS/tool/AI app, like auth, db + orm, email, payments and styling.<p>You can view everything that's included on the website and the docs (<a href="https://docs.launchleopard.com" rel="nofollow noreferrer">https://docs.launchleopard.com</a>)<p>Would love to hear what other features or tech you'd want to see in a boilerplate like this!

Soccer video analysis from your match videos

I created a tool to generate awesome soccer video analysis from match videos.<p>I'm no pro player, just play with my friends weekly, record our matches, and use this tool to check out our performance.<p>My friends really enjoy it and have suggested adding features like measuring player speed, tracking players positions, and more.

Show HN: Easily Visualize Your SQLAlchemy Data Models in a Nice SVG Diagram

Show HN: Easily Visualize Your SQLAlchemy Data Models in a Nice SVG Diagram

Show HN: Twogether AI – Multi-Person Photo Generation API

Hey everyone, at Magicflow (YC W23) we're helping our customers run AI image generation in production, enabling them to produce high-quality photos at scale.<p>We are launching a scalable API today that makes it possible to create multi-person portrait photos: which means the ability to create real-looking photos of any two persons interacting with each other in some way only by providing a prompt and the person's pose. Generating this kind of photo requires a deep understanding of the AI ecosystem, a knowledge gap many companies face. In order to make the photos look real with high consistency and for a low cost, chaining of many models is required, and an excellent understanding of how to tweak with the various params of each one.<p>We also handle the infrastructure required to generate the photos, which can be a challenge when dealt with alone, especially for companies with a small backend team (we can scale to thousands of requests per day and generate 100 photos in about 3 minutes). Our customers today use this technology for the following use cases: creating new photo albums from old-scanned albums, providing personalized content for user acquisition campaigns, enabling new kinds of experiences in physical venues, and creating humorous photos with celebrities.<p>There is a significant tradeoff between creating a robust abstraction layer on top of Stable Diffusion capabilities and providing customers with more control over various options. The API currently allows you to manipulate the following parameters: the pose of the couple (hugging, taking a selfie, etc.), their facial expressions, the style of the photo (realistic, cartoon, painting, etc.), as well as the location, theme, and outfits (e.g., ski vacation, on the beach)<p>We created a free demo app for you to view examples and try live: <a href="https://twogether.ai?source=hn" rel="nofollow noreferrer">https://twogether.ai?source=hn</a> (no user or payment needed). For full API access, contact me at yardenst@magicflow.ai. We can typically set you up within a day, but an onboarding session is required to ensure responsible API usage.

Show HN: Twogether AI – Multi-Person Photo Generation API

Hey everyone, at Magicflow (YC W23) we're helping our customers run AI image generation in production, enabling them to produce high-quality photos at scale.<p>We are launching a scalable API today that makes it possible to create multi-person portrait photos: which means the ability to create real-looking photos of any two persons interacting with each other in some way only by providing a prompt and the person's pose. Generating this kind of photo requires a deep understanding of the AI ecosystem, a knowledge gap many companies face. In order to make the photos look real with high consistency and for a low cost, chaining of many models is required, and an excellent understanding of how to tweak with the various params of each one.<p>We also handle the infrastructure required to generate the photos, which can be a challenge when dealt with alone, especially for companies with a small backend team (we can scale to thousands of requests per day and generate 100 photos in about 3 minutes). Our customers today use this technology for the following use cases: creating new photo albums from old-scanned albums, providing personalized content for user acquisition campaigns, enabling new kinds of experiences in physical venues, and creating humorous photos with celebrities.<p>There is a significant tradeoff between creating a robust abstraction layer on top of Stable Diffusion capabilities and providing customers with more control over various options. The API currently allows you to manipulate the following parameters: the pose of the couple (hugging, taking a selfie, etc.), their facial expressions, the style of the photo (realistic, cartoon, painting, etc.), as well as the location, theme, and outfits (e.g., ski vacation, on the beach)<p>We created a free demo app for you to view examples and try live: <a href="https://twogether.ai?source=hn" rel="nofollow noreferrer">https://twogether.ai?source=hn</a> (no user or payment needed). For full API access, contact me at yardenst@magicflow.ai. We can typically set you up within a day, but an onboarding session is required to ensure responsible API usage.

Show HN: Interactive AI Resume/LinkedIn for better networking/job hunting

Generally, I found resumes too vague to get to know anyone (hence why no one networks with them), professional blogs too low ROI, walking up to people unscalable, and cold messaging fairly low success-rate.<p>I wanted the 'marketing tool' of networking to get myself out there. Something that lets me: 1) Draw people into a conversation before they've realized it 2) Make them remember me and ultimately reach out to me 3) See what people asked me so I can further refine my interactive profile and start the networking cycle again<p>So I built a website where anyone can create these interactive profiles starting with a resume import.<p>The one I linked is a test profile but on my personal one, I got: 1) >10x more people reaching out to me when I put myself out there to network (some were VCs actually; though I'm not fundraising right now) 2) A bunch of engagement questions where I can see what people want to know about me so I can further enhance my profile and improve my own outreach<p>This is still in early stages, but if I go to a conference/join a new team at a new job/need to network for some other reason, I think I'll put this on my LinkedIn/business card/etc.<p>The (limited) data so far suggests people are more willing to first talk to the interactive profile before reaching out to me. I guess that makes sense, it's less commitment than emailing me. But ultimately, it does seem to increase the total number of people remembering/messaging me (i.e. improving the professional networking funnel as it were).<p>I would love y'all's thoughts on it<p>Edit: I can see some of you asking questions lol. Way more fun than LinkedIn's 'This random person looked at your profile but what did they want to know? We have no idea'.

Show HN: SirixDB – Bitemporal binary JSON database system and event store

I had already posted the project a couple of years ago, and it gained some interest, but a lot of stuff has been done since then, especially regarding performance, a completely new JSON store, a REST API, various internals refactored, an improved JSONiq based query engine allowing updates, implementing set-oriented join optimizations, a now already dated web UI, a new Kotlin based CLI, a Python and TypeScript client to ease the use of Sirix... First prototypes from a precursor stem already from 2005.<p>So, what is it all about?<p>The system uses ideas from ZFS (a keyed index trie, storing checksums in parent pages...) and Git (a persistent index structure that shares unchanged pages between revisions) but appends new tree roots on each commit [1][2].<p>It is a JSON DBS. The system stores fine granular JSON nodes. Thus, there's almost no limit to the structure and size of an object. Objects can be arbitrarily nested, and updates are cheap.<p>On a high level, it supports space-efficient snapshots, tracking changes by an author / optional commit messages, time travel queries, reverting to previous revisions (while all revisions in-between still exist for audits...), or retrieving the changes of whole (sub)trees.<p>On the one hand, it's, thus, a bitemporal DBS, but on the other hand, it can be used as a simple event store. It stores the state after an event or a change occurs and tracks the changes.<p>Thus, an entity, a node in the JSON structure, can be updated to new values and eventually be removed while the history is easily retrievable, or we can easily revert to a previous state. The system assigns a unique ID to each new node, which never changes and is never reused (even after the deletion of the node). Thus, the system stores the state after the change/event and the event itself (the change event).<p>The leaf pages of the index structures are not simply copied during a write, but a sliding window algorithm is applied, such that only modified nodes and nodes that fall out of the sliding window have to be written. A predefined window length is configurable. The system avoids write-peaks, which would occur due to full snapshots and having to read a long chain of incremental changes in between.<p>Thus, it's best suited for fast flash drives with fast random reads and sequential writes. Data is never overwritten thus, audit trails are given for free.<p>Another aspect is that the system does not need a WAL (that is basically a second data store) due to atomic switches of a root index page and a single permitted read/write transaction (txn) concurrently and in parallel to N read-only txns, which are bound to specific revisions during the start. Reads do not involve any locks.[2]<p>A path summary, an unordered set of all paths to leaf nodes in the tree, is built and enables various optimizations. Furthermore, a rolling hash is optionally built, whereas all ancestor node hashes are adapted during inserts.<p>A dated Jupyter notebook with some examples can be found in [3], and overall documentation in [4].<p>The query engine[5] Brackit is retargetable (a couple of interfaces and rewrite rules have to be implemented for DB systems) and especially finds implicit joins and applies known algorithms from the relational DB systems world to optimize joins and aggregate functions due to set-oriented processing of the operators.[6]<p>I've given an interview in [7], but I'm usually very nervous, so don't judge too harshly.<p>Give it a try, and happy coding!<p>Kind regards<p>Johannes<p>[1] <a href="https://sirix.io" rel="nofollow noreferrer">https://sirix.io</a> | <a href="https://github.com/sirixdb/sirix">https://github.com/sirixdb/sirix</a><p>[2] <a href="https://sirix.io/docs/concepts.html" rel="nofollow noreferrer">https://sirix.io/docs/concepts.html</a><p>[3] <a href="https://colab.research.google.com/drive/1NNn1nwSbK6hAekzo1YbED52RI3NMqqbG#scrollTo=CBWQIvc0Ov3P" rel="nofollow noreferrer">https://colab.research.google.com/drive/1NNn1nwSbK6hAekzo1Yb...</a><p>[4] <a href="https://sirix.io/docs/" rel="nofollow noreferrer">https://sirix.io/docs/</a><p>[5] <a href="http://brackit.io" rel="nofollow noreferrer">http://brackit.io</a><p>[6] <a href="https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koOWN50sgiFa5hSC" rel="nofollow noreferrer">https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koO...</a><p>[7] <a href="https://youtu.be/Ee-5ruydgqo?si=Ift73d49w84RJWb2" rel="nofollow noreferrer">https://youtu.be/Ee-5ruydgqo?si=Ift73d49w84RJWb2</a>

Show HN: SirixDB – Bitemporal binary JSON database system and event store

I had already posted the project a couple of years ago, and it gained some interest, but a lot of stuff has been done since then, especially regarding performance, a completely new JSON store, a REST API, various internals refactored, an improved JSONiq based query engine allowing updates, implementing set-oriented join optimizations, a now already dated web UI, a new Kotlin based CLI, a Python and TypeScript client to ease the use of Sirix... First prototypes from a precursor stem already from 2005.<p>So, what is it all about?<p>The system uses ideas from ZFS (a keyed index trie, storing checksums in parent pages...) and Git (a persistent index structure that shares unchanged pages between revisions) but appends new tree roots on each commit [1][2].<p>It is a JSON DBS. The system stores fine granular JSON nodes. Thus, there's almost no limit to the structure and size of an object. Objects can be arbitrarily nested, and updates are cheap.<p>On a high level, it supports space-efficient snapshots, tracking changes by an author / optional commit messages, time travel queries, reverting to previous revisions (while all revisions in-between still exist for audits...), or retrieving the changes of whole (sub)trees.<p>On the one hand, it's, thus, a bitemporal DBS, but on the other hand, it can be used as a simple event store. It stores the state after an event or a change occurs and tracks the changes.<p>Thus, an entity, a node in the JSON structure, can be updated to new values and eventually be removed while the history is easily retrievable, or we can easily revert to a previous state. The system assigns a unique ID to each new node, which never changes and is never reused (even after the deletion of the node). Thus, the system stores the state after the change/event and the event itself (the change event).<p>The leaf pages of the index structures are not simply copied during a write, but a sliding window algorithm is applied, such that only modified nodes and nodes that fall out of the sliding window have to be written. A predefined window length is configurable. The system avoids write-peaks, which would occur due to full snapshots and having to read a long chain of incremental changes in between.<p>Thus, it's best suited for fast flash drives with fast random reads and sequential writes. Data is never overwritten thus, audit trails are given for free.<p>Another aspect is that the system does not need a WAL (that is basically a second data store) due to atomic switches of a root index page and a single permitted read/write transaction (txn) concurrently and in parallel to N read-only txns, which are bound to specific revisions during the start. Reads do not involve any locks.[2]<p>A path summary, an unordered set of all paths to leaf nodes in the tree, is built and enables various optimizations. Furthermore, a rolling hash is optionally built, whereas all ancestor node hashes are adapted during inserts.<p>A dated Jupyter notebook with some examples can be found in [3], and overall documentation in [4].<p>The query engine[5] Brackit is retargetable (a couple of interfaces and rewrite rules have to be implemented for DB systems) and especially finds implicit joins and applies known algorithms from the relational DB systems world to optimize joins and aggregate functions due to set-oriented processing of the operators.[6]<p>I've given an interview in [7], but I'm usually very nervous, so don't judge too harshly.<p>Give it a try, and happy coding!<p>Kind regards<p>Johannes<p>[1] <a href="https://sirix.io" rel="nofollow noreferrer">https://sirix.io</a> | <a href="https://github.com/sirixdb/sirix">https://github.com/sirixdb/sirix</a><p>[2] <a href="https://sirix.io/docs/concepts.html" rel="nofollow noreferrer">https://sirix.io/docs/concepts.html</a><p>[3] <a href="https://colab.research.google.com/drive/1NNn1nwSbK6hAekzo1YbED52RI3NMqqbG#scrollTo=CBWQIvc0Ov3P" rel="nofollow noreferrer">https://colab.research.google.com/drive/1NNn1nwSbK6hAekzo1Yb...</a><p>[4] <a href="https://sirix.io/docs/" rel="nofollow noreferrer">https://sirix.io/docs/</a><p>[5] <a href="http://brackit.io" rel="nofollow noreferrer">http://brackit.io</a><p>[6] <a href="https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koOWN50sgiFa5hSC" rel="nofollow noreferrer">https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koO...</a><p>[7] <a href="https://youtu.be/Ee-5ruydgqo?si=Ift73d49w84RJWb2" rel="nofollow noreferrer">https://youtu.be/Ee-5ruydgqo?si=Ift73d49w84RJWb2</a>

Show HN: SirixDB – Bitemporal binary JSON database system and event store

I had already posted the project a couple of years ago, and it gained some interest, but a lot of stuff has been done since then, especially regarding performance, a completely new JSON store, a REST API, various internals refactored, an improved JSONiq based query engine allowing updates, implementing set-oriented join optimizations, a now already dated web UI, a new Kotlin based CLI, a Python and TypeScript client to ease the use of Sirix... First prototypes from a precursor stem already from 2005.<p>So, what is it all about?<p>The system uses ideas from ZFS (a keyed index trie, storing checksums in parent pages...) and Git (a persistent index structure that shares unchanged pages between revisions) but appends new tree roots on each commit [1][2].<p>It is a JSON DBS. The system stores fine granular JSON nodes. Thus, there's almost no limit to the structure and size of an object. Objects can be arbitrarily nested, and updates are cheap.<p>On a high level, it supports space-efficient snapshots, tracking changes by an author / optional commit messages, time travel queries, reverting to previous revisions (while all revisions in-between still exist for audits...), or retrieving the changes of whole (sub)trees.<p>On the one hand, it's, thus, a bitemporal DBS, but on the other hand, it can be used as a simple event store. It stores the state after an event or a change occurs and tracks the changes.<p>Thus, an entity, a node in the JSON structure, can be updated to new values and eventually be removed while the history is easily retrievable, or we can easily revert to a previous state. The system assigns a unique ID to each new node, which never changes and is never reused (even after the deletion of the node). Thus, the system stores the state after the change/event and the event itself (the change event).<p>The leaf pages of the index structures are not simply copied during a write, but a sliding window algorithm is applied, such that only modified nodes and nodes that fall out of the sliding window have to be written. A predefined window length is configurable. The system avoids write-peaks, which would occur due to full snapshots and having to read a long chain of incremental changes in between.<p>Thus, it's best suited for fast flash drives with fast random reads and sequential writes. Data is never overwritten thus, audit trails are given for free.<p>Another aspect is that the system does not need a WAL (that is basically a second data store) due to atomic switches of a root index page and a single permitted read/write transaction (txn) concurrently and in parallel to N read-only txns, which are bound to specific revisions during the start. Reads do not involve any locks.[2]<p>A path summary, an unordered set of all paths to leaf nodes in the tree, is built and enables various optimizations. Furthermore, a rolling hash is optionally built, whereas all ancestor node hashes are adapted during inserts.<p>A dated Jupyter notebook with some examples can be found in [3], and overall documentation in [4].<p>The query engine[5] Brackit is retargetable (a couple of interfaces and rewrite rules have to be implemented for DB systems) and especially finds implicit joins and applies known algorithms from the relational DB systems world to optimize joins and aggregate functions due to set-oriented processing of the operators.[6]<p>I've given an interview in [7], but I'm usually very nervous, so don't judge too harshly.<p>Give it a try, and happy coding!<p>Kind regards<p>Johannes<p>[1] <a href="https://sirix.io" rel="nofollow noreferrer">https://sirix.io</a> | <a href="https://github.com/sirixdb/sirix">https://github.com/sirixdb/sirix</a><p>[2] <a href="https://sirix.io/docs/concepts.html" rel="nofollow noreferrer">https://sirix.io/docs/concepts.html</a><p>[3] <a href="https://colab.research.google.com/drive/1NNn1nwSbK6hAekzo1YbED52RI3NMqqbG#scrollTo=CBWQIvc0Ov3P" rel="nofollow noreferrer">https://colab.research.google.com/drive/1NNn1nwSbK6hAekzo1Yb...</a><p>[4] <a href="https://sirix.io/docs/" rel="nofollow noreferrer">https://sirix.io/docs/</a><p>[5] <a href="http://brackit.io" rel="nofollow noreferrer">http://brackit.io</a><p>[6] <a href="https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koOWN50sgiFa5hSC" rel="nofollow noreferrer">https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koO...</a><p>[7] <a href="https://youtu.be/Ee-5ruydgqo?si=Ift73d49w84RJWb2" rel="nofollow noreferrer">https://youtu.be/Ee-5ruydgqo?si=Ift73d49w84RJWb2</a>

Show HN: Send me an IRL message and watch it arrive

- This page is running directly on the Pico W in the stream, using microdot (but proxied out via cloudflare tunnel).<p>- There's a buzzer that plays a sound as well but the stream has no audio.<p>- There's a Pi Zero 2 W with a Camera Module 3 streaming the video. Streaming low latency video is painful. I tried mjpeg, Nginx+HLS/DASH, YouTube and Twitch. Twitch is definitely the easiest and best-performing.<p>- There's still a few seconds delay between sending your message and the video updating.

Show HN: Bulk Creation of Transcripts from YouTube Playlists with Whisper

I know there are various tools that are supposed to make this easy, but I couldn't find anything that did everything I wanted, so I made this today for fun. The web-based offerings all take forever and seem flaky, and you need to process one video at a time, with no control over the transcription settings. In contrast, my script lets you convert a whole playlist in bulk with full control over everything.<p>It's truly easy to use-- you can clone the repo, install to a venv, and be generating a folder full of high quality transcript text files in under 5 minutes. All you need to do is supply the URL to a YouTube playlist or to an individual video file and this tool does the rest automatically. It uses faster-whisper with a high beam_size, so it's a bit slower than you might expect, but this does result in higher accuracy. The best way to use this is to take an existing playlist, or create a new one on YouTube, start this script up, and come back the next morning with all your finished transcripts. It attempts to "upgrade" the output of whisper by taking all the transcript segments, gluing them together, and then splitting them back into sentences (it uses Spacy for this, or a simpler regex-based function). You end up with a single text file with the full transcript all ready to go for each video in the playlist, with a sensible file name based on the title of the video.<p>If you have CUDA installed, it will try to use it, but as with all things CUDA, it's annoyingly fragile and picky, so don't be surprised if you get a CUDA error even if you know for a fact CUDA is installed on your system. If you're looking for reliability, disable CUDA. But if you need to transcribe a LOT of transcripts, it does go much, much faster on a GPU.<p>Even if you don't have a GPU, if you have a powerful machine with a lot of RAM and cores, this script will fully saturate them and can download and process multiple videos at the same time. The default settings are pretty good for that situation. But if you have a slower machine, you might want to use a smaller Whisper model (like `base.en` or even `tiny.en`) and dial down the beam_size to 2.