The best Hacker News stories from Show from the past day
Latest posts:
Show HN: I built a LLM-powered Ask HN: like Perplexity, but for HN comments
Hi HN!<p>I'm Jonathan and I built Ask Hacker Search (<a href="https://hackersearch.net/ask" rel="nofollow">https://hackersearch.net/ask</a>), an LLM-powered version of Hacker News' Ask HN.<p>Unlike Ask HN, Ask Hacker Search doesn't solicit new contributions from HN readers. Instead, it leverages Hacker News' historical data to answer questions, and offers LLM-generated summaries of those. I've used it for questions like "Should I use Drizzle or Prisma?" or "What is a good screen capture that allows easy zooming effects on Mac?".<p>It is particularly useful when you're interested in understanding HN readers' sentiment about a topic, or when looking for expert insights on topics of interest to HN readers. I've been using it continually while building it, and have found it particularly useful to find software libraries recommended by HN or get quick vibe checks on hot topics.<p>This builds on my release of Hacker Search two weeks ago
(<a href="https://news.ycombinator.com/item?id=40238509">https://news.ycombinator.com/item?id=40238509</a>), which offered a semantic search engine over top HN submissions. It's not just a small upgrade: covering comments was the #1 requested feature after that launch, so I rebuilt the near entirety of the product to support that.<p>Please try it out and let me know what you think of it! I have to limit the number of LLM summaries each person can get for free, as this is entirely self-funded. If you hit the limit, you can subscribe for more summaries generated by a better model ($8/month), or bring your own compute by running inference on Ollama on your machine!
Show HN: I built a LLM-powered Ask HN: like Perplexity, but for HN comments
Hi HN!<p>I'm Jonathan and I built Ask Hacker Search (<a href="https://hackersearch.net/ask" rel="nofollow">https://hackersearch.net/ask</a>), an LLM-powered version of Hacker News' Ask HN.<p>Unlike Ask HN, Ask Hacker Search doesn't solicit new contributions from HN readers. Instead, it leverages Hacker News' historical data to answer questions, and offers LLM-generated summaries of those. I've used it for questions like "Should I use Drizzle or Prisma?" or "What is a good screen capture that allows easy zooming effects on Mac?".<p>It is particularly useful when you're interested in understanding HN readers' sentiment about a topic, or when looking for expert insights on topics of interest to HN readers. I've been using it continually while building it, and have found it particularly useful to find software libraries recommended by HN or get quick vibe checks on hot topics.<p>This builds on my release of Hacker Search two weeks ago
(<a href="https://news.ycombinator.com/item?id=40238509">https://news.ycombinator.com/item?id=40238509</a>), which offered a semantic search engine over top HN submissions. It's not just a small upgrade: covering comments was the #1 requested feature after that launch, so I rebuilt the near entirety of the product to support that.<p>Please try it out and let me know what you think of it! I have to limit the number of LLM summaries each person can get for free, as this is entirely self-funded. If you hit the limit, you can subscribe for more summaries generated by a better model ($8/month), or bring your own compute by running inference on Ollama on your machine!
Show HN: I'm 17 and wrote a guide on how to build your own programming language
Hey! I’m JC. I’m 17 and part of Hack Club, a nonprofit where we help teenagers ship programming projects with their friends while growing technically.<p>A while ago, I asked myself the question, “How exactly do programming languages work behind the scenes?”
It seemed really daunting until I went to a half hour workshop at a high school hackathon about writing a tree-walk interpreter and realized that getting started was actually super fun.<p>This guide is designed in the vein of that - to get people, especially teenagers, started on learning how to build a programming language in a literal weekend by actually shipping one. It’s a stepping stone for learning the big things - compilers, optimizations for performance, etc.
It’s very inspired by Crafting Interpreters and why’s poignant guide, but meant to be approachable in a weekend.<p>Some backstory on me: A year ago I finished high school early and joined Hack Club full-time to build projects like this. I’ve been programming since COVID, and learned how to code primarily by shipping things that seemed daunting to me and taking inspiration from people taking the time to break down various topics online.<p>Give it a try and take it out for a spin! Constructive feedback is also really appreciated.<p>It’s open source on GitHub at <a href="https://github.com/hackclub/easel">https://github.com/hackclub/easel</a>
Show HN: I'm 17 and wrote a guide on how to build your own programming language
Hey! I’m JC. I’m 17 and part of Hack Club, a nonprofit where we help teenagers ship programming projects with their friends while growing technically.<p>A while ago, I asked myself the question, “How exactly do programming languages work behind the scenes?”
It seemed really daunting until I went to a half hour workshop at a high school hackathon about writing a tree-walk interpreter and realized that getting started was actually super fun.<p>This guide is designed in the vein of that - to get people, especially teenagers, started on learning how to build a programming language in a literal weekend by actually shipping one. It’s a stepping stone for learning the big things - compilers, optimizations for performance, etc.
It’s very inspired by Crafting Interpreters and why’s poignant guide, but meant to be approachable in a weekend.<p>Some backstory on me: A year ago I finished high school early and joined Hack Club full-time to build projects like this. I’ve been programming since COVID, and learned how to code primarily by shipping things that seemed daunting to me and taking inspiration from people taking the time to break down various topics online.<p>Give it a try and take it out for a spin! Constructive feedback is also really appreciated.<p>It’s open source on GitHub at <a href="https://github.com/hackclub/easel">https://github.com/hackclub/easel</a>
Show HN: I made a Chrome extension to clean up your Gmail inbox locally
Hi everyone,<p>My motivation for building this was to address the trouble of mass unsubscribing from unwanted emails and deleting bulk emails while ensuring privacy and control over the process. With this Chrome extension, emails are not sent to any external servers. All calls to the Gmail API happen locally on your device.<p>Feedback and suggestions are welcome!
Show HN: I made a Chrome extension to clean up your Gmail inbox locally
Hi everyone,<p>My motivation for building this was to address the trouble of mass unsubscribing from unwanted emails and deleting bulk emails while ensuring privacy and control over the process. With this Chrome extension, emails are not sent to any external servers. All calls to the Gmail API happen locally on your device.<p>Feedback and suggestions are welcome!
Show HN: I wrote a symmetry game with a daily puzzle
I’ve been working on a puzzle game for a few years as a side project.<p>The game is based on a small region of tiles in a grid that is mirrored, modified, and mirrored again. It’s based on a novel phenomenon I noticed where, once these mutations happen a few times, the original region can be hard to recognize. You have a feeling there are symmetries in the image, but they are just out of grasp. Going further, if only part of the region is visible, such as mirroring off the edge of the board, it adds to the feeling and it becomes a satisfying puzzle to solve.<p>I originally published this on the app stores, but after spending some money on marketing, I didn’t have line of sight to something people would pay for. I had built it around a hosted level API to tweak levels between version releases, making it prone to software rot over a few years. SSL certs would expire, credit cards would expire, App Store agreements need to be renewed, etc.. Without a continual drip of effort I wasn’t motivated to put in, it defaulted to broken.<p>With that software rot in mind, and hoping to make something that would be around for a while for my friends and family to play, I started making a web-only, client-side only version. Still prone to rot, but not nearly as many moving pieces. It’s missing some of the polish of the original and the puzzles aren't hand curated with a graduating difficulty, but it’s playable on many modern devices. The generated puzzles are good ~75% of the time. I’m still working out how to detect the dud puzzles- they are playable, but not fun. I’ve got ideas on what defines a “good” puzzle, but haven’t formalized them into a fitness function yet.<p>One other note– while there are almost definitely UI bugs (please report!), if it says the puzzle can be solved in X taps & flips, it can. Those numbers are derived from how the puzzle itself renders, so it’s (thankfully) not prone to producing impossible to solve puzzles. Merely ones that may appear so at first-- hence the name.<p>Today’s puzzle is a good difficulty to start with, a new one generates daily at midnight EST. There's no cookie or tracking, so let me know if you're playing!
Show HN: I wrote a symmetry game with a daily puzzle
I’ve been working on a puzzle game for a few years as a side project.<p>The game is based on a small region of tiles in a grid that is mirrored, modified, and mirrored again. It’s based on a novel phenomenon I noticed where, once these mutations happen a few times, the original region can be hard to recognize. You have a feeling there are symmetries in the image, but they are just out of grasp. Going further, if only part of the region is visible, such as mirroring off the edge of the board, it adds to the feeling and it becomes a satisfying puzzle to solve.<p>I originally published this on the app stores, but after spending some money on marketing, I didn’t have line of sight to something people would pay for. I had built it around a hosted level API to tweak levels between version releases, making it prone to software rot over a few years. SSL certs would expire, credit cards would expire, App Store agreements need to be renewed, etc.. Without a continual drip of effort I wasn’t motivated to put in, it defaulted to broken.<p>With that software rot in mind, and hoping to make something that would be around for a while for my friends and family to play, I started making a web-only, client-side only version. Still prone to rot, but not nearly as many moving pieces. It’s missing some of the polish of the original and the puzzles aren't hand curated with a graduating difficulty, but it’s playable on many modern devices. The generated puzzles are good ~75% of the time. I’m still working out how to detect the dud puzzles- they are playable, but not fun. I’ve got ideas on what defines a “good” puzzle, but haven’t formalized them into a fitness function yet.<p>One other note– while there are almost definitely UI bugs (please report!), if it says the puzzle can be solved in X taps & flips, it can. Those numbers are derived from how the puzzle itself renders, so it’s (thankfully) not prone to producing impossible to solve puzzles. Merely ones that may appear so at first-- hence the name.<p>Today’s puzzle is a good difficulty to start with, a new one generates daily at midnight EST. There's no cookie or tracking, so let me know if you're playing!
Show HN: I wrote a symmetry game with a daily puzzle
I’ve been working on a puzzle game for a few years as a side project.<p>The game is based on a small region of tiles in a grid that is mirrored, modified, and mirrored again. It’s based on a novel phenomenon I noticed where, once these mutations happen a few times, the original region can be hard to recognize. You have a feeling there are symmetries in the image, but they are just out of grasp. Going further, if only part of the region is visible, such as mirroring off the edge of the board, it adds to the feeling and it becomes a satisfying puzzle to solve.<p>I originally published this on the app stores, but after spending some money on marketing, I didn’t have line of sight to something people would pay for. I had built it around a hosted level API to tweak levels between version releases, making it prone to software rot over a few years. SSL certs would expire, credit cards would expire, App Store agreements need to be renewed, etc.. Without a continual drip of effort I wasn’t motivated to put in, it defaulted to broken.<p>With that software rot in mind, and hoping to make something that would be around for a while for my friends and family to play, I started making a web-only, client-side only version. Still prone to rot, but not nearly as many moving pieces. It’s missing some of the polish of the original and the puzzles aren't hand curated with a graduating difficulty, but it’s playable on many modern devices. The generated puzzles are good ~75% of the time. I’m still working out how to detect the dud puzzles- they are playable, but not fun. I’ve got ideas on what defines a “good” puzzle, but haven’t formalized them into a fitness function yet.<p>One other note– while there are almost definitely UI bugs (please report!), if it says the puzzle can be solved in X taps & flips, it can. Those numbers are derived from how the puzzle itself renders, so it’s (thankfully) not prone to producing impossible to solve puzzles. Merely ones that may appear so at first-- hence the name.<p>Today’s puzzle is a good difficulty to start with, a new one generates daily at midnight EST. There's no cookie or tracking, so let me know if you're playing!
Show HN: I wrote a symmetry game with a daily puzzle
I’ve been working on a puzzle game for a few years as a side project.<p>The game is based on a small region of tiles in a grid that is mirrored, modified, and mirrored again. It’s based on a novel phenomenon I noticed where, once these mutations happen a few times, the original region can be hard to recognize. You have a feeling there are symmetries in the image, but they are just out of grasp. Going further, if only part of the region is visible, such as mirroring off the edge of the board, it adds to the feeling and it becomes a satisfying puzzle to solve.<p>I originally published this on the app stores, but after spending some money on marketing, I didn’t have line of sight to something people would pay for. I had built it around a hosted level API to tweak levels between version releases, making it prone to software rot over a few years. SSL certs would expire, credit cards would expire, App Store agreements need to be renewed, etc.. Without a continual drip of effort I wasn’t motivated to put in, it defaulted to broken.<p>With that software rot in mind, and hoping to make something that would be around for a while for my friends and family to play, I started making a web-only, client-side only version. Still prone to rot, but not nearly as many moving pieces. It’s missing some of the polish of the original and the puzzles aren't hand curated with a graduating difficulty, but it’s playable on many modern devices. The generated puzzles are good ~75% of the time. I’m still working out how to detect the dud puzzles- they are playable, but not fun. I’ve got ideas on what defines a “good” puzzle, but haven’t formalized them into a fitness function yet.<p>One other note– while there are almost definitely UI bugs (please report!), if it says the puzzle can be solved in X taps & flips, it can. Those numbers are derived from how the puzzle itself renders, so it’s (thankfully) not prone to producing impossible to solve puzzles. Merely ones that may appear so at first-- hence the name.<p>Today’s puzzle is a good difficulty to start with, a new one generates daily at midnight EST. There's no cookie or tracking, so let me know if you're playing!
Show HN: I wrote a symmetry game with a daily puzzle
I’ve been working on a puzzle game for a few years as a side project.<p>The game is based on a small region of tiles in a grid that is mirrored, modified, and mirrored again. It’s based on a novel phenomenon I noticed where, once these mutations happen a few times, the original region can be hard to recognize. You have a feeling there are symmetries in the image, but they are just out of grasp. Going further, if only part of the region is visible, such as mirroring off the edge of the board, it adds to the feeling and it becomes a satisfying puzzle to solve.<p>I originally published this on the app stores, but after spending some money on marketing, I didn’t have line of sight to something people would pay for. I had built it around a hosted level API to tweak levels between version releases, making it prone to software rot over a few years. SSL certs would expire, credit cards would expire, App Store agreements need to be renewed, etc.. Without a continual drip of effort I wasn’t motivated to put in, it defaulted to broken.<p>With that software rot in mind, and hoping to make something that would be around for a while for my friends and family to play, I started making a web-only, client-side only version. Still prone to rot, but not nearly as many moving pieces. It’s missing some of the polish of the original and the puzzles aren't hand curated with a graduating difficulty, but it’s playable on many modern devices. The generated puzzles are good ~75% of the time. I’m still working out how to detect the dud puzzles- they are playable, but not fun. I’ve got ideas on what defines a “good” puzzle, but haven’t formalized them into a fitness function yet.<p>One other note– while there are almost definitely UI bugs (please report!), if it says the puzzle can be solved in X taps & flips, it can. Those numbers are derived from how the puzzle itself renders, so it’s (thankfully) not prone to producing impossible to solve puzzles. Merely ones that may appear so at first-- hence the name.<p>Today’s puzzle is a good difficulty to start with, a new one generates daily at midnight EST. There's no cookie or tracking, so let me know if you're playing!
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: I made a Mac app to search my images and videos locally with ML
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.<p>Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.<p>I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.<p>Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.<p>You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).<p>It's also 100% private. Make any media searchable without it ever leaving your computer.<p>How I built it:
- 100% Javascript (I'm using Electron JS and React JS).
- Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries).
- Redis (storing and doing KNN search on the embeddings with this DB).
- Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).<p>Demo: <a href="https://www.youtube.com/watch?v=EIUgPNHOKKc" rel="nofollow">https://www.youtube.com/watch?v=EIUgPNHOKKc</a><p>If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.
Show HN: I made a Mac app to search my images and videos locally with ML
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.<p>Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.<p>I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.<p>Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.<p>You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).<p>It's also 100% private. Make any media searchable without it ever leaving your computer.<p>How I built it:
- 100% Javascript (I'm using Electron JS and React JS).
- Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries).
- Redis (storing and doing KNN search on the embeddings with this DB).
- Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).<p>Demo: <a href="https://www.youtube.com/watch?v=EIUgPNHOKKc" rel="nofollow">https://www.youtube.com/watch?v=EIUgPNHOKKc</a><p>If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.
Show HN: I made a Mac app to search my images and videos locally with ML
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.<p>Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.<p>I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.<p>Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.<p>You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).<p>It's also 100% private. Make any media searchable without it ever leaving your computer.<p>How I built it:
- 100% Javascript (I'm using Electron JS and React JS).
- Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries).
- Redis (storing and doing KNN search on the embeddings with this DB).
- Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).<p>Demo: <a href="https://www.youtube.com/watch?v=EIUgPNHOKKc" rel="nofollow">https://www.youtube.com/watch?v=EIUgPNHOKKc</a><p>If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.
Show HN: I made a Mac app to search my images and videos locally with ML
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.<p>Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.<p>I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.<p>Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.<p>You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).<p>It's also 100% private. Make any media searchable without it ever leaving your computer.<p>How I built it:
- 100% Javascript (I'm using Electron JS and React JS).
- Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries).
- Redis (storing and doing KNN search on the embeddings with this DB).
- Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).<p>Demo: <a href="https://www.youtube.com/watch?v=EIUgPNHOKKc" rel="nofollow">https://www.youtube.com/watch?v=EIUgPNHOKKc</a><p>If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.