The best Hacker News stories from Show from the past day
Latest posts:
Show HN: I wrote a symmetry game with a daily puzzle
I’ve been working on a puzzle game for a few years as a side project.<p>The game is based on a small region of tiles in a grid that is mirrored, modified, and mirrored again. It’s based on a novel phenomenon I noticed where, once these mutations happen a few times, the original region can be hard to recognize. You have a feeling there are symmetries in the image, but they are just out of grasp. Going further, if only part of the region is visible, such as mirroring off the edge of the board, it adds to the feeling and it becomes a satisfying puzzle to solve.<p>I originally published this on the app stores, but after spending some money on marketing, I didn’t have line of sight to something people would pay for. I had built it around a hosted level API to tweak levels between version releases, making it prone to software rot over a few years. SSL certs would expire, credit cards would expire, App Store agreements need to be renewed, etc.. Without a continual drip of effort I wasn’t motivated to put in, it defaulted to broken.<p>With that software rot in mind, and hoping to make something that would be around for a while for my friends and family to play, I started making a web-only, client-side only version. Still prone to rot, but not nearly as many moving pieces. It’s missing some of the polish of the original and the puzzles aren't hand curated with a graduating difficulty, but it’s playable on many modern devices. The generated puzzles are good ~75% of the time. I’m still working out how to detect the dud puzzles- they are playable, but not fun. I’ve got ideas on what defines a “good” puzzle, but haven’t formalized them into a fitness function yet.<p>One other note– while there are almost definitely UI bugs (please report!), if it says the puzzle can be solved in X taps & flips, it can. Those numbers are derived from how the puzzle itself renders, so it’s (thankfully) not prone to producing impossible to solve puzzles. Merely ones that may appear so at first-- hence the name.<p>Today’s puzzle is a good difficulty to start with, a new one generates daily at midnight EST. There's no cookie or tracking, so let me know if you're playing!
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: Tarsier – Vision utilities for web interaction agents
Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.<p>Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p><a href="https://reworkd.ai/careers">https://reworkd.ai/careers</a>
Show HN: I made a Mac app to search my images and videos locally with ML
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.<p>Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.<p>I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.<p>Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.<p>You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).<p>It's also 100% private. Make any media searchable without it ever leaving your computer.<p>How I built it:
- 100% Javascript (I'm using Electron JS and React JS).
- Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries).
- Redis (storing and doing KNN search on the embeddings with this DB).
- Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).<p>Demo: <a href="https://www.youtube.com/watch?v=EIUgPNHOKKc" rel="nofollow">https://www.youtube.com/watch?v=EIUgPNHOKKc</a><p>If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.
Show HN: I made a Mac app to search my images and videos locally with ML
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.<p>Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.<p>I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.<p>Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.<p>You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).<p>It's also 100% private. Make any media searchable without it ever leaving your computer.<p>How I built it:
- 100% Javascript (I'm using Electron JS and React JS).
- Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries).
- Redis (storing and doing KNN search on the embeddings with this DB).
- Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).<p>Demo: <a href="https://www.youtube.com/watch?v=EIUgPNHOKKc" rel="nofollow">https://www.youtube.com/watch?v=EIUgPNHOKKc</a><p>If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.
Show HN: I made a Mac app to search my images and videos locally with ML
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.<p>Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.<p>I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.<p>Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.<p>You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).<p>It's also 100% private. Make any media searchable without it ever leaving your computer.<p>How I built it:
- 100% Javascript (I'm using Electron JS and React JS).
- Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries).
- Redis (storing and doing KNN search on the embeddings with this DB).
- Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).<p>Demo: <a href="https://www.youtube.com/watch?v=EIUgPNHOKKc" rel="nofollow">https://www.youtube.com/watch?v=EIUgPNHOKKc</a><p>If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.
Show HN: I made a Mac app to search my images and videos locally with ML
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.<p>Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.<p>I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.<p>Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.<p>You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).<p>It's also 100% private. Make any media searchable without it ever leaving your computer.<p>How I built it:
- 100% Javascript (I'm using Electron JS and React JS).
- Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries).
- Redis (storing and doing KNN search on the embeddings with this DB).
- Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).<p>Demo: <a href="https://www.youtube.com/watch?v=EIUgPNHOKKc" rel="nofollow">https://www.youtube.com/watch?v=EIUgPNHOKKc</a><p>If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.
Show HN: I built a math website the internet loved, I'm back with more features
A few months back, I published my website, teachyourselfmath, which shows you a list of math problems parsed automatically from PDFs around the world. It received a tremendous amount of feedback and interest. And I was honestly overwhelmed by the response and then life happened.<p>Over the past few weeks, I have been actively working on this project, trying to incorporate all the feedback and I’d love to share it with the world again. New features:
1. Filter problems by difficulty and category
2. Bookmark your favorite problems
3. Editor in the comment section supports markdown formatting
4. ...and some UI improvements throughout the website<p>I am also starting a small telegram community of math nerds who would like to discuss all things math, as well as talk about upcoming features and feedback for the website. Here is the link - (<a href="https://t.me/teachyourselfmath" rel="nofollow">https://t.me/teachyourselfmath</a>)<p>If you’d like to support my work through small donations, you can do it here - (<a href="https://www.buymeacoffee.com/viveknathani">https://www.buymeacoffee.com/viveknathani</a>). Right now, teachyourselfmath runs for free. Later, I’d love to make features that people would love to pay for but fundamentally, the goal is to make math accessible through technology. There’s a lot of peer learning involved in the comments section of these math problems. All of this gives me more reason to keep working on this.<p>Happy hacking!
Show HN: Open-source BI and analytics for engineers
We are building Quary (<a href="https://quary.dev">https://quary.dev</a>), an engineer-first BI/analytics product. You can find our repo at <a href="https://github.com/quarylabs/quary">https://github.com/quarylabs/quary</a> and our website at <a href="https://www.quary.dev/">https://www.quary.dev/</a>. There’s a demo video here: <a href="https://www.youtube.com/watch?v=o3hO65_lkGU" rel="nofollow">https://www.youtube.com/watch?v=o3hO65_lkGU</a><p>As engineers who have worked on data at startups and Amazon, we were frustrated by self-serve BI tools. They seemed dumbed down and they always required us to abandon our local dev tools we know and love (e.g. copilot, git). For us and for everyone we speak to, they end up being a mess.<p>Based on this, we decided there was a need for engineer-oriented BI and analytics software.<p>Quary solves these pain points by bringing standard software practices (version control, testing, refactoring, ci/cd, open-source, etc.) to the BI and analytics workflow.<p>We integrate with many databases, but we’re showcasing our slick Supabase integration, because it: (1) keeps your data safe by running on your machine without data flowing through our servers; and (2) enables you to quickly build an analytics layer on top of your Supabase Postgres instances. Check out our Supabase guide: <a href="https://www.quary.dev/docs/quickstart-supabase">https://www.quary.dev/docs/quickstart-supabase</a><p>What we’re launching today is open source under the Apache 2.0 license. We plan to keep the developer core open source and add paid features like a web platform to easily share data models (per-seat pricing), and an orchestration engine to materialize your data models.<p>Please try Quary at <a href="https://quary.dev">https://quary.dev</a> and let us know what you think! We're excited to put the power of BI and analytics into the hands of engineers.
Show HN: Open-source BI and analytics for engineers
We are building Quary (<a href="https://quary.dev">https://quary.dev</a>), an engineer-first BI/analytics product. You can find our repo at <a href="https://github.com/quarylabs/quary">https://github.com/quarylabs/quary</a> and our website at <a href="https://www.quary.dev/">https://www.quary.dev/</a>. There’s a demo video here: <a href="https://www.youtube.com/watch?v=o3hO65_lkGU" rel="nofollow">https://www.youtube.com/watch?v=o3hO65_lkGU</a><p>As engineers who have worked on data at startups and Amazon, we were frustrated by self-serve BI tools. They seemed dumbed down and they always required us to abandon our local dev tools we know and love (e.g. copilot, git). For us and for everyone we speak to, they end up being a mess.<p>Based on this, we decided there was a need for engineer-oriented BI and analytics software.<p>Quary solves these pain points by bringing standard software practices (version control, testing, refactoring, ci/cd, open-source, etc.) to the BI and analytics workflow.<p>We integrate with many databases, but we’re showcasing our slick Supabase integration, because it: (1) keeps your data safe by running on your machine without data flowing through our servers; and (2) enables you to quickly build an analytics layer on top of your Supabase Postgres instances. Check out our Supabase guide: <a href="https://www.quary.dev/docs/quickstart-supabase">https://www.quary.dev/docs/quickstart-supabase</a><p>What we’re launching today is open source under the Apache 2.0 license. We plan to keep the developer core open source and add paid features like a web platform to easily share data models (per-seat pricing), and an orchestration engine to materialize your data models.<p>Please try Quary at <a href="https://quary.dev">https://quary.dev</a> and let us know what you think! We're excited to put the power of BI and analytics into the hands of engineers.
Show HN: Open-source BI and analytics for engineers
We are building Quary (<a href="https://quary.dev">https://quary.dev</a>), an engineer-first BI/analytics product. You can find our repo at <a href="https://github.com/quarylabs/quary">https://github.com/quarylabs/quary</a> and our website at <a href="https://www.quary.dev/">https://www.quary.dev/</a>. There’s a demo video here: <a href="https://www.youtube.com/watch?v=o3hO65_lkGU" rel="nofollow">https://www.youtube.com/watch?v=o3hO65_lkGU</a><p>As engineers who have worked on data at startups and Amazon, we were frustrated by self-serve BI tools. They seemed dumbed down and they always required us to abandon our local dev tools we know and love (e.g. copilot, git). For us and for everyone we speak to, they end up being a mess.<p>Based on this, we decided there was a need for engineer-oriented BI and analytics software.<p>Quary solves these pain points by bringing standard software practices (version control, testing, refactoring, ci/cd, open-source, etc.) to the BI and analytics workflow.<p>We integrate with many databases, but we’re showcasing our slick Supabase integration, because it: (1) keeps your data safe by running on your machine without data flowing through our servers; and (2) enables you to quickly build an analytics layer on top of your Supabase Postgres instances. Check out our Supabase guide: <a href="https://www.quary.dev/docs/quickstart-supabase">https://www.quary.dev/docs/quickstart-supabase</a><p>What we’re launching today is open source under the Apache 2.0 license. We plan to keep the developer core open source and add paid features like a web platform to easily share data models (per-seat pricing), and an orchestration engine to materialize your data models.<p>Please try Quary at <a href="https://quary.dev">https://quary.dev</a> and let us know what you think! We're excited to put the power of BI and analytics into the hands of engineers.
Show HN: I built an AI tool to help with ADHD task paralysis
Show HN: I open sourced Athena Crisis, a game built with React and CSS
Hey HN! I'm so excited to open source Athena Crisis under the MIT-License and fund contributions to the game and the genre.<p>If you like the game and want to support its development, please check it out on Steam or on athenacrisis.com.
Show HN: I open sourced Athena Crisis, a game built with React and CSS
Hey HN! I'm so excited to open source Athena Crisis under the MIT-License and fund contributions to the game and the genre.<p>If you like the game and want to support its development, please check it out on Steam or on athenacrisis.com.
Show HN: Boxwood – simple templating engine for JavaScript, in JavaScript
Hey!<p>A while back, I wrote a templating engine (MIT License), which I mostly use for small side projects, either static or ssr generated. Simplicity is one of the main goals.<p>I'll be happy to answer any questions. The main goal for sharing it is to simply get some feedback.<p>Best,
Emil
Show HN: Pico: An open-source Ngrok alternative built for production traffic
Pico is an open-source alternative to Ngrok. Unlike most other open-source tunnelling solutions, Pico is designed to serve production traffic and be simple to host (particularly on Kubernetes).<p>Upstream services connect to Pico and register endpoints. Pico will then route requests for an endpoint to a registered upstream service via its outbound-only connection. This means you can expose your services without opening a public port.<p>Pico runs as a cluster of nodes in order to be fault tolerant, scale horizontally and support zero downtime deployments. It is also easy to host, such as a Kubernetes Deployment or StatefulSet behind a HTTP load balancer.
Show HN: Pico: An open-source Ngrok alternative built for production traffic
Pico is an open-source alternative to Ngrok. Unlike most other open-source tunnelling solutions, Pico is designed to serve production traffic and be simple to host (particularly on Kubernetes).<p>Upstream services connect to Pico and register endpoints. Pico will then route requests for an endpoint to a registered upstream service via its outbound-only connection. This means you can expose your services without opening a public port.<p>Pico runs as a cluster of nodes in order to be fault tolerant, scale horizontally and support zero downtime deployments. It is also easy to host, such as a Kubernetes Deployment or StatefulSet behind a HTTP load balancer.