The best Hacker News stories from Show from the past day
Latest posts:
Show HN: Zerox – Document OCR with GPT-mini
This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document".<p>But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost.<p>I've tested almost every variant of document OCR over the past year, especially trying things like table / chart extraction. I've found the rules based extraction has always been lacking. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. Using a vision model just make sense!<p>In general, I'd categorize this solution as slow, expensive, and non deterministic. But 6 months ago it was impossible. And 6 months from now it'll be fast, cheap, and probably more reliable!
Show HN: Zerox – Document OCR with GPT-mini
This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document".<p>But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost.<p>I've tested almost every variant of document OCR over the past year, especially trying things like table / chart extraction. I've found the rules based extraction has always been lacking. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. Using a vision model just make sense!<p>In general, I'd categorize this solution as slow, expensive, and non deterministic. But 6 months ago it was impossible. And 6 months from now it'll be fast, cheap, and probably more reliable!
Show HN: Zerox – Document OCR with GPT-mini
This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document".<p>But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost.<p>I've tested almost every variant of document OCR over the past year, especially trying things like table / chart extraction. I've found the rules based extraction has always been lacking. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. Using a vision model just make sense!<p>In general, I'd categorize this solution as slow, expensive, and non deterministic. But 6 months ago it was impossible. And 6 months from now it'll be fast, cheap, and probably more reliable!
Show HN: Shade/Bs – Modern Web UIs Without Node.js
Show HN: OpenDataCapture an electronic data capture platform for data collection
Hi HN,<p>We're the Douglas Neuroinformatics Platform[1], and we've been working on Open Data Capture, a web-based electronic data capture (EDC) platform for continuous clinical and research data collection. You can use it to administer instruments (like forms and interactive tasks) either in-person or remotely.<p>The platform is based on a fundamentally longitudinal data model. Unlike other EDC platforms, which are centered around the concept of a study with rigid timepoints, Open Data Capture is designed for continuous data capture. Data is associated with a given session, which includes metadata such as date, time, and mode (i.e., in-person or remote).<p>We've designed the system around the core restriction that many hospital institutions demand that data remain on-premise, while clinician-researchers often want to evaluate clients outside the institution with research questions. This has resulted in our innovative gateway concept, where assigned remote assessments are pushed onto an internet accessible service, and responses are encrypted in-place with HPKE[2] until the backend pulls them into the backend database. This makes the deployment firewall-friendly provided you can launch a minimal VPS or VM host somewhere globally accessible.<p>We're also a big fan of making things easy to deploy, so we supply a docker-compose stack which can bring up a demo instance easily to run locally.<p>The platform is free, open source, and written in TypeScript, with a NoSQL database underneath. Users can write instruments in TypeScript using a type-safe declarative form system (with native i18n support built in) or wrap and integrate completely arbitrary interactive tasks written in JavaScript (with optional support for TypeScript and JSX). Under the hood, this is based on dynamic imports and native ESM. There’s a browser-based IDE (the Instrument Playground) with live reloading and full Intellisense where you can try creating your own instruments.<p>We have a local deployment going live at our institution and appropriately-licensed (free) instruments we're deploying here will be integrated directly into the codebase.<p>Our future plans include expanding our instrument types to allow for binary data storage with an s3-like backend, and with abstractions for data types, like actigraphy, and MRI.<p>Check it out on GitHub[3], try the Instrument Playground[4], or see the Live Demo[5].<p>Would love to hear everybody’s thoughts!<p>Links:<p>[1] <a href="https://github.com/DouglasNeuroInformatics/">https://github.com/DouglasNeuroInformatics/</a><p>@gdevenyi @joshunrau<p>[2] <a href="https://datatracker.ietf.org/doc/rfc9180/" rel="nofollow">https://datatracker.ietf.org/doc/rfc9180/</a><p>[3] <a href="https://github.com/DouglasNeuroInformatics/OpenDataCapture">https://github.com/DouglasNeuroInformatics/OpenDataCapture</a><p>[4] <a href="https://playground.opendatacapture.org" rel="nofollow">https://playground.opendatacapture.org</a><p>[5] <a href="https://demo.opendatacapture.org" rel="nofollow">https://demo.opendatacapture.org</a>
Show HN: OpenDataCapture an electronic data capture platform for data collection
Hi HN,<p>We're the Douglas Neuroinformatics Platform[1], and we've been working on Open Data Capture, a web-based electronic data capture (EDC) platform for continuous clinical and research data collection. You can use it to administer instruments (like forms and interactive tasks) either in-person or remotely.<p>The platform is based on a fundamentally longitudinal data model. Unlike other EDC platforms, which are centered around the concept of a study with rigid timepoints, Open Data Capture is designed for continuous data capture. Data is associated with a given session, which includes metadata such as date, time, and mode (i.e., in-person or remote).<p>We've designed the system around the core restriction that many hospital institutions demand that data remain on-premise, while clinician-researchers often want to evaluate clients outside the institution with research questions. This has resulted in our innovative gateway concept, where assigned remote assessments are pushed onto an internet accessible service, and responses are encrypted in-place with HPKE[2] until the backend pulls them into the backend database. This makes the deployment firewall-friendly provided you can launch a minimal VPS or VM host somewhere globally accessible.<p>We're also a big fan of making things easy to deploy, so we supply a docker-compose stack which can bring up a demo instance easily to run locally.<p>The platform is free, open source, and written in TypeScript, with a NoSQL database underneath. Users can write instruments in TypeScript using a type-safe declarative form system (with native i18n support built in) or wrap and integrate completely arbitrary interactive tasks written in JavaScript (with optional support for TypeScript and JSX). Under the hood, this is based on dynamic imports and native ESM. There’s a browser-based IDE (the Instrument Playground) with live reloading and full Intellisense where you can try creating your own instruments.<p>We have a local deployment going live at our institution and appropriately-licensed (free) instruments we're deploying here will be integrated directly into the codebase.<p>Our future plans include expanding our instrument types to allow for binary data storage with an s3-like backend, and with abstractions for data types, like actigraphy, and MRI.<p>Check it out on GitHub[3], try the Instrument Playground[4], or see the Live Demo[5].<p>Would love to hear everybody’s thoughts!<p>Links:<p>[1] <a href="https://github.com/DouglasNeuroInformatics/">https://github.com/DouglasNeuroInformatics/</a><p>@gdevenyi @joshunrau<p>[2] <a href="https://datatracker.ietf.org/doc/rfc9180/" rel="nofollow">https://datatracker.ietf.org/doc/rfc9180/</a><p>[3] <a href="https://github.com/DouglasNeuroInformatics/OpenDataCapture">https://github.com/DouglasNeuroInformatics/OpenDataCapture</a><p>[4] <a href="https://playground.opendatacapture.org" rel="nofollow">https://playground.opendatacapture.org</a><p>[5] <a href="https://demo.opendatacapture.org" rel="nofollow">https://demo.opendatacapture.org</a>
Show HN: TinkerBird – A Chrome-native vector database
Show HN: TinkerBird – A Chrome-native vector database
Show HN: A source-available billing system I've spent 18 months building
Show HN: A source-available billing system I've spent 18 months building
Show HN: A source-available billing system I've spent 18 months building
Show HN: A source-available billing system I've spent 18 months building
Show HN: I made helpers for Web Components
Show HN: I made helpers for Web Components
Show HN: JSON-Threat-Protection Rust High-Performance Crate
Show HN: JSON-Threat-Protection Rust High-Performance Crate
Show HN: Ristretto, an OSS sandboxed code playground/notebook written in itself
Show HN: Ristretto, an OSS sandboxed code playground/notebook written in itself
Show HN: I made a tool to HTTPS your localhost
It's been 4 month since I work on Lokal full-time, I finally feel confident to share it publicly on YCombinator.<p>Lokal is a software for Tunneling, Local Development, and HTTP Debugging, It's support HTTP, TCP and UDP Tunnel.<p>The different with other tunneling solution is that Lokal has mDNS support with https enabled by default, while other might be only offer public-facing tunnel service.<p>On the latest version 0.3.0, Lokal support Self-hosted Lokal Tunnel Server, which allow you to use your own domain and your own VPS, allow you to have Premium but self-hosted Tunneling Solution.<p>Download -> Lokal.so/download
Self-hosting Tutorial -> <a href="https://docs.lokal.so/lokal-server/installation/" rel="nofollow">https://docs.lokal.so/lokal-server/installation/</a>
Show HN: I made a tool to HTTPS your localhost
It's been 4 month since I work on Lokal full-time, I finally feel confident to share it publicly on YCombinator.<p>Lokal is a software for Tunneling, Local Development, and HTTP Debugging, It's support HTTP, TCP and UDP Tunnel.<p>The different with other tunneling solution is that Lokal has mDNS support with https enabled by default, while other might be only offer public-facing tunnel service.<p>On the latest version 0.3.0, Lokal support Self-hosted Lokal Tunnel Server, which allow you to use your own domain and your own VPS, allow you to have Premium but self-hosted Tunneling Solution.<p>Download -> Lokal.so/download
Self-hosting Tutorial -> <a href="https://docs.lokal.so/lokal-server/installation/" rel="nofollow">https://docs.lokal.so/lokal-server/installation/</a>