The best Hacker News stories from All from the past day
Latest posts:
When MFA isn't MFA, or how we got phished
LÖVE: a framework to make 2D games in Lua
Meduza co-founder's phone infected with Pegasus
Meduza co-founder's phone infected with Pegasus
Bug in macOS 14 Sonoma prevents our app from working
Bug in macOS 14 Sonoma prevents our app from working
Credit card debt collection
Any sufficiently advanced uninstaller is indistinguishable from malware
Chrome: Heap buffer overflow in WebP
New world record with an electric racing car: From 0 to 100 in 0.956 seconds
Quiet – Encrypted P2P team chat with no servers, just Tor
We built the fastest CI and it failed
Apple iPhone 15 Pro and iPhone 15 Pro Max
Death by a Thousand Microservices
Fine-tune your own Llama 2 to replace GPT-3.5/4
There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at <a href="https://news.ycombinator.com/item?id=37090632">https://news.ycombinator.com/item?id=37090632</a>). I've been playing around with fine-tuning models for a couple of years, and wanted to share some insights and practical code. I’ve condensed what I’ve learned into a small set of notebooks at <a href="https://github.com/OpenPipe/OpenPipe/tree/main/examples/classify-recipes">https://github.com/OpenPipe/OpenPipe/tree/main/examples/clas...</a>, covering labeling data, fine-tuning, running efficient inference, and evaluating costs/performance. The 7B model we train here matches GPT-4’s labels 95% of the time on the test set, and for the 5% of cases where they disagree it’s often because the correct answer is genuinely ambiguous.<p>What is fine-tuning? You can think of it as a more-powerful form of prompting, where instead of writing your instructions in text you actually encode them in the weights of the model itself. You do this by training an existing model on example input/output pairs that demonstrate the task you want your fine-tuned model to learn. Fine-tuning can work with as few as 50 examples but I usually try to get 1000+ if possible.<p>Prompting still has some big advantages over fine-tuning. It's way easier/faster to iterate on your instructions than label data and re-train a model. And operationally it's easier to deploy one big model and just adjust its behavior as necessary vs deploying many small fine-tuned models that will likely each get lower utilization.<p>Fine-tuning has one huge advantage though: it is far more effective at guiding a model's behavior than prompting, so you can often get away with a <i>much</i> smaller model. That gets you faster responses and lower inference costs. A fine-tuned Llama 7B model is 50x cheaper than GPT-3.5 on a per-token basis, and for many use cases can produce results that are as good or better!<p>For example, classifying the 2M recipes at <a href="https://huggingface.co/datasets/corbt/all-recipes" rel="nofollow noreferrer">https://huggingface.co/datasets/corbt/all-recipes</a> with GPT-4 would cost $23k. Even with GPT-3.5 it would cost over $1k. The model we fine-tuned performs similarly to GPT-4 and costs just $19 to run over the entire dataset.<p>Disclaimer: My brother David and I are working on an open-source product called OpenPipe (<a href="https://github.com/openpipe/openpipe">https://github.com/openpipe/openpipe</a>) to help engineers adopt fine-tuning as simply as possible. But none of the information above depends on our startup. The current post is just about sharing information that we’ve learned about fine-tuning. I hope it’s useful!
Fine-tune your own Llama 2 to replace GPT-3.5/4
There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at <a href="https://news.ycombinator.com/item?id=37090632">https://news.ycombinator.com/item?id=37090632</a>). I've been playing around with fine-tuning models for a couple of years, and wanted to share some insights and practical code. I’ve condensed what I’ve learned into a small set of notebooks at <a href="https://github.com/OpenPipe/OpenPipe/tree/main/examples/classify-recipes">https://github.com/OpenPipe/OpenPipe/tree/main/examples/clas...</a>, covering labeling data, fine-tuning, running efficient inference, and evaluating costs/performance. The 7B model we train here matches GPT-4’s labels 95% of the time on the test set, and for the 5% of cases where they disagree it’s often because the correct answer is genuinely ambiguous.<p>What is fine-tuning? You can think of it as a more-powerful form of prompting, where instead of writing your instructions in text you actually encode them in the weights of the model itself. You do this by training an existing model on example input/output pairs that demonstrate the task you want your fine-tuned model to learn. Fine-tuning can work with as few as 50 examples but I usually try to get 1000+ if possible.<p>Prompting still has some big advantages over fine-tuning. It's way easier/faster to iterate on your instructions than label data and re-train a model. And operationally it's easier to deploy one big model and just adjust its behavior as necessary vs deploying many small fine-tuned models that will likely each get lower utilization.<p>Fine-tuning has one huge advantage though: it is far more effective at guiding a model's behavior than prompting, so you can often get away with a <i>much</i> smaller model. That gets you faster responses and lower inference costs. A fine-tuned Llama 7B model is 50x cheaper than GPT-3.5 on a per-token basis, and for many use cases can produce results that are as good or better!<p>For example, classifying the 2M recipes at <a href="https://huggingface.co/datasets/corbt/all-recipes" rel="nofollow noreferrer">https://huggingface.co/datasets/corbt/all-recipes</a> with GPT-4 would cost $23k. Even with GPT-3.5 it would cost over $1k. The model we fine-tuned performs similarly to GPT-4 and costs just $19 to run over the entire dataset.<p>Disclaimer: My brother David and I are working on an open-source product called OpenPipe (<a href="https://github.com/openpipe/openpipe">https://github.com/openpipe/openpipe</a>) to help engineers adopt fine-tuning as simply as possible. But none of the information above depends on our startup. The current post is just about sharing information that we’ve learned about fine-tuning. I hope it’s useful!
iPhone 15 and iPhone 15 Plus
iPhone 15 and iPhone 15 Plus
A group of open source Android apps without ads and unnecessary permissions
Chronic fatigue syndrome may have a post-viral infection origin