The best Hacker News stories from All from the past day

Go back

Latest posts:

Quiet – Encrypted P2P team chat with no servers, just Tor

We built the fastest CI and it failed

Apple iPhone 15 Pro and iPhone 15 Pro Max

Death by a Thousand Microservices

Fine-tune your own Llama 2 to replace GPT-3.5/4

There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at <a href="https://news.ycombinator.com/item?id=37090632">https://news.ycombinator.com/item?id=37090632</a>). I've been playing around with fine-tuning models for a couple of years, and wanted to share some insights and practical code. I’ve condensed what I’ve learned into a small set of notebooks at <a href="https://github.com/OpenPipe/OpenPipe/tree/main/examples/classify-recipes">https://github.com/OpenPipe/OpenPipe/tree/main/examples/clas...</a>, covering labeling data, fine-tuning, running efficient inference, and evaluating costs/performance. The 7B model we train here matches GPT-4’s labels 95% of the time on the test set, and for the 5% of cases where they disagree it’s often because the correct answer is genuinely ambiguous.<p>What is fine-tuning? You can think of it as a more-powerful form of prompting, where instead of writing your instructions in text you actually encode them in the weights of the model itself. You do this by training an existing model on example input/output pairs that demonstrate the task you want your fine-tuned model to learn. Fine-tuning can work with as few as 50 examples but I usually try to get 1000+ if possible.<p>Prompting still has some big advantages over fine-tuning. It's way easier/faster to iterate on your instructions than label data and re-train a model. And operationally it's easier to deploy one big model and just adjust its behavior as necessary vs deploying many small fine-tuned models that will likely each get lower utilization.<p>Fine-tuning has one huge advantage though: it is far more effective at guiding a model's behavior than prompting, so you can often get away with a <i>much</i> smaller model. That gets you faster responses and lower inference costs. A fine-tuned Llama 7B model is 50x cheaper than GPT-3.5 on a per-token basis, and for many use cases can produce results that are as good or better!<p>For example, classifying the 2M recipes at <a href="https://huggingface.co/datasets/corbt/all-recipes" rel="nofollow noreferrer">https://huggingface.co/datasets/corbt/all-recipes</a> with GPT-4 would cost $23k. Even with GPT-3.5 it would cost over $1k. The model we fine-tuned performs similarly to GPT-4 and costs just $19 to run over the entire dataset.<p>Disclaimer: My brother David and I are working on an open-source product called OpenPipe (<a href="https://github.com/openpipe/openpipe">https://github.com/openpipe/openpipe</a>) to help engineers adopt fine-tuning as simply as possible. But none of the information above depends on our startup. The current post is just about sharing information that we’ve learned about fine-tuning. I hope it’s useful!

Fine-tune your own Llama 2 to replace GPT-3.5/4

There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at <a href="https://news.ycombinator.com/item?id=37090632">https://news.ycombinator.com/item?id=37090632</a>). I've been playing around with fine-tuning models for a couple of years, and wanted to share some insights and practical code. I’ve condensed what I’ve learned into a small set of notebooks at <a href="https://github.com/OpenPipe/OpenPipe/tree/main/examples/classify-recipes">https://github.com/OpenPipe/OpenPipe/tree/main/examples/clas...</a>, covering labeling data, fine-tuning, running efficient inference, and evaluating costs/performance. The 7B model we train here matches GPT-4’s labels 95% of the time on the test set, and for the 5% of cases where they disagree it’s often because the correct answer is genuinely ambiguous.<p>What is fine-tuning? You can think of it as a more-powerful form of prompting, where instead of writing your instructions in text you actually encode them in the weights of the model itself. You do this by training an existing model on example input/output pairs that demonstrate the task you want your fine-tuned model to learn. Fine-tuning can work with as few as 50 examples but I usually try to get 1000+ if possible.<p>Prompting still has some big advantages over fine-tuning. It's way easier/faster to iterate on your instructions than label data and re-train a model. And operationally it's easier to deploy one big model and just adjust its behavior as necessary vs deploying many small fine-tuned models that will likely each get lower utilization.<p>Fine-tuning has one huge advantage though: it is far more effective at guiding a model's behavior than prompting, so you can often get away with a <i>much</i> smaller model. That gets you faster responses and lower inference costs. A fine-tuned Llama 7B model is 50x cheaper than GPT-3.5 on a per-token basis, and for many use cases can produce results that are as good or better!<p>For example, classifying the 2M recipes at <a href="https://huggingface.co/datasets/corbt/all-recipes" rel="nofollow noreferrer">https://huggingface.co/datasets/corbt/all-recipes</a> with GPT-4 would cost $23k. Even with GPT-3.5 it would cost over $1k. The model we fine-tuned performs similarly to GPT-4 and costs just $19 to run over the entire dataset.<p>Disclaimer: My brother David and I are working on an open-source product called OpenPipe (<a href="https://github.com/openpipe/openpipe">https://github.com/openpipe/openpipe</a>) to help engineers adopt fine-tuning as simply as possible. But none of the information above depends on our startup. The current post is just about sharing information that we’ve learned about fine-tuning. I hope it’s useful!

iPhone 15 and iPhone 15 Plus

iPhone 15 and iPhone 15 Plus

A group of open source Android apps without ads and unnecessary permissions

Chronic fatigue syndrome may have a post-viral infection origin

X sues Calif. to avoid revealing how it makes “controversial” content decisions

The Project Gutenberg Open Audiobook Collection

The Project Gutenberg Open Audiobook Collection

In Germany, 27 are in 'preventive detention' b/c they might do climate protests

To make dishwashers great again? (2020)

Google Chrome just rolled out a new way to track you and serve ads

UK air traffic control meltdown

UK air traffic control meltdown

Microsoft has not stopped forcing Edge on Windows 11 users

The Decline of Usability (2020)

< 1 2 3 ... 318 319 320 321 322 ... 822 823 824 >