The best Hacker News stories from All from the past day
Latest posts:
Show HN: Factorio Learning Environment – Agents Build Factories
I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE).<p>FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints.<p>A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity.<p>The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning.<p>Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms.<p>Agents interact with FLE through a REPL pattern:
1. They observe the world (seeing the output of their last action)
2. Generate Python code to perform their next action
3. Receive detailed feedback (including exceptions and stdout)<p>We provide two main evaluation settings:
- Lab-play: 24 structured tasks with fixed resources
- Open-play: An unbounded task of building the largest possible factory on a procedurally generated map<p>We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin).<p>The code is available at <a href="https://github.com/JackHopkins/factorio-learning-environment" rel="nofollow">https://github.com/JackHopkins/factorio-learning-environment</a>.<p>You'll need:
- Factorio (version 1.1.110)
- Docker
- Python 3.10+<p>The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents.<p>We would love to hear your thoughts and see what others can do with this framework!
Show HN: Factorio Learning Environment – Agents Build Factories
I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE).<p>FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints.<p>A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity.<p>The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning.<p>Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms.<p>Agents interact with FLE through a REPL pattern:
1. They observe the world (seeing the output of their last action)
2. Generate Python code to perform their next action
3. Receive detailed feedback (including exceptions and stdout)<p>We provide two main evaluation settings:
- Lab-play: 24 structured tasks with fixed resources
- Open-play: An unbounded task of building the largest possible factory on a procedurally generated map<p>We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin).<p>The code is available at <a href="https://github.com/JackHopkins/factorio-learning-environment" rel="nofollow">https://github.com/JackHopkins/factorio-learning-environment</a>.<p>You'll need:
- Factorio (version 1.1.110)
- Docker
- Python 3.10+<p>The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents.<p>We would love to hear your thoughts and see what others can do with this framework!
Show HN: Seven39, a social media app that is only open for 3 hours every evening
I built this site as a quick test if a time boxed social media experience feels better than an endless one. So far I've just been using it with friends and it feels nice, but it seems like it is time to bring it to a larger audience.<p>Let me know what you think! It is just based on EST for now, sorry.
Happy 20th birthday, Y Combinator
Happy 20th birthday, Y Combinator
A 10x Faster TypeScript
A 10x Faster TypeScript
Canon EF and RF Lenses – All Autofocus Motors
Hi there! I have written an e-book about all autofocus motor types used in Canon EF & RF lenses from the past 40 years.
Canon EF and RF Lenses – All Autofocus Motors
Hi there! I have written an e-book about all autofocus motor types used in Canon EF & RF lenses from the past 40 years.
Music labels will regret coming for the Internet Archive, sound historian says
Apple Exclaves
What made the Irish famine so deadly
What made the Irish famine so deadly
Ecosia is teaming up with Qwant to build a European search index
Performance of the Python 3.14 tail-call interpreter
Performance of the Python 3.14 tail-call interpreter
Tesla created secret team to suppress driving range complaints (2023)
Go European: Discover European products and services
uBlock Origin is no longer available on the Chrome Store
Show HN: I built an app to get daily wisdom from Mr. Worldwide
Pitbull is coming to Stockholm. As a part of that prep, I built an app with glassmorphism style counting down to the big day