The developers built an HTML5 reading library from scratch using LLM, and it passed all tests 100%.

- December 16, 2025

Emil Stenström, Head of AI Product at Odevo, shared his experience building JustHTML, an HTML5 parser written entirely in Python without any external libraries. A key tool he used was LLM to assist in programming.

The goal of development was to create a library that passed the html5lib-tests. Initially, Stenström used GitHub Copilot in Agent Mode, developing from scratch until it passed 30% of the tests. However, he hit a wall. He then used Cluade Sonnet 3.7, gradually making improvements until it reached 100% pass, but the code was very slow. During this process, he optimized it by porting the tokenizer to Rust, which improved performance, but Stenström had no understanding of Rust code.

He then looked at the html5ever project, written in Rust, and found it to be highly efficient. He ported the structure from html5ever and developed the project until it passed all the tests. Finally, he used Gemini 3 Pro to optimize the Python code directly, resulting in significantly faster code. Next, tests are run to remove unused code, and a fuzzer is run to generate 3 million sample code files to check for crash potentials. At this point, the library is ready for use.

Stenström summarized five lessons learned from developing the JustHTML library and coding with AI:

1. Give the AI measurable goals: Don't just tell it to improve the code, but specify the tests you want to run.
2. Always check the code to learn from the code the AI writes.
3. Encourage the AI to rethink. Sometimes, if the code looks bad, you can simply tell it you don't like the code it wrote to encourage it to suggest new approaches.
4. Always store code in version control for rollback.
5. Allow the AI to make mistakes so it can learn from them. Don't fix every single error.

The JustHTML project took almost a year to develop (considering the different AI models used; Gemini 3 Pro was recently released). The total code length is 3,000 lines. Stenström confirms that this speed of development would be impossible without the AI agent, and that a significant amount of code checking and decision-making was still required.

💬 AI Content Assistant

Ask me anything about this article. No data is stored for your question.

Search This Blog

News World That's Worth

The developers built an HTML5 reading library from scratch using LLM, and it passed all tests 100%.

💬 AI Content Assistant

Comments

Post a Comment

Popular posts from this blog

[Rumor] Meta Secret Model Avocaco Slips to May Following Underwhelming Benchmarks.

Adobe Settles FTC Lawsuit for $150M Over "Difficult-to-Cancel" Subscriptions.

Smartphone Stalemate Apple and Samsung Tie for World's Top Producer in 2025.

AWS and Cerebras Launch Inference Disaggregation to Slash AI Latency on Bedrock.

NVIDIA Pulls Back Jensen Huang Abandons $100B OpenAI Investment Plan.

New iPhone 17 Return Policy Targets Parts-Swapping Fraud.

No More Skipping YouTube Rolls Out 30-Second Forced Ads for Smart TV Viewers Globally.