When LLMs Try to Draw: A Weird Zoo of AI Line Art

Or: Vibe-coding my way through Cursor, I got GPT, Gemini, and Claude to sketch animals with JSON polylines — no diffusion, no code, just vibes.

Aug 25, 2025

A few days ago, I stumbled across an intriguing LessWrong post where someone used different LLMs to describe Earth's geography in the simplest way possible - just by asking it to tag particular latitude and longitude into land from water. And the results were fascinating.

This got me thinking: if language models can conceptualize geography, what happens when you ask them to create visual art? Not generate images through diffusion models, but actually construct line drawings using pure coordinate geometry. How well can an LLM, not explicitly trained, on this task can do this?

Thus began my weekend project: getting 9 different language models to draw animals using nothing but JSON polylines, then rendering those drawings into actual images. The twist? I wouldn't write a single line of code myself.

The Method: "Vibe Coding" with Cursor

Instead of painstakingly coding a pipeline, I “vibe coded” it in Cursor. My process was beautifully simple:

Open Cursor and describe my vision and let Cursor write the code. Nine models to start with (the list can be customized) where each model received identical prompts requesting JSON output with normalized coordinates (0-1 space) and named polyline strokes. The five animals I picked were: African elephant, giraffe, domestic dog, domestic cat, and the bald eagle.
More often than not Cursor is able to detect when things went wrong and fixed them itself, which is amazingly helpful! It handled: API integration quirks (GPT-5 models don't support temperature=0!?), error handling for malformed responses, batch processing logic, and even helpful emoji-filled progress indicators. The code quality is genuinely impressive, better than what many humans would write. Occasionally I have to explicitly describe a problem to Cursor to get it fixed, or instruct it with particular tweaks.
Repeat until everything worked

The result? An evaluation pipeline consisting of:

A JSON generation orchestrator that prompts different models and collect their JSON-based “line drawings”
A matplotlib-based renderer that converts JSON polylines into PNG images
A config loader for API key management and other configuration

Here’s the lineup of nine contestants: gpt-4o, gpt-5-nano, gpt-5-mini, gpt-5, gemini-1.5-pro, gemini-2.5-flash, gemini-2.5-pro, claude-sonnet-3.7, claude-sonnet-4

The Results: A Weird Zoo of Wobbly Outlines

Within the same family of models, bigger variants usually did better than the smaller ones. Smaller models often devolved into geometric blobs. As you see below GPT-4o did absolute the worst. Gemini-2.5-flash produced what I think are the best drawings. Claude Sonnet 4’s dog looked like an alien donkey with antennae. GPT-5-nano went for something rodent-like. Definitely a weird kind of zoo. Gemini-2.5-pro even added hexagonal “spots” - an imaginative touch!
Broadly speaking, Google’s Gemini generated better drawings than OpenAI’s GPT models. Anthropic’s Claude series had issues generating valid JSONs. A different prompt with more detailed instructions helped Claude prduce more workable drawings, but I haven’t included them to allow for apples to apples comparisons.
Latency and API woes. All GPT-5 variants were quite slow: mini and nano averaged just over 67 seconds per prompt, GPT-5 averaged over 137 seconds per prompt! API access felt throttled compared to simply pasting the same prompt into the chat interface, which responded faster. GPT-4o was the fastest at 5 seconds per prompt and performed worst! Gemini 2.5 models took under 45 seconds on average, with 1.5 Pro taking an average of 11 seconds.

Closing Thoughts

I started this experiment as a fun curiosity, vibe-coding my way through Cursor. Language models do seem to have a kind of mental map of animals. That map, however, is fuzzy, distorted, and full of amusing failures: diagonal rectangles for cats, alien-donkeys for dogs, and cartoonish polygons for bald eagles.

I did try providing more detailed instructions (e.g., “make sure the elephant has tusks and four legs”), but that didn’t really help much. Perhaps there exist better prompts that can help here!

What do you think - which of these sketches is your favorite? Any ideas worth pursuing in this general area?

ModelAnalysis.ai

Discussion about this post

Ready for more?