I’ve been a GPU poor for a long time — my home workstation runs a 16GB 5060 Ti, and that’s been the ceiling on what I can tinker with locally.

That ceiling just moved. I bought — well, financed — a DGX Spark.

Okay, fine: technically it’s a Lenovo ThinkStation PGX, one of the OEM takes on NVIDIA’s DGX Spark reference design. But “DGX Spark” is the name everyone actually knows, so that’s what I’m calling it. Either way it’s the same silicon underneath: a GB10 Grace Blackwell machine, SoC Blackwell (sm121) with a big pool of unified memory. It’s the kind of box that’s about to show up everywhere now that the same chip is heading into laptops as the RTX Spark.

So naturally, the first thing I did was put it to work.

I’ve been bringing NVFP4 KV cache — native 4-bit KV on consumer and SoC Blackwell — to the local-inference stacks, validated across Gemma 3, Gemma 4, and DiffusionGemma. The KV cache is the single most valuable thing to shrink on a bandwidth-bound machine, and it turns out that in most cases you can take it to 4 bits without the quality falling over.

Two threads on it have done unexpectedly well:

A longer write-up is coming. For now: the GPU poor days are, at least temporarily, behind me.