PACK ⬕ Cloud Control - Jason Fletcher / VJ Loop Artist

PACK ⬕ Cloud Control

- This pack contains 120 VJ loops (53 GB))

We've all seen strange shapes in a cloud filled sky. But what if the sky was actually daydreaming? Inspired by Paul Trillo’s short film Etherea, I invited Palpa to collaborate with me on this unique challenge and we jumped right in.

We started off by creating a bunch of motion reference videos that would be used to guide ComfyUI in bringing the clouds to life. So I scoured Envato for video clips on each of the themes that I wanted to visualize, such as ballerinas dancing, birds flying, dolphins swimming, horses galloping, flowers blooming, and such. But often these video clips feature a bunch of other characters that I didn't want included. So I used the Mask Prompter 3 plugin to automatically rotoscope the footage just based on a text prompt. Under the hood this plugin uses the Segment Anything 3 model, which is very powerful and is able to do the cutouts with very little cleanup necessary. From there I animated each of the characters in the ways I was imagining. Each motion reference is the combination of between 5-12 different cutout video clips. From there it was useful to have the motion reference videos actually mimic the appearance of a real sky, with a solid blue background and the characters being white gradients. I used the Tint, Tritone, or Colorama FX to precisely gradient map each layer in the comp. Then I looped the ending to the beginning and rendered out these motion references at 512x288 at 12fps.

Palpa worked in ComfyUI to carefully engineer a vid2vid workflow which combined a text prompt, image reference, custom LoRA, control nets, and motion reference so as to guide the AnimateDiff model, along with the Dreamshaper model as the generator backbone (which is a fine-tuned version of the Stable Diffusion 1.5 model). It took a series of experiments to nail down the ideal slider values so that it felt like the clouds rode the line of following the motion reference and yet still appearing as a typical cloud formation. What I love about AnimateDiff and how Palpa uses it, is how it visualizes stuff I can see in my head but cannot animate on my own. So satisfying! But since VRAM is a limitation for the max duration of a video clip that can be rendered out, we aimed for motion reference videos which were no longer than 1 minute in total. Even still, each one of these "Base" clips took several hours to render for just this first step of the pipeline when rendered out at 1024x576 at 12fps. A few of these "Base" clips were beautiful in themselves and made it into the VJ pack after being uprezzed in Topaz Video AI. But the other "Base" clips contained some artifacts and needed to be refined in the next step of the pipeline.

Although after doing a few tests we realized that the Dreamshaper model likely was trained on realistic clouds, but wasn't trained on abstract cloud shapes. So we collected various images from Pixabay.com of things that we wanted to make into clouds and then used Nano Banana Pro to have it reimagine those images into what we wanted. Now we had a small dataset of 30 images of abstract clouds in exactly the style we wanted and so we used this to train a custom LoRA for SD 1.5. Typically we've relied on the CivitAI LoRA trainer but the tool was offline at the moment. So instead I did some research and used the OneTrainer codebase to train a LoRA locally on my tower. Then we loaded the custom LoRA into the "Base" Comfy workflow and it solved the issues we were having with the "Base" renders.

After this step of the pipeline, we loaded the "Base" video back into ComfyUI and ran it again using a different vid2vid "Rebase" workflow that aimed to refine the visuals and slightly uprez it to 1200x680 at 12fps. This workflow is a customized version of the HighRes-Fix setup which Palpa has engineered and is an elegant solution that uses a combination of the DepthAnythingV2 control net and canny control net to keep the "Rebase" render very close to the original "Base" video clip, along with the help of a text prompt, image reference, and custom LoRA so that new details could be imagined. But it is only through the many prior iterations of experimentation that Palpa has managed to create this masterpiece. Major props to Palpa!

Then we took the "Rebase" and a few of the "Base" video clips into Topaz Video and uprezzed it to 3840x2160 (using the Gaia model) and did a x8 slowmo frame interpolation (using the Apollo model). Palpa has figured out that 12fps is the perfect minimum frame rate that can be interpolated and still looks wonderfully smooth for most use-cases. Then I rendered out everything to frame sequences and imported the footage into After Effects where I did some minor color correction.

I had a late night idea which was to use the same motion references as before but this time generate rainbows and prisms. So we created another LoRA focusing on abstract rainbows, used the "Base" workflow to render out video clips, process the footage in Topaz Video, and then imported the footage into After Effects. These renders are wonderfully bizarre and exactly what I had in mind. Although the rainbow footage was getting blown out when I layered it on top of the cloud renders and applied the screen blend mode. So I used the rainbow footage as a luma track matte against the clouds so that it could perfectly cut out the area behind the rainbows and therefore allow the rainbow colors to shine through. In my opinion this technique is akin to sidechaining within a DAW app.

With that issue fixed, I wondered if I could add more rainbows into the rainbow footage. So I did some experiments where I pre-comped the rainbow footage and applied a gradient ramp onto a solid so that I could have a mapping space, then applied the Colorama FX with some rainbow gradients, keyframed the Phase Shift attribute, and set the layer to the lighten blend mode, which allowed for the rainbow colors in the footage to be augmented by the animated rainbows of the Colorama FX. Meta rainbows forever.

And of course it's time to do some slitscan processing. There's this wonderfully wild warping that happens when the motion vectors in the footage match up with the slitscan vectors and it's kinda like a distortion whip. Normally it's tricky to make happen on purpose but with this footage the clouds move in all directions and so it happens frequently on its own. Figuring out which slitscan variations worked nicely for a given clip has always been a slow process to preview and curate. But I realized that I could render out some low resolution proxy clips at 512x288, pick the ideal slitscan settings per clip, and then handoff those settings into a 1920x1080 comp. This saved a bunch of time. The 1920x1080 comp typically works fine when the input footage features lots of black background as I've often done in the past, but this footage contained complex gradients and therefore significantly increased the amount of image data for the slitscan processing to ingest. And so the render times had ballooned to 12 hours per scene. Over the years I’ve optimized the needs for slitscan processing in After Effects which includes that the input footage is a JPG sequence (240 fps), stored on an NVME drive, AE Multi-Frame Rendering enabled, 16 bpc mode is required to avoid time aliasing, and Mercury Playback Engine disabled. And yet I had selected 30 scenes that I wanted to slitscan, which is just too heavy of a render job for my tower. So I was just about to do some curation and slim down the amount of scenes, when I had a realization...

Inspired by the earlier steps in this project, I realized that I could downrez the input footage to 1200x680 at 60fps and then the slitscan render would only take 12 minutes per scene. I did the math and it actually made sense due to substantially fewer pixels to process per frame and it made a huge difference. But it still required that I use Topaz to preprocess the input footage to 1200x680 at 240fps, so as to avoid the time aliasing artifacts that is common in slitscan renders. After I interpolated the 24 pieces of footage in Topaz Video, I ended up with 846,760 interpolated frames... The insane things I do for slitscan! A huge bonus that I didn't realize until later on is that due to the reduced 1200x680 resolution, this would give the slitscan effect more time resolution to work with and therefore squeeze onto the 1200x680 canvas, and so this effectively cleaned up any remnants of time aliasing artifacts that I've seen in prior VJ packs. So it's not an exaggeration to say these are my finest slitscan renders yet and is quite satisfying to further refine my slitscan technique. From here I took the slitscan renders into Topaz Video and uprezzed it from 1200x680 at 60fps to 3840x2160 at 60fps. It really blows my mind how much can be pulled off with all these various stages of interpolation. Feet on the ground, head in the clouds, brain in space, spirit far away.

Released March 2026