Jason Fletcher
Motion Designer
PACK ⬕ Robotics Foundry
- This pack contains 83 VJ loops (71 GB)

The AI software is here and maturing quickly. And I think having it control all sorts of robot hardware is just around the corner. Things gonna get weirder.

I finally made the jump to using the A1111 Stable Diffusion web UI and it renders images so much faster thanks to the xFormers library being compatible with my GPU. Also there are tons of unique extensions that people have shared and I have much spelunking to do. I figured out how to run x2 instances of A1111 so that both of my GPU's can be rendering different jobs, which is hugely beneficial.

For the last few months I've been running backburner experiments inputting various videos into SD to see how it extrapolates upon a frame sequence. The "Stop Motion" scenes are the fruit of these experiments. But the main trouble I've had is the exported frames are very jittery, which I'm typically not a fan of. This is due to the fact that the input video frames are used as the noise source for the diffusion process and hence I have to set the Denoising Strength between 0.6 and 0.8 so that SD has enough room to extrapolate on top of the input video frames. Although SD has no temporal awareness and it assumes you're going to export a solo frame, not an animated frame sequence, and so all of this is a hack. But I found that if I chose a subject matter such as robots then I could embrace the stop motion animation vibe and match the feeling of incessant tech upgrades that we are currently living in.

I tried all sorts of different videos but ultimately my StyleGAN2 videos with a black background were by far the most successful for inputting into SD. I believe this is because my SG2 videos typically feature slowly morphing content. Plus the black background allows SD to focus the given text prompt onto a single object, therefore narrowing its focus and shortcutting SD into strange new territories. But the real key is inputting a video that features content that contextually parallels the SD text prompt, at least in the overall form, but definitely for the necessary color palate. SD's dreaming is limited in that regard. Also finding the ideal SD seed to lock down is important since there are many seeds which didn't match the style I was aiming for.

My initial tests at 60fps were far too intense and difficult to watch. So I experimented with limiting the frame rate of the input video and landed on exporting a 30fps frame sequence from After Effects. After processing the AE frames in SD, then I passed the SD frames into Topaz Video AI and interpolated every other frame so as to make it a 60fps video. Typically I do not interpolate footage that moves this fast since it makes it feel too morphy, but in this context I think it gives the stop motion aspect a buttery quality.

For the "Circuit Map" scenes I grabbed the CPU videos from the Machine Hallucinations pack and used it for the stop motion technique described above. From there I jammed with it in After Effects and couldn't resist applying all sorts of slitscan experiments to make it feel as though the circuits are alive in various ways. And of course applying some liberal use of color saturation and Deep Glow was useful in making it feel electric and pulsing with energy.

For the "Factory Arm" scenes I wanted to have an industrial robot arm swinging around and insanely distorting. So I started by creating a text prompt in SD and then rendering out 13,833 images. For the first time I didn't curate the images of this dataset by hand and just left any images which showcased any strange croppings, which saved tons of time. In the past I've worried that StyleGAN2 would learn the undesired croppings but have since learned that with datasets this large these details tend to get averaged out by the gamma or I can just stay away from the seeds where it becomes visible. From there I did some transfer learning from the FFHQ-512 model and trained it using my dataset until 1296kimg.

After that I did a new experiment that I've been tinkering with lately. I typically train using FreezeD=4 since I have found that using this setting allows the model to remain a bit more free flowing in its interpolations when rendering out video. I reason that the super low resolution layers contain very little detail and it's maybe better to just have these layers remain unchanged from the mature original state of the FFHQ model. Maybe this is because currently I rarely train for more than 5000kimg. But I'm just going by intuition here as an amateur and the devs have shared little about this aspect. Anyways after the training has stabilized using FreezeD=4 then I switched over to FreezeD=13 and further trained until 2532kimg. This allowed the training to progress a little bit faster, about 30 seconds per tick, which adds up to significant time savings... Yet I noticed it's dangerous if I switch too early since it can introduce undesirable blobby artifacts into the model. Using FreezeD=13 means that only the very last layer of a 512x512 model will receive training and all of the smaller resolution layers will be frozen during training. I have found this useful for when it seems that I have hit a threshold of the model learning any more details, so instead I just focus on the very last layer. I believe this is because the layers are connected in a way that smaller resolution layers affect the downstream layers during the training process, and so freezing the smaller layers allows it to train differently. But I need to do more testing as I'm not confident about this technique.

From there I had a SG2 model of industrial robot arms moving around in a way that I didn't dig and I almost discarded this model. But as I have often experienced, it's vital to render out about 10,000 seeds and then curate through the seeds and organize a selection of seeds into an interpolation video export. Sometimes the SG2 model can look funky when rendering out a freeform interpolation video since it's moving through the latent space without a human guiding it. After that I jammed with the videos by applying a slitscan and then mirroring it. To be honest, I typically steer clear of using the mirror effect since I think it's heavily overused in many VJ loops. But it's always good to break your own rules occasionally and in this context I think it's well deserved since having multiple industrial robot arms move in unison looks appropriate and really cool.

For the "Mecha Mirage" scenes I grabbed a bunch of videos from the Machine Hallucinations pack and applied the SD stop motion technique. These were quite satisfying since they were more in line with how I imagined SD could extrapolate and dream up strange new mutating machines. I think these videos look extra spicy when sped up 400% but I kept the original speed for all VJ-ing purposes. It is so bizarre what these AI tools can visualize, mashing together things that I would have never fathomed. Again I applied an X-axis mirror effect since the strange tech equipment takes on a new life, although this time I didn't use a traditional mirror effect since I flipped the X axis and then purposefully overlapped the two pieces of footage with a lighten blend mode. So you don't see a strict mirror line and better blends everything. And then the pixel stretch effect was a last minute addition that was some real tasty icing on the cake.. I think it's because machines are often symmetrical and so this really drives home that feeling. In the future I want to experiment with Stable WarpFusion but getting it to run locally is such a pain. Hello my AI friend, what did you have for lunch today?

Released July 2023