Making of the Artcorn

This post is a study on how quickly trying out a cool thing so easily gets out of hand, or as we Finns say, slips away from a woollen glove.

My colleague Ville Tainio showed me a small picture he had made with neural-style, a torch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. He had used it on our Social Responsibility Program logo. I saw potential and decided to just quickly try. It’s been three weeks now and not nearly enough sleep.

There are a lot of pictures after the next wall of text. Moving pictures! For the impatient, here’s a two minute video demonstration with music.

“Making of the Artcorn” by Teemu Turunen [CC BY-SA 4.0]

The music composed by Markus Koskinen.

The sciencey stuff

The algorithm implementation (written in Lua and well documented by jcjohnson) uses neural representations to first separate, then recombine the content and style of different images, to create a new image. The neural network tries to work out what is unique in the artist’s style, then use that style on the provided content.

It uses a convolutional neural network, a feed-forward neural network inspired by biological processes, in which the neurons respond to overlapping regions in the visual field. The network consists of multiple layers of small neuron collections called receptive fields. When doing propagation, the momentum and weight decay values are chosen to reduce oscillation during stochastic gradient descent. I have, at this point, or any future point in time, no idea what that last sentence means.

This algorithm is, however, pretty well explained in the paper. These neurons are computational units working as a collection of image filters, extracting a certain feature from the input image. Each layer produces a differently filtered version of the image. Through the processing hierarchy the input image is transformed into representations of the actual content of the image, not just pixel values. High-level content representation is in terms of objects and their arrangement.

For the style image texture information is captured, but the content is not relevant. As a result we have separate content and style, and they can be used to produce new imagery that still makes sense.

Read the paper for more information. I am just a random guy who wants to make Artcorns.

Artcorns?

I am going to use this marvel of modern science to create surreal versions of our logo. These images will be used to print promotional beverage coasters! We can then sneak some of them into random bars, and… that’s about as far as I’ve planned this.

“Prototype Artcorn Coasters” by Teemu Turunen [CC BY-SA 4.0]

Coaster prototypes

Is that a fat cat under the prototypes? Why yes, yes it is.

Here’s the logo original by Pekka Pulli, the Chilicorn [copyright Futurice, CC BY-SA 4.0]

Chilicorn original

Using the algorithm

There are plenty of parameters you can tweak in the algorithm. I get into some of those later in this post. For the Artcorns (artistically altered Chilicorns) I go with the basics:

-style_image starry_night.jpg (the image from which the style is learned from)
-content_image chilicorn_no_text-1024.png  (the content on which the learned style is applied to)
-image_size 1024 (the size of the output image, pixels)
-num_iters 1000 (how many iterations are run)
-save_iter 5 (save every 5th iteration into a file)
-output_image starrycorn.png (name of the output image)
-gpu -1 (use CPU instead of GPU even if the latter is available)

The algorithm is run with 1000-2000 iterations. I want to print 200 versions of the developing output while it churns away, to study them and so I can combine them into a video. There seems to be some variance on how frequently the algorithm implementation allows you to save the output, using a number as low as five may cause it to decline service and exit.

If that happens, I then try save_iter 6 and num_iters 1200, or save_iter 7 and num_iters 1400. I’ll see if I can disable that particular check at some point, it does not make sense to me.

The resulting two hundred PNG images are combined into an mp4 encoded video using ffmpeg. I created 20 second long videos.

Environments

I created a GitHub Gist with the installation and operation procedures.

Installing the dependencies for the neural-style algorithm was quite straightforward. I started by running it on a virtualized CentOS server, for which I allocated 10GB of memory, assuming it to be plenty.

“I Got This” by Teemu Turunen, [No Rights Reserved, CC0 1.0 Universal]

I Got This

It was not plenty.

I quickly learned that with 10GB I can create 600px images, not larger. Also my CentOS does not have a suitable GPU for this purpose. This type of computing is much, much faster on GPU. By increasing the server memory to 27GB I manage to create 1024px images on CPU - seven hours per image. I could maybe even do 1280px, but the time required grows exponentially.

That is too much time to find out that the end result sucks, which is why I booked an AWS G2 instance (g2.2xlarge) that comes with a suitable GPU. Unfortunately that GPU only has 4GB of memory, again limiting the picture size to 512px. Running that on AWS GPU only takes 40 minutes though, which is great for prototyping. Installing neural-style with dependencies on the AWS instance was even easier, it’s a five minute job.

All videos on this site have been created on the CentOS server at 1024px size utilizing the CPU for the computing. Most of them have first been prototyped as 512px version on AWS on GPU.

“Top output while running neural-style on CentOS on CPU” by Teemu Turunen [CC BY-SA 4.0]

Top output while running neural-style on CentOS on CPU

If you can get a GPU with 8GB or more memory, I highly recommend it… this is excruciatingly slow! A thousand iterations is a lot though - you can see at around 250-300 whether the result will be interesting. On AWS with smaller images that’s just 10 minutes each.

Artcorns!

I picked some well known paintings for the algorithm to learn the style of. They are also old enough to fall into the Public Domain. I assume that using a style of a copyrighted work should not be an infringement, but I have zero interest in becoming the precedent in court.

Rousseaucorn

Style is learned by the neural algorithm from Henri Rousseau’s Tiger in a Tropical Storm [Public Domain, image from Wikimedia Commons]

Henri Rousseau - Tiger in a Tropical Storm

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Starrycorn

Style is learned by the neural algorithm from Vincent van Gogh’s The Starry Night [Public Domain, image from Wikimedia Commons]

Vincent Van Gogh - The Starry Night

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Picassocorn

Style is learned by the neural algorithm from Pablo Picasso’s Self Portrait (1907) [Public Domain, image from WikiArt]

Pablo Picasso - Self Portrait (1907)

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Egyptcorn

Style is learned by the neural algorithm from an Egyptian tomb wall-painting from the 10th tomb at Gourna, Tebes [Public Domain, image from Wikimedia Commons]

Egyptian tomb wall-painting from the 10th tomb at Gourna, Tebes

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Mapcorn

Style is learned by the neural algorithm from a hand-drawn map of Stralsund, from the book History of the Thirty Years’ War by Anton Gindely, 1884 [Public Domain, image from the British Library Collection in Flickr]

A hand-drawn map of Stralsund

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Attack of the 50 foot Corn

Style is learned by the neural algorithm from the movie poster for Attack of the 50 Foot Woman, by Allied Artists, 1958 [Public Domain, image from Wikipedia]

Movie poster - Attack of the 50 Foot Woman

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Enough Artcorns!

That is true, I got carried away. With my CentOS server it took about 48 hours of computing to create those.

What did I learn, other than that we still really need more powerful computers?

The algorithm obviously learns the color and texture from the paintings very well. Even fairly colorless images, such as the map, transform the content greatly.

Using a style image with very uniform texture is pretty much the same as applying a simple Photoshop filter on the content image, just a lot slower. Having variance in the style image makes things more interesting.

Chilicorn, as an image, is also not ideal as a content as it is very simple and compact; no room for the larger style elements to manifest.

I have the images for the coasters now. They are going to be the envy of the town! Stop by our office a bit later this year, and you might get one.

Let’s try something else then…

Applying styles on a photograph

I have a picture of the the Futurice Helsinki office with some people hanging around.

I’ll try a few different styles on that next.

The Futurice Helsinki office [copyright Futurice, All Rights Reserved - used with permission]

Movie poster - Attack of the 50 Foot Woman

The style image is a picture of an illuminated copper engraving, from Metamorphosis insectorum Surinamensium, Plate XLV. 1705, by the awesome scientist Maria Sibylla Merian [Public Domain, image from WikiArt]

Maria Sibylla Merian - Metamorphosis insectorum Surinamensium, Plate XLV

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Bamboo under Spring Rain, Xia Chang, 1460AD, Ink on paper and mounted as a handscroll. [Public Domain, image from Wikimedia Commons]

Xia Chang - Bamboo under Spring Rain

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Image of an oil painting presenting a bazaar, credited to John Varley [copyright Wellcome Library, CC BY-SA 4.0]

Image of an oil painting presenting a bazaar

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Pablo Picasso, Still Life with Liqueur Bottle, 1909, [Public Domain, image from nonsite.org]

Pablo Picasso - till Life with Liqueur Bottle, 1909

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Combining multiple styles and setting their weight

You can also give multiple style pictures and define style weighting. For instance, if we take the content from another office pic, taken at our weekly tech sharing event WWWeeklies by someone who caught Andre watching a video of himself instead.

Photograph by Futurice

The first style is selected to see if we can apply some finer texture to it, by using a photograph.

This Awesome Moss Growing On My Backyard by Teemu Turunen [CC BY-SA 4.0]

Photograph of moss by Teemu Turunen

The second style brings strong shapes that overlap the people in the office pic.

It is important to note that if I understood the workings of this algorithm correctly, the fact that the spirals overlap people has no relevance, since the algorithm separates the style from the content.

Freehand Golden Spirals in MS Paint Like a Boss by Teemu Turunen [CC BY-SA 4.0]

Image of spirals by Teemu Turunen

I want to emphasize shape over texture, so I give 70% style-weight for the spirals, 30% for the moss.

This is what happens:

The video [copyright Teemu Turunen, CC BY-SA 4.0]

The shapes aren’t there. Time to add style-weight. The default is 100, and increasing or reducing the number should affect how much the style image(s) affect the end result.

As a high-roller, I’ll double it to 200. Behold!

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Yeah, I rolled too high. Reducing style-weight to 150.

The video [copyright Teemu Turunen, CC BY-SA 4.0]

Perfect! I have combined two decent enough pictures and a doodle, spent 24 hours and a lot of electricity, and created something very ugly.

At least it was educational. Style-weight clearly has a huge impact on the result. Using multiple styles for the same image may be challenging to predict, not sure if I see a reason for it.

Also due to the spiral experiment I am now convinced that style is indeed separated from the content; having the spirals overlap the people in the style image does not seem to matter.

The mutant invasion

Recently an already iconic pic was taken at our company annual party. Our Tampere office band, BAD finance, rocked the stage and this bandfie was taken by Mikko Pohja [CC BY-SA 4.0]

BAD finance performing at Kaapelitehdas, by Mikko Pohja

I was pretty excited about applying neural-style to this picture, but…

Nope.

The people on the front are just the wrong size I guess. You can recognise facial features easily from the original, but there’s not enough space for the algorithm to twist them in a way that does not create mutants.

I’ll just link images for these. First I tried with Camillo Pissarro’s Kastanienbäume in Louveciennes [Public Domain, image from Wikimedia Commons]

Camille Pissarro - Kastanienbäume in Louveciennes

Spotted mutants! [copyright Teemu Turunen, CC BY-SA 4.0]

What’s with the Guy Fawkes in front middle? :-o

Then with Hugo Simberg’s Kuoleman puutarha [Public Domain, image from Wikimedia Commons]

Hugo Simberg - Kuoleman puutarha

Nightmare mutants, as expected! [copyright Teemu Turunen, CC BY-SA 4.0]

New York Time’s front page from 1914 [Public Domain, image from Wikimedia Commons]

New York Times headline 1914

Papercut mutants! [copyright Teemu Turunen, CC BY-SA 4.0]

The Great Wave off Kanagawa [Public Domain, image from Wikipedia]

The Great Wave off Kanagawa

Artsy tranquil mutants! [copyright Teemu Turunen, CC BY-SA 4.0]

And finally, the PAL test pattern PM5544 [Public Domain, image from Wikipedia]

PAL Test Pattern PM5544

Cool cubistic mutants! [copyright Teemu Turunen, CC BY-SA 4.0]

A character from Doom on the front left though?

Conclusion: This type of images are not good for content, unless you are after mutants.

Apologies to BAD finance!

Practical uses

I still really have none, except for the coasters.

Landscapes are where this algorithm really shines. A good balance of details (like trees, buildings, people) and open space (sky, sea…).

Here’s a photo taken at our office balcony in Helsinki, featuring a bottle of our company blend, Cache Buster. The pic was taken by our whisky club president, Rauli Poikela [CC BY-SA 4.0]

Helsinki skyline with Cache Buster by Rauli Poikela

Let’s spice it up with some neural voodoo using this serene painting - Oulu Fire 1882, by Herman Josef Kesti [Public Domain, image from Wikipedia]

Painting of Oulu Fire 1882, by Herman Josef Kesti

The video [copyright Teemu Turunen, CC BY-SA 4.0]

That is actually quite impressive. Landscapes are clearly the way to go, if you want pretty images.

Earlier I tried the newspaper texture on the band pic. It didn’t work out, but the result was interesting. Let’s see how it works on this.

New York Time’s front page from 1914 [Public Domain, image from Wikimedia Commons]

New York Times headline 1914

The video [copyright Teemu Turunen, CC BY-SA 4.0]

That is actually pretty cool example of the texture transfer.

This neural-style stuff is highly addictive, so I warn you against trying. It is a horrible time sink.

If you disregard my advice, you can find the instructions in my GitHub Gist.

Thanks for reading!

The author, style from A colourful assortment of paper clips by Purple Sherbet Photography, [copyright Teemu Turunen, CC BY-SA 4.0]

Teemu Turunen neural-styled with paper clips