Prototype: Sketch_augmented

Having worked as an architect for the last 6 years, I have witnessed hand drawing and sketching disappear more and more from the architect’s toolbox. Even though this tool is still essential for the architect to reflect and find innovative ideas and aesthetic principles.

The vision driving this project is to make evolve the architect’s workflow with the help of Artificial Intelligence and more specifically here the tool of architecture sketching itself. To invent a more organic way of drawing and shaping buildings.

With the help of Artificial intelligence methods, the end product will generate realistic views interpreted from the sketch as it’s being drawn. Giving instant feedback to the user, he will be able to work with a more objective and informed representation of his idea.

In this article we will explain our approach and our research process while explaining some of the artificial intelligence techniques we’re using.

1. Finding the right model

To be able to map a hand drawn sketch to a realistic image we need the help of artificial neural networks. They have proven in the recent years to be quite efficient in vision and image generation. These neural networks are huge mathematical functions with an enormous amount of adjustable parameters. They are able to learn a task by seeing a collection of input and output examples. By “seeing” we mean passing each input through the model, comparing the result and the target output with the help a “loss function” and correcting the model’s parameters. The loss function is here to process the difference there is between the 2 outputs. Finally, this learning process is regulated by an optimizer algorithm that will allow the neural net to learn quicker and better.

For our project we will need a specific kind of neural networks called “generative adversarial networks” (GAN), these models have the amazing ability to generate new images in different ways. Currently 2 major types of GANs seems to be suited for our task: U-net GANs and CycleGan.

We will begin to explore the capabilities of those 2 models and how they fit to our project’s needs.

1.1. What is a U-net models

The U-net architecture was initialy created for biomedical image segmentation (detecting and detouring objects in an image).

The general logic behind this architecture is to have a downsampling phase called the “encoder” and an upsampling phase called “decoder”. During the encoding phase, the image size is progressively reduced as more and more semantic and content is extracted. At the end of the encoding phase we get a “semantic” vector that will then be progressively upsamplede. But since this vector has lost all shape information, we progressively mix the generated image with the original image’s shape using what we call “skip connections”. (You can read more about this architecture in the original paper https://arxiv.org/abs/1505.04597)

This kind of architecture was also proven efficient for image enhancement and restoration when paired with a “feature loss” function. They can enhance image resolution, clean signal noise or even fill up holes in the image.

For a generative model to be able to learn we need a tool to evaluate the accuracy of the generated image. We usually use a second model called a “critic” that will learn to identify the generated image or the true image. But this method has not always proven good result for realistic image generation. Instead, we use a pre-trained “classification” model that is normally able to predict what objects are in the image. But instead of using the output of this model(it’s a car or a horse), we pick values inside the model’s layers that will represent features found in the image (textures, shapes, etc…). So when we pass the generated image and the target image, we want those values to be as close as possible.

image.png

1.2. What is CycleGan

Cyclegan model basically can transfer image texture style to another texture style (style transfer). It is called this way because it has the ability to make the conversion in both directions. The most popular example is the photo to painting and reverse application: