Sound Space

AR/VR

Overview


Augmented Reality is a technology that allows us to augment our visually perceptible world with elements that are in turn primarily visually perceptible on the source medium. The focus of the project and the resulting application Sound Space was to make the visually perceptible parts of our environment auditorily experienceable.

Similar to synesthesia, which occurs in some people, sound frequencies are generated based on registered color. In the typical sound production by a synthesizer or other instrument, the musician can play all the pitches that the instrument makes available. With Sound Space, on the other hand, you can only play sounds that result from the colors of the environment you are in at the moment.

The intention behind this is to actively connect the musician to their environment, as the tones are unique in their availability at that moment. So depending on time, light, weather, material, texture and other external influences, the result is a special composition.

The resulting application is divided into two functional blocks. In the first, so called game mode, the user has to hit different frequencies. This trains the conscious perception of the environment and the musical ear. In the free mode the user can then fully concentrate on generating sounds. The app was developed using Unity and Vuforia

Instructions


Splash screen

Main menu

When the app has been started, the user is greeted by the splash screen, which displays the first hints about the function of the application. You can see various waveforms that are important for the later function of the app.

After the splash screen disappears, the user can choose between two menu items that refer to two functionalities of the app. In game mode, the user is given a frequency to hit in the upper left corner. An indicator, located next to the frequency display, shows whether the momentary tone is above or below the given frequency. At the moment when no sound is played, a circle appears to indicate the idle state. If the displayed pitch is hit, a new random value is immediately displayed. In the lower left corner, there is a back arrow that guides the user to the main menu again. The graphic in the lower right corner shows the waveform with which the sound is being played at the moment. This can be changed by scanning another waveform from the respective image target. I will go into more detail about the individual waveforms in the chapter Implementation.

Game mode

Changing the waveform

When changing to another waveform, a particle system is displayed, whose elements consist of the respective wave and the color scheme of the UI is colored to the respective waveform color to show the user that the change was successful.

In the center of the screen there is a circular element (the so called color extractor) that extracts a color tone from the camera image. As soon as the user taps on the screen, the extractor enlarges and the corresponding sound is played. This tone can be changed in height by moving the finger in a vertical direction, whereas a horizontal movement changes the volume of the tone. After registering a color tone, the pitch can be fine-tuned until the specified frequency is reached.

The free mode differs in that no preset frequency has to be reached and therefore the frequency display in the upper left corner is not present.

Implementation


In this chapter I will discuss special mechanisms that are important for the function of the application and go beyond the standard Vuforia setup. The most important feature is the extraction of a color value from the environment. For this the image is copied from the stream as texture. The color value of the pixel that is exactly in the middle of the texture is then set as the color value of the color extractor.

// Save the image from the stream as texture
var image =
    VuforiaBehaviour.Instance.CameraDevice.GetCameraImage(PIXEL_FORMAT);
image.CopyToTexture(mTexture, true);

if (dynamicColor)
{
    // Get color of the center pixel from the texture and apply it on the color pipette
    sample = mTexture.GetPixel((image.Width / 2), (image.Height / 2));
    circle.color = sample;
}

Color extraction

Based on the luminosity method, which is mainly used in the conversion of color to grayscale images, a single value is generated from the red, green and blue components of the color pixel. The luminosity method is based on our human perception, since we perceive green tones most strongly and these are weighted most heavily in the method.

// Get luminosity based on color value of sample
float luminosity = GetLuminosity(sample);

// Generate a mean value from R, G and B channel based on the perceived brightness of a color
public float GetLuminosity(Color color)
{
    return ((0.21f * sample.r) + (0.72f * sample.g) + (0.07f * sample.b));
}

Get luminosity

The frequency obtained by the luminance method is used as the basis for calculating a phase, which in turn is needed for generating the various waveforms. Four different waveforms have been integrated in the Sound Space application, these are sine, square, triangle and sawtooth wave. The waveform expresses itself in the sound of the respective tone.

Sine wave

// Sine wave
if (waveform == 1)
{
    // Calculate the sample value for a sine wave
    data[i] = (float)(Mathf.Sin((float)phase)) * 5.0f;
}

Square wave

// Square wave
if (waveform == 2)
{
    // Calculate the sample value for a square wave
    if (Mathf.Sin((float)phase) >= 0)
    {
        data[i] = 1.0f;
    }

    else
    {
        data[i] = -1.0f;
    }
}

Triangle wave

// Triangle wave
if (waveform == 3)
{
    // Calculate the sample value for a triangle wave
    float x = (float)phase / Mathf.PI;
    float y = Mathf.Abs(2.0f * (x - Mathf.Round(x + 0.5f)));
    data[i] = y * 2.0f - 1.0f;
}

Sawtooth wave

// Sawtooth wave
if (waveform == 4)
{
    // Calculate the sample value for a sawtooth wave
    data[i] = ((float)phase / Mathf.PI) - 1.0f;

    // Wrap the phase back to 0 when it reaches 2*pi
    if (phase > 2.0f * Mathf.PI)
    {
        phase -= 2.0f * Mathf.PI;
    }
}

Volume and pitch of the respective tone are calculated by the user’s touch input. The first touch position is registered and then serves as the basis for calculating the tone change. This is calculated on the basis of the distance and direction of the starting position and output as a change in tone.

The objects that are called by capturing image targets behave like a button in the end, because they are clickable via the display. This is achieved by a raycast that evaluates whether such an object is hit by it or not. If this is not the case, the sound is played and if an object is hit, the sound wave changes to the respective form and the user is informed about the successful interaction by a color inversion of the object and a particle system.

Details


Tools used

Unity, Vuforia, Visual Studio, C#, Android, Inkscape, Affinity Designer

Institution

University of Applied Sciences Upper Austria

Course

Augmented Reality

Lecturer

FH-Prof. Dr. techn. Dipl.-Inf. (FH) Christoph Anthes, MSc