Casting Rays

Saturday, March 8, 2014

Strong Typing and Initialization for Generic Vectors

With C++11 becoming more and more stable I decided to talk about some of the features C++11 brings to the table that can enforce strong typing, and how that will affect parts of MiniLib, my minimalistic general purpose library, or more specifically the part of the library relating to linear algebra.

I often see vector and matrix types using hard-coded dimensions, often taking the shape of Vector2, Vector3, Vector4, Matrix2, Matrix3, and Matrix4 classes being separate from each other. There are some benefits to this. Knowing exactly the dimensions of you data types can lead to some good hand optimizations, most notably using SIMD functionality. This method also enforces strict typing, where type mismatches are caught at compile-time. However, it does also mean that a lot of redundant coding, which is a waste of time as well well as an increase in error rate (more code = more errors).

Another way of writing vector and matrix classes is to dynamically allocate dimensions. However, this brings a severe problem; There is nothing inherently different between, say, a three dimensional vector and a two, or four dimensional vector. This means that vectors of different sizes can mix with each other unless the programmer does sanity checks when performing operations on two vectors or matrices moving errors from compile-time to run-time when the former should be favored. Aside from this, dynamically allocated vectors and matrices are difficult for the compiler to automatically optimize, since their size is not known at compile-time.

A third option for implementing linear algebra types is templates. Templates can be used to define sizes of arrays. This brings a few advantages; it enforces strong typing, it avoids unnecessary duplication of code, and it makes automated optimizations more obvious for the compiler, while at the same time leaving the option open for the programmer to specialize a template for specific hand optimizations. This is the implementation that I have chosen for MiniLib. All is not well however. The old C++ standard has had a long standing, glaring hole where construction of templated vectors and matrices have proven problematic. Simply put, there is no way to initialize a number of elements in a non-aggregate through a constructor without writing a specialized constructor taking the exact number and type of arguments that are expected by the object construction.

There have been a few ways around this though; Passing a single data structure that holds the data to be copied to the constructor, using a makeshift initializer list by overloading the comma operator and passing that to the constructor, and by using the variable argument count language construct. While the first option is the lesser of the three evils, using a separate data structure to hold data does not really solve the problem, it only shifts it from the main data type to the data structure holding the data to be copied. The second option is very much like the first one, only it tries to solve the problem through obscure means (i.e. overloading an operator that is best left alone). It should be noted that the initializer list might be implemented without using the comma operator, but in the end it is still a very inelegant solution. The makeshift initializer might also not be performant unless care is taken in its implementation.

Taking this into account I landed on using the variable argument count method, the same method used for the standard C function print(). It looks something like this:

However, this is a somewhat dangerous road to tread. Not only is there a performance penalty involved for iterating through the variable argument list, but there is also nothing in the way of type safety. There is not even any mechanism in place for guaranteeing that the actual number of input parameters match the expected number of input parameters. This could potentially lead to trouble if the programmer using the library is not careful.

This is where the C++11 standard comes in and saves the day. C++11 provides some excellent tools for overcoming these limitations. There is the option of std::initializer_list, and there is the option of variadic templates. One is an extension of the standard library, the other is an extension of the language itself, but both of these alternatives provide compile-time safety. To be fair, the former option most likely also uses language extensions behind the scenes.

As you can see C++11 has opened up the possibility for elegant handling of purely mathematical data types by providing a number of tools used here to create strong typing that can be enforced at compile-time. Unfortunately, while I am incredibly tempted to implement these solutions into my own linear algebra library I still feel somewhat hamstrung by the current rate of adoption of C++11. Newer compilers may handle all of this with ease, but I seldom have the luxury to work with the latest and greatest. Still, people looking to implement flexible linear algebra libraries should look into the new features in C++11.

Monday, January 13, 2014

Voxel Rendering Using Discrete Ray Tracing

About

About a year ago now, I decided to do some research into voxel rendering. Information on the subject was fairly limited however, and I ultimately chose to implement voxel rendering based on my own prior knowledge in related areas. The result was a working voxel renderer that utilizes discrete ray tracing to march through a voxel field and rendering the first voxel it intersects. The most interesting thing about this technique is how well it lends itself to parallel speed-ups as well as acceleration structures such as sparse voxel octrees, as made famous by Jon Olick during his time at id Software.

At about the same time I had finished the renderer I decided to type up a small tutorial that would hopefully be of interest to someone trying to find out more about the subject but does not know where to start. Not really having any outlet to post this tutorial I simply put the text I had typed up in a folder somewhere where it mostly went unread. At this point I think posting the tutorial here makes perfect sense. This tutorial was written fairly hastily, and I will be updated to be more clear on some points, but I do hope it will still be of use to someone.

Summary

The algorithm in this paper is based on old methods for rendering geometry using ray casting in a similar fashion to Wolfenstein 3D. Such an algorithm is extended by one dimension to be able to handle rendering voxels in a fully three-dimensional context by discretely traversing a voxel volume along a ray. Ray traversal is performed using a three-dimensional form of digital differential analysis. As with most forms of ray tracing, the algorithm lends itself well to parallel computing, and can be extended to be more efficient using a sparse voxel octree structure.

Prerequisite Knowledge

Before reading any further there are some core concepts that this article will assume the reader already has a firm grasp on.

First and foremost, the reader is assumed to have a decent grasp on programming, preferably C++. However, the basic principles the code should be clear enough that, coupled with the descriptions of the code, should make understanding the algorithms as expressed through C++ fairly obvious for any programmer.

Secondly, the reader is assumed to have working knowledge of linear algebra. Such knowledge is fundamental to understanding the principles of the algorithm. Furthermore, some of the code in this article depends on code not presented in this article which require understanding of linear algebra to implement.

Third, the reader is assumed to be familiar with graphics rendering as well as the concepts and jargon contained therein. Concepts like pixels, texels, view ports, color channels, and rasterization will not be further explained. No knowledge of specific graphics API:s is required.

What is a Voxel?

Over the years the term voxel has become somewhat dubious. Seen in isolation, a voxel can be said to be a cube. This has lead many to believe that any game that predominantly renders cubes is a voxel renderer. This is not necessarily true. Seen in context, a voxel is merely an index in an axis aligned three-dimensional grid of data, and, much like the texels in a texture, voxels must adhere to that grid at all times. A voxel can not move freely around the grid; in fact, a voxel can not “move” at all since it is merely an index, and, as such, has no notion of position.1 Because of this, it is not correct to view voxels as geometrical cubes. The cube shape is merely the visual representation of voxel data.

An even more disputed term is voxel renderer. Voxels are data, and not an algorithm. As such, a voxel renderer could mean anything that visualizes voxel data, be that through ray tracing, ray casting, rasterization, or any other method. In essence, there might be no straight answer as to what a voxel renderer actually is. However, it stands to reason that a voxel renderer must use voxel data as the primary source of rendering data. Merely rendering cubes do not qualify as voxel rendering unless the cubes are stored as voxels. Thus, we know what a voxel renderer is not.

What is Ray Tracing?

Ray tracing is the process of following the progression of a ray from a starting point to an impact point. Generally, a ray consists of two components; a position (the origin), and a direction. As such, a ray has a starting point, but not an end point. Used for computer graphics, the starting point of a ray is generally the same as the position of the virtual camera. The direction of the ray is the normalized differential vector between the camera starting point and the position of the pixel on the view port that is currently being rendered. Given these initial conditions, a ray is traced for every pixel on the screen in order to render a complete scene.

In many cases ray tracing is used to analytically determine an impact by iterating through a list of arbitrarily shaped objects in the scene and doing an intersection test for the current ray. However, for voxels, analytical ray tracing might be severely impractical. Iterating through a volume of voxels to perform a ray-cube intersection test is for many cases not going to be efficient enough for real-time use. By doubling dimensional resolution the volume has effectively grown by a factor of eight. A discrete method for finding intersections is therefore preferred where the maximum amount of intersection tests is limited to Pythagorean complexity rather than cubic. Furthermore, discrete ray tracing guarantees that the first intersection found is the closest of all possible intersections since the technique actually involves traversing through each voxel a ray intersects in the order that the ray intersects them.

The Algorithm

The basic algorithm for rendering voxels in this paper is based on the algorithm presented by Lode Vandevenne in his excellent ray casting tutorials for writing a Wolfenstein 3D styled renderer. The algorithm, which involves discretely stepping through a two-dimensional map by tracing a ray for every pixel column on the screen in order to find the closest intersection to a wall, manages to infer enough data from such an intersection to render a three-dimensional view of the two-dimensional map. There are severe limitations using this method for rendering though; Players are limited to one plane, meaning that the full Y (up) axis is lost, and the camera could not be arbitrarily rotated. While it may seem counter logical that such an algorithm used to render the graphics of one of the first first-person shooters ever made can be of any use today, upon closer inspection the ray caster can already be said to render voxels albeit locked to a single plane.

Extending the algorithm by one dimension enables us to render much more complex and interesting geometry by removing the “one plane” restriction as well as allowing the camera a full six degrees of freedom. Such a change entails another two fundamental alterations of the original algorithm; First, the use of volumetrically stored data in place of two-dimensional data for geometry, and second, tracing rays for every pixel on the screen rather than for every column of pixels. Since the algorithm steps though the indexes of the volumetric data in the order that a ray traverses them, the algorithm belongs to a class of discrete ray tracing algorithms.

Implementation

This section of the article aims to provide C++ code of the components of the algorithm while at the same time explaining what the segments of code do and how they work.

The algorithm can be broken down into two major sections; the main loop that initializes data and steps through each pixel on the screen, and the actual ray tracing which traverses the voxel volume until an impact point is found.

Setup and Main Loops

The first section of code handles the main function that will trace rays through the view port. A ray consists of two components; A position (origin), and a unit direction vector. The position of the ray equals the position of the camera and will remain the same throughout the rendering process. The direction of the vector will have to be recalculated for each point on the view port the ray is to pierce.

Instead of calculating the normal of each ray, we can instead calculate the normals of the rays at the location of each of the four view port coordinates by normalizing the view port vectors. The direction for each pixel on the screen can then be bilinearly interpolated based on these four normalized vectors. See figure 1.

Figure 1: Camera components. The red point is the camera position, the black frame is the view port, and the blue lines are the view port vectors. The normalized view port vectors then correspond to the direction of the rays to be traced at the corners of the view port.

In code this corresponds to:

// Calculate normals at view point coordinates
const Vec3 upperLeftNormal = normalize(camera.GetUpperLeftVector());
const Vec3 upperRightNormal = normalize(camera.GetUpperRightVector());
const Vec3 lowerLeftNormal = normalize(camera.GetLowerLeftVector());
const Vec3 lowerRightNormal = normalize(camera.GetLowerRightVector());

The camera view port vectors in the code above are assumed to originate from (0, 0, 0), i.e. they are independent of camera position. If they depend on the camera position simply subtract the camera position before normalizing the vectors. However, the camera view port vectors above are assumed to rotate as the camera rotates. This then consolidates important camera properties such as direction and field of view.

With the view port normals calculated we need to determine the interpolation deltas for every pixel on the screen in the y direction.

// Calculate the inverse of the screen dimensions
const float invHeight = 1.f / (float)screen.GetHeight();
const float invWidth = 1.f / (float)screen.GetWidth();

// Left and right y deltas for normal interpolation
const Vec3 leftNormalYDelta = (lowerLeftNormal - upperLeftNormal) * invHeight;
const Vec3 rightNormalYDelta = (lowerRightNormal - upperRightNormal) * invHeight;

// Interpolation variables
Vec3 leftNormal = upperLeftNormal;
Vec3 rightNormal = upperRightNormal;

The interpolation variables are initialized to the upper side of the view port. This is done due to the order in which we will traverse the pixels on the screen as the rendering process works its magic; top to bottom, left to right. As the rendering process creeps down the screen so are the interpolation variables incremented by the calculated deltas. The inverse screen width will be used later inside the main loops in order to complete the bilinear interpolation of the direction vectors.

The next step initializes the origin of the ray:

Ray ray;
ray.origin = camera.GetPosition();

Only the ray origin is initialized outside of the main loops since it is the only one of the two components1 of the ray that remains constant throughout the rendering process. The ray direction will constantly change as the pixel to be rendered on screen changes.

There are two main loops; One iterates through the y (down) direction of the screen pixels, while the other iterates through the x (right) direction of the screen pixels. Starting with the first part of the y loop:

for (int y = 0; y < screen.GetHeight(); ++y) {
  
  ray.direction = leftNormal;
  
  // Calculate new x delta
  const Vec3 normalXDelta = (rightNormal - leftNormal) * invWidth;

Here the ray direction is initialized to the left interpolation variable due to the order in which we iterate through the screen pixels in the x direction; Left to right. At this point we also calculate the final component of the bilinear interpolation which will be used later.

  for (int x = 0; x < screen.GetWidth(); ++x) {
    // Trace a ray and render on screen
    Color color = TraceRay(ray, volume);
    screen.SetPixel(x, y, color);

As we step through the columns of pixels on the screen we also call the function for tracing our ray through the voxel volume in order to find the color of the closest intersecting voxel. This function plays such an important part of the algorithm that it is described in its own code walkthrough section. For now, a simple function call will suffice as a description.

The closing sections of both for loops perform the bilinear interpolation on the interpolation variables.

    // Interpolate x
    ray.direction += normalXDelta;
  }

  // Interpolate y
  leftNormal += leftNormalYDelta;
  rightNormal += rightNormalYDelta;
}

The ray direction is incremented one step to the right, while the left and right normals are incremented one step down. As the code loops back to the beginning of the y loop the ray direction vector is once again set to the left normal. This time, however, the left normal has been incremented one step down. A new x delta is also calculated based upon the new left and right normal values.

Tracing the Rays

With the ray setup done for each pixel on the screen the next step will be to implement the heart of the voxel rendering algorithm; volume traversal along a ray.

For discretely traversing the volume we will use a form of three-dimensional digital differential analysis1 which will guarantee that we always stop and test a voxel index at exactly one of the x, y, or z dividing planes. Figure 2 shows an example using a two-dimensional implementation of DDA.

Figure 2: Two-dimensional digital differential analysis. The blue line is the ray, and the red point is the ray origin. The algorithm finds intersections (green) at the point of axial dividing planes in the order of occurrence with the ray origin as a starting point. The numbering indicates the order the intersections occur in, and what voxels they refer to. The final intersection point is omitted since the neighboring voxel is out of bounds.

To start off, the function signature must be defined:

Color TraceRay(Ray ray, const VoxelVolume &volume)
{

This indicates what was probably already evident from the previous code walkthrough section; That the function takes a ray and the voxel volume as parameters, and returns the color of the closest voxel intersection.

The second part delves into the setup of the DDA algorithm:

  // Calculate distances to axis boundaries and direction of discrete DDA steps
  int coord[3] = { int(ray.origin[0]), int(ray.origin[1]), int(ray.origin[2]) };
  vec3 deltaDist;
  vec3 next;
  int step[3];
  for (int i = 0; i < 3; ++i) {
    const float x = (ray.direction[0] / ray.direction[i]);
    const float y = (ray.direction[1] / ray.direction[i]);
    const float z = (ray.direction[2] / ray.direction[i]);
    deltaDist[i] = sqrt( x*x + y*y + z*z );
    if (ray.direction[i] < 0.f) {
      step[i] = -1;
      next[i] = (ray.origin[i] - coord[i]) * deltaDist[i];
    } else {
      step[i] = 1;
      next[i] = (coord[i] + 1.f – ray.origin[i]) * deltaDist[i];
    }
  }

First off, the starting point of the ray in volume coordinates is calculated by truncating the excess decimal values from the ray origin. These variables will be used later as indexes to access the volume. This is followed by determining the length of the individual components of a vector from one axis plane to the next based on the ray direction and calculated using Pythagora's theorem.

If a ray direction dimension is less than zero then the direction that component will step in is negative one, while if a ray direction dimension is greater or equal to zero the direction that component will step in is positive one. This is due to the fact that we will always only step one voxel index away from the previous voxel index tested. At the same time we calculate the distance from the ray origin to the closest axis plane the ray intersects. The calculation differs depending on the sign of the ray direction component. Figure 3 clarifies this discrepancy in a two-dimensional example.

Figure 3: DDA setup. The red point is the ray origin. The blue line is the ray. The green point is the coordinate of the grid cell the ray originates from. The numbers indicate the sign of the x and y components of the ray direction vector. As the x component of the ray's direction is positive the distance is calculated as [gridX +1 – positionX] * deltaX (+1 since the ray intersects the next axis plane in relation to the grid coordinate), while y is negative and calculated as [positionY – gridY] * deltaY. The multiplication is necessary in order to project the distance to the grid boundary.

It should be noted that index 0 corresponds to the x axis, 1 to the y axis, and 2 to the z axis. Indexing through numbers allows us to make use of loops to perform much of the repetitive work. Do not worry all too much about the performance penalties of using loops at this point. If the compiler of choice has any sense it will unroll these loops in order to eliminate those inefficiencies.

The next part of the algorithm performs the actual DDA:

  while (true) {
    // Perform DDA
    int side = 0;
    for (int i = 1; i < 3; ++i) {
      if (next[side] > next[i]) {
        side = i;
      }
    }
    next[side] += deltaDist[side];
    coord[side] += step[side];

By finding the smallest component of one of our previously defined vectors we can determine what component of that vector to increment. At the same time we know what component of the volume coordinate to increment, thereby knowing what voxel coordinate the ray is currently traversing. The DDA algorithm is performed until either the ray leaves the volume, and thus no collision can ever be found, or a voxel intersection is determined.

    if (coord[side] < 0 || coord[side] >= volume.GetDimension(side)) {      
      break;
    }

The above code makes sure that the current map coordinates are safe to use as indexes for the volume in order to avoid access violations. Should that not be the case then the algorithms will break the loop and return a black color, as will be seen in the next code segment.

The astute reader may notice that only the last altered map coordinate component is actually tested. This means that the other two components of the coordinate could access outside of the volume and not be detected by the stopping condition. This imposes the precondition that the ray origin must be guaranteed to be inside the volume at the time of rendering. This precondition is perfectly reasonable for voxel environments, but could be a severe limitation when rendering movable voxel characters. The algorithm can be safely extended to handle cases where the camera is located outside the volume, but since voxel characters generally suffer from a plethora of technical issues other than the aforementioned the presented technique might not be suitable for such rendering regardless.

The final part of the algorithm involves retrieving the intersected voxel data, if any, and terminating the algorithm.

    // Sample volume data at calculated position and make collision calculations
    Voxel voxel = volume.GetVoxel(coord[0], coord[1], coord[2]);
    if (voxel.valid) {
      voxel.color.r >>= side;
      voxel.color.g >>= side;
      voxel.color.b >>= side;
      return voxel.color;
    }
  }
  return RGB_BLACK; // No voxel found
}

A voxel flagged as valid means that the current voxel index is occupied. This will act as the second stopping condition of the DDA algorithm. By bit shifting the integer color channel value by the last altered dimension we can achieve a crude, static lighting model. This sub-segment is completely optional, and can be safely removed if a different lighting model is chosen. The color of the voxel is then returned and rendered to the screen by the main loops. If no voxel was intersected the color black is returned instead.

Further Improvements

Undoubtedly, there are many ways in which the algorithm as presented above can be improved, both in terms of performance and in terms of visual fidelity.

Performance could be improved by inserting voxel data into a sparse voxel octree which can quickly skip tracing large empty spaces, and comes with other benefits such as automatic LOD. Performance could also be improved by optimizing data storage for cache coherence, threading the rendering process, or rewriting the algorithm to be executed on the GPU.

Secondary rays coupled with an actual lighting model could be introduced to produce shadows, reflections, and refractions that would greatly enhance visual appeal. This improvement is trivial to implement in the existing algorithm. The “blocky” nature of voxel geometry might however not lend itself well to realistic lighting. “Smoothing” out the voxels could help alleviate this issue and requires a normal to be stored at each voxel, but the rendering process would also have to be altered, making this improvement less than trivial.

Source

The source code for this project can be found here:
SirJonthe/VoxelRayTrace at GitHub

Friday, January 3, 2014

Texture Coordinates and Valid Ranges

When writing a software rasterizer for textured geometry you might have come across the following conversion from UV coordinates to texture coordinates:

int texX = int(u * (width – 1));

int texY = int(v * (height – 1));

It is simple, it never results in access violations for valid ranges of UV coordinates, and it is relatively fast. However, this conversion is actually wrong, but you might never have noticed it. In fact, the problem really only gets noticeable given low resolution textures.

First of all, let us take a look at the valid range of UV coordinate components and texture coordinate components. UV coordinates are real numbers that range [0.0, 1.0] while texture coordinates, which are laid out in memory in discrete steps of 1 have a valid discrete range of [0, N-1], where N is the dimension of the texture. However, since UV coordinates are floating point values multiplied by a texture dimension minus one and then converted to an integer the valid range for floating point texture coordinates, rather than discrete texture coordinates, becomes [0.0, N).

Let us try to map a quad with the entirety of this 2x2 texture

using the UV coordinates and conversion technique specified above. We will then get the following result:

We want to map the entire texture across the quad, yet we only get the lower left texel of the texture, with a sliver of the top left texel at the topmost scanline of the quad. This is clearly wrong. But what exactly is wrong? Well, there are actually two connected issues in our UV to texture coordinate conversion.

The first issue that I will bring up is the fact that our conversion function will only display a texture coordinate component N-1 when a UV coordinate component is exactly 1.0, the very end of the valid range for UV coordinates. For UV coordinate components of anything below 1.0 the texture coordinate rendered will not be N-1. This means that the texel N-1 will only be rendered at the edges of our quad where the UV coordinate components are exactly 1.0, or maybe not at all depending on the orientation of our quad and the subpixel accuracy scheme used for rasterization. We partially solve the problem by using the full width and height of the texture as multipliers to the UV coordinates when converting to texture coordinates. However, we must be weary of not reading outside of the allocated memory for the given texture, so we need to wrap those values to the valid range of the texture. The conversion function is therefore updated to look like the following:

int texX = int(u * width) % (width – 1);

int texY = int(v * height) % (height – 1);

Using the modulo operator we make sure to wrap the texture around should the resulting texture coordinate lie outside the discrete range [0, N-1]. For power-of-two textures we can use the AND operator instead of the more expensive modulo operator. By making these changes we get the following result:

This looks much better, but there is still a problem where the UV coordinate components are exactly 1.0, see the top scanline of the quad where the texture has been wrapped around to sample from the bottom of the texture. This is actually a correctly rendered quad, but it is not the way we want it to look. Inspecting the ranges of UV that correspond to a unique texture coordinate we will find that the texturing process is performing exactly as it should, but that we as programmers expect it to do something illogical.

What our algorithm does is that for each UV coordinate range of [1.0/N * i, 1.0/N * (i+1)), where i is a discrete number, we get a unique texture coordinate for every unique i, or so we might expect. However, both i = N and i = 0 result in the texture coordinate 0, which is something we do not always want.

You might be tempted to say that rounding is the solution, but the problem does not exactly stem from rounding per se. It is rather caused by a discrepancy in the valid ranges of UV coordinates as opposed to the valid ranged of texture coordinates. Notice the difference in range notation between the two coordinate systems. UV coordinates are valid for the real value 1.0, but texture coordinates are not valid for the real value N, but are valid for all real values below N since built-in floating to fixed point conversion in C and C++ simply disregard decimals.

So how can we actually solve the problem? The answer is UV hinting (I made up that term). UV hinting is a simple process in which the rasterizer simply takes the relevant adds or subtracts a fraction of the relevant UV interpolation deltas from the UV coordinates used for interpolation. This makes sure that UV coordinates are interpolated within the range (a, b) rather than [a, b] and, ultimately, avoids the edge cases where 1.0 is converted to N.

The result speaks for itself:

So, then the question becomes what fraction should be added or subtracted? Well, the answer is fairly simple. Instead of sampling texture coordinates at the edges of the screen pixels, you instead sample from the center of the pixels, thereby shifting the UV coordinates that are sampled. This means that when interpolating a scanline with the end-point U coordinates [1.0, 0.0] (which means that interpolation starts from 1.0 and decrements until 0.0 is reached) between the screen pixels 0 and 100 you will start sampling from the U coordinate 1.0 - 1.0 / (100 - 0) * 0.5 = 1.0 - 0.005 = 0.995. Then this starting point is interpolated for every pixel in the screen x-axis using the delta (0.0 - 1.0) / (100 - 0) = -0.01.

The end point is also shifted in the same direction as the starting point. However, since we (should) be stopping one pixel short of hitting the end-point [for (int x = x1; x < x2; ++x)] we will not overshoot, or even hit, the end-point of the valid UV range. Generalized, the entire interpolation for UV coordinates inside the scanline function should look like the following:

u = u1 - 1.0 / (x2 - x1) * 0.5 + (u2 - u1) / (x2 - x1) * x

v = v1 - 1.0 / (x2 - x1) * 0.5 + (v2 - v1) / (x2 - x1) * x

Remember that this correction will also have to be done for the y-axis on the screen (this will most likely occur inside the triangle edge interpolation stage rather than the scanline interpolation stage).

Saturday, December 21, 2013

Extending C++, The “const public” Section

A few years ago I came into contact with a nifty language called C#. For situations where the language itself will not be the bottleneck C# can prove a huge improvement in productivity over C and C++. Still, I mainly program in C++ as I often dabble in graphics engines where language can sometimes have far too big of an impact to disregard it. C# does however carry some really cool ideas that I would have wanted C++ to have. One idea in particular strikes me as a feature that I sorely miss.

The concept is called “properties” in C#. Basically, they are an elegant way to eliminate accessor and mutator functions (more commonly known as “getters” and “setters”). What these properties do is that they hide the fact that the user of an object is effectively calling a function that returns a private field by looking like the user is simply reading or writing to data directly.

However, since properties are essentially functions, these functions may perform any number of operations before reading or writing. Sometimes, there lies a really costly algorithm beneath calling a property that the programmer might be unaware of. This is generally considered bad practice, but not prohibited.

In C++, I would write the above code differently. If I am absolutely certain that the variables are pure getters and setters that are free from indirect effects I would simply declare them public and be done with it. However, sometimes declaring something public might be a too invasive action. Sometimes, you just want a programmer to be able to read, but not write from a public scope. You could write a function that returns the data, but that results in more code (not by a lot) and function calls all over the place. You could achieve this by declaring the data itself const, but that would prohibit the object it belongs to from manipulating it save for the time of construction. Because of this I thought of a construct that takes the best from both worlds; The “const public” section.

The “const public” section is essentially only a subset of the concept of properties by only exposing a bare bones way of reading data from any scope. Therefore, the construct still falls in line with the low-level thinking that pervades C++. By extension, a “const protected” section might also make sense, where data is not readable or writable publicly, read-only from a subclass, and readable and writable from the base class that declares it.

Initially, I thought this was a clear win for the language as a whole. However, if you are a somewhat well versed C++ programmer, or you come from any object oriented language background, you have probably already spotted the flaw of the construct; Encapsulation, or more specifically, the way it breaks encapsulation. This is where I am currently struggling to reconcile my own differing views. On the one hand a good interface is agnostic to the implementation, where if the implementation changes the interface remains the same, and thus keeps the amount of code changes that need to be made to a minimum. On the other hand, C++ is a very special kind of language. It sails the seas between different programming paradigms by being flexible enough for the programmer to craft her own paradigms to adhere by.

Extending a language should always be done with caution. The main question is, will a const public section result in a big enough improvement, or detriment, to the language as a whole to be considered worth the effort? My guess is that it will have a minimal impact either way, but can prove to be a major convenience when the situation arises. Regardless, const public sections are not intended to be abused, and should be approached with the same caution as declaring data public. However, const public might at the same time be a better alternative to the programmer than just declaring data public.

Tuesday, December 10, 2013

The Illusive Multiplication Operator

As you may know, vectors define a multitude of operations. Three of these operations are commonly used as a basis to overload the multiplication operator in C++; component-wise multiplication, cross product, and dot product. However, there seems to be no actual consensus regarding what operation to overload the operator with. Most often, the multiplication operator is overloaded to mean dot product. However, some people prefer to overload it to mean the cross product, which is a completely separate concept.

Some, myself included, always preferred to define the multiplication operator as a component-wise multiplication as there is too much confusion regarding whether the operator meant cross or dot product, and we 'bypass' the problem by simply not having the operator defined as either. For those operations it would be better to explicitly call functions named after what they do to so as to eliminate any confusion. However, overloading the multiplication operator to mean component-wise multiplication is also more in line with vector subtraction and addition, and becomes very useful for various forms of interpolation, not to mention its usefulness with regards to color arithmetic.

However, a few months ago Jonathan Blow tweeted a simple insight that struck me as profound.

@paniq @keefjudge @id_aa_carmack Everyone overloads operator *(vec, vec) differently, which is a good reason not to overload it at all.
— Jonathan Blow (@Jonathan_Blow) March 12, 2013

Actually, he is completely right. Avoid all confusion by simply not overloading the multiplication operator at all. While I am sad to say that I still implement my vector multiplication as a component-wise multiplication I still feel that the point being made has affected me in other ways where if there is a risk of confusion in anything I do, I try to rethink the situation instead of perpetuating the problem.

Monday, December 9, 2013

The Importance of a Stable Clipping Algorithm

I recently started working on a new project involving a software renderer while at the same time trying out rendering techniques that I never got around to learning before. One such technique is the binary space partitioning tree. While the concept of BSP trees are fairly straightforward it surfaced an error in some old code that I imported into the project, namely the code that clips polygons against a single plane and outputs two polygons; one polygon wholly in front of the clipping plane, and one polygon wholly behind the clipping plane.

The general idea behind the creation of a BSP tree is to recursively subdivide the geometry by picking a an existing polygon hyperplane to split the geometry by and sorting them in three lists; a list for geometry in front of the hyperplane, a list for geometry behind the hyperplane, and a list for geometry that is coplanar in relation to the hyperplane. This process then recursively repeats for each of the newly created lists, except for the coplanar list which is already sorted. The exact stopping condition is determined by the programmer, although common stopping conditions include lists that only contain a single polygon, or lists that only contain a certain minimum amount of polygons. Care has to be taken when choosing the hyperplane to split by so as to avoid the tree from becoming too deep, or unbalanced. This is a science of its own and will be discussed in some other context.

When determining which list a polygon should be sorted in there can be four separate cases; a polygon is entirely in front of the hyperplane, a polygon is entirely behind the hyperplane, a polygon is coplanar in relation to the hyperplane, or a polygon intersects the hyperplane. For this last case a clipping algorithm is used. This algorithm literally splits a polygon into two parts; one part wholly in front of the hyperplane, and one part wholly behind the hyperplane, and then sorts the resulting polygons into their respective lists for further processing. This is the situation where a stable clipping algorithm becomes very important.

A stable clipping algorithm should never clip a polygon by a plane that it has already been clipped by once. More precisely, it should not generate a polygon on the opposite side of the plane, since that part has already been clipped away from the input polygon. An unstable clipping algorithm might generate two polygons despite the fact that such a polygon should not exist. For view frustum clipping a stable algorithm is not all too important since the polygons behind the clipping planes are discarded. As such, I was never able to identify the error earlier. For BSP tree generation an unstable clipping algorithm will almost certainly cause an infinite loop by constantly clipping the polygons that share an edge with the hyperplane.

The code I eventually came up with an algorithm that looks like the following:

The basic rundown of the algorithm is as follows. First, make room for the output polygon. Then pre-calculate all of the input vertices' signed distances to the plane. If all vertices are either behind the plane, or coinciding with the plane, then return an empty output polygon. The next part is the actual clipping. This requires comparing two connected vertices' distances. If the first vertex is in front of the plane, and the other is behind or coinciding with it, or if the first vertex is behind or coinciding with the plane, and the other is in front of it, then linearly project the first vertex to coincide with the plane. Additionally, if the first vertex is in front or coinciding with the plane then add it to the array of output vertices. A counter keeps track of the number of output vertices generated so far. Outside of the loop this counter will be used to re-size the output vertex array to correspond to the number of generated vertices. In order to get the polygon on the opposite side of the clipping plane, simply negate the plane normal and apply the algorithm to the original input data.

Remember that the output polygon has to be able to hold room for one more vertex than the number of input vertices since clipping against a single plane can logically only output that many vertices at most. Alternatively, you could use a list as the output data structure to avoid preallocation and resizing of data. If your algorithm potentially outputs more than one extra vertex then your algorithm is unstable and you should reconsider using it.

While the algorithm now ideally should work as expected, we are working with a finite amount of bits for out floating point types. This means that in practice the algorithm will not work reliably for anything but axis-aligned planes. This is where numerical stability becomes important. Implementing numerical stability in this case usually means nothing more than allowing some leeway in the classification of points as being either behind, in front, or coinciding. Therefore, the general algorithm remains exactly the same, with only a few adjustments in the implementation:

I wish to express some degree of caution. While the code works as expected for any input that I have given it I can not guarantee it to work for all possible inputs. Even I can err. As such, there may yet be come classes of cases that the above algorithm does not handle properly. If so, my apologies.

Resources

BSP Tree Frequently Asked Questions (FAQ) - Question 8
Clipping a Convex Polygon Against a Plane

Sunday, December 8, 2013

What is a Voxel Renderer?

I often see both scholars and novices alike point out what a voxel engine is and what it is not. 'Minecraft does not use voxels', 'Resogun does not use voxels', and '3D Dot Heroes does not use voxels' is tossed around a lot. I agreed with them for a long while, basing my opinion on the fact that many, if not all of the examples thrown around are hardware accelerated and as such are rendered using triangles as rendering primitives. After actually writing my own voxel renderer I came to a different conclusion about what constitutes a voxel renderer by separating the term 'voxel', which is a spatial data structure, from 'renderer', which is an algorithm used for the visualization of spatial data. In this post I will attempt show the logic of that argument by examining the individual terms of 'voxel renderer' with the hopes of showing people that the concept is far less binary than originally thought.

The first component of the term 'voxel renderer' is of course 'voxel'. Thus, we should have a look at what the basic concept of voxels means. Simply put voxels are volumetric data stored in a three dimensional grid, meant to convey a three dimensional image. It can therefore be said that a voxel is to three dimensions what a pixel is to two dimensions. Therefore, a way to think of a single voxel is to imagine a singularly colored, axis aligned cube (although it is ultimately the renderer that decides how to visualize the input data). It should also be noted that a cube is really only a convenient way of imagining a voxel. A cube is a volume defined by six points in an unrestricted three dimensional space, while a voxel is a single grid cell in an axis aligned space. As such cubes are free to move around the space, while a voxel is forever confined to the location of the cell in which it resides. Because of this we can really only say that a voxel is represented by a cube rather than actually being a cube.

Voxels can often contain more data than a color. They can also contain many properties that describe the surface that the voxels are representing such as surface normals and roughness. However, voxels will not need to contain information about position or size since their position is defined by their relation to one another in the voxel grid, while their size is homogenous along the individual axis. However, voxel volumes can be made to rotate and move independently of other, separate voxel volumes by attaching individual transformation parameters to each volume for the engine to regard when handling the data.

Given the information provided above it should be obvious that any game storing spatial data in a three dimensional grid can be said to use voxels in one form or another. Yet, another interesting facet of voxels is the rendering of voxels. A game engine encompasses many components that work together in tandem. Undeniably, one of those components is the renderer, which is often the single most characteristic and generally identifiable component in an engine. 'Renderer' is an umbrella term for all kinds of visualization algorithms; rasterizers, ray casters, ray tracers, splatters and so on, but it also has a much broader significance by encompassing things like transformation, hidden surface removal, and lighting and shadowing amongst other things.

Taken at face value, 'voxel renderer' does not make any pretense to dictate the actual algorithm used in the rendering process. In fact, it does not dictate anything except the predominant use of voxels in the process. A predominant use of voxels should be an obvious prerequisite when determining if a renderer is a voxel renderer since the term voxel is used as a classification of the renderer. Classifying by something other than a main characteristic feature would be misleading. However, 'renderer' is a much too open ended term to actually mean much (try substituting the word 'pixel' for 'voxel' in 'voxel renderer', and it should become abundantly clear just how without any meaning term is). Additionally, 'voxel renderer', as it stands, has not been used in such a was as to load it with a specific meaning regarding what method of delivery is employed in the rendering process. Therefore, the only prerequisite for a correct use of the term 'voxel renderer' is then the use of voxels as the main source of data used for rendering.

This leads to the question 'when are voxels considered the main source of rendering information'? It is common knowledge that modern day graphics cards provide an ad hoc solution for rendering triangles. Barring the use of OpenCL or the like, any rendering algorithm that seeks to accelerate the process via hardware has to ultimately render primitives as triangles. However, a voxel is converted into triangles merely as a final step to in the rendering process, which arguably is the smallest segment of the rendering pipeline. Aside from that, the mere fact that a voxel is rendered as a set of triangles does not mean that the end result is not a voxel. Rasterization is merely a delivery method for rendering voxels and should not, in and of itself, exclude a renderer that otherwise predominantly uses voxels from being considered a voxel renderer.

In conclusion, voxels are a form of spatial data, while rendering is a form of visualization algorithm. Voxel rendering is a form of algorithm (unspecified) used for the visualization of voxels. Thus, many different things can be considered voxel renderers, as long as there is a predominant, albeit not necessarily persistent, use of voxels throughout the renderer so as to not undermine the classifying term. Other than that, the definition is very open ended and may be used liberally until practice establishes a narrower use for the term.