Optimizing Voxels - Greedy Meshing pt. 2

tags:
  - post
  - voxel
  - gamedev
  - optimization
  - rust
date: 2025-11-13

Let's talk about rewriting our logic from Optimizing Voxels - Greedy Meshing pt. 1 to become a Binary Greedy Mesher and make it lightning fast.

How?

You may be wondering, "my data isn't binary, my voxels all have types!" This is true, but the mesh does not necessarily have to care about types. It can make texturing a bit more difficult, I will cover that some time in the future, but for now we can just proceed knowing that it's possible with potential side effects (as any decision tends to come with). The only thing the mesh needs to care about, with all other details stripped away, is if there is a voxel present or not. That can be represented as binary, and our processor will love us for it.

Interestingly, the logic is not that different from before. It is done in a different way, though, so that we can make use of bitwise operations. A quick aside though is that our chunks should be no larger than 32x32x32 or 64x64x64 for performance reasons. CPUs work very fast on 32bit and 64bit integers, so we want to make use of them here. The first step is to encode your chunk as binary data. You can continue to store the chunk with all its regular data and store this extra data along with it.

Encoding the bits

What we're going to need is 6 different arrays of integers, either 32 or 64bit depending on your chunk size, that hold information on whether the faces of each voxel are visible. Each voxel gets 1 bit in the integer that is 1 if the voxel is there and the face is visible, or 0 if the voxel is not there or the face is not visible. These arrays are a little more confusing than it seems. In order for our method to use bitwise as much as possible, the integers themselves should be aligned to an axis where the faces would connect and be touching; with the other axes just being dimensions of the array. I'll explain specifically why soon, just know for now that it will dramatically help us merge faces that are next to each other. But which axis? Well, there's a problem.

Consider, in a Y-up world, if we used the X axis as the axis our integers here cover. Faces pointing along the Z axis can utilize these integers well, since the edges themselves run along the X axis and thus the faces run together. This is also true for faces pointing along the Y axis. However, faces pointing along the X axis have edges running along the Z axis, and thus are never connected along the X axis. For this special case, our X axis faces actually have to utilize the integers as if each bit was along the Z axis instead. I would argue this is one of the biggest pain points in looking at and understanding the code if ignored, so consider designing around this problem to make it more clear. You should also know that this problem area can make non-cube voxel sizes more painful or even impossible.

So we store our face visibility as binary, 1 bit per voxel, X axis along the integer's bits except for faces pointing along the X axis which use the Z axis instead. What a mess, but what next? Now we can start translating our previous logic using this data. I'm going to use Rust's binary helper functions in this next section, so you may need to translate them to your own language. I'm also going to be using 64 bit integers, but it won't differ too much.

Rewriting our logic

Finding the start

Just like our first step in the previous logic, our first step here is to find where to start. Our last one looped through everything every time, which is slow. In general this can be sped up, even in our original logic, by keeping track of the last place we started. That was a bit too complicated for our previous logic, and a waste of time since we were going to replace it, but since we want speed we should implement it here. In our case here, it's actually handled for us by using a for loop. For the example, I will focus on a single array to keep it simple. It should be noted that if all of your arrays are the same size, they can all fit their logic into this single for loop.

let mut left_rows = chunk.face_bitmasks.left;
for i in 0..left_rows.len() {
	// ...
}

We actually deliberately copy the bitmask arrays because we need a working copy we can modify as we go. The reason we don't just loop over the array itself is we will utilize the index in various ways, not just to work with the other sides' arrays but for other details too.

Now we've found a row to start with, we need to find a bitwise position inside that row. So we grab our row, but we might need to run multiple times over our row before we're done processing it. A perfect case for a while loop.

let mut row = left_rows[i];
while row != 0 {
	// ...
}

Our plan here is to find a section of bits that we can greedy mesh with, and then when we're done we erase that data and check again if there's anything that remains to be done all the same. Once all the bits for the voxels in the row are set to 0 (processed), the whole row will be equal to 0. This is also great because CPUs love to compare against 0.

Now in our while loop, we can find a bitwise position to start from. Luck would have it that there are functions that can do this for us extremely quickly.

let offset = row.trailing_zeros();

This function returns the number of zeros from the end of the binary representation of the number. For example, if we had a binary number such as 0b1000, which is 8, trailing_zeros() would return 3. There are also functions for the other side of the binary representation, the leading_*() functions, but generally I believe it would be harder to work with them in our head and it doesn't affect much otherwise.

Getting the length and window

So now we have our starting position, defined by our row and offset into that row. Now we need to get the length by figuring out how many faces are connected. In this case, I like to call this a "window." If you guessed "I bet there's a fast built in function for this too," you'd be right! Just like how trailing_zeros() can count zeros, there is an equivalent trailing_ones() that can count ones on the bitwise level very quickly. However, if there are any trailing zeros in the way, we need to dispose of them. This is simple, you can just shift the int over by your offset.

let window_len = (row >> offset).trailing_ones();

Now we have 2 pieces of data about where our window of connected faces is, but we actually need the binary representation of the window itself for it to be useful for us. So we must construct it. I made my own little function to take the information and construct a binary window of zeros, for example if you fed it an offset of 1 and a len of 2, it would return 0b11111111111111111111111111111001. You could make one that constructs a window of ones instead if it makes more sense to you, I don't think it matters too much since we end up using both anyway.

/// Returns len zeros at offset bits
pub fn bit_window(offset: u32, len: u32) -> u64 {
	// Rust note: Unbounded is important here, or the shift will not always behave like we want it to. Specifically, it will not stop us from shifting into a 0 value. Otherwise it is equivalent to << or >>.
    let window_left = u64::MAX.unbounded_shl(offset + len);
    // 64 is our number of bits
    let window_right = u64::MAX.unbounded_shr(64 - offset);
    window_left | window_right
}

At this point, technically we've already processed this section of the row. So we can clear it with our zeros.

row &= window;

If you created a window of ones, you can just use !window (or binary inverse equivalent for your language) here of course. Consider storing it as its own variable though, since we're about to use it more. Let's just do that right now.

let cmp_window = !window;

In my case, this is now my window of ones.

Getting the width

Now let's find our width. This is trickier than it may seem, since our "width" will depend on which axis we're working with already. We want to work with the other axis where the faces are touching. In our case here, that would be the Y (up) axis. Since I'm working with 1D arrays to handle 3D data, there's a small bit of math to loop over the rows across the Y axis. There is probably a few ways to go about this, but I will just go over one. Let's get into it.

First, we need to know what Y level we're on already. This is pretty simple, divide our index by our size.

let y = i / 64;

In theory this is extremely fast since I'm using sizes that are a power of 2. Some compilers may even do this optimization for you, so you might not have to worry about it, but for powers of two instead of dividing you can use shifts to divide even faster. Our trailing_zeros() function comes back to save us here, if we need to do it.

// Alternative manual fast division
let y = i >> 64_u32.trailing_zeros();

Anyway, this y value doesn't change much, so we can even store it outside of our while loop.

Now we can loop over the Y axis based on our offset using some math to accumulate width. We need a for loop with a whole separate index here.

let mut width = 1;
for yi in 1..64 - y {
	// ...
}

Since our current row is technically yi = 0, we don't need to do it. That's why we start our range at 1. Then, we only need to cover the amount of rows that remain between our current Y and our maximum (64 in my case), which is simple subtraction. Then, we can use this to add a value to our i to get the next row that would be down/up from us by multiplying it by how many rows there would be in between in the array.

let cmp_i = i + (yi * 64);

Now, we can compare against it with our window and see if we can extend across it.

if left_rows[cmp_i] & cmp_window == cmp_window {
	width += 1;
	// unset the bits we added
	left_rows[cmp_i] &= window;
} else {
	break;
}

If our window doesn't match, we can't extend any further, so we break out of the loop. After our loop has ended, we have all of the data we need to draw the face. Our complete start position (xyz), width and length. All at blazing fast speeds. How you draw this is up to you, as those details may vary. At this point, you also have to figure out how you will pass your texture data over. I'll also write about that eventually.

Outro

That's just about it. Our bitwise greedy mesher is technically done, it just needs to handle all of the other faces. An exercise left to the reader is that the up and down faces do not extend their width across Y, and so the loop is slightly different. Hopefully this little guide is considered helpful, and good luck.