3D Engine 05

Swine'z pages

Czech Welcome Guff Coding 3D - Engine Guestbook Links Downloads

About author Mail me Mailform

3D Engine 5 - Z-buffer and texture mapping box_en

Hallelujah ! Believe in god ? Praise him for next part of 3d-engine serial. I'm here again with new topic (and lots of jams, but i'm not here to share these ...) Today i'd like to tell you how does Z-buffer and texture mapping work.
Now it's time to thank to Mats Byggmastar a.k.a. MRI / Doomsday for his didactic (;-)) document called "Fatmap" (Fast affine texture mapping), you can find original on the internet. It's much more optimalisated, but unfortunately it's not perspective correct (some of you don't know what that is, but you soon will - so get to it !)
I think the first one who did realtime texturemapping was >doommaker< mr. Carmack, who could optimalise it such way so we could play wolfenstein or doom even at these slow computers. Maybe you're thinking about connection of textures and z-buffer - it's simple : interpolation. Z-buffer is memory of same size as screen, but inside are z-values of pixels. When you're about to draw pixel, you compare it's z-value with z-value of previously drawn pixel and when it's nearer, you overdraw old pixel (and update z-value). It's quite simple, but quite slow and since z-buffer requires floats also quite big in memory (but both problems was solved with 3d-cards) Big advantage of z-buffer is perfect object intersection :

With invention of z-buffer also begins our everlasting fight with fillrate. Fillrate is amount of pixels you're able to draw in a second. With z-buffer you have to compute z-value of each pixel and compare it with the older. That's also quite big amount of data to be transfered. Now, we need some function to tell what the z-value in our particular pixel is. We will use the plane equation again. What plane ? Our plane is defined by three points of face and their coordinates are perspective correct x and y and linear (just after transformation) z. That's also connection with texturemapping. We will have two more planes for u and v coordinate, that tells us position of pixel in texture ! Now we will look at the plane equation :

We can tell how much Z is :

Now we can see that some numbers can be precomputed :

dzdx = - a / c dzdy = - b / c begz = - d / c

Where dzdx is delta-z-per-delta-x, dzdy is delta-z-per-delta-y and begz is z-coordinate in point x = 0, y = 0 (upper left screen corner). Now it's simple - for every next line add dzdy, for every pixel on that line add dzdx and we have our z. To find begz shouldn't be hard :

So we have our first c-function today :


struct Vertex {
    float x, y, z;
    float u, v;
    //
    float illum;
    //
    Vector normal;
};

struct Polygon {
    int n_vertex_num;
    Vertex vertex[7];
};

float didx, didy, begi;
// i for illumination

int Compute_Interpolation_Consts(POLYGON *p)
{
    float x[2], y[2], l[2];
    float a, b, c;
    
    x[0] = p->vtx[1].x - p->vtx[0].x;
    x[1] = p->vtx[2].x - p->vtx[0].x;

    y[0] = p->vtx[1].y - p->vtx[0].y;
    y[1] = p->vtx[2].y - p->vtx[0].y;
    
    l[0] = (float)p->vtx[1].illum - (float)p->vtx[0].illum; 
    l[1] = (float)p->vtx[2].illum - (float)p->vtx[0].illum;
    // vektory lying on our imaginary plane ...

    a = y[0] * l[1] - y[1] * l[0];
    b = l[0] * x[1] - l[1] * x[0];
    c = x[0] * y[1] - x[1] * y[0];
    // cross product - calculate normal

    if(c == 0) 
        return 0;
    
    didx = - a / c;
    didy = - b / c;
    // according to equations above ...

    begi = (float)p->vtx[0].illum - p->vtx[0].x * didx - p->vtx[0].y * didy;
    // faster sollution for - d / c

    return 1;
}

int DrawPolygon(Polygon *p, unsigned __int32 *video, unsigned __int32 color)
{
    int start_y, left_x, right_x;
    int didx_fp, k, r_k;
    float illum;
    int l, c;

    if(!Create_Edges(p)) 
        return 0;

    if(!Compute_Interpolation_Consts(p)) 
        return 0;

    start_y = p_right_edge->y;
    left_x = p_left_edge->x;
    right_x = p_right_edge->x;
    // first edge properties ...

    video += n_Width * start_y;
    // set pointer to first scanline

    illum = begi + didy * start_y;
    // illumination of first pixel of first scanline

    didx_fp = (int)(didx * 65536);
    // 16:16 fixed point

    while(1) {
        l = left_x >> 16;
        c = right_x >> 16;
        // back to int

        k = (int)(illum * 65536) + c * didx_fp;
        // 16:16 fixed point

        for(; c < l; c ++) {
            r_k = k >> 16;
            // back to int ...

            if(r_k > 255)
                r_k = 255;
            if(r_k < 0)
                r_k = 0;
            // check limits ... (shouldn't be here, but our
            // rasterizer isn't sub-pixel accurate)

            video[c] = ((((color & 0xff0000) * r_k) & 0xff000000) +
                (((color & 0x00ff00) * r_k) & 0x00ff0000) +
                (((color & 0x0000ff) * r_k) & 0x0000ff00)) >> 8;
            // mix color
            
            k += didx_fp;
            // next pixel
        }
        // segment drawing loop

        video += n_Width;
        illum += didy;
        // next scanline

        if(-- p_left_edge->num == 0) {
            if(p_left_edge -- == left_edge_buff)
                return 1;
            // check for end of polygon
            left_x = p_left_edge->x;
        } else
            left_x += p_left_edge->dxdy;

        if(-- p_right_edge->num == 0) {
            p_right_edge ++;
            right_x = p_right_edge->x;
        } else
            right_x += p_right_edge->dxdy;
    }

    return 1;
}

It's not going to be this simple, because of perspective distortion. You can see that texturemapping (ie. any surface-interpolated value) won't be linear :

You can see that nearer quads are bigger than the far-off ones ...

But i can't make you down. So i'll put here smooth shading example. It's practically the same like before, but we simply average neighbour faces normals to get vertex pseudonormals, that are used for lighting. So we have lighting value per vertex, not per face like before. (note we must include illumination to edge-splitting function in clipping pyramid !) You can see function ComputeVertexNormals() in load3ds.cpp module. The example is running with directx 7, the only difference is that you must link ddraw.lib and dxguid.lib. (and speed, of course) :

Smooth engine - Font engine + Antialiassing

Hope you like it :-), i had to do some work ... It's still using painter's algorithm, simple font-engine is included and it's using something-like quincunx for image antialiassing. But now we move on ...

Perspective correct mapping

I already told you mapping won't be linear. But how to make it linear ? We will use the fact that z itself is not linear, but 1 / z is linear. When interpolating more coordinates, we will have to multiply them by 1 / z. But that means we will have to recompute them back when drawing ... well, we wouldn't have to do it with z, we would just inverted comparation sign when drawing (far objects will have bigger z than near, but smaller 1 / z !) But when texturemapping we would have to do two divisions per pixel ! (assume 2d texture) Allright, if you wan't write high qulity rasterizer you will have to. But for realtime maniacs like me (and some of you) we have a simple optimalization. (don't say Carmack, don't say Carmack ... well it was propably him ...) We will do division each 16-th pixel and between them just linearly interpolate with certain error, that will be bigger on short and dip surfaces :

We won't normally recognize that error, only when looking parallel with some surface when close to it. (but collision detection can fix some of that case simply by don't letting camera to be too close) Now look at some code :


int n_Width, n_Height;
// screen dimensions

__int32 *z_buffer;
// allocated to size n_Width * n_Height

float dzdx, dzdy, begz;
// mapping constants (only Z)

int Compute_Interpolation_Consts(Polygon *p_poly)
{
    float x[2], y[2], z[2];
    float a, b, c;
    
    x[0] = p_poly->vertex[1].x - p_poly->vertex[0].x;
    x[1] = p_poly->vertex[2].x - p_poly->vertex[0].x;

    y[0] = p_poly->vertex[1].y - p_poly->vertex[0].y;
    y[1] = p_poly->vertex[2].y - p_poly->vertex[0].y;
    
    z[0] = (1 / p_poly->vertex[1].z) - (1 / p_poly->vertex[0].z); 
    z[1] = (1 / p_poly->vertex[2].z) - (1 / p_poly->vertex[0].z);
    // our vectors
    // notice 1/z is used in place of z

    a = y[0] * z[1] - y[1] * z[0];
    b = z[0] * x[1] - z[1] * x[0];
    c = x[0] * y[1] - x[1] * y[0];
    // cross product

    if(c == 0) 
        return 0;
    
    dzdx = - a / c;
    dzdy = - b / c;
    // interpolation constants ...

    begz = (1 / p_poly->vertex[0].z) -
        p_poly->vertex[0].x * dzdx -
        p_poly->vertex[0].y * dzdy;
    // z-value in top left corner

    return 1;
}
// so we have interpolation constants ...
// now how to use them :

void Interpolate_Segment(int z1, int dz, unsigned __int32 *video,
    unsigned __int32 *v2, __int32 *z_buff, unsigned __int32 color)
{
    do {
        if((unsigned)(*z_buff) > (unsigned)z1){
            *video = color;
            *z_buff = z1;
        }
        // pixel is nearer than before then
        // draw to both color and z-buffer

        video ++;
        z_buff ++;
        // move to next pixel

        z1 += dz;
        // z - interpolation
    } while(video < v2);
    // it's faster to compare pointers than
    // compare i and extra i ++ when using for
}

void Draw_Segment(int c, int l, float z, __int32 *z_buff,
    unsigned __int32 *video, unsigned __int32 color)
{
    int dz, z1, z2;

    z1 = (int)(0x10000 / z);
    // back to linear values in 16:16 fp

    while(c >= 16) {
        z += dzdx * 16;
        // every 16th pixel for optimalisation purposes

        z2 = (int)(0x10000 / z);
        // next linear z

        dz = (z2 - z1) / 16;
        // compute delta-z for interpolation
        // along our segment (in fact it's linear dzdx)

        Interpolate_Segment(z1, dz, video, video + 16, z_buff, color);
        // call drawing loop

        c -= 16;
        // reduce pixel count by 16
        
        z_buff += 16;
        video += 16;
        // shift pointers to next 16 pixels

        z1 = z2;
        // clever saving of computations
    }

    if(c > 0) { // draws the rest of segment (less than 16 pixels)
        z += dzdx * c;

        z2 = (int)(0x10000 / z);

        dz = (z2 - z1) / 16;

        Interpolate_Segment(z1, dz, video, video + c, z_buff, color);
    }
}

int DrawPolygon(Polygon *p, unsigned __int32 *video, unsigned __int32 color)
{
    int start_y, left_x, right_x;
    __int32 *z_buff;
    int l, c;
    float z;

    if(!Create_Edges(p)) 
        return 0;

    if(!Compute_Interpolation_Consts(p)) 
        return 0;

    start_y = p_right_edge->y;
    left_x = p_left_edge->x;
    right_x = p_right_edge->x;
    // first edge properties

    video += n_Width * start_y;
    z_buff = &z_buffer[n_Width * start_y];
    // set video and z-buffer to the first line of raster

    z = begz + dzdy * start_y;
    // Z on first scanline

    while(1) {
        c = left_x >> 16;
        l = right_x >> 16;
        // back to int

        c -= l ++;

        if(c > 0)
            Draw_Segment(c, l, z + dzdx * l, z_buff + l, video + l, color);
        // draw our segment

        video += n_Width;
        z_buff += n_Width;
        z += dzdy;
        // move to next scanline

        if(-- p_left_edge->num == 0) {
            if(p_left_edge -- == left_edge_buff)
                return 1;
            // end of buffer is end of polygon
            left_x = p_left_edge->x;
        } else 
            left_x += p_left_edge->dxdy;

        if(-- p_right_edge->num == 0) {
            p_right_edge ++;
            right_x = p_right_edge->x;
        } else 
            right_x += p_right_edge->dxdy;
    }

    return 1;
}

void Clean_ZBuffer()
{
    int i, m;

    m = Width * Height;

    for(i = 0; i > m; i ++)
        z_buffer[i] = 0xffffffff;
    // fills z-buffer with maximal values (= very far)
    // (if you fill it with zeros = near, nothing will be drawn !)
    // don't norget to call otherwise it will draw rubbish !
}

Have a look at the picture :
z-buffer engine

Z_Buffer + Flat shading

And we have engine with z-buffering. Works pretty cool and accurate (and slow). Major disadvantage of z-buffer is overdraw. Some pixels could (and will be) overdrawn even arround 10 times ! So in the next part we will introduce the BSP tree, used also to precompute visibility. Maybe you ask - well, i have 3D card, i don't need that. But when you do some optimalisations, you will be able to afford for example more detail actor models ...
But now the textures ... Maybe you noticed the font engine contains function for bitmap loading. I will put it here just to be complete :


#include <windows.h>
#include <stdio.h>
// in windows.h are some bitmap structures
// (to be concrete BITMAPFILEHEADER and BITMAPINFOHEADER)

struct BMP {
    unsigned __int32 *bytes; // pixels - 32 bpp
    int width, height;       // dimensions of bitmap
};

BMP *Load_TrueColorBMP(char *path)
{
    BITMAPFILEHEADER bmfh;
    BITMAPINFOHEADER bmih;
    int Width, Height;
    char *lpbBuf;
    int BuffLen;
    FILE *BMPfp;
    BMP *tmp;
    int i, j;
    
    if((BMPfp = fopen(path, "rb")) == NULL)
        return NULL;

    fread(&bmfh, sizeof(bmfh), 1, BMPfp);
    fread(&bmih, sizeof(bmih), 1, BMPfp);
    if(bmfh.bfType != ('B'|('M' << 8)))
        return NULL; 

    if(bmih.biBitCount == 24 && bmih.biCompression == BI_RGB) {
        if((tmp = (BMP*)malloc(sizeof(BMP))) == NULL)
            return NULL;

        Width = bmih.biWidth;
        Height = bmih.biHeight;

        tmp->n_Width = Width;
        tmp->n_Height = Height;
        if((tmp->bytes = (unsigned __int32*)malloc(Width * Height * 4)) == NULL)
            return NULL;
        BuffLen = ((Width * 3 + 3) >> 2) << 2;

        if((lpbBuf = (char*)malloc(BuffLen)) == NULL)
            return NULL;

        for(i = Height - 1; i >= 0; i --) {
            if((j = fread(lpbBuf, 1, BuffLen, BMPfp)) != BuffLen)
                return NULL;
            for(j = 0; j < Width; j ++) {
                tmp->bytes[j + Width * i] = RGB(lpbBuf[j * 3],
                    lpbBuf[j * 3 + 1], lpbBuf[j * 3 + 2]);
            }
        }
    } else
        return NULL;

    free(lpbBuf);
    return tmp;
}

I won't go deep ... first read headers, check if it's really bitmap, if it's 24bpp and uncompressed and then alloc memory and read pixels.
So, you should know how to make perspective non-correct mapping. But how to make perspective correct ? We will certeinly want to use linear interpolation, because it's fast. So we will make from non-linear values linear values. How ? By multiplying 1 / z :


int Compute_Interpolation_Consts(Polygon *p_poly)
{
    float x[2], y[2], z[2], u[2], v[2];
    float a, b, c;
    
    x[0] = p_poly->vertex[1].x - p_poly->vertex[0].x;
    x[1] = p_poly->vertex[2].x - p_poly->vertex[0].x;

    y[0] = p_poly->vertex[1].y - p_poly->vertex[0].y;
    y[1] = p_poly->vertex[2].y - p_poly->vertex[0].y;
    
    z[0] = (1 / p_poly->vertex[1].z) - (1 / p_poly->vertex[0].z); 
    z[1] = (1 / p_poly->vertex[2].z) - (1 / p_poly->vertex[0].z);
    
    u[0] = 1 / p_poly->vertex[1].z * p_poly->vertex[1].u -
        1 / p_poly->vertex[0].z * p_poly->vertex[0].u;
    u[1] = 1 / p_poly->vertex[2].z * p_poly->vertex[2].u -
        1 / p_poly->vertex[0].z * p_poly->vertex[0].u;

    v[0] = 1 / p_poly->vertex[1].z * p_poly->vertex[1].v -
        1 / p_poly->vertex[0].z * p_poly->vertex[0].v;
    v[1] = 1 / p_poly->vertex[2].z * p_poly->vertex[2].v -
        1 / p_poly->vertex[0].z * p_poly->vertex[0].v;
    // our plane - parallel vectors
    // vectors u and v are multiplied by 1 / z

    a = y[0] * z[1] - y[1] * z[0];
    b = z[0] * x[1] - z[1] * x[0];
    c = x[0] * y[1] - x[1] * y[0];
    // cross product

    if(c == 0) 
        return 0;
    
    dzdx = - a / c;
    dzdy = - b / c;
    // interpolation consts for z (to be precise - 1/z)

    begz = (1 / p_poly->vertex[0].z) -
        p_poly->vertex[0].x * dzdx -
        p_poly->vertex[0].y * dzdy;
    // compute - d / c
    // Z (1/z)

    a = y[0] * u[1] - y[1] * u[0];
    b = u[0] * x[1] - u[1] * x[0];
    //c = x[0] * y[1] - x[1] * y[0];
    // c - component will the same for u, v and z

    dudx = - a / c;
    dudy = - b / c;
    // interpolation consts for U (u/z)

    begu = (1 / p_poly->vertex[0].z * p_poly->vertex[0].u) -
        p_poly->vertex[0].x * dudx - p_poly->vertex[0].y * dudy;
    // cumpute - d / c
    // U (u/z)
    
    a = y[0] * v[1] - y[1] * v[0];
    b = v[0] * x[1] - v[1] * x[0];
    //c = x[0] * y[1] - x[1] * y[0];
    // and same again ...

    dvdx = - a / c;
    dvdy = - b / c;
    // consts for V (v/z)

    begv = (1 / p_poly->vertex[0].z * p_poly->vertex[0].v) -
        p_poly->vertex[0].x * dvdx - p_poly->vertex[0].y * dvdy;
    // compute - d / c
    // V (v/z)

    return 1;
}

Next we use almost the same source for drawing polygons as for z-buffer (but interpolate u-s and v-s moreover), but we will need to modify routine for drawing segments :


int DrawPolygon(Polygon *p, unsigned __int32 *video)
{
    int start_y, left_x, right_x;
    __int32 *z_buff;
    float z, u, v;
    int l, c;

    if(!Create_Edges(p)) 
        return 0;

    if(!Compute_Interpolation_Consts(p)) 
        return 0;

    start_y = p_right_edge->y;
    left_x = p_left_edge->x;
    right_x = p_right_edge->x;
    // first edge properties

    video += n_Width * start_y;
    z_buff = &z_buffer[n_Width * start_y];
    // set video a z-buffer (just helper pointer,
    // not the global one !) to the first scanline

    z = begz + dzdy * start_y;
    u = begu + dudy * start_y;
    v = begv + dvdy * start_y;
    // first scanline coords ...

    while(1) {
        c = left_x > 16;
        l = right_x > 16;
        // back to int

        c -= l ++;

        if(c &ht; 0) {
            Draw_Segment(c, l, u + dudx * l, v + dvdx * l,
                z + dzdx * l, z_buff + l, video + l);
        }
        // draw horizontal segment

        video += n_Width;
        z_buff += n_Width;
        z += dzdy;
        u += dudy;
        v += dvdy;
        // move to next scanline

        if(-- p_left_edge->num == 0) {
            if(p_left_edge -- == left_edge_buff)
                return 1;
            // check for end of polygon
            left_x = p_left_edge->x;
        } else 
            left_x += p_left_edge->dxdy;

        if(-- p_right_edge->num == 0) {
            p_right_edge ++;
            right_x = p_right_edge->x;
        } else 
            right_x += p_right_edge->dxdy;
    }

    return 1;
}

void Draw_Segment(int c, int l, float u, float v, float z,
    __int32 *z_buff, unsigned __int32 *video)
{
    float dz, z1, z2;
    int du, u1, u2;
    int dv, v1, v2;

    z1 = (0x10000 / z);
    // back to linear, to 16:16 fp

    u1 = (int)(u * z1);
    v1 = (int)(v * z1);
    // u and v is de-linearized using z

    while(c >= 16) {
        z += dzdx * 16;
        u += dudx * 16;
        v += dvdx * 16;
        // for optimalisation purposes delinearize only each 16th pixel

        z2 = (0x10000 / z);
        // delinearize next z
        
        u2 = (int)(u * z2);
        v2 = (int)(v * z2);
        // u and v is de-linearized using z

        dz = (z2 - z1) / 16;
        du = (u2 - u1) / 16;
        dv = (v2 - v1) / 16;
        // compute deltas

        Interpolate_Segment(u1, v1, z1, du, dv, dz, video, video + 16, z_buff);
        // call drawing loop

        c -= 16;
        // substract 16 drawn pixels ...
        
        z_buff += 16;
        video += 16;
        // move to next pixels

        z1 = z2;
        u1 = u2;
        v1 = v2;
        // use precomputed values ...
    }

    if(c > 0) { // draws the rest of segment (less than 16 pixels)
        z += dzdx * c;
        u += dudx * c;
        v += dvdx * c; // won't cause big error when multiplied by 16

        z2 = (0x10000 / z);
        u2 = (int)(u * z2);
        v2 = (int)(v * z2);

        dz = (z2 - z1) / c;
        du = (u2 - u1) / c;
        dv = (v2 - v1) / c;
        // ... and divided (compiller will use shifts)

        Interpolate_Segment(u1, v1, z1, du, dv, dz, video, video + c, z_buff);
    }
}

So, Interpolate_Segment() gets coordinates of texture and z in 16:16 fixed point. We could normally multiply them to get the index of pixel, but there is simple trick how to make it faster. You propably know texture has to have size of power of two. So the multiplying can be replaced with bit shift and modulo division (because the texture can repeat multiple times on a single surface) by logical ands :


void Interpolate_Segment(int u1, int v1, int z1,
    int du, int dv, int dz, unsigned __int32 *video,
    unsigned __int32 *v2, __int32 *z_buff)
{
    do {
        if((unsigned)(*z_buff) > (unsigned)z1) {
            *video = _texture[((u1 >> 16) & 0xff) + ((v1 >> 8) & 0xff00)];
            *z_buff = z1;
        }
        // apply texture

        video ++;
        z_buff ++;
        // next pixel

        z1 += dz;
        u1 += du;
        v1 += dv;
        // coordinate interpolation
    } while(video < v2);
}

So, that's our oldloop from z-buffer code, but instead of color the texture is drawn. What the code means ? "u1 >> 16" is conversion of U (x-coordinate of texture) from 16:16fp to int and "& 0xff" means texture will be 256 pixels wide. "v1 >> 8" means conversion of V (y-coord) from 16:16fp to int and at the same time multiply by 256 to get the exact index in texture. The "& 0xff00" means texture will be 256 pixels high, but V is already multiplied by 256 ...
It shouldn't be hard to replace these numbers by variables and support textures of variable size, not only 256 per 256. So first ">> 16" will be always there, "& 0xff" will be replaced with "& (tex_width - 1)". ">> 8" will be replaced with ">> (16 - log₂(tex_height))" and final "& 0xff00" replace with "& ((tex_height - 1) * tex_width)". Just one advice - don't use log(tex_height) / log(2) to obtain logarithm with base of 2, it's sometimes inaccurate and texture will be displayed in half resolution. (the result is 7.9999, but when rounding, the result will be 7 ... - and using round() seems not good to me) Write your own log₂() based on some loop ...
So stop reading now, get your source and get outa here (it's gonna blow ! ;-)) :

z-buffer engine 2

Z_Buffer + Texturemapping

back

Valid HTML 4.01