How to find the time complexity of 6502 Assembly Code

How to find the time complexity of 6502 Assembly Code

First of all, in order to run and assemble our 6502 assembler code, we would need a 6502 chip or an Emulator, in this blog post we will be using a 6502 Emulator which can be found here.

The Emulator looks something like this:

The Emulator has the following Sections:

  • Input Box - This is the box on the left and it is used for writing our assembly code

  • A Bitmap Display - This is the black display in the middle which is used for Graphical Output

  • Console Box - This is the box on the right which acts as the I/O for our Program

Let's start by finding the efficiency of the following source code written for the 6502 Assembler:

         LDA #$00    ; set a pointer in memory location $40 to point to $0200
         STA $40        ; ... low byte ($00) goes in address $40
         LDA #$02    
         STA $41        ; ... high byte ($02) goes into address $41
         LDA #$07    ; colour number
        LDY #$00    ; set index to 0
 LOOP:    STA ($40),y    ; set pixel colour at the address (pointer)+Y
         INY        ; increment index
         BNE LOOP    ; continue until done the page (256 pixels)
         INC $41        ; increment the page
         LDX $41        ; get the current page number
         CPX #$06    ; compare with 6
         BNE LOOP    ; continue until done all pages

What does this code do?

This code-snippet when run on the emulator colors the bitmap display bright yellow.

How to find the Run-Time of the program?

First Step: Calculate the number of processor cycles the program takes to complete.

We can do that by referring to 6502's Documentation, where the processor cycles for each command is specified. So let's start by calculating the number of cycles for each line of our code. Commands like BNE (branch on not equal) have different cycle counts depending if the result is zero or not.

LabelCommandCyclesCycle CountAlt CyclesAlt CountTotal
LDA #$00212
STA $40313
LDA #$02212
STA $41313
LDA #$07212
LDY #$00212
loop:STA ($40),y610246144
INY210242048
BNE LOOP31020243068
INC $415420
LDX $413412
CPX #$06248
BNE loop332111

So if we add all the values in this table, we would get the total number of cycles of the program.

$$2+3+2+...+8+11=11325$$

Second Step: Now that we have the total number of cycles we can easily get the run-time of the program by multiplying it by the time it takes for our 6502 processor to complete one cycle. The clock speed for our 6502 processor is 1 microsecond (that is it takes one microsecond to complete one clock cycle).

$$Runtime=Cycles\times ClockSpeed$$

Total CyclesRuntimeUnit
1132511325Micro Seconds
11.325Milli Seconds
0.011325Seconds

Now let us find ways to improve the runtime:

If we observe this table, the runtime of the whole program majorly depends on these 3 lines of code:

CommandCyclesCycles CountAlt CyclesAlt CountTotal Cycles
STA ($40),y610246144
INY210242048
BNE LOOP31020243068

Since this is a loop, we can try loop unrolling(Unrolling means to repeating the inner portion of the loop multiple times making multiple copies of the code. This takes additional space, but reduces the number of times that the loop control condition is evaluated (Tyler, 2024).) the loop. I have unrolled the loop 32 times but "loop unrolling is often beneficial as long as the unrolled loop fits into cache; unrolling past that point will reduce performance"(Tyler, 2024). So our code should look something like this after unrolling.

    LDA #$00       ; Set a pointer in memory location $40 to point to $0200
    STA $40         ; Low byte ($00) goes in address $40
    LDA #$02
    STA $41         ; High byte ($02) goes into address $41
    LDA #$07        ; Color number
    LDY #$00        ; Set index to 0

; Unrolled loop: Process 32 pixels per iteration

LOOP:
    STA ($40),y ; Set pixel color for the next pixel
    INY         ; Increase the value of Y register
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    STA ($40),y
    INY
    BNE LOOP
    INC $41     ; Increment the page
    LDX $41     ; Get the current page number
    CPX #$06    ; Compare with 6
    BNE LOOP    ; Continue until done all pages

Now Let's change the initial code to display light a blue color on the bitmap display. The codes corresponding to the color can be found here. We just have to change the value stored in the accumulator from #7 to #e.

         LDA #$00    ; set a pointer in memory location $40 to point to $0200
         STA $40        ; ... low byte ($00) goes in address $40
         LDA #$02    
         STA $41        ; ... high byte ($02) goes into address $41
         LDA #$e        ; set the colour number to light blue
        LDY #$00    ; set index to 0
 LOOP:    STA ($40),y    ; set pixel colour at the address (pointer)+Y
         INY        ; increment index
         BNE LOOP    ; continue until done the page (256 pixels)
         INC $41        ; increment the page
         LDX $41        ; get the current page number
         CPX #$06    ; compare with 6
         BNE LOOP    ; continue until done all pages

Now let's make the 4 sections of the bit map display of different colors. One way to achieve that would be to store the value of color in a memory location($42), increment the value at that location and load it in the accumulator again.

    LDA #$00    ; set a pointer in memory location $40 to point to $0200
     STA $40        ; ... low byte ($00) goes in address $40
     LDA #$02    
     STA $41        ; ... high byte ($02) goes into address $41
     LDA #$07    ; color number
     STA $42
    LDY #$00    ; set index to 0
 LOOP:    
     STA ($40),y    ; set pixel color at the address (pointer)+Y
     INY            ; increment index
     BNE LOOP    ; continue until done the page (256 pixels)
    INC $42        ; Increasing our color value by one
    LDA $42        ; Loading the new value in the accumulator
     INC $41        ; increment the page
     LDX $41        ; get the current page number
     CPX #$06    ; compare with 6
     BNE LOOP    ; continue until done all pages

Sources:

Tyler, C. (n.d.). Software portability and optimization. matrix.senecapolytechnic.ca. https://matrix.senecapolytechnic.ca/~chris.tyler/wiki/doku.php?id=spo600:start

Did you find this article valuable?

Support Steven David Pillay by becoming a sponsor. Any amount is appreciated!