Help you learn SPARC assembly language branching and optimization
Paul, Chapters 1, 2, 3, 4, 8, 9 (for several precepts)
Study many small C programs and corresponding hand-written assembly language programs
After studying each program, refer to summary sheets to reinforce and generalize the new material that the program illustrates
Illustrates branching instructions
See sum.c, sumflat.c, and sum.S
What it does
Reads two integers, the second of which must be greater than the first
Computes and prints the sum of all integers between the two
How it works
Uses a counting loop, in the obvious way
The code: sum.c
Obvious
The code: sumflat.c
Eliminates control structures as a stepping stone toward assembly language
Strongly suggested technique!!!!!!
The code: sum.S
.word 0Assembler pseudo-op
Allocate 4 bytes, and initialize them to 0
Appropriate for data and rodata sections
Illegal in bss section
cmp %l0, %l1Synthetic instruction for subcc %l0, %l1, %g0
Sets condition codes
Sets N iff result's high-order bit is 1
Sets Z iff the result is 0
Sets V iff the carry into the sign bit was unequal to the carry out of the sign bit (for signed arithmetic)
Sets C iff there was a carry out of the sign bit (for unsigned arithmetic)See summary sheet for more precise descriptions
bg loopend1Branch instruction
Branch to the location loopend1 if the condition codes indicate a "greater than" condition
See summary sheet for more precise description
A Delayed Control Transfer Instruction (DCTI)
Next instruction (the delay instruction) will be executed before the destination instruction is executed
For now, next instruction should be nop
Subsequent examples will describe how to optimize by eliminating nop instructions
ba loop1Branch instruction
Branch always to the location loop1
Also a DCTI
Introduced these assembly language features:
Integer branch instructions
ba, bg
See summary sheet for others
Arithmetic instruction
cmp (synthetic)
Pseudo-op
.word
Illustrates optimization by minimizing memory access
See sumopt1.S
What it does
Same as sum
How it works
Essentially the same, but faster
Minimizes memory access by storing variables exclusively in registers when possible
The code
Variables i and iSum are no longer allocated memory
Register map
Important documentation
Introduced these assembly language features:
(None)
See sumopt2.S
Illustrates optimization by eliminating nop instructions
What it does
Same as sumopt1
How it works
Essentially the same as sumopt1, but faster
The code
"set, call, nop" sequence
Same as "sethi, or, call, nop"
Replaced with "sethi, call, or"
for loop optimized using standard pattern
ble,a loop1",a" sets the branch instruction's annul bit
If the branch is not taken, then don't execute (annul) the delay instruction
Trace
Note: only 2 overhead instructions per loop
Introduced these assembly language features
The annul bit in integer branch instructions
For all integer branch instructions except ba:
Take the branch => execute the delay instruction
Don't take the branch => annul the delay instruction
For the ba instruction
Annul the delay instruction
Patterns for eliminating nop instructions:
SPARC assembly language "for" statement optimization
SPARC assembly language "while" statement optimization
Copyright © 2002 by Robert M. Dondero, Jr.