Princeton University
COS 217:  Introduction to Programming System

Precept 14:  SPARC Assembly Language Branching and Optimization

Purpose

Help you learn SPARC assembly language branching and optimization

Reading

Paul, Chapters 1, 2, 3, 4, 8, 9 (for several precepts)

Approach

Study many small C programs and corresponding hand-written assembly language programs

After studying each program, refer to summary sheets to reinforce and generalize the new material that the program illustrates

Example:  sum

Illustrates branching instructions

See sum.c, sumflat.c, and sum.S

What it does

Reads two integers, the second of which must be greater than the first

Computes and prints the sum of all integers between the two

How it works

Uses a counting loop, in the obvious way

The code:  sum.c

Obvious

The code:  sumflat.c

Eliminates control structures as a stepping stone toward assembly language

Strongly suggested technique!!!!!!

The code:  sum.S

.word 0

Assembler pseudo-op

Allocate 4 bytes, and initialize them to 0

Appropriate for data and rodata sections

Illegal in bss section

cmp %l0, %l1

Synthetic instruction for subcc %l0, %l1, %g0

Sets condition codes

Sets N iff result's high-order bit is 1
Sets Z iff the result is 0
Sets V iff the carry into the sign bit was unequal to the carry out of the sign bit (for signed arithmetic)
Sets C iff there was a carry out of the sign bit (for unsigned arithmetic)

See summary sheet for more precise descriptions

bg loopend1

Branch instruction

Branch to the location loopend1 if the condition codes indicate a "greater than" condition

See summary sheet for more precise description

A Delayed Control Transfer Instruction (DCTI)

Next instruction (the delay instruction) will be executed before the destination instruction is executed

For now, next instruction should be nop

Subsequent examples will describe how to optimize by eliminating nop instructions

ba loop1

Branch instruction

Branch always to the location loop1

Also a DCTI

Introduced these assembly language features:

Integer branch instructions

ba, bg

See summary sheet for others

Arithmetic instruction

cmp (synthetic)

Pseudo-op

.word

Example:  sumopt1

Illustrates optimization by minimizing memory access

See sumopt1.S

What it does

Same as sum

How it works

Essentially the same, but faster

Minimizes memory access by storing variables exclusively in registers when possible

The code

Variables i and iSum are no longer allocated memory

Register map

Important documentation

Introduced these assembly language features:

(None)

Example:  sumopt2

See sumopt2.S

Illustrates optimization by eliminating nop instructions

What it does

Same as sumopt1

How it works

Essentially the same as sumopt1, but faster

The code

"set, call, nop" sequence

Same as "sethi, or, call, nop"

Replaced with "sethi, call, or"

for loop optimized using standard pattern

ble,a loop1

",a" sets the branch instruction's annul bit

If the branch is not taken, then don't execute (annul) the delay instruction

Trace

Note:  only 2 overhead instructions per loop

Introduced these assembly language features

The annul bit in integer branch instructions

For all integer branch instructions except ba:

Take the branch => execute the delay instruction

Don't take the branch => annul the delay instruction

For the ba instruction

Annul the delay instruction

Patterns for eliminating nop instructions:

SPARC assembly language "for" statement optimization

SPARC assembly language "while" statement optimization

SPARC assembly language "if" statement optimization

SPARC assembly language "if...else" statement optimization

Copyright © 2002 by Robert M. Dondero, Jr.