Stacks
Hardware stacks
Area of memory that grows and shrinks according to the LIFO (last in, first out) principle
Stack models
There are four models of different position sp register points.
Full Ascending
Empty Ascending
Full Descending
Empty Descending
AAPCS specifies Full Descending stack, 8-byte width.
Stack pointer points to an address divisible by 8, say, 0x400.
In the block of 0x400, there are 8 bytes. The next block is 0x408.
Store multiple decrement before (overwrite)
STMDB sp!, {r0, r1}
SUB sp, sp, #8 STR r0, [sp] STR r1, [sp, #4]
push {r0, r1}
STM can operate on other registers, whilst push only operates on the stack pointer.
STMDB sp!, {r0, r1} ADD sp, sp, #8 @ equivalent to pop, but pop doesn't work here.
LDMIA sp!, {r0, r1} @ increment after (pop)Published on November 20, 2015
Reference
Basic Char syntax
?\a ⇒ 7 ; control-g, C-g ?\b ⇒ 8 ; backspace, BS, C-h ?\t ⇒ 9 ; tab, TAB, C-i ?\n ⇒ 10 ; newline, C-j ?\v ⇒ 11 ; vertical tab, C-k ?\f ⇒ 12 ; formfeed character, C-l ?\r ⇒ 13 ; carriage return, RET, C-m ?\e ⇒ 27 ; escape character, ESC, C-[ ?\s ⇒ 32 ; space character, SPC ?\\ ⇒ 92 ; backslash character, \ ?\d ⇒ 127 ; delete character, DELPublished on November 20, 2015
Bitwise Instructions
A | B | A AND B | A OR B | NOT A | A XOR B | A BIC B |
---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 1 | 0 | 0 |
0 | 1 | 0 | 1 | 1 | 1 | 0 |
1 | 0 | 0 | 1 | 0 | 1 | 1 |
1 | 1 | 1 | 1 | 0 | 0 | 0 |
XOR
[A AND (NOT B)] OR [(NOT A) AND B]
A XOR 0 = A
A XOR 1 = NOT A
In ARM | In C | |
---|---|---|
AND | AND r0, r1,r2 | r0 = r1 & r2; |
OR | ORR r0, r1, r2 | r0 = r1 || r2; |
XOR | EOR r0, r1, r2 | r0 = r1 ^ r2; |
BIC | BIC r0, r1, r2 | r0 = r1 & (~r2) |
NOT | MVN r0, r1 | r0 = ~r1 |
ARM
MVN r0, #0
-1 is stored in r0.
#0 is automatically extended to 32 bits of 0’s. Flipping 32 0’s to 32 1’s gets -1 in two’s complement.
ADD r3, r0, r1
r1 is the mask.
Bit shifts and multiplication
LSL
Logical shift by n bits – unsigned multiplication by 2n
LSL r0, r0, #4
The above code multiplies r0 with 24, i.e. 16.
LSR
Logical shift by n bits – unsigned division by 2n
ASR
Arithmetic shift by n bits – signed division by 2n
ROR
Logical rotate by n bits – 32 bit rotate
ror r1, r0, #1
Rotate amount is non-negative.
Shift 4 bits: r0 = 0x12345678, r1 would be 0x81234567.
Shift 8 bits: r0 = 0x12345678, r1 would be 0x78123456.
ARM | C | |
---|---|---|
LSL | LSL r0, r1, #n | r0 = r1 << n; |
LSR | LSR r0, r1, #n | r0 = (unsigned int) r1 >> n; |
ASR | ASR r0, r1, #n | r0 = r1 >> n; |
ROR | ROR r0, r1, #n | UNFINISHED |
RSB
Reverse subtract, used for easier multiplication
RSB r0, r1, r2 @ r0 = r2 - r1
RSB r10, r9, r9, LSL #3 @ r10 = r9 * (8 - 1)
Addressing modes
ADD r0, r1, r2, LSL #2 @ r0 = r1 + (r2 << 2)
LSL r2, r2, #r2 ADD r0, r1, r2
LDR r0, [r1, r2] @ r0 = [r1 + r2]
LDR r0, [r1, r2, LSL #n] @ LSL applied to r2
ADD r0, r1, #0xff, #8 @ r0 = r1 + 0xff000000 @ performs ROR, #8 is the rot
ADD r2, r2, LSL, #2 @ r2 = 5 * r2Published on November 6, 2015
ARM Nested Procedures
Nested Procedures
int sumSquare(int x, int y){ return (mult(x, x) + y); }
sumSquare: push {r4-r11, lr} MOV r4, r1 MOV r1, r0 BL mult ADD r0, r0, r4 pop {r4-r11, lr} BX lr @ then?
main: MOV r0, #2 MOV r1, #3 BL sumSquare MOV r3, r0
Without pushing lr, calling another function overwrites it.
String constants in ARM
x = 0xaabbccdd; printf("%d\n", x);
// how printf works %int printf(const char *format, ...)
Where to store
- String “%d\n” — in data segment
- Value 0xaabbccdd —
String formats (unfinished)
.ascii
.asciz
.data mystr: .asciz "%d\n" mynum: .word 0xaabbccdd print_fun: push {lr} LDR r0, =mystr @ location with label mystr is store in r0 LDR r1, =mynum LDR r1, [r1] BL printf pop {lr} BX lr
Alignment
.text .align 2
mystr is placed in a location that is a multiple of 2^2 = 4.
Data objects should start at memory addresses that are divisible by the size of the data object
- short (2-byte) at address divisible by 2
- int (4-bytes) at address divisible by 4
- double (8-byte) at address divisible by 8
struct p { char x; int y; }; struct p sp;
In this case, sizeof(p) is 8. int should be aligned.
Published on November 2, 2015
ARM Loops
While loops
while (a < 0){ a++; }
loop: CMP r0, #0 BGE end ADD r0, r0, #1 @ Body of loop B loop end:
@ Equivalent to above loop: CMP r0, #0 ADDLT r0, r0, #1 BLT loop
Do while
do { a++; } while (a < 0);
@ loop part different to while loop loop: ADD r0, r0, #1 @ body of 'do' CMP r0, #0 BLT loop
For loops
for (i = 0; i < 10; i++){ a++; b--; }
MOV r0, #0 loop: CMP r0, #10 BGE end ADD r1, r1, #1 SUB r2, r2, #1 B loop end:
@ Equivalent to above MOV r0, #0 loop: CMP r0, #10 ADDLT r1, r1, #1 SUBLT r2 ,r2, #1 ADDLT r0, r0, #1 BLT loop
C functions
Caller and callee functions:
main() { // main in this case is a caller int a = 10, b = 20; int c = sum(a, b); } int sum(int x, y){ // sum is a callee return x + y; }
Steps in function call
- Pass parameters (using registers)
- Call functions (sum)
- Compute return value, put it in the right register
- Transfer control back to caller (return)
1 and 2 are in main();
Making the function call
- Simple branch instruction
BL sum @ function call BX @ return
... B sum return ... ... sum: ADD r0, r0, r1 B return_loc ...
Problems:
- Cannot call a function twice
- Always go to return_loc
Reason:
- sum() might be called by many functions, so we cannot return to a fixed place.
- The calling proc to sum() must be able to say “return back here” somehow.
- BX instruction returns to the location specified in lr.
- BL instruction stores the location of next line so that branch could return to where it was called.
lr (link register): register reserved to store the return address
... BL sum ... ... sum: ADD r0, r0, r1 BX lr ; MOV pc,lr i.e., return ...
Passing arguments
main() { int a = 10, b = 20, c; c = sum(a, b); c++; }
main: MOV r0, #20 MOV r1, #20 BL sum sum: ADD r0, r0, r1 @ return value in r0 BX lr
Register conventions
A set of generally accepted rules as to which registers are guaranteed to be unchanged after a procedure call and which may be changed.
r0 – r3
Arguments and return values, otherwise corruptible.
r4 – r11
Callee has to make sure that the value in r4-r11 are unchanged by it.
r12 scratch register
Push and pop
Push registers onto, and pop registers off a full descending stack (usually within function call).
push {r4-r11} ... pop {r4-r11}Published on October 30, 2015
Directed Acyclic Graphs
Def Directed graph with no cycles are called directed acyclic graphs.
Lemma 1: every DAG has at least one source (proof by contradiction)
Lemma 2: If v is a souce vertex of G, then G is a DAG if and only if G-v is a DAG
Find topological ordering
While G has at least one vertex
If G has some source,
Choose one source and output it
Delete the source and all its outgoing edges from G
Else
Return that G is not a DAG
Compile By Hand
Basic arithmetic
g = h + A[8]; // r0 stores A array; r1 stores h; r2 to store g.
ADD r0, r0, #32 LDR r3, [r0] @ r0 = &A + 8 * 4, since int is of 4 bytes.
LDR r3, [r0, #32]
Accessing arrays
void foo (int *p, int size){ // p stored in r2, size stored in r1 *p = 0; *(p + 1) = 0; *(p + size - 1) = 0; }
foo: MOV r2, #0; @ store 0 in *p STR r2, [r0] @ *p = 0 MOV r3, #4 ADD r4, r0, r3 @ p + 1 in C STR r2, [r4] @ store 0 in *(p + 1) SUB r1, r1, #1 @ r1 = size - 1 MUL r4, r1, r3 @ r4 = r1 * r3 STR r2, [r0, r4] @ store 0 in *(p + size - 1)
Data transfer instructions
void swap (int *x, int *y){ // x stored in r0, y stored in r1 int tmp = *x; *x = *y; *y = tmp; }
swap: LDR r2, [r0] @ Load *x from memory and store in r2 LDR r3, [r1] @ Load *y from memory and store in r3 STR r3, [r0] @ Store r3 in register to *x in memory MOV r2, [r1] @ Store r2 (tmp) in register to *y in memory
- For short, LDR -> LDRSH, STR -> STRSH
- For unsigned short, LDR -> LDRH, STR -> STRH
Linked lists
- head -> *next -> *next -> … -> value
head -> value = 100; head -> *next -> value = 100; // r0 stores the first value
MOV r1, #100 STR r1, [r0] @ store to memory ... LDR r3, [r0, #4] @ store 100 to next value STR r1, [r3]Published on October 26, 2015
ARM Instruction
ARM goto Instruction
goto label (C) is the same as B label in ARM.
Conditional Branch
- Set the condition bits (N, Z, V, C) in the program status register.
- Then check on these condition bits to branch conditionally.
Conditional bits
N = 1 if result is Negative
Z = 1 if result is Zero
V = 1 if there’s overflow
C = 1 if there’s carry out
SUBS (with the ending S) sets conditional bits in cpsr register.
SUBS r1, r0, #0
r0 = r1 – #0, and set the condition bits NZVC based on the result.
If r1==r2, then N = 0, Z= 1, V = 0, C =1
If r < r2 (no overflow), then N =1, Z = 0, V=0
Another way to do that is to use a ‘Comparison Instruction’
CMP r1, r2 BEQ label @if Z = 1, then go to label
- Compare and set condition bits
- Subtract a register or an immediate value from a register value and updates status register
- Then check on these condition bits to branch conditionally
COM r0, #4 BEQ this_instr ADD r4, r5, r6 B end this_instr: SUB r0, r0, #1 end:
if(r0 == 4) r0 --; r4 = r5 + r6;
if(r0 != 4) r4 = r5 + r6; r0 --;
ARM Decision Instructions
BEQ label | == | BRANCH EQUAL |
BNE label | != | BRANCH NOT EQUAL |
BLE label | <= | BRANCH LESS THAN EQUAL |
BLT label | < | BRANCH LESS THAN |
BGE label | >= | BRANCH GREATER THAN EQUAL |
BGT label | > | BRANCH GREATER THAN |
The condition is T/F based upon the fields in the Program Status Register.
If…else
if (x==y) x += y; else x -= y;
CMP r01, r1 BNE else_aprt ADD r0, r0, r1 B end else_part: SUB r0, r0,r1 end:Published on October 23, 2015
ARM Memory Map / Basic Instructions
ARM Memory Map
Section (high address to low address) | Function |
---|---|
OS and Memory-Mapped IO | |
Dynamic Data | Stack and heap |
BSS | Uninitialized global data Zero-initialized global data Static data |
Data | Initialized global data |
Text | Machine code (read only) |
Exception Handlers |
- “Text” (instructions in machine language)
- “Data” contains any global or static variables which have a pre-defined value and can be modified. That is any variables that are not defined within a function (and thus can be accessed from anywhere) or are defined in a function but are defined as static so they retain their value across subsequent calls.
- “BSS” also known as uninitialized data, is usually adjacent to the data segment. The BSS segment contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code.
- “Heap” (for dynamically allocated data)
- “Stack” (for function local variables)
- Heap and stack change in size as the program executes.
View the memory map using “size”
$ gcc -O foo.c -c
$ size foo.o
text data bss dec hex filename
52 0 0 52 34 foo.o
Program Execution
Interactions between CPU, registers, and memory. Each instruction (4 bytes) handles the data from registers (more frequently) and memory.
Registers
Each ARM register is 32 bits wide. There are 30 general purpose registers (6 status registers, 1 program counter).
r0 to r12, general
r13, stack pointer,
r14, subroutine Link Register
r15, program counter
Basic Type of Instructions
- Arithmetic (involves only CPU and registers)
Compute the sum (or difference) of two registers, and store the value in a register.
Move the contents of one register to another
- Memory Instructions (transfer data between memory and registers)
- Control Transfer Instructions: Change flow of execution
In C: a = b + 10; In ARM: ADD r0, r1, #10
#10 is an immediate (constant), which has fixed length of 4 bytes.
Assignment Instructions
In C: a = b; // a and b are integers a = 10; In ARM: mov r0, r1 @r0 = r1
Data Transfer
Separate instructions to transfer data between registers and memory:
- Memory to register (LOAD)
- Register to memory (STORE)
Load/store Syntax
LDR r0, [r1]
- r1 must be a register that contains a valid memory location (as a pointer).
- Copy 4 bytes from the location pointed by r1 into r0
- Equivalent to r0 = * r1; (in C)
STR r0, [r1]
- r1 stores the address of destination memory.
- Store contents of r0 to location pointed by r1.
- *r1 = r0; (in C)
In Little Endian machines (ARM and Intel x86), the least significant byte (like 5 in 0x00000005) goes into the lowest memory address (like 0x200 of 0x200 to 0x0203).
Variations on LDR/STR
LOAD
LDRH r0, [r1] @ r1 points to 2 bytes (unsigned)
- Fill the significant bits with zeros.
- Treat content pointed by r1 as unsigned.
LDRSH r0, [r1] @ r1 points to 2 bytes (signed)
- Treat content pointed by r1 as signed.
- Identify if the most significant bit is 0 or 1 (positive or negative) and do sign expansion.
- Fill F’s in the spare bits.
LDRB r0, [r1] @ r1 points to 1 byte (unsigned)
LDRSB r0, [r1] @ r1 points to 1 byte (signed)
STORE
STR r0, [r1] @ store 4 bytes STRH r0, [r1] @ store least significant 2 bytes STRB r0, [r1] @ store least significant byte
There’s no STRSH or STRSB, since the two bytes or one byte can be signed or unsigned. Raw data in r0 ought not be changed.
Published on October 19, 2015Struct
A struct is a data structure composed of simpler data types.
There is no methods or inheritance.
struct point { int x; int y; }
main(){ struct point p1; // get memory allocated for two int p1.x = 5; p2.y = 10; }
void printPoint(struct point p){ printf("(%d,%d)", p.x, p.y); }
struct point *p; p = malloc(sizeof(struct point)); printf("x is %d\n",(*p).x); printf("x is %d\n", p->x);Published on October 16, 2015