Significant part of computational problems require floating point calculations. In contrast with integer calculations, floating point calculations allow to perform operations on real numbers with certain precision.
Floating point number has the following structure:
+----------------------+
|Sign|Exponent|Mantissa|
+----------------------+
This structure encodes the following equation:
Note that this representation is different from integer representation meaning that casting of the floating point value as an integer will produce garbage and vice versa. Numbers need to be converted between representations using dedicated FPU commands.
Typically, floating point arithmetic is carried out not in the processor itself but in a Floating Point Unit or FPU coprocessor. FPUs have their own general purpose and control registers. Data mast be moved to the FPU registers before computation from core registers or memory. Keep in mind that frequently in order to conserve energy FPUs are disabled by default. Not all processors and microcontrollers support floating point arithmetic.
ARM FPU provides 16 general purpose registers named s0
to s15
. Their contents may be viewed in debugger via command (providing list of registers):
info registers s0 s1 s2
or by using shorthand:
i reg s0 s1 s2
VMOV{cond}.F<32|64> sd, #C
Move constant #C
into register sd
.
VMOV{cond}.F<32|64> sd, r1
Move convents of register r1
into register sd
.
VMOV{cond}.F<32|64> rd, s1
Move convents of register s1
into register rd
.
If cond
is present, perform operation only if condition met.
vpop {s1, s2, s3}
vpush {s1, s2, s3}
Push and pop FPU registers to and from stack.
Instructions take list of registers as an argument.
vldr{cond}.F<32|64> sd, [r1 {, #off}]
vldr{cond}.F<32|64> sd, label
Load data from memory at address stored in register r1
or label to register sd
.
vstr{cond}.F<32|64> s1, [rd {, #off}]
Store data from register s1
to memory at address stored in register rd
.
#off
is an optional address offset. If cond
is present, perform operation only if condition met.
VCVT{R}{cond}.<S32|U32>.F<32|64>, sd, s1
Convert to integer representation.
VCVT{cond}.F<32|64>.<S32|U32>, sd, s1
Convert to floating point representation.
Convert value in register s1
and store result to register sd
. F32
and F64
are 32 bit or 64 bit floating point representations respectively. S32
and U32
are 32 bit signed and unsigned integer representations respectively. If cond
is present, perform operation only if condition met. If R
is present, rounding is controlled by contents of FPSCR
control register. Otherwise round to 0.
VCVT{cond}.<S16|U16|S32|U32>.F<32|64>, sd, s1, #N
Convert to fixed point representation.
VCVT{cond}.F<32|64>.<U16|S16|S32|U32>, sd, s1, #N
Convert to floating point representation.
Convert value in register s1
and store result to register sd
. F32
and F64
are 32 bit or 64 bit floating point representations respectively. S32
and U32
are 32 bit signed and unsigned integer representations respectively. S16
and U16
are 16 bit signed and unsigned integer representations respectively. #N
is a number of fraction bits. If cond
is present, perform operation only if condition met.
VCVT<B|T>{cond}.F<32|64>.F16, sd, s1
VCVT<B|T>{cond}.F16.F<32|64>, sd, s1
Convert between full-precision and half-precision floating point number representations. Convert value in register r1
and store result into rd
. B
or T
to use bottom or top half of number for conversion respectively.
VCMP{E}{cond}.f<32|64> <s1>, <s2>
VCMP{E}{cond}.f<32|64> <s1>, #0.0
Compare s1
and s2
or s1
and . If cond
is present, perform operation only if condition met. If E
is present, NaN
causes Invalid Operation exception.
Note that conditional code must be preceded by IT
operation.
<op>{cond}.f<32|64> <sd>, <s1>
Perform operation <op>
on 32 bit or 64 bit number in register s1
and put result into register sd
. If cond
is present, perform operation only if condition met.
<op>
is one of the following:
<op>{cond}.f<32|64> <sd>, <s1>, <s2>
Perform operation <op>
on 32 bit or 64 bit numbers in registers s1
and s2
and put result into register sd
. If cond
is present, perform operation only if condition met.
<op>
is one of the following:
These operations are fused. They improve accuracy but may return wrong result in some edge cases:
.syntax unified
.section .isr_vector,"a" @ 1
b _reset @ entry point
b _reset @ Reset: relative branch allows remap
b . @ Undefined Instruction
b . @ Software Interrupt
b . @ Prefetch Abort
b . @ Data Abort
b . @ Reserved
b . @ IRQ
b . @ FIQ
.section .data
@@ your constants go here
@@ your constants go here
.section .bss
@@ your varables go here
@@ your varables go here
.section .text
_reset:
.extern _stack_start @ 2
_init_stack:
ldr sp, =_stack_start @ 3
_enable_fpu: @ 4
mrc p15, 0, r0, c1, c0, 2
orr r0, r0, #0x300000 @ single precision
@@ orr r0, r0, #0xC00000 @ double precision
mcr p15, 0, r0, c1, c0, 2
mov r0, #0x40000000
fmxr fpexc, r0
.global _start
_start:
@@ your code goes here
@@ your code goes here
@@ your code goes here
wait:
add ip, ip, #1
b wait
nop
.end
Code explanation line-by-line:
_start_stack
variable is defined in linker script. This line tells assembler not to look for it locallyThe following code goes inside .data section in template code.
.section .data
x: .single 2.53 @ 1
y: .single 3.17
Code explanation line-by-line:
.double
directiveThe following code goes inside _start function in template code.
_start:
ldr r0, =x @ 1
ldr r0, [r0]
ldr r1, =y
ldr r1, [r1]
ldr r3, =a
ldr r4, =b
mov r2, #5 @ 2
vmov s0, r0 @ 3
vmov s1, r1
vmov s2, r2
vmov s3, #2 @ 4
vcvt.f32.u32 s2, s2 @ 5
vadd.f32 s4, s0, s1 @ 6
vsub.f32 s5, s1, s2
vdiv.f32 s6, s2, s3
vmul.f32 s7, s3, s4
vabs.f32 s8, s5
vpush {s0, s1, s2, s3} @ 7
vpush {s4, s5, s6}
vstr s7, [r3] @ 8
vstr s8, [r4]
Code explanation line-by-line: