ARM assembly language
Assembly language programming on ARM microprocessors with examples of working code.
Monday, 12 October 2020
Friday, 7 March 2014
Triangle numbers
The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
Let us list the factors of the first seven triangle numbers:
1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28
We can see that 28 is the first triangle number to have over five divisors.
What is the value of the first triangle number to have over five hundred divisors?
.syntax unified
.align 2
.section .rodata
resstring:
.asciz "%d\n"
.equ last, 250 @ we count up to the square root - so half of the divisors will
@ be necessarily less than the square root and half above.
icount .req r4 @ it is possible to write FORTRAN in any language :-)
jcount .req r5
tmp .req r6
num .req r7
.text
.align 2
.global get_num_divisors
.type get_num_divisors, %function
@ we use jcount to hold current divisor and icount to hold the number of divisors seen
@ num is passed as r0 and the number of divisors is returned.
get_num_divisors:
stmfd sp!, {r4-r7, lr}
mov num, r0
mov icount, 0
mov jcount, 1
jstart:
mov r0, num
mov r1, jcount
bl divide
teq r1, 0
beq factor
b nextj
factor:
add icount, icount, 1
nextj:
add jcount, jcount, 1
mul tmp, jcount, jcount
cmp num, tmp
bgt jstart
mov r0, icount
ldmfd sp!, {r4-r7, pc}
.align 2
.global main
.type main, %function
main:
stmfd sp!, {r4-r7, lr}
mov num, 0
mov icount, 0
mov jcount, 1
loop:
cmp num, #last
bge printme
add icount, icount, jcount
mov r0, icount
bl get_num_divisors
mov num, r0
add jcount, jcount, 1
b loop
printme:
mov r1, icount
ldr r0, =resstring @ store address of start of string to r0
bl printf
mov r0, 0
ldmfd sp!, {r4-r7, pc}
mov r7, 1 @ set r7 to 1 - the syscall for exit
swi 0 @ then invoke the syscall from linux
divide.s
.syntax unified # divide takes value in r0, divisor in r1 and returns dividend in r0 and modulus in r1 .global divide .type divide, %function divide: stmfd sp!, {lr} # see http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf bl __aeabi_uidivmod ldmfd sp!, {pc}
Back on track...
Thursday, 22 August 2013
Greatest product of adjacent numbers in a grid
A note to any readers : This will be the last post for a month.
In the 20
20 grid below, four numbers along a diagonal line have been marked in red.
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48
The product of these numbers is 26
63
78
14 = 1788696.
What is the greatest product of four adjacent numbers in the same direction (up, down, left, right, or diagonally) in the 20
20 grid?
The solution to this problem is quite interesting because it demonstrates the use of a macro. A macro is a construct available within as which permits the same sort of abstraction as a procedure|subroutine|function would in a high-level language.
Inside the macro, the parameter a is accessed via \a and the construct \@ expands out to a unique value each time the macro is invoked; this allows the use in labels which are distinct for each macro invocation.
.syntax unified
.align 2
.section .rodata
buffer:
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 8, 2, 22, 97, 38, 15, 0, 40, 0, 75, 4, 5, 7, 78, 52, 12, 50, 77, 91, 8, 0, 0, 0
.byte 0, 0, 0, 49, 49, 99, 40, 17, 81, 18, 57, 60, 87, 17, 40, 98, 43, 69, 48, 4, 56, 62, 0, 0, 0, 0
.byte 0, 0, 0, 81, 49, 31, 73, 55, 79, 14, 29, 93, 71, 40, 67, 53, 88, 30, 3, 49, 13, 36, 65, 0, 0, 0
.byte 0, 0, 0, 52, 70, 95, 23, 4, 60, 11, 42, 69, 24, 68, 56, 1, 32, 56, 71, 37, 2, 36, 91, 0, 0, 0
.byte 0, 0, 0, 22, 31, 16, 71, 51, 67, 63, 89, 41, 92, 36, 54, 22, 40, 40, 28, 66, 33, 13, 80, 0, 0, 0
.byte 0, 0, 0, 24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50, 0, 0, 0
.byte 0, 0, 0, 32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70, 0, 0, 0
.byte 0, 0, 0, 67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21, 0, 0, 0
.byte 0, 0, 0, 24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72, 0, 0, 0
.byte 0, 0, 0, 21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95, 0, 0, 0
.byte 0, 0, 0, 78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92, 0, 0, 0
.byte 0, 0, 0, 16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57, 0, 0, 0
.byte 0, 0, 0, 86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58, 0, 0, 0
.byte 0, 0, 0, 19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40, 0, 0, 0
.byte 0, 0, 0, 4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66, 0, 0, 0
.byte 0, 0, 0, 88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69, 0, 0, 0
.byte 0, 0, 0, 4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36, 0, 0, 0
.byte 0, 0, 0, 20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16, 0, 0, 0
.byte 0, 0, 0, 20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54, 0, 0, 0
.byte 0, 0, 0, 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
resstring:
.asciz "%d\n"
.equ ocolumns, 26
.equ icolumns, 20
.equ colstart, 23
.equ border, 3
.equ offset, 2
.equ points, 4
.equ north, -ocolumns
.equ north_east, -ocolumns+1
.equ east, 1
.equ south_east, ocolumns+1
.equ south, ocolumns
.equ south_west, ocolumns - 1
.equ west, -1
.equ north_west, -ocolumns-1
.equ point_set, 1
tmp .req r2
pointv .req r3
data_ptr .req r4
icount .req r5
jcount .req r6
kcount .req r7
maxv .req r8
dpstore .req r9
.macro direction a
mov data_ptr, dpstore
mov kcount, points
mov pointv, point_set
kloop\@:
ldrb tmp, [data_ptr], \a
mul pointv, pointv, tmp
subs kcount, kcount, 1
bne kloop\@
cmp maxv, pointv
movlt maxv, pointv @ set maxv to max of maxv and pointv
.endm
.text
.align 2
.global main
.type main, %function
main:
stmfd sp!, {r4-r9, lr}
mov icount, icolumns
iloop:
mov jcount, icolumns
jloop:
mov tmp, ocolumns
add data_ptr, icount, offset
mul tmp, tmp, data_ptr
add tmp, tmp, jcount
add tmp, tmp, offset
ldr data_ptr, =buffer
add dpstore, data_ptr, tmp @ data[i][j]
direction north
direction north_east
direction east
direction south_east
direction south
direction south_west
direction west
direction north_west
subs jcount, jcount, 1
bne jloop
subs icount, icount, 1
bne iloop
printme:
mov r1, maxv
ldr r0, =resstring @ store address of start of string to r0
bl printf
mov r0, 0
ldmfd sp!, {r4-r9, pc}
mov r7, 1 @ set r7 to 1 - the syscall for exit
swi 0 @ then invoke the syscall from linux
Thursday, 15 August 2013
Summing prime numbers
The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.
.syntax unified
.equ word,4
.equ limit,2000000
.equ numprimes4,595732 @ sufficient to hold all the bytes in all the elements of the vector we use to hold the primes
sum_hi .req r4
sum_lo .req r5
numprimes .req r6
primes_ptr .req r7
number .req r8
limit .req r9
.section .bss
.lcomm primes_vector,numprimes4 @ allocate space for vector of primes
.section .rodata
llustring:
.asciz "%llu\n"
.text
.align 8
.global main
.type main, %function
main:
stmfd sp!, {r4-r9, fp, lr}
ldr primes_ptr, =primes_vector
mov numprimes, 1
mov number, 2
str number, [primes_ptr] @ store 2 to primes vector
ldr limit, =limit
mov sum_hi, 0 @ initialise sum
mov sum_lo, 2
mov number, 3 @ start with 3 as first odd prime
loop:
cmp number, limit
bge printme @ leave if we have exceeded limit
mov r0, number
ldr r1, =primes_vector
mov r2, numprimes
bl prime_vector
teq r0, 1 @ check for primality of number
bne nexti
str number, [primes_ptr, numprimes, lsl 2] @ if prime, add to vector
add numprimes, numprimes, 1 @ increment prime count
adds sum_lo, sum_lo, number @ and add to sum
adc sum_hi, sum_hi, 0
nexti:
add number, number, 2 @ increment number by 2
b loop
printme:
mov r2, sum_lo
mov r3, sum_hi
ldr r0, =llustring @ store address of start of string to r0
bl printf
mov r0, 0
ldmfd sp!, {r4-r9, fp, pc}
mov r7, 1 @ set r7 to 1 - the syscall for exit
swi 0 @ then invoke the syscall from linux
prime_vector.s
.syntax unified .equ word, 4 # this subroutine returns 1 if the passed number is prime; 0 if not # # inputs # r0 - integer to test # r1 - pointer to vector of prime integers smaller than r0 # r2 - length of vector passed in r1 # # outputs # r0 - prime boolean number .req r4 vptr .req r5 tmp .req r6 squared .req r7 vsize .req r8 .global prime_vector .type prime_vector, %function .text prime_vector: stmfd sp!, {r4-r8, lr} mov number, r0 mov vptr, r1 mov vsize, r2 nexti: ldr tmp, [vptr], word @ get vector element to tmp mul squared, tmp, tmp @ and square it cmp squared, number @ compare the square to the input number movgt r0, 1 @ store 1 to r0 if square is greater (i.e. number is primes) bgt last @ and jump out mov r0, number mov r1, tmp bl divide @ divide number by tmp teq r1, 0 @ and check if it is a divisor moveq r0, 0 @ if so, not prime, so set r0 to 0 beq last @ and leave subs vsize, vsize, 1 @ decrement prime counter bgt nexti @ and try next vector element mov r0, 1 last: ldmfd sp!, {r4-r8, pc}
divide.s
.syntax unified # divide takes value in r0, divisor in r1 and returns dividend in r0 and modulus in r1 .global divide .type divide, %function divide: stmfd sp!, {lr} # see http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf bl __aeabi_uidivmod ldmfd sp!, {pc}
Sunday, 28 July 2013
Generating processor-specific code from a source file
There are many ARM cores as described at Wikipedia. The processors get more complex as they evolve and occasionally new instructions are introduced. One instance is the rev instruction which will reverse the bytes in a word e.g. for converting big-endian numbers to little-endian and vice versa.
In my Sheeva Plug there is a version 5TE processor (which does not support this instruction); my Raspberry Pi has a version 6 processor (which does).
I would like to keep my code in one source file, and dynamically assemble the optimum code for the processor upon which it was built.
We can use the macro assembler of the GNU assembler to achieve this.
.if ARCH >= 6 rev r0, r0 .else and r2, r0, #0xff000000 @ load the top 2 bytes to r2 and r3, r0, #0x00ff0000 @ load the next 2 bytes to r3 and r4, r0, #0x0000ff00 @ load the next 2 bytes to r4 and r5, r0, #0x000000ff @ load the bottom 2 bytes to r5 mov r0, r2, lsr #24 @ shift r2 bytes to bottom and store to r0 orr r0, r3, lsr #8 @ or the remaining shifted data orr r0, r4, lsl #8 orr r0, r5, lsl #24 .endif
We can interrogate the machine at build time by looking inside /proc/cpuinfo via
cat /proc/cpuinfo | grep ^Processor which gives :
- On A SheevaPlug : Processor : Feroceon 88FR131 rev 1 (v5l)
- On A Raspberry Pi : Processor : ARMv6-compatible processor rev 7 (v6l)
And then we pull the appropriate number within the Makefile and then we can build via make endian using the following Makefile
AS := /usr/bin/as
LD := /usr/bin/ld
ASOPTS := -gstabs
ARCH = $(shell sed -ne 's/^Processor\s*:.*(v\([0-9]\)l)/\1/p' /proc/cpuinfo)
endian: endian.s
echo ARCH is $(ARCH)
$(AS) $(ASOPTS) --defsym ARCH=$(ARCH) -march=armv$(ARCH) -o $@.o $^
$(LD) -o $@ $@.o
Tuesday, 23 July 2013
Pythagorean triplets
A Pythagorean triplet is a set of three natural numbers, a
b
c, for which,
For example, 32 + 42 = 9 + 16 = 25 = 52.
There exists exactly one Pythagorean triplet for which a + b + c = 1000.
Find the product abc.
.syntax unified
.equ limit,1000
.align 4
icount .req r4 @ use of i, j and k as loop indices is historic :-)
jcount .req r5 @ we use icount for a, jcount for b and kcount for c
kcount .req r6
tmp .req r8
jksum .req r9
.section .rodata
resstring:
.asciz "%d\n"
.text
.global main
.type main, %function
main:
stmfd sp!, {r4-r9, lr}
ldr icount, =limit
istart:
subs jcount, icount, 1 @ i > j so start j at i-1
beq nexti
jstart:
subs kcount, jcount, 1 @ j > k too
beq nextj
add tmp, icount, jcount
kstart:
add tmp, tmp, kcount
cmp tmp, #limit
bne nextk
mul jksum, kcount, kcount @ we re-use the register later as jksum (this is ksquared here)
@ we could also alias ksquared to r9 - but then we have to avoid
@ collision with jksum
mul tmp, jcount, jcount @ could alias jsquared to r8 (or just re-use tmp) as here
add jksum, jksum, tmp
mul tmp, icount, icount @ and also could alias isquared 9but choose to reuse tmp)
cmp jksum, tmp @ this is the important comparison - if i*i == (j*j) + (k*k)
beq printme @ then leave since there is only one solution
nextk:
add tmp, icount, jcount
subs kcount, kcount, 1
bne kstart
nextj:
subs jcount, jcount, 1
bne jstart
nexti:
subs icount, icount, 1
bne istart
printme:
mul tmp, icount, jcount
mul tmp, tmp, kcount
mov r1, tmp
ldr r0, =resstring
bl printf
mov r0, 0
ldmfd sp!, {r4-r9, pc}
mov r7, 1
swi 0