ARM assembly language

Monday, 12 October 2020

I got distracted ... I now have a Pinebook Pro and a Pine phone. Both have an ARM64 CPU - I will write something about ARM64 assembler soon!

Friday, 7 March 2014

Triangle numbers

The sequence of triangle numbers is generated by adding the natural numbers. So the 7^th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...

Let us list the factors of the first seven triangle numbers:

1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28

We can see that 28 is the first triangle number to have over five divisors.

What is the value of the first triangle number to have over five hundred divisors?

.syntax unified

.align 2

.section .rodata
resstring:
.asciz "%d\n"

.equ    last, 250       @ we count up to the square root - so half of the divisors will
                        @ be necessarily less than the square root and half above.

icount .req r4          @ it is possible to write FORTRAN in any language :-)
jcount .req r5
tmp    .req r6
num    .req r7

.text
.align  2
.global get_num_divisors
.type   get_num_divisors, %function

@ we use jcount to hold current divisor and icount to hold the number of divisors seen
@ num is passed as r0 and the number of divisors is returned.
get_num_divisors:
 stmfd   sp!, {r4-r7, lr}
 mov     num, r0
 mov     icount, 0
 mov     jcount, 1
jstart:
 mov     r0, num
 mov     r1, jcount
 bl      divide
 teq     r1, 0
 beq     factor
 b       nextj
factor:
 add     icount, icount, 1
nextj:
 add     jcount, jcount, 1
 mul     tmp, jcount, jcount
 cmp     num, tmp
 bgt     jstart
 mov     r0, icount
 ldmfd   sp!, {r4-r7, pc}

.align  2
.global main
.type   main, %function

main:
 stmfd   sp!, {r4-r7, lr}
 mov     num, 0
 mov     icount, 0
 mov     jcount, 1
loop:
 cmp     num, #last
 bge     printme
 add     icount, icount, jcount
 mov     r0, icount
 bl      get_num_divisors
 mov     num, r0

 add     jcount, jcount, 1
 b       loop

printme:
 mov     r1, icount
 ldr     r0, =resstring  @ store address of start of string to r0
 bl      printf

 mov     r0, 0
 ldmfd   sp!, {r4-r7, pc}
 mov     r7, 1   @ set r7 to 1 - the syscall for exit
 swi     0               @ then invoke the syscall from linux

divide.s

.syntax unified # divide takes value in r0, divisor in r1 and returns dividend in r0 and modulus in r1 .global divide .type divide, %function divide: stmfd sp!, {lr} # see http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf bl __aeabi_uidivmod ldmfd sp!, {pc}

Back on track...

A month turns into 6 so easily. While away I have replaced my desktop machine with a CubieTruck - does all I need and runs Debian well. Also just got a udroid u3 for evaluation purposes and getting that to work. More code soon.

Thursday, 22 August 2013

Greatest product of adjacent numbers in a grid

A note to any readers : This will be the last post for a month.

In the 2020 grid below, four numbers along a diagonal line have been marked in red.

08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48

The product of these numbers is 26 63 78 14 = 1788696.

What is the greatest product of four adjacent numbers in the same direction (up, down, left, right, or diagonally) in the 2020 grid?

The solution to this problem is quite interesting because it demonstrates the use of a macro. A macro is a construct available within as which permits the same sort of abstraction as a procedure|subroutine|function would in a high-level language.

Inside the macro, the parameter a is accessed via \a and the construct \@ expands out to a unique value each time the macro is invoked; this allows the use in labels which are distinct for each macro invocation.

.syntax unified

.align 2

.section .rodata
buffer:
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 8, 2, 22, 97, 38, 15, 0, 40, 0, 75, 4, 5, 7, 78, 52, 12, 50, 77, 91, 8, 0, 0, 0
.byte 0, 0, 0, 49, 49, 99, 40, 17, 81, 18, 57, 60, 87, 17, 40, 98, 43, 69, 48, 4, 56, 62, 0, 0, 0, 0
.byte 0, 0, 0, 81, 49, 31, 73, 55, 79, 14, 29, 93, 71, 40, 67, 53, 88, 30, 3, 49, 13, 36, 65, 0, 0, 0
.byte 0, 0, 0, 52, 70, 95, 23, 4, 60, 11, 42, 69, 24, 68, 56, 1, 32, 56, 71, 37, 2, 36, 91, 0, 0, 0
.byte 0, 0, 0, 22, 31, 16, 71, 51, 67, 63, 89, 41, 92, 36, 54, 22, 40, 40, 28, 66, 33, 13, 80, 0, 0, 0
.byte 0, 0, 0, 24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50, 0, 0, 0
.byte 0, 0, 0, 32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70, 0, 0, 0
.byte 0, 0, 0, 67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21, 0, 0, 0
.byte 0, 0, 0, 24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72, 0, 0, 0
.byte 0, 0, 0, 21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95, 0, 0, 0
.byte 0, 0, 0, 78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92, 0, 0, 0
.byte 0, 0, 0, 16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57, 0, 0, 0
.byte 0, 0, 0, 86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58, 0, 0, 0
.byte 0, 0, 0, 19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40, 0, 0, 0
.byte 0, 0, 0, 4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66, 0, 0, 0
.byte 0, 0, 0, 88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69, 0, 0, 0
.byte 0, 0, 0, 4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36, 0, 0, 0
.byte 0, 0, 0, 20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16, 0, 0, 0
.byte 0, 0, 0, 20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54, 0, 0, 0
.byte 0, 0, 0, 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

resstring:
.asciz "%d\n"

.equ    ocolumns, 26
.equ    icolumns, 20
.equ    colstart, 23
.equ    border, 3
.equ    offset, 2
.equ    points, 4
.equ    north, -ocolumns
.equ    north_east, -ocolumns+1
.equ    east, 1
.equ    south_east, ocolumns+1
.equ    south, ocolumns
.equ    south_west, ocolumns - 1
.equ    west, -1
.equ    north_west, -ocolumns-1
.equ    point_set, 1

tmp      .req r2
pointv   .req r3
data_ptr .req r4
icount   .req r5
jcount   .req r6
kcount   .req r7
maxv     .req r8
dpstore  .req r9

.macro direction a
 mov     data_ptr, dpstore
 mov     kcount, points
 mov     pointv, point_set
kloop\@:
 ldrb    tmp, [data_ptr], \a
 mul     pointv, pointv, tmp
 subs    kcount, kcount, 1
 bne     kloop\@
 cmp     maxv, pointv
 movlt   maxv, pointv     @ set maxv to max of maxv and pointv
.endm

.text
.align  2
.global main
.type   main, %function
main:
 stmfd   sp!, {r4-r9, lr}
 mov     icount, icolumns
iloop:
 mov     jcount, icolumns
jloop:
 mov     tmp, ocolumns
 add     data_ptr, icount, offset
 mul     tmp, tmp, data_ptr

 add     tmp, tmp, jcount
 add     tmp, tmp, offset
 ldr     data_ptr, =buffer
 add     dpstore, data_ptr, tmp @ data[i][j]

 direction north
 direction north_east
 direction east
 direction south_east
 direction south
 direction south_west
 direction west
 direction north_west

 subs    jcount, jcount, 1
 bne     jloop

 subs    icount, icount, 1
 bne     iloop
printme:
 mov     r1, maxv
 ldr     r0, =resstring         @ store address of start of string to r0
 bl      printf

 mov     r0, 0
 ldmfd   sp!, {r4-r9, pc}
 mov     r7, 1                  @ set r7 to 1 - the syscall for exit
 swi     0                      @ then invoke the syscall from linux

Thursday, 15 August 2013

Summing prime numbers

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

Find the sum of all the primes below two million.

.syntax unified
.equ    word,4

.equ    limit,2000000
.equ    numprimes4,595732                       @ sufficient to hold all the bytes in all the elements of the vector we use to hold the primes

sum_hi          .req r4
sum_lo          .req r5
numprimes       .req r6
primes_ptr      .req r7
number          .req r8
limit           .req r9

.section .bss
.lcomm primes_vector,numprimes4                 @ allocate space for vector of primes
.section .rodata
llustring:
.asciz "%llu\n"
.text
.align 8
.global main
.type main, %function
main:
 stmfd   sp!, {r4-r9, fp, lr}
 ldr     primes_ptr, =primes_vector
 mov     numprimes, 1
 mov     number, 2
 str     number, [primes_ptr]                   @ store 2 to primes vector
 ldr     limit, =limit
 mov     sum_hi, 0                              @ initialise sum
 mov     sum_lo, 2
 mov     number, 3                              @ start with 3 as first odd prime
loop:
 cmp     number, limit
 bge     printme                                @ leave if we have exceeded limit

 mov     r0, number
 ldr     r1, =primes_vector
 mov     r2, numprimes
 bl      prime_vector
 teq     r0, 1                                  @ check for primality of number
 bne     nexti
 str     number, [primes_ptr, numprimes, lsl 2] @ if prime, add to vector
 add     numprimes, numprimes, 1                @ increment prime count
 adds    sum_lo, sum_lo, number                 @ and add to sum
 adc     sum_hi, sum_hi, 0
nexti:
 add     number, number, 2                      @ increment number by 2
 b       loop
printme:
 mov     r2, sum_lo
 mov     r3, sum_hi
 ldr     r0, =llustring                         @ store address of start of string to r0
 bl      printf

 mov     r0, 0
 ldmfd   sp!, {r4-r9, fp, pc}
 mov     r7, 1                                  @ set r7 to 1 - the syscall for exit
 swi     0                                      @ then invoke the syscall from linux

prime_vector.s

.syntax unified .equ word, 4 # this subroutine returns 1 if the passed number is prime; 0 if not # # inputs # r0 - integer to test # r1 - pointer to vector of prime integers smaller than r0 # r2 - length of vector passed in r1 # # outputs # r0 - prime boolean number .req r4 vptr .req r5 tmp .req r6 squared .req r7 vsize .req r8 .global prime_vector .type prime_vector, %function .text prime_vector: stmfd sp!, {r4-r8, lr} mov number, r0 mov vptr, r1 mov vsize, r2 nexti: ldr tmp, [vptr], word @ get vector element to tmp mul squared, tmp, tmp @ and square it cmp squared, number @ compare the square to the input number movgt r0, 1 @ store 1 to r0 if square is greater (i.e. number is primes) bgt last @ and jump out mov r0, number mov r1, tmp bl divide @ divide number by tmp teq r1, 0 @ and check if it is a divisor moveq r0, 0 @ if so, not prime, so set r0 to 0 beq last @ and leave subs vsize, vsize, 1 @ decrement prime counter bgt nexti @ and try next vector element mov r0, 1 last: ldmfd sp!, {r4-r8, pc}

divide.s

Sunday, 28 July 2013

Generating processor-specific code from a source file

There are many ARM cores as described at Wikipedia. The processors get more complex as they evolve and occasionally new instructions are introduced. One instance is the rev instruction which will reverse the bytes in a word e.g. for converting big-endian numbers to little-endian and vice versa.

In my Sheeva Plug there is a version 5TE processor (which does not support this instruction); my Raspberry Pi has a version 6 processor (which does).

I would like to keep my code in one source file, and dynamically assemble the optimum code for the processor upon which it was built.

We can use the macro assembler of the GNU assembler to achieve this.

.if ARCH >= 6
 rev r0, r0
.else
 and r2, r0, #0xff000000 @ load the top 2 bytes to r2
 and r3, r0, #0x00ff0000 @ load the next 2 bytes to r3
 and r4, r0, #0x0000ff00 @ load the next 2 bytes to r4
 and r5, r0, #0x000000ff @ load the bottom 2 bytes to r5
 mov r0, r2, lsr #24  @ shift r2 bytes to bottom and store to r0
 orr r0, r3, lsr #8  @ or the remaining shifted data
 orr r0, r4, lsl #8
 orr r0, r5, lsl #24
.endif

We can interrogate the machine at build time by looking inside /proc/cpuinfo via

cat /proc/cpuinfo | grep ^Processor which gives :

On A SheevaPlug : Processor : Feroceon 88FR131 rev 1 (v5l)
On A Raspberry Pi : Processor : ARMv6-compatible processor rev 7 (v6l)

And then we pull the appropriate number within the Makefile and then we can build via make endian using the following Makefile

AS      := /usr/bin/as
LD      := /usr/bin/ld

ASOPTS  := -gstabs

ARCH = $(shell sed -ne 's/^Processor\s*:.*(v\([0-9]\)l)/\1/p' /proc/cpuinfo)

endian: endian.s
        echo ARCH is $(ARCH)
        $(AS) $(ASOPTS) --defsym ARCH=$(ARCH) -march=armv$(ARCH) -o $@.o $^
        $(LD) -o $@ $@.o

Tuesday, 23 July 2013

Pythagorean triplets

A Pythagorean triplet is a set of three natural numbers, a b c, for which,

a² + b² = c²

For example, 3² + 4² = 9 + 16 = 25 = 5².

There exists exactly one Pythagorean triplet for which a + b + c = 1000.
Find the product abc.

.syntax unified
.equ limit,1000

.align 4

icount .req r4           @ use of i, j and k as loop indices is historic :-)
jcount .req r5           @ we use icount for a, jcount for b and kcount for c
kcount .req r6
tmp    .req r8
jksum  .req r9

.section .rodata
resstring:
.asciz "%d\n"
.text
.global main
.type   main, %function
main:
 stmfd sp!, {r4-r9, lr}
 ldr icount, =limit
istart:
 subs jcount, icount, 1    @ i > j so start j at i-1
 beq nexti
jstart:
 subs kcount, jcount, 1    @ j > k too
 beq nextj
 add tmp, icount, jcount
kstart:
 add tmp, tmp, kcount
 cmp tmp, #limit
 bne nextk
 mul jksum, kcount, kcount @ we re-use the register later as jksum (this is ksquared here)
                           @ we could also alias ksquared to r9 - but then we have to avoid 
                           @ collision with jksum
 mul tmp, jcount, jcount   @ could alias jsquared to r8 (or just re-use tmp) as here
 add jksum, jksum, tmp
 mul tmp, icount, icount   @ and also could alias isquared 9but choose to reuse tmp)
 cmp jksum, tmp            @ this is the important comparison - if i*i == (j*j) + (k*k) 
 beq printme               @ then leave since there is only one solution
nextk:
 add tmp, icount, jcount
 subs kcount, kcount, 1
 bne kstart
nextj:
 subs jcount, jcount, 1
 bne jstart
nexti:
 subs icount, icount, 1
 bne istart
printme:
 mul tmp, icount, jcount
 mul tmp, tmp, kcount
 mov r1, tmp
 ldr r0, =resstring
 bl  printf

 mov r0, 0
 ldmfd sp!, {r4-r9, pc}
 mov r7, 1
 swi 0