## Monday, 12 October 2020

I got distracted ... I now have a Pinebook Pro and a Pine phone. Both have an ARM64 CPU - I will write something about ARM64 assembler soon!

## Friday, 7 March 2014

### Triangle numbers

The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...

Let us list the factors of the first seven triangle numbers:

1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28

We can see that 28 is the first triangle number to have over five divisors.

What is the value of the first triangle number to have over five hundred divisors?

```.syntax unified

.align 2

.section .rodata
resstring:
.asciz "%d\n"

.equ    last, 250       @ we count up to the square root - so half of the divisors will
@ be necessarily less than the square root and half above.

icount .req r4          @ it is possible to write FORTRAN in any language :-)
jcount .req r5
tmp    .req r6
num    .req r7

.text
.align  2
.global get_num_divisors
.type   get_num_divisors, %function

@ we use jcount to hold current divisor and icount to hold the number of divisors seen
@ num is passed as r0 and the number of divisors is returned.
get_num_divisors:
stmfd   sp!, {r4-r7, lr}
mov     num, r0
mov     icount, 0
mov     jcount, 1
jstart:
mov     r0, num
mov     r1, jcount
bl      divide
teq     r1, 0
beq     factor
b       nextj
factor:
nextj:
mul     tmp, jcount, jcount
cmp     num, tmp
bgt     jstart
mov     r0, icount
ldmfd   sp!, {r4-r7, pc}

.align  2
.global main
.type   main, %function

main:
stmfd   sp!, {r4-r7, lr}
mov     num, 0
mov     icount, 0
mov     jcount, 1
loop:
cmp     num, #last
bge     printme
mov     r0, icount
bl      get_num_divisors
mov     num, r0

b       loop

printme:
mov     r1, icount
ldr     r0, =resstring  @ store address of start of string to r0
bl      printf

mov     r0, 0
ldmfd   sp!, {r4-r7, pc}
mov     r7, 1   @ set r7 to 1 - the syscall for exit
swi     0               @ then invoke the syscall from linux
```
```divide.s
.syntax unified
# divide takes value in r0, divisor in r1 and returns dividend in r0 and modulus in r1
.global divide
.type divide, %function

divide:
stmfd   sp!, {lr}
# see http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf
bl  __aeabi_uidivmod
ldmfd   sp!, {pc}
```

### Back on track...

A month turns into 6 so easily. While away I have replaced my desktop machine with a CubieTruck - does all I need and runs Debian well. Also just got a udroid u3 for evaluation purposes and getting that to work. More code soon.

## Thursday, 22 August 2013

### Greatest product of adjacent numbers in a grid

A note to any readers : This will be the last post for a month.

In the 2020 grid below, four numbers along a diagonal line have been marked in red.

08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48

The product of these numbers is 26 63 78 14 = 1788696.

What is the greatest product of four adjacent numbers in the same direction (up, down, left, right, or diagonally) in the 2020 grid?

The solution to this problem is quite interesting because it demonstrates the use of a macro. A macro is a construct available within as which permits the same sort of abstraction as a procedure|subroutine|function would in a high-level language.

Inside the macro, the parameter a is accessed via \a and the construct \@ expands out to a unique value each time the macro is invoked; this allows the use in labels which are distinct for each macro invocation.

```.syntax unified

.align 2

.section .rodata
buffer:
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 8, 2, 22, 97, 38, 15, 0, 40, 0, 75, 4, 5, 7, 78, 52, 12, 50, 77, 91, 8, 0, 0, 0
.byte 0, 0, 0, 49, 49, 99, 40, 17, 81, 18, 57, 60, 87, 17, 40, 98, 43, 69, 48, 4, 56, 62, 0, 0, 0, 0
.byte 0, 0, 0, 81, 49, 31, 73, 55, 79, 14, 29, 93, 71, 40, 67, 53, 88, 30, 3, 49, 13, 36, 65, 0, 0, 0
.byte 0, 0, 0, 52, 70, 95, 23, 4, 60, 11, 42, 69, 24, 68, 56, 1, 32, 56, 71, 37, 2, 36, 91, 0, 0, 0
.byte 0, 0, 0, 22, 31, 16, 71, 51, 67, 63, 89, 41, 92, 36, 54, 22, 40, 40, 28, 66, 33, 13, 80, 0, 0, 0
.byte 0, 0, 0, 24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50, 0, 0, 0
.byte 0, 0, 0, 32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70, 0, 0, 0
.byte 0, 0, 0, 67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21, 0, 0, 0
.byte 0, 0, 0, 24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72, 0, 0, 0
.byte 0, 0, 0, 21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95, 0, 0, 0
.byte 0, 0, 0, 78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92, 0, 0, 0
.byte 0, 0, 0, 16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57, 0, 0, 0
.byte 0, 0, 0, 86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58, 0, 0, 0
.byte 0, 0, 0, 19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40, 0, 0, 0
.byte 0, 0, 0, 4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66, 0, 0, 0
.byte 0, 0, 0, 88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69, 0, 0, 0
.byte 0, 0, 0, 4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36, 0, 0, 0
.byte 0, 0, 0, 20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16, 0, 0, 0
.byte 0, 0, 0, 20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54, 0, 0, 0
.byte 0, 0, 0, 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

resstring:
.asciz "%d\n"

.equ    ocolumns, 26
.equ    icolumns, 20
.equ    colstart, 23
.equ    border, 3
.equ    offset, 2
.equ    points, 4
.equ    north, -ocolumns
.equ    north_east, -ocolumns+1
.equ    east, 1
.equ    south_east, ocolumns+1
.equ    south, ocolumns
.equ    south_west, ocolumns - 1
.equ    west, -1
.equ    north_west, -ocolumns-1
.equ    point_set, 1

tmp      .req r2
pointv   .req r3
data_ptr .req r4
icount   .req r5
jcount   .req r6
kcount   .req r7
maxv     .req r8
dpstore  .req r9

.macro direction a
mov     data_ptr, dpstore
mov     kcount, points
mov     pointv, point_set
kloop\@:
ldrb    tmp, [data_ptr], \a
mul     pointv, pointv, tmp
subs    kcount, kcount, 1
bne     kloop\@
cmp     maxv, pointv
movlt   maxv, pointv     @ set maxv to max of maxv and pointv
.endm

.text
.align  2
.global main
.type   main, %function
main:
stmfd   sp!, {r4-r9, lr}
mov     icount, icolumns
iloop:
mov     jcount, icolumns
jloop:
mov     tmp, ocolumns
mul     tmp, tmp, data_ptr

ldr     data_ptr, =buffer
add     dpstore, data_ptr, tmp @ data[i][j]

direction north
direction north_east
direction east
direction south_east
direction south
direction south_west
direction west
direction north_west

subs    jcount, jcount, 1
bne     jloop

subs    icount, icount, 1
bne     iloop
printme:
mov     r1, maxv
ldr     r0, =resstring         @ store address of start of string to r0
bl      printf

mov     r0, 0
ldmfd   sp!, {r4-r9, pc}
mov     r7, 1                  @ set r7 to 1 - the syscall for exit
swi     0                      @ then invoke the syscall from linux
```

## Thursday, 15 August 2013

### Summing prime numbers

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

Find the sum of all the primes below two million.

```.syntax unified
.equ    word,4

.equ    limit,2000000
.equ    numprimes4,595732                       @ sufficient to hold all the bytes in all the elements of the vector we use to hold the primes

sum_hi          .req r4
sum_lo          .req r5
numprimes       .req r6
primes_ptr      .req r7
number          .req r8
limit           .req r9

.section .bss
.lcomm primes_vector,numprimes4                 @ allocate space for vector of primes
.section .rodata
llustring:
.asciz "%llu\n"
.text
.align 8
.global main
.type main, %function
main:
stmfd   sp!, {r4-r9, fp, lr}
ldr     primes_ptr, =primes_vector
mov     numprimes, 1
mov     number, 2
str     number, [primes_ptr]                   @ store 2 to primes vector
ldr     limit, =limit
mov     sum_hi, 0                              @ initialise sum
mov     sum_lo, 2
loop:
cmp     number, limit
bge     printme                                @ leave if we have exceeded limit

mov     r0, number
ldr     r1, =primes_vector
mov     r2, numprimes
bl      prime_vector
teq     r0, 1                                  @ check for primality of number
bne     nexti
str     number, [primes_ptr, numprimes, lsl 2] @ if prime, add to vector
add     numprimes, numprimes, 1                @ increment prime count
nexti:
add     number, number, 2                      @ increment number by 2
b       loop
printme:
mov     r2, sum_lo
mov     r3, sum_hi
ldr     r0, =llustring                         @ store address of start of string to r0
bl      printf

mov     r0, 0
ldmfd   sp!, {r4-r9, fp, pc}
mov     r7, 1                                  @ set r7 to 1 - the syscall for exit
swi     0                                      @ then invoke the syscall from linux
```
```prime_vector.s
.syntax unified
.equ word, 4

# this subroutine returns 1 if the passed number is prime; 0 if not
#
# inputs
#   r0 - integer to test
#   r1 - pointer to vector of prime integers smaller than r0
#   r2 - length of vector passed in r1
#
# outputs
#   r0 - prime boolean

number   .req r4
vptr     .req r5
tmp      .req r6
squared  .req r7
vsize    .req r8

.global prime_vector
.type prime_vector, %function
.text
prime_vector:
stmfd   sp!, {r4-r8, lr}
mov     number, r0
mov     vptr, r1
mov     vsize, r2
nexti:
ldr     tmp, [vptr], word                      @ get vector element to tmp
mul     squared, tmp, tmp                      @ and square it
cmp     squared, number                        @ compare the square to the input number
movgt   r0, 1                                  @ store 1 to r0 if square is greater (i.e. number is primes)
bgt     last                                   @ and jump out
mov     r0, number
mov     r1, tmp
bl      divide                                 @ divide number by tmp
teq     r1, 0                                  @ and check if it is a divisor
moveq   r0, 0                                  @ if so, not prime, so set r0 to 0
beq     last                                   @ and leave
subs    vsize, vsize, 1                        @ decrement prime counter
bgt     nexti                                  @ and try next vector element
mov     r0, 1
last:
ldmfd   sp!, {r4-r8, pc}
```
```divide.s
.syntax unified
# divide takes value in r0, divisor in r1 and returns dividend in r0 and modulus in r1
.global divide
.type divide, %function
divide:
stmfd   sp!, {lr}
# see http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf
bl      __aeabi_uidivmod
ldmfd   sp!, {pc}
```

## Sunday, 28 July 2013

### Generating processor-specific code from a source file

There are many ARM cores as described at Wikipedia. The processors get more complex as they evolve and occasionally new instructions are introduced. One instance is the rev instruction which will reverse the bytes in a word e.g. for converting big-endian numbers to little-endian and vice versa.

In my Sheeva Plug there is a version 5TE processor (which does not support this instruction); my Raspberry Pi has a version 6 processor (which does).

I would like to keep my code in one source file, and dynamically assemble the optimum code for the processor upon which it was built.

We can use the macro assembler of the GNU assembler to achieve this.

```.if ARCH >= 6
rev r0, r0
.else
and r2, r0, #0xff000000 @ load the top 2 bytes to r2
and r3, r0, #0x00ff0000 @ load the next 2 bytes to r3
and r4, r0, #0x0000ff00 @ load the next 2 bytes to r4
and r5, r0, #0x000000ff @ load the bottom 2 bytes to r5
mov r0, r2, lsr #24  @ shift r2 bytes to bottom and store to r0
orr r0, r3, lsr #8  @ or the remaining shifted data
orr r0, r4, lsl #8
orr r0, r5, lsl #24
.endif
```

We can interrogate the machine at build time by looking inside /proc/cpuinfo via

cat /proc/cpuinfo | grep ^Processor which gives :

• On A SheevaPlug : Processor : Feroceon 88FR131 rev 1 (v5l)
• On A Raspberry Pi : Processor : ARMv6-compatible processor rev 7 (v6l)

And then we pull the appropriate number within the Makefile and then we can build via make endian using the following Makefile

```AS      := /usr/bin/as
LD      := /usr/bin/ld

ASOPTS  := -gstabs

ARCH = \$(shell sed -ne 's/^Processor\s*:.*(v\([0-9]\)l)/\1/p' /proc/cpuinfo)

endian: endian.s
echo ARCH is \$(ARCH)
\$(AS) \$(ASOPTS) --defsym ARCH=\$(ARCH) -march=armv\$(ARCH) -o \$@.o \$^
\$(LD) -o \$@ \$@.o
```

## Tuesday, 23 July 2013

### Pythagorean triplets

A Pythagorean triplet is a set of three natural numbers, a b c, for which,

a2 + b2 = c2

For example, 32 + 42 = 9 + 16 = 25 = 52.

There exists exactly one Pythagorean triplet for which a + b + c = 1000.
Find the product abc.

```.syntax unified
.equ limit,1000

.align 4

icount .req r4           @ use of i, j and k as loop indices is historic :-)
jcount .req r5           @ we use icount for a, jcount for b and kcount for c
kcount .req r6
tmp    .req r8
jksum  .req r9

.section .rodata
resstring:
.asciz "%d\n"
.text
.global main
.type   main, %function
main:
stmfd sp!, {r4-r9, lr}
ldr icount, =limit
istart:
subs jcount, icount, 1    @ i > j so start j at i-1
beq nexti
jstart:
subs kcount, jcount, 1    @ j > k too
beq nextj
kstart:
cmp tmp, #limit
bne nextk
mul jksum, kcount, kcount @ we re-use the register later as jksum (this is ksquared here)
@ we could also alias ksquared to r9 - but then we have to avoid
@ collision with jksum
mul tmp, jcount, jcount   @ could alias jsquared to r8 (or just re-use tmp) as here
mul tmp, icount, icount   @ and also could alias isquared 9but choose to reuse tmp)
cmp jksum, tmp            @ this is the important comparison - if i*i == (j*j) + (k*k)
beq printme               @ then leave since there is only one solution
nextk:
subs kcount, kcount, 1
bne kstart
nextj:
subs jcount, jcount, 1
bne jstart
nexti:
subs icount, icount, 1
bne istart
printme:
mul tmp, icount, jcount
mul tmp, tmp, kcount
mov r1, tmp
ldr r0, =resstring
bl  printf

mov r0, 0
ldmfd sp!, {r4-r9, pc}
mov r7, 1
swi 0
```