ARM assembly language

2020-10-12T04:22:00.001-07:00

I got distracted ... I now have a Pinebook Pro and a Pine phone. Both have an ARM64 CPU - I will write something about ARM64 assembler soon!

Triangle numbers

2014-03-07T12:22:00.000-08:00

The sequence of triangle numbers is generated by adding the natural numbers. So the 7^th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...

Let us list the factors of the first seven triangle numbers:

1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28

We can see that 28 is the first triangle number to have over five divisors.

What is the value of the first triangle number to have over five hundred divisors?

.syntax unified

.align 2

.section .rodata
resstring:
.asciz "%d\n"

.equ    last, 250       @ we count up to the square root - so half of the divisors will
                        @ be necessarily less than the square root and half above.

icount .req r4          @ it is possible to write FORTRAN in any language :-)
jcount .req r5
tmp    .req r6
num    .req r7

.text
.align  2
.global get_num_divisors
.type   get_num_divisors, %function

@ we use jcount to hold current divisor and icount to hold the number of divisors seen
@ num is passed as r0 and the number of divisors is returned.
get_num_divisors:
 stmfd   sp!, {r4-r7, lr}
 mov     num, r0
 mov     icount, 0
 mov     jcount, 1
jstart:
 mov     r0, num
 mov     r1, jcount
 bl      divide
 teq     r1, 0
 beq     factor
 b       nextj
factor:
 add     icount, icount, 1
nextj:
 add     jcount, jcount, 1
 mul     tmp, jcount, jcount
 cmp     num, tmp
 bgt     jstart
 mov     r0, icount
 ldmfd   sp!, {r4-r7, pc}

.align  2
.global main
.type   main, %function

main:
 stmfd   sp!, {r4-r7, lr}
 mov     num, 0
 mov     icount, 0
 mov     jcount, 1
loop:
 cmp     num, #last
 bge     printme
 add     icount, icount, jcount
 mov     r0, icount
 bl      get_num_divisors
 mov     num, r0

 add     jcount, jcount, 1
 b       loop

printme:
 mov     r1, icount
 ldr     r0, =resstring  @ store address of start of string to r0
 bl      printf

 mov     r0, 0
 ldmfd   sp!, {r4-r7, pc}
 mov     r7, 1   @ set r7 to 1 - the syscall for exit
 swi     0               @ then invoke the syscall from linux

divide.s

.syntax unified # divide takes value in r0, divisor in r1 and returns dividend in r0 and modulus in r1 .global divide .type divide, %function divide: stmfd sp!, {lr} # see http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf bl __aeabi_uidivmod ldmfd sp!, {pc}

Back on track...

2014-03-07T11:47:00.002-08:00

A month turns into 6 so easily. While away I have replaced my desktop machine with a CubieTruck - does all I need and runs Debian well. Also just got a udroid u3 for evaluation purposes and getting that to work. More code soon.

Greatest product of adjacent numbers in a grid

2013-08-22T09:52:00.001-07:00

A note to any readers : This will be the last post for a month.

In the 2020 grid below, four numbers along a diagonal line have been marked in red.

08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48

The product of these numbers is 26 63 78 14 = 1788696.

What is the greatest product of four adjacent numbers in the same direction (up, down, left, right, or diagonally) in the 2020 grid?

The solution to this problem is quite interesting because it demonstrates the use of a macro. A macro is a construct available within as which permits the same sort of abstraction as a procedure|subroutine|function would in a high-level language.

Inside the macro, the parameter a is accessed via \a and the construct \@ expands out to a unique value each time the macro is invoked; this allows the use in labels which are distinct for each macro invocation.

.syntax unified

.align 2

.section .rodata
buffer:
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 8, 2, 22, 97, 38, 15, 0, 40, 0, 75, 4, 5, 7, 78, 52, 12, 50, 77, 91, 8, 0, 0, 0
.byte 0, 0, 0, 49, 49, 99, 40, 17, 81, 18, 57, 60, 87, 17, 40, 98, 43, 69, 48, 4, 56, 62, 0, 0, 0, 0
.byte 0, 0, 0, 81, 49, 31, 73, 55, 79, 14, 29, 93, 71, 40, 67, 53, 88, 30, 3, 49, 13, 36, 65, 0, 0, 0
.byte 0, 0, 0, 52, 70, 95, 23, 4, 60, 11, 42, 69, 24, 68, 56, 1, 32, 56, 71, 37, 2, 36, 91, 0, 0, 0
.byte 0, 0, 0, 22, 31, 16, 71, 51, 67, 63, 89, 41, 92, 36, 54, 22, 40, 40, 28, 66, 33, 13, 80, 0, 0, 0
.byte 0, 0, 0, 24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50, 0, 0, 0
.byte 0, 0, 0, 32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70, 0, 0, 0
.byte 0, 0, 0, 67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21, 0, 0, 0
.byte 0, 0, 0, 24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72, 0, 0, 0
.byte 0, 0, 0, 21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95, 0, 0, 0
.byte 0, 0, 0, 78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92, 0, 0, 0
.byte 0, 0, 0, 16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57, 0, 0, 0
.byte 0, 0, 0, 86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58, 0, 0, 0
.byte 0, 0, 0, 19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40, 0, 0, 0
.byte 0, 0, 0, 4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66, 0, 0, 0
.byte 0, 0, 0, 88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69, 0, 0, 0
.byte 0, 0, 0, 4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36, 0, 0, 0
.byte 0, 0, 0, 20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16, 0, 0, 0
.byte 0, 0, 0, 20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54, 0, 0, 0
.byte 0, 0, 0, 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

resstring:
.asciz "%d\n"

.equ    ocolumns, 26
.equ    icolumns, 20
.equ    colstart, 23
.equ    border, 3
.equ    offset, 2
.equ    points, 4
.equ    north, -ocolumns
.equ    north_east, -ocolumns+1
.equ    east, 1
.equ    south_east, ocolumns+1
.equ    south, ocolumns
.equ    south_west, ocolumns - 1
.equ    west, -1
.equ    north_west, -ocolumns-1
.equ    point_set, 1

tmp      .req r2
pointv   .req r3
data_ptr .req r4
icount   .req r5
jcount   .req r6
kcount   .req r7
maxv     .req r8
dpstore  .req r9

.macro direction a
 mov     data_ptr, dpstore
 mov     kcount, points
 mov     pointv, point_set
kloop\@:
 ldrb    tmp, [data_ptr], \a
 mul     pointv, pointv, tmp
 subs    kcount, kcount, 1
 bne     kloop\@
 cmp     maxv, pointv
 movlt   maxv, pointv     @ set maxv to max of maxv and pointv
.endm

.text
.align  2
.global main
.type   main, %function
main:
 stmfd   sp!, {r4-r9, lr}
 mov     icount, icolumns
iloop:
 mov     jcount, icolumns
jloop:
 mov     tmp, ocolumns
 add     data_ptr, icount, offset
 mul     tmp, tmp, data_ptr

 add     tmp, tmp, jcount
 add     tmp, tmp, offset
 ldr     data_ptr, =buffer
 add     dpstore, data_ptr, tmp @ data[i][j]

 direction north
 direction north_east
 direction east
 direction south_east
 direction south
 direction south_west
 direction west
 direction north_west

 subs    jcount, jcount, 1
 bne     jloop

 subs    icount, icount, 1
 bne     iloop
printme:
 mov     r1, maxv
 ldr     r0, =resstring         @ store address of start of string to r0
 bl      printf

 mov     r0, 0
 ldmfd   sp!, {r4-r9, pc}
 mov     r7, 1                  @ set r7 to 1 - the syscall for exit
 swi     0                      @ then invoke the syscall from linux

Summing prime numbers

2013-08-15T08:44:00.000-07:00

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

Find the sum of all the primes below two million.

.syntax unified
.equ    word,4

.equ    limit,2000000
.equ    numprimes4,595732                       @ sufficient to hold all the bytes in all the elements of the vector we use to hold the primes

sum_hi          .req r4
sum_lo          .req r5
numprimes       .req r6
primes_ptr      .req r7
number          .req r8
limit           .req r9

.section .bss
.lcomm primes_vector,numprimes4                 @ allocate space for vector of primes
.section .rodata
llustring:
.asciz "%llu\n"
.text
.align 8
.global main
.type main, %function
main:
 stmfd   sp!, {r4-r9, fp, lr}
 ldr     primes_ptr, =primes_vector
 mov     numprimes, 1
 mov     number, 2
 str     number, [primes_ptr]                   @ store 2 to primes vector
 ldr     limit, =limit
 mov     sum_hi, 0                              @ initialise sum
 mov     sum_lo, 2
 mov     number, 3                              @ start with 3 as first odd prime
loop:
 cmp     number, limit
 bge     printme                                @ leave if we have exceeded limit

 mov     r0, number
 ldr     r1, =primes_vector
 mov     r2, numprimes
 bl      prime_vector
 teq     r0, 1                                  @ check for primality of number
 bne     nexti
 str     number, [primes_ptr, numprimes, lsl 2] @ if prime, add to vector
 add     numprimes, numprimes, 1                @ increment prime count
 adds    sum_lo, sum_lo, number                 @ and add to sum
 adc     sum_hi, sum_hi, 0
nexti:
 add     number, number, 2                      @ increment number by 2
 b       loop
printme:
 mov     r2, sum_lo
 mov     r3, sum_hi
 ldr     r0, =llustring                         @ store address of start of string to r0
 bl      printf

 mov     r0, 0
 ldmfd   sp!, {r4-r9, fp, pc}
 mov     r7, 1                                  @ set r7 to 1 - the syscall for exit
 swi     0                                      @ then invoke the syscall from linux

prime_vector.s

.syntax unified .equ word, 4 # this subroutine returns 1 if the passed number is prime; 0 if not # # inputs # r0 - integer to test # r1 - pointer to vector of prime integers smaller than r0 # r2 - length of vector passed in r1 # # outputs # r0 - prime boolean number .req r4 vptr .req r5 tmp .req r6 squared .req r7 vsize .req r8 .global prime_vector .type prime_vector, %function .text prime_vector: stmfd sp!, {r4-r8, lr} mov number, r0 mov vptr, r1 mov vsize, r2 nexti: ldr tmp, [vptr], word @ get vector element to tmp mul squared, tmp, tmp @ and square it cmp squared, number @ compare the square to the input number movgt r0, 1 @ store 1 to r0 if square is greater (i.e. number is primes) bgt last @ and jump out mov r0, number mov r1, tmp bl divide @ divide number by tmp teq r1, 0 @ and check if it is a divisor moveq r0, 0 @ if so, not prime, so set r0 to 0 beq last @ and leave subs vsize, vsize, 1 @ decrement prime counter bgt nexti @ and try next vector element mov r0, 1 last: ldmfd sp!, {r4-r8, pc}

divide.s

Generating processor-specific code from a source file

2013-07-28T05:00:00.001-07:00

There are many ARM cores as described at Wikipedia. The processors get more complex as they evolve and occasionally new instructions are introduced. One instance is the rev instruction which will reverse the bytes in a word e.g. for converting big-endian numbers to little-endian and vice versa.

In my Sheeva Plug there is a version 5TE processor (which does not support this instruction); my Raspberry Pi has a version 6 processor (which does).

I would like to keep my code in one source file, and dynamically assemble the optimum code for the processor upon which it was built.

We can use the macro assembler of the GNU assembler to achieve this.

.if ARCH >= 6
 rev r0, r0
.else
 and r2, r0, #0xff000000 @ load the top 2 bytes to r2
 and r3, r0, #0x00ff0000 @ load the next 2 bytes to r3
 and r4, r0, #0x0000ff00 @ load the next 2 bytes to r4
 and r5, r0, #0x000000ff @ load the bottom 2 bytes to r5
 mov r0, r2, lsr #24  @ shift r2 bytes to bottom and store to r0
 orr r0, r3, lsr #8  @ or the remaining shifted data
 orr r0, r4, lsl #8
 orr r0, r5, lsl #24
.endif

We can interrogate the machine at build time by looking inside /proc/cpuinfo via

cat /proc/cpuinfo | grep ^Processor which gives :

On A SheevaPlug : Processor : Feroceon 88FR131 rev 1 (v5l)
On A Raspberry Pi : Processor : ARMv6-compatible processor rev 7 (v6l)

And then we pull the appropriate number within the Makefile and then we can build via make endian using the following Makefile

AS      := /usr/bin/as
LD      := /usr/bin/ld

ASOPTS  := -gstabs

ARCH = $(shell sed -ne 's/^Processor\s*:.*(v\([0-9]\)l)/\1/p' /proc/cpuinfo)

endian: endian.s
        echo ARCH is $(ARCH)
        $(AS) $(ASOPTS) --defsym ARCH=$(ARCH) -march=armv$(ARCH) -o $@.o $^
        $(LD) -o $@ $@.o

Pythagorean triplets

2013-07-23T12:57:00.004-07:00

A Pythagorean triplet is a set of three natural numbers, a b c, for which,

a² + b² = c²

For example, 3² + 4² = 9 + 16 = 25 = 5².

There exists exactly one Pythagorean triplet for which a + b + c = 1000.
Find the product abc.

.syntax unified
.equ limit,1000

.align 4

icount .req r4           @ use of i, j and k as loop indices is historic :-)
jcount .req r5           @ we use icount for a, jcount for b and kcount for c
kcount .req r6
tmp    .req r8
jksum  .req r9

.section .rodata
resstring:
.asciz "%d\n"
.text
.global main
.type   main, %function
main:
 stmfd sp!, {r4-r9, lr}
 ldr icount, =limit
istart:
 subs jcount, icount, 1    @ i > j so start j at i-1
 beq nexti
jstart:
 subs kcount, jcount, 1    @ j > k too
 beq nextj
 add tmp, icount, jcount
kstart:
 add tmp, tmp, kcount
 cmp tmp, #limit
 bne nextk
 mul jksum, kcount, kcount @ we re-use the register later as jksum (this is ksquared here)
                           @ we could also alias ksquared to r9 - but then we have to avoid 
                           @ collision with jksum
 mul tmp, jcount, jcount   @ could alias jsquared to r8 (or just re-use tmp) as here
 add jksum, jksum, tmp
 mul tmp, icount, icount   @ and also could alias isquared 9but choose to reuse tmp)
 cmp jksum, tmp            @ this is the important comparison - if i*i == (j*j) + (k*k) 
 beq printme               @ then leave since there is only one solution
nextk:
 add tmp, icount, jcount
 subs kcount, kcount, 1
 bne kstart
nextj:
 subs jcount, jcount, 1
 bne jstart
nexti:
 subs icount, icount, 1
 bne istart
printme:
 mul tmp, icount, jcount
 mul tmp, tmp, kcount
 mov r1, tmp
 ldr r0, =resstring
 bl  printf

 mov r0, 0
 ldmfd sp!, {r4-r9, pc}
 mov r7, 1
 swi 0

Find product of digits

2013-07-14T04:12:00.001-07:00

Find the greatest product of five consecutive digits in the 1000-digit number.

73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450

.syntax unified

.equ limit,10000
.equ outer, 996
.equ inner, 5 

.align 4

address  .req r4
thisbyte .req r5
icounter .req r6
ocounter .req r7
maxv     .req r8
tmp      .req r9

.section .data
buffer:
.byte 7, 3, 1, 6, 7, 1, 7, 6, 5, 3, 1, 3, 3, 0, 6, 2, 4, 9, 1, 9, 2, 2
.byte 5, 1, 1, 9, 6, 7, 4, 4, 2, 6, 5, 7, 4, 7, 4, 2, 3, 5, 5, 3, 4, 9, 1, 9
.byte 4, 9, 3, 4, 9, 6, 9, 8, 3, 5, 2, 0, 3, 1, 2, 7, 7, 4, 5, 0, 6, 3, 2, 6
.byte 2, 3, 9, 5, 7, 8, 3, 1, 8, 0, 1, 6, 9, 8, 4, 8, 0, 1, 8, 6, 9, 4, 7, 8
.byte 8, 5, 1, 8, 4, 3, 8, 5, 8, 6, 1, 5, 6, 0, 7, 8, 9, 1, 1, 2, 9, 4, 9, 4
.byte 9, 5, 4, 5, 9, 5, 0, 1, 7, 3, 7, 9, 5, 8, 3, 3, 1, 9, 5, 2, 8, 5, 3, 2
.byte 0, 8, 8, 0, 5, 5, 1, 1, 1, 2, 5, 4, 0, 6, 9, 8, 7, 4, 7, 1, 5, 8, 5, 2
.byte 3, 8, 6, 3, 0, 5, 0, 7, 1, 5, 6, 9, 3, 2, 9, 0, 9, 6, 3, 2, 9, 5, 2, 2
.byte 7, 4, 4, 3, 0, 4, 3, 5, 5, 7, 6, 6, 8, 9, 6, 6, 4, 8, 9, 5, 0, 4, 4, 5
.byte 2, 4, 4, 5, 2, 3, 1, 6, 1, 7, 3, 1, 8, 5, 6, 4, 0, 3, 0, 9, 8, 7, 1, 1
.byte 1, 2, 1, 7, 2, 2, 3, 8, 3, 1, 1, 3, 6, 2, 2, 2, 9, 8, 9, 3, 4, 2, 3, 3
.byte 8, 0, 3, 0, 8, 1, 3, 5, 3, 3, 6, 2, 7, 6, 6, 1, 4, 2, 8, 2, 8, 0, 6, 4
.byte 4, 4, 4, 8, 6, 6, 4, 5, 2, 3, 8, 7, 4, 9, 3, 0, 3, 5, 8, 9, 0, 7, 2, 9
.byte 6, 2, 9, 0, 4, 9, 1, 5, 6, 0, 4, 4, 0, 7, 7, 2, 3, 9, 0, 7, 1, 3, 8, 1
.byte 0, 5, 1, 5, 8, 5, 9, 3, 0, 7, 9, 6, 0, 8, 6, 6, 7, 0, 1, 7, 2, 4, 2, 7
.byte 1, 2, 1, 8, 8, 3, 9, 9, 8, 7, 9, 7, 9, 0, 8, 7, 9, 2, 2, 7, 4, 9, 2, 1
.byte 9, 0, 1, 6, 9, 9, 7, 2, 0, 8, 8, 8, 0, 9, 3, 7, 7, 6, 6, 5, 7, 2, 7, 3
.byte 3, 3, 0, 0, 1, 0, 5, 3, 3, 6, 7, 8, 8, 1, 2, 2, 0, 2, 3, 5, 4, 2, 1, 8
.byte 0, 9, 7, 5, 1, 2, 5, 4, 5, 4, 0, 5, 9, 4, 7, 5, 2, 2, 4, 3, 5, 2, 5, 8
.byte 4, 9, 0, 7, 7, 1, 1, 6, 7, 0, 5, 5, 6, 0, 1, 3, 6, 0, 4, 8, 3, 9, 5, 8
.byte 6, 4, 4, 6, 7, 0, 6, 3, 2, 4, 4, 1, 5, 7, 2, 2, 1, 5, 5, 3, 9, 7, 5, 3
.byte 6, 9, 7, 8, 1, 7, 9, 7, 7, 8, 4, 6, 1, 7, 4, 0, 6, 4, 9, 5, 5, 1, 4, 9
.byte 2, 9, 0, 8, 6, 2, 5, 6, 9, 3, 2, 1, 9, 7, 8, 4, 6, 8, 6, 2, 2, 4, 8, 2
.byte 8, 3, 9, 7, 2, 2, 4, 1, 3, 7, 5, 6, 5, 7, 0, 5, 6, 0, 5, 7, 4, 9, 0, 2
.byte 6, 1, 4, 0, 7, 9, 7, 2, 9, 6, 8, 6, 5, 2, 4, 1, 4, 5, 3, 5, 1, 0, 0, 4
.byte 7, 4, 8, 2, 1, 6, 6, 3, 7, 0, 4, 8, 4, 4, 0, 3, 1, 9, 9, 8, 9, 0, 0, 0
.byte 8, 8, 9, 5, 2, 4, 3, 4, 5, 0, 6, 5, 8, 5, 4, 1, 2, 2, 7, 5, 8, 8, 6, 6
.byte 6, 8, 8, 1, 1, 6, 4, 2, 7, 1, 7, 1, 4, 7, 9, 9, 2, 4, 4, 4, 2, 9, 2, 8
.byte 2, 3, 0, 8, 6, 3, 4, 6, 5, 6, 7, 4, 8, 1, 3, 9, 1, 9, 1, 2, 3, 1, 6, 2
.byte 8, 2, 4, 5, 8, 6, 1, 7, 8, 6, 6, 4, 5, 8, 3, 5, 9, 1, 2, 4, 5, 6, 6, 5
.byte 2, 9, 4, 7, 6, 5, 4, 5, 6, 8, 2, 8, 4, 8, 9, 1, 2, 8, 8, 3, 1, 4, 2, 6
.byte 0, 7, 6, 9, 0, 0, 4, 2, 2, 4, 2, 1, 9, 0, 2, 2, 6, 7, 1, 0, 5, 5, 6, 2
.byte 6, 3, 2, 1, 1, 1, 1, 1, 0, 9, 3, 7, 0, 5, 4, 4, 2, 1, 7, 5, 0, 6, 9, 4
.byte 1, 6, 5, 8, 9, 6, 0, 4, 0, 8, 0, 7, 1, 9, 8, 4, 0, 3, 8, 5, 0, 9, 6, 2
.byte 4, 5, 5, 4, 4, 4, 3, 6, 2, 9, 8, 1, 2, 3, 0, 9, 8, 7, 8, 7, 9, 9, 2, 7
.byte 2, 4, 4, 2, 8, 4, 9, 0, 9, 1, 8, 8, 8, 4, 5, 8, 0, 1, 5, 6, 1, 6, 6, 0
.byte 9, 7, 9, 1, 9, 1, 3, 3, 8, 7, 5, 4, 9, 9, 2, 0, 0, 5, 2, 4, 0, 6, 3, 6
.byte 8, 9, 9, 1, 2, 5, 6, 0, 7, 1, 7, 6, 0, 6, 0, 5, 8, 8, 6, 1, 1, 6, 4, 6
.byte 7, 1, 0, 9, 4, 0, 5, 0, 7, 7, 5, 4, 1, 0, 0, 2, 2, 5, 6, 9, 8, 3, 1, 5
.byte 5, 2, 0, 0, 0, 5, 5, 9, 3, 5, 7, 2, 9, 7, 2, 5, 7, 1, 6, 3, 6, 2, 6, 9
.byte 5, 6, 1, 8, 8, 2, 6, 7, 0, 4, 2, 8, 2, 5, 2, 4, 8, 3, 6, 0, 0, 8, 2, 3
.byte 2, 5, 7, 5, 3, 0, 4, 2, 0, 7, 5, 2, 9, 6, 3, 4, 5, 0

.section .rodata
.align  2
resstring:
        .asciz "%d\n"
.text
.align  2
.global main
.type   main, %function
main:
 stmfd   sp!, {r4-r9, lr}
 ldr address, =buffer
 ldr ocounter, =outer
outer_start:
 ldr icounter, =inner
 mov tmp, #1
inner_start:
 ldrb thisbyte, [address], 1
 mul tmp, tmp, thisbyte
 subs icounter, icounter, 1
 bne inner_start
 cmp maxv, tmp
 movlt maxv, tmp            @ overwrite maxv with tmp if tmp is bigger 
 sub address, address, 4
 subs ocounter, ocounter, 1
 bne outer_start

printme:
 mov     r1, maxv
 ldr     r0, =resstring     @ store address of start of string to r0
 bl      printf

 mov r0, 0
 ldmfd   sp!, {r4-r9, pc} 
 mov     r7, 1              @ set r7 to 1 - the syscall for exit
 swi     0                  @ then invoke the syscall from linux

Finding prime numbers

2013-07-11T12:08:00.000-07:00

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.

What is the 10 001st prime number?

.syntax unified

.equ limit,10000
.equ limit4,40000

.align 4

number     .req r4
count      .req r5
numprimes  .req r6
primes_ptr .req r7

.section .bss
.lcomm primes_vector,limit4 # sufficient for 10000 32-bit ints

.section .rodata
 .align 2
resstring:
 .asciz "%d\n"
.text
 .align 2
 .global main
 .type main, %function
main:
 stmfd sp!, {r4-r7, lr}

 ldr     primes_ptr, =primes_vector
 mov     numprimes, 1
 mov     number, 2
 str     number, [primes_ptr]

 ldr count, =limit
 mov number, 3                                 @ 2 is the first prime
loop:
 mov     r0, number
 ldr     r1, =primes_vector
 mov     r2, numprimes
 bl      prime_vector
 teq     r0, 1
 bne     nexti
 str     number, [primes_ptr, numprimes, lsl 2]
 add     numprimes, numprimes, 1
 subs count, count, 1
 beq printme
nexti:
 add number, number, 2
 b loop
 
printme:
 mov r1, number
 ldr r0, =resstring                            @ store address of start of string to r0
 bl printf

 mov r0, 0
 ldmfd sp!, {r4-r7, pc}
 mov r7, 1                                     @ set r7 to 1 - the syscall for exit
 swi 0                                         @ then invoke the syscall from linux

prime_vector.s

.syntax unified .equ word, 4 # this subroutine returns 1 if the passed number is prime; 0 if not # # inputs # r0 - integer to test # r1 - pointer to vector of prime integers smaller than r0 # r2 - length of vector passed in r1 # # outputs # r0 - prime boolean number .req r4 vptr .req r5 tmp .req r6 squared .req r7 vsize .req r8 .global prime_vector .type prime_vector, %function .text .align 2 prime_vector: stmfd sp!, {r4-r8, lr} mov number, r0 mov vptr, r1 mov vsize, r2 nexti: ldr tmp, [vptr], word mul squared, tmp, tmp cmp squared, number @ check if vector element squared is greater than test number movgt r0, 1 @ if so test numer is prime so leave bgt last mov r0, number mov r1, tmp bl divide @ divide number by vector element - if no remainder teq r1, 0 @ it is not prime so leave moveq r0, 0 beq last subs vsize, vsize, 1 @ check next element bgt nexti mov r0, 0 last: ldmfd sp!, {r4-r8, pc}

divide.s

Sum squares and square sums

2013-07-08T12:35:00.003-07:00

The sum of the squares of the first ten natural numbers is,

1² + 2² + ... + 10² = 385

The square of the sum of the first ten natural numbers is,

(1 + 2 + ... + 10)² = 55² = 3025

Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 - 385 = 2640.

Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

.syntax unified

.equ limit,100

number .req r4
sumsq .req r5
sqsum .req r6
tmp .req r7

.section .rodata
 .align 2
string:
 .asciz "%d\n"
.text
 .align 2
 .global main
 .type main, %function
main:
 stmfd sp!, {r4-r7, lr}
 mov sqsum, 0
 mov sumsq, 0
 ldr number, =limit
loop:
 mul tmp, number, number
 add sqsum, sqsum, tmp
# decrement number and loop or exit
 subs number, number, 1
 beq end_loop
 b loop
end_loop:
 ldr number, =limit
 add number, number, 1
 ldr sumsq, =limit
 mul sumsq, sumsq, number
 lsr sumsq, sumsq, 1
 mul sumsq, sumsq, sumsq
last:
 sub tmp, sumsq, sqsum 
 mov r1, tmp
 ldr r0, =string         @ store address of start of string to r0
 bl printf

 mov r0, 0
 ldmfd sp!, {r4-r7, pc}
 mov r7, 1     @ set r7 to 1 - the syscall for exit
 swi 0     @ then invoke the syscall from linux

Divisibility

2013-07-02T11:35:00.002-07:00

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.

What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?

.syntax unified

.equ limit,20

.align 4

@ algorithm
@ initialise try_products to 1
@ foreach number > 1 and <= limit
@ test if it is prime
@ if try_products is set, then multiply the number by itself 
@ while it does not exceed limit, then multiply the total by
@ this product. if the number squared exceeds the limit, then 
@ set try_product to 0.
@ if try_products is 0 and the number is prime, then multiply
@ the total by number.

try_product .req r4
number      .req r5
last        .req r6
total       .req r7
tmp         .req r8

.section .rodata
 .align 2
resstring:
 .asciz "%d\n"
.text
 .align 2
 .global main
 .type main, %function
main:
 stmfd sp!, {r4-r8, lr}

 mov total, 1
 mov try_product, 1
 mov number, 2
loop:
 mov r0, number
 bl isprime20
 cmp r0, 1
 bne nexti
 
 cmp try_product, 1
 bne no_product
 mul tmp, number, number
 cmp tmp, limit
 ble prod_start
 mov try_product, 0
 b no_product
prod_start:
 mov tmp, number
 mov last, tmp
prod_loop:
 cmp tmp, limit
 bgt last_mul
 mov last, tmp
 mul tmp, tmp, number
 b prod_loop
last_mul:
 mul total, total, last
no_product:
 cmp try_product, 0
 bne nexti
 mul total, total, number
nexti:
 cmp number, limit
 beq printme
 add number, number, 1
 b loop
 
printme:
 mov r1, total
 ldr r0, =resstring        @ store address of start of string to r0
 bl printf

 mov r0, 0
 ldmfd sp!, {r4-r8, pc}
 mov r7, 1                @ set r7 to 1 - the syscall for exit
 swi 0                    @ then invoke the syscall from linux

# this subroutine returns 1 if the passed number (<= 20) is prime; 0 if not
#
# inputs
#   r0 - integer to test
#
# outputs
#   r0 - prime boolean

.global isprime20
.type isprime20, %function
.text
.align 2

isprime20:
 stmfd sp!, {lr}
 mov r1, r0
 ands r2, r1, 1
 bne odd
 mov r0, 0
 cmp r1, 2                @ 2 is the only prime even r1
 bne last
 mov r0, 1
 b last
odd:
 mov r0, 1
 cmp r1, 9                @ 9 and 15 are the only non-prime 
 bne test15               @ odd numbers less than 20   
 mov r0, 0
 b last
test15:
 cmp r1, 15
 bne last
 mov r0, 0
last:
 ldmfd sp!, {pc}

Palindromic numbers

2013-07-01T11:49:00.002-07:00

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 99.

Find the largest palindrome made from the product of two 3-digit numbers.

.syntax unified

.equ max3,999
.equ min3,100
.equ maxdigits,6

i       .req r4
j       .req r5
product .req r6
maxp    .req r7
mini    .req r8
minj    .req r9
maxj    .req r10

.section .rodata
 .align 2
sumstring:
 .asciz "%d\n"

.text
 .align 2
 .global main
 .type main, %function
main:
 stmfd sp!, {r4-r10, lr}
 ldr i, =max3
 ldr mini, =min3
 ldr maxj, =max3
 ldr minj, =min3
iloop:
 mov j, maxj
jloop:
 mul product, i, j

 mov r0, product
 bl is_palindromic
 cmp r0, #1
 bne next
 cmp product, maxp
 ble next
 mov maxp, product
 mov r0, product
 bl divide_by_10         @ divides r0 by 10 
 bl divide_by_10         @ so 3 consecutive calls
 bl divide_by_10         @ will divide by 1000
 mov minj, r0
 mov minj, r0

next:
 subs j, j, 1
 cmp j, minj
 bgt jloop

 subs i, i, 1
 mov maxj, i
 cmp i, mini
 bgt iloop

last:
 mov r1, maxp  
 ldr r0, =sumstring      @ store address of start of string to r0
 bl printf

 mov r0, 0
 ldmfd sp!, {r4-r10, pc}
 mov r7, 1               @ set r7 to 1 - the syscall for exit
 swi 0                   @ then invoke the syscall from linux

A division of the found product by 1000 to reset the minima is an optimisation to speed termination.

We divide by 10 three times in succession - we could write a routine to divide by 1000, but one to divide by 10 is much more useful.

ispalindromic.s

.syntax unified .equ datum_size, 1 .equ digits, 6 # this subroutine returns 1 if the passed 6-digit number is palindromic; 0 if not # the number is a product of 2 3-digit numbers so we assume the product has 6 digits # # inputs # r0 - integer to test # # outputs # r0 - palindromic boolean # # local # left .req r4 right .req r5 counter .req r6 buffer_address .req r7 running .req r8 tmp .req r9 # .section .bss .lcomm buffer, 6 @ use digits here? # .global is_palindromic .type is_palindromic, %function .global get_digits .type get_digits, %function .text .align 2 # is_palindromic: stmfd sp!, {r4-r9, lr} bl get_digits mov counter, 3 @ digits >> 1 ldr buffer_address, =buffer ip_last: @ the next few lines are the nub of the problem sub left, buffer_address, counter @ on the first pass we compare x[0] and x[5] ldrb tmp, [left, 3] @ if they are equal we compare x[1] and x[4] add right, buffer_address, counter @ if they are equal we compare x[2] and x[3] ldrb running, [right, 2] teq tmp, running bne no @ mismatch between tested digits subs counter, counter, 1 bgt ip_last @ test next pair for equality mov r0, 1 b last no: mov r0, 0 last: ldmfd sp!, {r4-r9, pc} # get_digits: @ this routine writes a number into a string stmfd sp!, {r7-r8, lr} ldr running, =digits ldr buffer_address, =buffer gd_loop: subs running, running, 1 bl divide_by_10_remainder strb r1, [buffer_address], #datum_size bgt gd_loop gd_last: ldmfd sp!, {r7-r8, pc}

divide_by_10.s

.syntax unified # this subroutine divides the passed number by 10 and # returns the dividend and remainder # # The const -0x33333333 is 0xccccccd (2s complement) # 0xcccccccc is 12/15th (0.8) of 0xffffffff and we use this as # a multiplier, then shift right by 3 bits (divide by 8) to # effect a multiplication by 0.1 # # We multiply this number by 10 (multiply by 4, add 1 then multiply by 2) # and subtract from the original number to give the remainder on division # by 10. # # inputs # r0 - integer to divide # # outputs # r0 - the dividend # r1 - the remainder .equ const,-0x33333333 #.equ const,0xcccccccd .text .align 2 .global divide_by_10_remainder .type divide_by_10_remainder, %function divide_by_10_remainder: stmfd sp!, {lr} cmp r0, 10 blt rsmall ldr r1, =const umull r2, r3, r1, r0 mov r2, r3, lsr #3 @ r2 = r3 / 8 == r0 / 10 mov r3, r2 @ r3 = r2 mov r3, r3, asl #2 @ r3 = 4 * r3 add r3, r3, r2 @ r3 = r3 + r2 mov r3, r3, asl #1 @ r3 = 2 * r3 rsb r3, r3, r0 @ r3 = r0 - r3 = r0 - 10*int(r0/10) mov r1, r3 @ the remainder mov r0, r2 @ the dividend b rlast rsmall: mov r1, r0 mov r0, 0 rlast: ldmfd sp!, {pc} # this subroutine divides the passed number by 10 # returns the dividend # # The const -0x33333333 is 0xccccccd (2s complement) # 0xcccccccc is 12/15th (0.8) of 0xffffffff and we use this as # a multiplier, then shift right by 3 bits (divide by 8) to # effect a multiplication by 0.1 # # We multiply this number by 10 (multiply by 4, add 1 then multiply by 2) # # inputs # r0 - integer to divide # # outputs # r0 - the dividend .align 2 .global divide_by_10 .type divide_by_10, %function divide_by_10: stmfd sp!, {lr} cmp r0, 10 blt small ldr r1, =const umull r2, r3, r1, r0 mov r2, r3, lsr #3 @ r2 = r3 / 8 == r0 / 10 mov r0, r2 @ the dividend b last small: mov r0, 0 last: ldmfd sp!, {pc}

Prime factors

2013-06-30T06:31:00.002-07:00

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143 ?


.syntax unified

@ 600851475143 is 8BE589EAC7 in hex

.equ numhi,139        @ 0x8b
.equ numlo,3851020999 @ 0xe589eac7

num_hi  .req r4
num_lo  .req r5
maxdiv  .req r6
n       .req r7

.section .rodata
 .align 2
resstring:
 .asciz "%d\n"
.text
 .align 2
 .global main
 .type main, %function
main:
 stmfd sp!, {r4-r7, lr}

 mov maxdiv, 0
 mov n, 1
 ldr num_lo, =numlo
 ldr num_hi, =numhi
loop:
 add n, n, 2           @ start at 3 and increment by 2
loop1:
 mov r0, num_lo
 mov r1, num_hi
 mov r2, n
 mov r3, 0
 bl  long_divide
 teq r2, 0
 bne loop
 teq r3, 0
 bne loop
 mov num_lo, r0         @ only get here when we have no remainder
 mov num_hi, r1         @ save the divisor as the new number
 mov r0, n
 bl isprime
 teq r0, 1
 bne loop               @ increment n if n is non-prime
 cmp maxdiv, n
 movlt maxdiv, n        @ save n as the largest divisor if it is larger
 teq num_lo, 1          @ we know it has prime factors
 beq printme
 b loop1
printme:
 mov r1, maxdiv
 ldr r0, =resstring     @ store address of start of string to r0
 bl printf

 mov r0, 0
 ldmfd sp!, {r4-r7, pc}
 mov r7, 1              @ set r7 to 1 - the syscall for exit
 swi 0                  @ then invoke the syscall from linux

The ARM is a RISC microprocessor and does not provide a divide command in the instruction set. We use a division routine from libgcc.

Using the -S option to gcc when a c file is compiled will generate an assembly language file which can be read for enlightenment.

isprime.s

.syntax unified # this subroutine returns 1 if the passed number is prime; 0 if not # inputs # r0 - integer to test # outputs # r0 - prime boolean number .req r4 divisor .req r5 tmp .req r6 .global isprime .type isprime, %function .text .align 2 isprime: stmfd sp!, {r4-r6, lr} mov number, r0 ands tmp, number, 1 bne odd mov r0, 0 cmp number, 2 @ 2 is the only prime even number bne last mov r0, 1 b last odd: mov divisor, 3 cmp number, 8 bgt big mov r0, 1 cmp number, 1 @ 1 is the only odd number < 8 not prime bne last mov r0, 0 b last big: mov r0, number mov r1, divisor bl divide teq r1, 0 beq factor add divisor, divisor, 2 mul tmp, divisor, divisor subs tmp, tmp, number ble big mov r0, 1 b last factor: mov r0, 0 last: ldmfd sp!, {r4-r6, pc}

The isprime function uses a naive, yet straightforward, algorithm to determine whether a given number is prime.

divide.s

Our divide and long_divide routines are wrappers around the libgcc functions to allow comments.

long_divide.s

.syntax unified # long_divide takes low numerator in r0, high numerator in r1, low denominator in r2 and high denominator in r3 # returns low quotient in r0, high quotient in r1, low remainder in r2 and high remainder in r3 .global long_divide .type long_divide, %function long_divide: stmfd sp!, {lr} # see https://codereview.chromium.org/5302007/diff/12001/arch/arm/lib/_uldivmod.S bl __aeabi_uldivmod ldmfd sp!, {pc}

The Fibonacci sequence

2013-06-28T00:50:00.002-07:00

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

.syntax unified

.equ maxfib,4000000

previous .req r4
current  .req r5
next     .req r6
sum      .req r7
max      .req r8
tmp      .req r9

.section .rodata
 .align 2
fibstring:
 .asciz "fib is %d\n"
sumstring:
 .asciz "%d\n"

.text
 .align 2
 .global main
 .type main, %function
main:
 stmfd sp!, {r4-r9, lr}
 ldr max,   =maxfib
 mov previous, 1 
 mov current, 1
 mov sum, 0

loop:
 cmp current, max
 bgt last

 add next, current, previous

 movs tmp, current, lsr 1      @ set carry flag from lsr - for the odd-valued terms
                               @ we discard the result of the movs and are only interested
                               @ in the side effect of the lsr which pushes the lower bit
                               @ of current (1 for odd; 0 for even) into the carry flag
                               @ movs will update the status register (c.f. mov which will not)
 addcc sum, sum, current       @ we add current to the sum ONLY when cc (carry clear) is true 
                               @ these are even-valued fibonacci terms

 mov previous, current
 mov current, next
 b loop

last:
 mov r1, sum  
 ldr r0, =sumstring            @ store address of start of string to r0
 bl printf

 mov r0, 0
 ldmfd sp!, {r4-r9, pc}
 mov r7, 1                     @ set r7 to 1 - the syscall for exit
 swi 0                         @ then invoke the syscall from linux

Find the sum of all the multiples of 3 or 5

2013-06-25T14:54:00.001-07:00

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

.syntax unified

.equ max3start,999
.equ max5start,995

number  .req r4
matched .req r5
sum     .req r6
max3    .req r7
max5    .req r8

.section .rodata
 .align 2
string:
 .asciz "%d\n"
.text
 .align 2
 .global main
 .type main, %function
main:
 stmfd sp!, {r4-r8, lr}
 ldr max5,   =max5start
 ldr max3,   =max3start
 ldr number, =max3start      @ start at 1000 - 1 ; numbers < 1000
 mov matched, 0
 mov sum, 0
loop:
 cmp number, max3
 bne test5

# matched a multiple of 3 - decrement max3
# add to sum and set matched to 1
 mov matched, 1
 add sum, sum, number
 subs max3, max3, 3

test5:
 cmp number, max5
 bne last

# matched a multiple of 5 - decrement max5
# add to sum and set matched to 1
 subs max5, max5, 5
 cmp matched, 1              @ have we already added it?
 addne sum, sum, number      @ if not add it to the total

last:
# decrement number and reset matched and loop
 mov matched, 0
 subs number, number, 1
 bne loop

 mov r1, sum  
 ldr r0, =string             @ store address of start of string to r0
 bl printf

 mov r0, 0
 ldmfd sp!, {r4-r8, pc}
 mov r7, 1                   @ set r7 to 1 - the syscall for exit
 swi 0                       @ then invoke the syscall from linux

Armstrong numbers

2013-06-23T07:23:00.000-07:00

Armstrong (or Narcissistic) numbers are a special set of integers.

An n-digit number is an Armstrong number if the sum of the n-th power of the digits is equal to the original number. For example, 371 is an Armstrong number because the sum of 3 cubed, 7 cubed and 1 cubed is equal to 371.

Here we compute all the Armstrong numbers of 2, 3 and 4 digits.

Power function

# this subroutine returns the passed digit to the passed power
#
# inputs
#   r0 - digit
#   r1 - power 
#
# outputs
#   r0 - digit ** power
#
# locals
#   r4
.globl _power
.align 2
        .text
_power:
 nop
 stmfd sp!, {r4, lr}           @ save variables to stack

 subs r1, r1, #1               @ leave unless power > 1
 ble _power_end

 mov r4, r0                    @ copy digit
_power_loop_start:
 mul r0, r4, r0                @ raise to next power
 subs r1, r1, #1  
 beq _power_end                @ leave when done
 b _power_loop_start           @ next iteration
_power_end:
 ldmfd   sp!, {r4, pc}         @ restore state from stack and leave subroutime

Armstrong function

# inputs
#   r0 - number
#
# outputs
#   r0 - armstrong number
#
# local r4, r5, r6, r7, r8

.equ ten,10
.equ hundred,100
.equ thousand,1000
.equ ten_thousand,10000

number .req r4
width .req r5
digit .req r6
current .req r7
armstrong .req r8

.globl _armstrong
.align 2
        .text
_armstrong:
 nop
 stmfd   sp!, {r4, r5, r6, r7, r8, lr}          @ save variables to stack

 mov     number, r0                             @ copy passed parameter to working number
 cmp number, #ten                               @ exit unless number > 10
 blt _end

 ldr     current, =ten_thousand                 @ exit unless number < 10000
 cmp number, current
 bge _end

 mov width, #0                                  @ initialise
 mov digit, #0
 mov armstrong, #0
 ldr current, =thousand                         @ handle 1000 digit
_thousand_start:
 cmp number, current
 blt _thousand_end                              @ exit thousand code if none left
 
 mov width, #4                                  @ width must be 4
 add current, current, #thousand                @ bump thousand counter
 add digit, digit, #1                           @ and corresponding digit count
 b _thousand_start                              @ and loop
_thousand_end:
 add number, number, #thousand                  @ need number modulo thousand
 sub number, number, current
 mov r0, digit                                  @ push digit
 mov r1, width                                  @ and width
 bl _power                                      @ to compute digit **width
 add armstrong, r0, armstrong                   @ and update armstrong number with this value

 ldr current, =hundred                          @ then we do the hundreds as we did the thousands
 mov digit, #0
_hundred_start:
 cmp number, current
 blt _hundred_end
 
 teq width, #0                                  @ and only set width if it is currently unset
 moveq width, #3
_hundred_width:
 add current, current, #hundred                 @ yada yada as thousands above
 add digit, digit, #1
 b _hundred_start
_hundred_end:
 add number, number, #hundred
 sub number, number, current
 mov r0, digit
 mov r1, width
 bl _power
 add armstrong, r0, armstrong

 ldr current, =ten                              @ then the tens as the hundred and thousands above
 mov digit, #0
_ten_start:
 cmp number, current
 blt _ten_end
 
 teq width, #0
 moveq width, #2
_ten_width:
 add current, current, #ten
 add digit, digit, #1
 b _ten_start
_ten_end:
 add number, number, #ten
 sub number, number, current
 mov r0, digit
 mov r1, width
 bl _power
 add armstrong, r0, armstrong

 mov r0, number                                 @ then add in the trailing digits
 mov r1, width
 bl _power
 add armstrong, r0, armstrong

 mov r0, armstrong                              @ and copy the armstrong number back to r0 for return
_end:
 ldmfd   sp!, {r4, r5, r6, r7, r8, pc}          @ restore state from stack and leave subroutine

We improve the legibility of the code by e.g. number .req r4. This textual substitution does increase the semantic value of the code considerably.

Armstrong function with a macro to abstract repeated code

N.B. This is functionally equivalent but much shorter than the previous function. The variable \@ here is a magic variable, incremented each time the macro is instantiated. This enables the use of distinct labels, which we need here.

# inputs
#   r0 - number
#
# outputs
#   r0 - armstrong number
#
# local r4, r5, r6, r7, r8

.equ ten,10
.equ hundred,100
.equ thousand,1000
.equ ten_thousand,10000

number .req r4
width .req r5
digit .req r6
current .req r7
armstrong .req r8

.macro armstrong_digit a, b
 ldr current, =\a
 mov digit, #0
_start\@:
 cmp number, current
 blt _end\@
 
 teq width, #0                                  @ and only set width if it is currently unset
 moveq width, #\b
 add current, current, #\a
 add digit, digit, #1
 b _start\@
_end\@:
 add number, number, #\a
 sub number, number, current
 mov r0, digit
 mov r1, width
 bl _power
 add armstrong, r0, armstrong
.endm

.globl _armstrong
.align 2
        .text
_armstrong:
 nop
 stmfd   sp!, {r4, r5, r6, r7, r8, lr}          @ save variables to stack

 mov     number, r0                             @ copy passed parameter to working number
 cmp number, #ten                               @ exit unless number > 10
 blt _end

 ldr     current, =ten_thousand                 @ exit unless number < 10000
 cmp number, current
 bge _end

 mov width, #0                                  @ initialise
 mov armstrong, #0

 armstrong_digit thousand 4
 armstrong_digit hundred 3
 armstrong_digit ten 2

 mov r0, number                                 @ then add in the trailing digits
 mov r1, width
 bl _power
 add armstrong, r0, armstrong

 mov r0, armstrong                              @ and copy the armstrong number back to r0 for return
_end:
 ldmfd   sp!, {r4, r5, r6, r7, r8, pc}          @ restore state from stack and leave subroutine

Here we introduce the use of the macro assembly functions of as Again this change leads to improved code - we eliminate duplication thus improving maintainability and legibility.

Armstrong_main function

.equ ten,10
.equ ten_thousand,10000

.section .rodata
 .align 2
string:
 .asciz "armstrong number of %d is %d\n"
.text
 .align 2
 .global main
 .type main, %function
main:
 ldr r5, =ten
 ldr r6, =ten_thousand

 mov r4, r5            @ start with n = 10
_main_loop:
 cmp r4, r6            @ leave if n = 10_000
 beq _main_end

 mov r0, r4            @ call the _armstrong function
 bl _armstrong

 teq r0, r4            @ if the armstong value = n print it
 bne _main_next        @ else skip

 mov r2, r0
 mov r1, r4
 ldr r0, =string       @ store address of start of string to r0
 bl printf             @ call the c function to display information
_main_next:
 add r4, r4, #1
 b _main_loop
_main_end:
 mov r7, #1            @ set r7 to 1 - the syscall for exit
 swi 0                 @ then invoke the syscall from linux

Result

bob@poland:~/src/problems_for_computer_solution/07_armstrong_numbers$ ./armstrong4macro 
armstrong number of 153 is 153
armstrong number of 370 is 370
armstrong number of 371 is 371
armstrong number of 407 is 407
armstrong number of 1634 is 1634
armstrong number of 8208 is 8208
armstrong number of 9474 is 9474
bob@poland:~/src/problems_for_computer_solution/07_armstrong_numbers$

Use of the floating point unit

2013-06-22T11:52:00.001-07:00

This demonstrate the simple use of the floating-point unit and printing of floating point vaues - works on the Raspberry Pi

.section .rodata
string:
 .asciz "pi (%f) times e (%f) is %f\n"
string2:
 .asciz "float (%f)\n"
.text
.global main
.type main, %function
main:
 stmfd sp!, {fp, lr}    @ save the frame pointer (r11) and link register (r14) to stack
 sub sp, sp, #16        @ allocate space on stack for 16 bytes (stack grows downwards)

 ldr r3, const_pi       @ put the address of const_pi in r3
 str r3, [r1]           @ store this address (in r3) to the address pointed to by r1
 flds s14, [r1]         @ load a 32 bit word (s14) from the address pointed to by r1 (const_pi)
 fcvtds d5, s14         @ convert the single precision (s14) to double precision (d5)

 ldr r3, const_e        @ same comments as const_pi 
 str r3, [r2]
 flds s15, [r2]
 fcvtds d6, s15         @ convert the single precision (s14) to double precision (d5)  

 fmuls s15, s14, s15    @ multiply s14 (pi) by s15 (e) and store in r15
 fcvtds d7, s15         @ convert the result in s15 to a double (d7)

 fstd d6, [sp]          @ push the double d6 (const_e) on to the stack
 fstd d7, [sp, #8]      @ push the double d7 (const_pi * const_e) on to the stack
 ldr r0, =string        @ put the address of the first string in r0
 fmrrd r2, r3, d5       @ fmrrd transfers the contents of the low half of d5 into r2,
                        @ and the contents of the high half of d5 into r3.                 
 bl printf              @ we can then print 3 double precision numbers

 ldr r0, =string2     
 fmrrd r2, r3, d5      
 bl printf              @ print const_pi

 ldr r0, =string2
 fmrrd r2, r3, d6
 bl printf              @ print const_e

 ldr r0, =string2
 fmrrd r2, r3, d7
 bl printf              @ print the product of pi * e
    
 add sp, sp, #16        @ restore the stack pointer (discard the 2 double precision locals)
 ldmfd sp!, {fp, pc}    @ restore the frame pointer and link register from the stack

 mov r0, #0
 mov r7, #1             @ set r7 to 1 - the syscall for exit
 swi 0                  @ then invoke the syscall from linux

 .align 2
const_pi:
 .float 3.1415926
const_e:
 .float 2.718281

bob@sweden:~/src/asm$ gcc -gstabs -o fp_mul fp_mul.s
bob@sweden:~/src/asm$ ./fp_mul
pi (3.141593) times e (2.718281) is 8.539731
float (3.141593)
float (2.718281)
float (8.539731)
bob@sweden:~/src/asm$

printf requires that the first double precision number is stored in the 2 integer registers, r2 and r3, and any subsequent double precision numbers are pushed on to the stack. This program demonstrates this usage.

Using gdb can show more details

(gdb) b main
Breakpoint 1 at 0x83d0: file fp_mul.s, line 11.
(gdb) run
Starting program: /home/bob/src/asm/fp_mul 

Breakpoint 1, main () at fp_mul.s:11
11  sub sp, sp, #16
(gdb) i r
r0             0x1 1
r1             0xbefffd24 3204447524
r2             0xbefffd2c 3204447532
r3             0x83cc 33740
r4             0x0 0
r5             0x0 0
r6             0x8320 33568
r7             0x0 0
r8             0x0 0
r9             0x0 0
r10            0xb6fff000 3070226432
r11            0x0 0
r12            0xb6fb5000 3069923328
sp             0xbefffbd0 0xbefffbd0
lr             0xb6ea181c -1226172388
pc             0x83d0 0x83d0 
cpsr           0x60000010 1610612752
(gdb) s
13  ldr r3, const_pi
(gdb) i r r3
r3             0x83cc 33740
(gdb) s
14  str r3, [r1]
(gdb) i r r3
r3             0x40490fda 1078530010
(gdb) i r r1
r1             0xbefffd24 3204447524
(gdb) s
15  flds s14, [r1]
(gdb) s
16  fcvtds d5, s14
(gdb) i r s14
s14            3.1415925 (raw 0x40490fda)
(gdb) x/4x const_pi
0x8448 : 0xda 0x0f 0x49 0x40
(gdb) i r r1
r1             0xbefffd24 3204447524
(gdb) i r r3
r3             0x40490fda 1078530010
(gdb) x/4x 0xbefffd24
0xbefffd24: 0xda 0x0f 0x49 0x40
(gdb) s
18  ldr r3, const_e
(gdb) 
19  str r3, [r2]
(gdb) 
20  flds s15, [r2]
(gdb) 
21  fcvtds d6, s15
(gdb) 
23  fmuls s15, s14, s15
(gdb) i r s14
s14            3.1415925 (raw 0x40490fda)
(gdb) i r s15
s15            2.71828103 (raw 0x402df851)
(gdb) s
24  fcvtds d7, s15
(gdb) i r s15
s15            8.53973103 (raw 0x4108a2bd)
(gdb) s
26  fstd d6, [sp]
(gdb) i r d7
d7             8.5397310256958008 (raw 0x40211457a0000000)
(gdb) s
27  fstd d7, [sp, #8]
(gdb) s
28  ldr r0, =string
(gdb) s
29  fmrrd r2, r3, d5
(gdb) s
30  bl printf
(gdb) i r r2
r2             0x40000000 1073741824
(gdb) i r r3
r3             0x400921fb 1074340347
(gdb) i r d5
d5             3.1415925025939941 (raw 0x400921fb40000000)
(gdb)

I do think that the code may be more robust if we were to define some storage space for the pointers r1 and r2 - if these were null when we enter the routine we would dereference null pointers which is never good. The registers, r0-r3 are specified by as temporary registers by the calling convention so we are at the mercy of the contents of these registers from the previous call?

A bubble sort function

2013-06-22T09:45:00.001-07:00

A bubble sort is the simplest sorting algorithm. It compares each vector element with it's successor and swaps them if they are out of order. It is inefficient, but sufficient for small vectors.

# this subroutine bubble sorts vectors of words
#
# inputs
#   r0 - start of vector
#   r1 - number of elements to sort
#
# no outputs
#
# locals
#   r4 - current pointer
#   r5 - inner counter
#   r6 - keep_going flag
#   r7 - first element
#   r8 - second element

.equ datum_size,4                                   @ works with 4 byte (32 bit) words
.globl _bubble
.text
_bubble:
 stmfd sp!, {r4, r5, r6, r7, r8, lr}                @ save variables to stack
 cmp r1, #1                                         @ number of elements must be > 1
 ble end_outer                                      @ stop if nothing to do

 sub r5, r1, #1                                     @ need n-1 comparisons
 mov r4, r0                                         @ initialise current pointer
 mov r6, #0                                         @ this register set when we swap

loop_start:
 ldr r7, [r4], #datum_size                          @ load one element
 ldr r8, [r4]                                       @ and next one
 cmp r7, r8                                         @ compare them
 ble no_swap                                        @ branch if second greater

 mov r6, #1                                         @ set keep_going flag
 sub r4, r4, #datum_size                            @ reset pointer to first element
 swp r8, r8, [r4]                                   @ exchange value in r8 and address in r4
 str r8, [r4, #datum_size]!                         @ store new r8 to incremented address
no_swap:
 subs r5, r5, #1                                    @ decrement counter
 bne loop_start                                     @ and restart loop if more needed

end_inner:
 cmp r6, #0                                         @ check keep_going flag
 beq end_outer                                      @ and leave if not set

 mov r6, #0                                         @ clear keep_going flag 
 mov r4, r0                                         @ reset pointer
 sub r5, r1, #1                                     @ reset counter
 b loop_start                                       @ start another iteration

end_outer:
        ldmfd   sp!, {r4, r5, r6, r7, r8, pc}       @ restore state from stack and leave subroutime

A bubble sort test harness to use the subroutine

@ An example of moving data from memory to a register
.globl main
.equ items,16                       @ maybe would use evalues - values >> 2 if it weren't a demo?
.section .rodata
before:
 .asciz "Before sorting, values are : %d" 
after:
 .asciz "After sorting,  values are : %d"
comma:
 .asciz ", %d"
new_line:
 .asciz "\n"
ok:
 .asciz "ok\n"
.section .data
values:
 .word 105, -7, 235, 61, 28, 315, 456, 63, 134, 97, 221, 53, 1000899, 145, 117, 5
evalues:
.text
main:
 ldr r0, =values                    @ the start of the vector
 mov r1, #items                     @ how many elements to print
 ldr r2, =before                    @ the start string
 ldr r3, =comma                     @ string for subsequent elements
 bl _vprintw                        @ print the unsorted vector
 ldr r0, =new_line
 bl  printf                         @ print the newline

 ldr r0, =values                    @ the start of the vector
 mov r1, #items                     @ how many elements to sort
 bl _bubble                         @ call the sort routine

 ldr r0, =values
 mov r1, #items
 ldr r2, =after
 ldr r3, =comma
 bl _vprintw                        @ print the sorted vector
 ldr r0, =new_line
 bl  printf

 mov     r7, #1                     @ set r7 to 1 - the syscall for exit
 swi     0                          @ then invoke the syscall from linux

bob@poland:~/www/examples$ make bubble
/usr/bin/gcc -gstabs -o bubble bubble.s bubble_sub.s vprintw.s
bob@poland:~/www/examples$ ./bubble 
Before sorting, values are : 105, -7, 235, 61, 28, 315, 456, 63, 134, 97, 221, 53, 1000899, 145, 117, 5
After sorting,  values are : -7, 5, 28, 53, 61, 63, 97, 105, 117, 134, 145, 221, 235, 315, 456, 1000899
bob@poland:~/www/examples$

Improving the output format

2013-06-22T04:56:00.000-07:00

This subroutine to display a vector of words using different strings for first and subsequent elements

# this subroutine prints vectors of words
#
# inputs
#   r0 - start of vector
#   r1 - number of elements to print
#   r2 - pointer to start of string used to print first element
#   r3 - pointer to start of string used to print subsequent elements
#
# no outputs
#
.globl _vprintw
.equ datum_size,4                @ working with 4 byte (32 bit) words
_vprintw:
 stmfd sp!, {r4, r5, r6, r7, lr} @ save registers on the stack
 cmp r1, #0                      @ exit if no elements
 ble last
 mov r4, r0                      @ copy the parameters to locals
 mov r5, r1
 mov r6, r2
 mov r7, r3
 ldr r1, [r4], #datum_size       @ load first vector element to r0 and bump pointer
 mov r0, r6                      @ address of first string to r0
 bl  printf                      @ and print it
 nop
 subs r5, r5, #1                 @ decrement counter
 beq last                        @ and fall out if zero
vprintw_loop:
 ldr r1, [r4], #datum_size       @ load next vector item to r0 and bump pointer
 mov r0, r7                      @ address of subsequent string to r0
 bl  printf                      @ and print it
 subs r5, r5, #1                 @ decrement counter
 bne vprintw_loop                @ and loop if non-zero
last:

 ldmfd sp!, {r4, r5, r6, r7, pc} @ restore registers from stack and return

A simple test harness to use the subroutine is given by

.globl main
.section .rodata
first:
 .asciz "Vector of words - values : %d"
subsequent:
 .asciz ", %d"
final:
 .asciz "\n"
values:
 .word 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60
endvalues:
.align 2
.text
main:
        ldr r0, =values
        mov r1, #11
        ldr r2, =first
        ldr r3, =subsequent
        bl _vprintw              @ print the vector elements
        ldr r0, =final
        bl printf                @ print the terminating newline

        mov     r7, #1           @ set r7 to 1 - the syscall for exit
        swi     0                @ then invoke the syscall from linux

bob@poland:~/www/examples$ make test_vprintw
/usr/bin/gcc -gstabs -o test_vprintw test_vprintw.s vprintw.s
bob@poland:~/www/examples$ ./test_vprintw
Vector of words - values : 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60
bob@poland:~/www/examples$

Structured Programs

2013-06-22T04:39:00.001-07:00

A subroutine to print a vector of bytes is:

# this subroutine prints vectors of bytes
#
# inputs
#   r0 - start of vector
#   r1 - number of elements to print
#   r2 - pointer to start of string used to print each element
#
# no outputs
#
.globl _vprintb
.equ datum_size,1                     @ 1 because we are using bytes
_vprintb:
 stmfd sp!, {r4, r5, r6, lr}          @ stash the registers we use
 mov r4, r0                           @ r4 is the pointer to the data
 mov r5, r1                           @ r5 is the counter
 mov r6, r2                           @ r6 is the pointer to the string
vprintb_loop:
 ldrb r1, [r4], #datum_size           @ put the vector element in r1 
                                      @ implicitly increment the pointer r4 by datum_size
 mov r0, r6                
 bl  printf                           @ print the string containing the vector element
 subs r5, r5, #1                      @ decrement the counter
 bne vprintb_loop                     @ loop if the counter is not zero

 ldmfd sp!, {r4, r5, r6, pc}          @ restore the stashed registers

A simple test harness to use the subroutine is given by

.equ datum_size,1
.globl main
.section .rodata
output:
 .asciz "The value is %d\n"
values:
 .byte 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60
endvalues:
.text
main:
 ldr r0, =values               @ the start of the vector
 mov r1, #11                   @ the number of elements
 ldr r2, =output               @ the output string
 bl _vprintb                   @ call the subroutine

 mov     r7, #1                @ set r7 to 1 - the syscall for exit
 swi     0                     @ then invoke the syscall from linux

bob@poland:~/www/examples$ make test_vprintb
/usr/bin/gcc -gstabs -o test_vprintb test_vprintb.s vprintb.s
bob@poland:~/www/examples$ ./test_vprintb
The value is 10
The value is 15
The value is 20
The value is 25
The value is 30
The value is 35
The value is 40
The value is 45
The value is 50
The value is 55
The value is 60
bob@poland:~/www/examples$

Converting endianness

2013-06-21T14:55:00.000-07:00

ARM v6 has a rev instruction to convert big- and little-endian values. A sheevaplug uses ARM v5TE so we have to do the inversion explicitly

# swaptest.s – Converting big-endian to little-endian and vice-versa
# In Arm V6 we can use the  rev instruction - but not supported on v5TE
.globl _start
.section .data
vstart:
 .word 0x12345678
.text
.align 2
_start:
 ldr r1, =vstart
 ldr r0, [r1]             @ load word to r0
 and r2, r0, #0xff000000  @ load the top 2 bytes to r2
 and r3, r0, #0x00ff0000  @ load the next 2 bytes to r3
 and r4, r0, #0x0000ff00  @ load the next 2 bytes to r4
 and r5, r0, #0x000000ff  @ load the bottom 2 bytes to r5
 mov r0, r2, lsr #24      @ shift r2 bytes to bottom and store to r0
 orr r0, r3, lsr #8       @ or the remaining shifted data
 orr r0, r4, lsl #8
 orr r0, r5, lsl #24
_stop:
 mov     r7, #1           @ set r7 to 1 - the syscall for exit
 swi     0                @ then invoke the syscall from linux

bob@poland:~/www/examples$ grep architecture /proc/cpuinfo
CPU architecture: 5TE
bob@poland:~/www/examples$
bob@poland:~/www/examples$ gdb endian
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi".
For bug reporting instructions, please see:
...
Reading symbols from /home/bob/www/examples/endian...done.
(gdb) b _start
Breakpoint 1 at 0x8078: file endian.s, line 10.
(gdb) b _stop
Breakpoint 2 at 0x80a0: file endian.s, line 21.
(gdb) run
Starting program: /home/bob/www/examples/endian 

Breakpoint 1, _start () at endian.s:11
11  ldr r0, [r1]  @ load word to r0
Current language:  auto
The current source language is "auto; currently asm".
(gdb) x/x &vstart
0x100a8 : 0x12345678
(gdb) cont
Continuing.

Breakpoint 2, _stop () at endian.s:22
22  swi     0   @ then invoke the syscall from linux
(gdb) p/x $r0
$2 = 0x78563412
(gdb) quit
A debugging session is active.

 Inferior 1 [process 17053] will be killed.

Quit anyway? (y or n) y
bob@poland:~/www/examples$

So we see the inverted data in r0 when the program stops.

Storing data to an element of a vector

2013-06-21T10:07:00.000-07:00

We write a constant to the second element of the vector and then read and return it.

# movtest4.s – An example of indirect addressing
.equ datum_size,1
.globl _start
.align 2
.section .data
values:
   .byte 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60
.text
.align 2
_start:
 ldr r4, =values
 mov r5, #100
 strb r5, [r4, #datum_size]!            @ store 100 to the second element of values
                                        @ we update the pointer r4 via the trailing !
 ldrb r0, [r4]                          @ load that value to r0 for return to OS
        mov     r7, #1                  @ set r7 to 1 - the syscall for exit
        swi     0                       @ then invoke the syscall from linux

We write a constant to the second element of the vector and then read and return it.

bob@poland:~/www/examples$ make movtest4
/usr/bin/as -gstabs -o movtest4.o movtest4.s
/usr/bin/ld -o movtest4 movtest4.o
bob@poland:~/www/examples$ ./movtest4 
bob@poland:~/www/examples$ echo $?
100
bob@poland:~/www/examples$

Again, it can be instructive to walk through the code with gdb.

(gdb) b _start
Breakpoint 1 at 0x8078: file movtest4.s, line 12.
(gdb) run
Starting program: /home/bob/src/professional_assembly_language/arm/chap05/movtest4 

Breakpoint 1, _start () at movtest4.s:12
12  mov r5, #100
(gdb) i r r5
r5             0x0 0
(gdb) x/11b &values
0x10090 : 10 15 20 25 30 35 40 45
0x10098 : 50 55 60
(gdb) s
13  strb r5, [r4, #datum_size]!       @ write back the incremented data pointer
(gdb) i r r5
r5             0x64 100
(gdb) x/11b &values
0x10090 : 10 15 20 25 30 35 40 45
0x10098 : 50 55 60
(gdb) s
14  ldrb r0, [r4]                    @ load that value to r0 for return to OS
(gdb) x/11b &values
0x10090 : 10 100 20 25 30 35 40 45
0x10098 : 50 55 60
(gdb) i r r0
r0             0x0 0
(gdb) s
15         mov     r7, #1            @ set r7 to 1 - the syscall for exit
(gdb) i r r0
r0             0x64 100
(gdb)

Using indexed memory locations

2013-06-21T09:49:00.000-07:00

.equ datum_size,1
.globl main
.align 2
.section .rodata
output:
 .asciz "The value is %d\n"
values:
 .byte 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60
endvalues:
.align 2
.text
main:
 stmfd sp!, {r5, r6, lr}       @ save the registers we use to the stack
 ldr r5, =endvalues
 ldr r6, =values
loop:
 ldrb r1, [r6], #datum_size    @ r6 will be implicitly incremented so r1 will contain each element in turn
 ldr r0, =output               @ the address of the string is put in r0
 bl  printf                    @ we print the value of r1 in the string pointed to by r0
 cmp r6, r5                    @ have we reached the endvalues label?
 bne loop                      @ if not, read and print the next element

 ldmfd sp!, {r4, r5, r6, pc}   @ restore registers before exit
 mov     r7, #1                @ set r7 to 1 - the syscall for exit
 swi     0                     @ then invoke the syscall from linux

Here the .equ directive is introduced which aliases a value to a name.

We do so because we can easily change from using bytes for storage of the value to using half-words (of 2 bytes) or words (of 4 bytes) in the data section and modification of the ldrb instruction to ldrh or ldr respectively.

bob@poland:~/www/examples$ make movtest3
/usr/bin/gcc -gstabs -o movtest3 movtest3.s
bob@poland:~/www/examples$ ./movtest3
The value is 10
The value is 15
The value is 20
The value is 25
The value is 30
The value is 35
The value is 40
The value is 45
The value is 50
The value is 55
The value is 60
bob@poland:~/www/examples$

Again we use gcc rather than as to assemble since we use the C function, printf.

Moving registers to the data section

2013-06-21T09:35:00.000-07:00

.global _start
.section .data
        value: .byte 42
.text
_start:
 ldr r1, =value
 mov r0, #9               @ r0 is now 9
 strb r0, [r1], #0        @ store the register r0 to the address in r1
 add r0, r0, r0           @ r0 is now 18
 ldrb    r0, [r1], #0     @ load the byte at address r1 to r0
 mov     r7, #1           @ set r7 to 1 - the syscall for exit
 swi     0                @ then invoke the syscall from linux

We initialise r0, and store it to memory, and then double the value of r0, and then read back from memory to r0.

Our return code (the value of r0) is the same as that stored rather than the doubled version.

By implication, our store and load were executed correctly.

If implication is suspicious, then it is possible to walk through the code using gdb to verify.

bob@poland:~/www/examples$ make movtest2
/usr/bin/as -gstabs -o movtest2.o movtest2.s
/usr/bin/ld -o movtest2 movtest2.o
bob@poland:~/www/examples$ ./movtest2 
bob@poland:~/www/examples$ echo $?
9
bob@poland:~/www/examples$

Moving data from memory to a register

2013-06-20T16:19:00.000-07:00

We use rodata (for read-only data).

# movtest1.s
@ An example of moving data from memory to a register
.global _start
.section .rodata
value: .byte 42
.align 2
.text
_start:
 ldr r1, =value
 ldrb r0, [r1], #1 @ load the byte at address r1 to r0 
                   @ r0 being the return value when we exit
 mov     r7, #1    @ set r7 to 1 - the syscall for exit
 swi     0         @ then invoke the syscall from linux

We use a makefile to build the software.

bob@poland:~/www/examples$ make movtest1 /usr/bin/as -gstabs -o movtest1.o movtest1.s /usr/bin/ld -o movtest1 movtest1.o bob@poland:~/www/examples$ ./movtest1 bob@poland:~/www/examples$ echo $? 42 bob@poland:~/www/examples$