ARM Instruction Set: Overview Introduction to this Guide The purpose of this guide is to provide you with the necessary information to allow you to: - Encode or decode ARM instructions, in order to write an assembler, disassembler or compiler - Improve your awareness of the features of the ARM instruction set - Know and understand the limitations of the current encoding Because ARM assembler is often explained without any references to the encoding of ARM instructions, I felt it necessary to create this guide. In this overview, I also explain the features of the ARM processor in general terms, and their implications to the processor's performance and efficiency. Kade Hansson January 1994 Note About the Acronym ARM In this guide, ARM stands for Advanced/Acorn RISC Machine/Microprocessor. For example, ARM2 stands for Acorn RISC Microprocessor (revision 2,) because this processor was designed for Acorn. However, ARM610 stands for Advanced RISC Microprocessor (revision 6.10,) as it was designed for Advanced RISC Machines (ARM) Limited and is used in the Apple Newton computer. Acorn also use the ARM acronym to stand for Acorn RISC Machine. Format of this Guide This guide is divided into separate text files, each of which contains information on a particular aspect of the encoding of the ARM instruction set. Filename Title Base.txt Base Instructions Condition.txt Conditions Version1.txt This overview ALU/Instrs.txt Arithmetic, Logic and Comparative Base Instructions ALU/Constants.txt Immediate Constants ALU/ShftInstr.txt Barrel Shifter Subinstructions ALU/AL1.txt Unary Arithmetic and Logic Instructions ALU/AL2.txt Binary Arithmetic and Logic Instructions ALU/CT.txt Comparative Instructions ALU/MULMLA.txt Multiplication Instructions Branch/BBL.txt Branch Instructions Branch/SWI.txt SWI Instruction FP/Constants.txt Floating Point Immediate Constants FP/Precision.txt Floating Point Precision Codes FP/Rounding.txt Floating Point Rounding Codes FP/FPO1.txt Floating Point Unary Operations FP/FPO2.txt Floating Point Binary Operations FP/FPRT0.txt Floating Point Flag Register Transfer Instructions FP/FPRT1.txt Floating Point Register Transfer Instructions FP/FPST.txt Floating Point Status Transfer Instructions FP/LDFSTF.txt Floating Point Load and Store Instructions Data/LDMSTM.txt Multiple Register Load and Store Instructions Data/LDRSTR.txt Single Register Load and Store Instructions Where possible, a reference to a section of text which expands on information given will appear in parentheses after that information. e.g. 0 MUL (MUL/MLA) 1 MLA (MUL/MLA) This file gives an overview of the ARM processor, its characteristics, its features and, most of all, its instruction set. Introduction to the ARM The ARM family of chips are arguably the most efficient microprocessors available at the present time (Jan 1994.) They are the first reduced instruction set microprocessors to be used in a microcomputer. Their most notable application to date is their use in innovative RISC microcomputer technology, such as the Acorn Archimedes series, Acorn UNIX workstations and the Apple Newton. This guide will deal with the ARM chip set currently used by Acorn in RISC OS computers, although much of the information given will have widespread applicability across all ARM incarnations. The latest incarnations of the ARM belong to the ARM7 series. But because Acorn have not yet incorporated this new technology into the present series of RISC OS machines, we will limit this description of the ARM instruction set to the ARM2, ARM250 and ARM3 incarnations. For the sake of compatibility, we will only discuss instructions common to all these ARM processors. N.B. The ARM1 is now considered to be obsolete. The only major difference between ARM1 and ARM2 is the lack of MUL and MLA instructions on the ARM1. Aliases for the ARM ARM chips all have special names which are used by Very Large Scale Integration Technology Incorporated (VLSI) to refer to them. VLSI currently manufacture all ARM processors. Common name Special name Usage ARM1 VL86C00? Archimedes prototypes ARM2 VL86C010 Acorn A3xx, A4xx, A3000, R140 ARM250 VL86C0?? (?) Acorn A30x0, A4000 ARM3 VL86C020 Acorn A540, A5000, A4, R2xx ARM610 VL86C051 (?) Apple Newton ARM7 VL86C06? (?) Next generation Acorn machines? MEMC1 VL86C110 Acorn A3xx MEMC1a VL86C110 A4xx, A3000, A30x0, A4000, A5000, A4, R140, R2xx VIDC10/VIDC1a VL86C310 All Acorn machines up until the A5000 alpha VIDC20 VL86C3?? (?) Next generation Acorn machines? IOC1 VL86C410 All Acorn machines up until the A5000 alpha Word and instruction size The ARM2/3 is a 32-bit RISC processor with a 32-bit data bus and a 26-bit address bus. The memory architecture is based around a 32-bit word, although byte access is permitted through the use of the LDRB and STRB instructions. ARM instructions are encoded into 32 bits, which contain the base instruction opcode, the condition code, any barrel shifter subinstruction opcodes, all register specifications, any address offsets, all immediate constants and any other data, such as SWI numbers. Features The ARM processor incorporates many innovative features which add greatly to its performance and functionality. Features such as mutiple register load and store instructions, simultaneous execution, decoding and fetching of successive instructions, a compact and efficient instruction set, and a powerful barrel shifter ensure its efficiency. A versatile memory controller provides features which allow paged memory mapping to be employed, and a fully programmable video controller provides access to high quality audio and video facilities. Also, an impressive range of functions are provided by the input-output controller. Privileged processor modes with private register banks also improve efficiency and utility. Additionally, a floating point coprocessor is supported. The compact and efficient RISC instruction set offers many advantages over the more traditional CISC instruction set. Due to the reduced number of instructions and the simpler encoding techniques used to generate them, they typically execute many times faster than their CISC equivalents. A CISC processor spends most of its time executing a small, simple subset of its instructions. Due to this fact, the speed advantages of the rarely used complex instructions are lost due to the speed disadvantages imposed by overly complicated processor design on the essential instructions. RISC processors also use much less power than their CISC competitiors, and are cheaper to produce. The ARM has 27 32-bit registers, 16 of which are available at any one time. The program counter (and program status register, R15) and the link register (R14) are the only two registers which are bound by the processor hardware to specific purposes, and all other registers may be freely used. The high number of available registers means that the ARM is very well suited to complex tasks, and is often able to perform them without accessing memory as often as other processors. This a particularly important consideration when the memory is clocked slower than the CPU, as it is in most ARM-based machines. Pipelining is the process which allows the ARM to simultaneously decode the next instruction whilst the current one is being executed and a third instruction is being fetched from memory. This process improves raw processor performance by a factor of three when in a body of code. It also improves the utilization of processor resources (reducing power consumption,) as without pipelining the circuitry which decodes an instruction would be disused while execution is taking place. Multiple load and store instructions allow very fast memory block transfers and efficient register stacking during subroutine calls. In combination with a callee register saving protocol, these instructions offer significant speed improvements over more traditional programming techniques. A barrel shifter is located on one of the inputs to the ALU of the ARM, and provides the option of shift and rotate operations for all arithmetic and logic instructions. This allows various optimizations to code which make frequent use of shift operations, which would not be possible using other processors. In addition, the ARM3 chip has a 4 kilobyte fast memory cache, which reduces access times to frequently used memory locations. The caching techniques also dramatically improve loop timings, as the instructions in the loop are often copied into the cache, and so can be fetched much more quickly. Addressing System The addressing scheme is managed by the memory controller (MEMC,) a seperate chip which is essentially an integral part of the ARM CPU. MEMC also communicates with the input-output controller (IOC) and the video controller (VIDC,) which make up the remainder of the ARM chip set. In this discussion we are not concerned with the functions of either IOC or VIDC. MEMC1a (as used with most ARM2/3 machines) can address up to 64 megabytes of memory (16 million words). However limitations currently imposed by RISC OS and the architecture of memory devices reduce the amount of physical memory that can actually be installed. The following table of maximum physical memory sizes takes into account current limitations of standard Acorn machine hardware: Max. Max. RAM ROM A3xx 8Mb 2Mb A4 4Mb 2Mb A4xx 8Mb 2Mb A540 16Mb 2Mb A3000 4Mb 2Mb A30x0 4Mb 2Mb A4000 4Mb 2Mb A5000 8Mb 2Mb R140 4Mb 2Mb R2xx 16Mb 2Mb These hardware limitations may be eliminated by future hardware upgrades (such as slave MEMC chips.) The software limitations imposed by RISC OS also limit the amount of physical memory which is supported. At the present moment, an absolute 16Mb limit on physical RAM exists, due to RISC OS making intellegent use of the MEMC's logical to physical address translation mechanisms. Physical memory is mapped into the present RISC OS memory map from address &2000000 onwards. The mappings from &3000000 onward include ROM, the input- output controllers, the video controller, the DMA address generators and the logical to physical address translator. These locations, with the exception of those mapped onto ROM, are only accessed by low level routines within RISC OS. Below &2000000 on the memory map lies the logical memory. The physical memory is rarely addressed directly, and it is by addressing the logical memory that RISC OS operates. To set up the logical memory, RISC OS divides physical memory into pages. It can then map these pages randomly within the logical memory slot. The size of each of these pages is typically 8, 16 or 32K. Which of these page sizes is chosen currently depends on the total amount of RAM available, although this may change. To read the page size in use, use the SWI OS_ReadMemMapEntries (SWI &51.) Dividing memory into pages which can be shuffled into any order is a powerful feature of MEMC as used under RISC OS. Shuffling of pages in the logical slot is achieved by manipulating 128 logical to physical memory mapping descriptors. These descriptors are held in content-addressable memory inside MEMC, and can be accessed quickly, maintaining short memory access times. As an additional feature of the memory mapping scheme, three levels of memory protection are supported: Protection level Privileges Supervisor Privileged access to all memory Operating System Privileged access to logical memory (not used by RISC OS) User Access to unprotected pages of logical memory and read cycles to addresses mapped onto the ROM Exceptions are generated if an illegal memory access is attempted. Processor Modes There are four processor modes available when using ARM2/3: Mode Mnemonic Normally entered Private registers User USR by default none (access user bank) Supervisor SVC upon software interrupt R13_svc and R14_svc Interrupt IRQ upon interrupt request R13_irq and R14_irq Fast Interrupt FIQ upon fast interrupt request R8_fiq to R14_fiq The SVC, IRQ and FIQ modes are privileged, and provide more control over the system. These three modes also have their own private registers, which reduce time overheads (due to stacking registers) when interrupts are dealt with. Performance The performance of each Acorn RISC computer is given in the following table. Some of these figures have been estimated, and should not be relied on. ARM ARM RAM ROM ROM Speed* Speed* Speed* CPU Clock Timing (std.) (opm.) (avg.) (pk.) (sus.) A3xx series ARM2 8MHz 125ns 450ns 375ns 4.0 7.1 3.4 A3000 ARM2 8MHz 125ns 450ns 375ns 5.6 7.1 4.0 A30x0 series ARM250 12MHz 83ns 333ns 333ns 7.6 12.5 6.0 A4 ARM3 25MHz 83ns 333ns 333ns 16.0 25.0 12.0 A4xx series ARM2 8MHz 125ns 450ns 375ns 5.6 7.1 4.0 A540 ARM3 25MHz 83ns 333ns 333ns 16.0 25.0 12.0 A4000 ARM250 12MHz 83ns 333ns 333ns 7.6 12.5 6.0 A5000 ARM3 25MHz 83ns 333ns 333ns 16.0 25.0 12.0 A5000 alpha ARM3 33MHz 83ns 333ns 333ns 20.0 32.0 15.4 R140 ARM2 8MHz 125ns 450ns 375ns 5.6 7.1 4.0 R2xx ARM3 25MHz 83ns 333ns 333ns 14.0 25.0 12.0 * Speeds in millions of instructions per second (MIPS) N.B. 1. Optimal ROM timings quoted for RISC OS 3.10 2Mb ROMs. 2. Optimal timing for RISC OS 2.00 0.5Mb ROMs is 200ns. 3. Future ROM chips may also have 166/200/250ns as optimal ROM timing. 4. Future ARM chips may allow further memory access time optimization, at the user's risk. The speed of instructions vary across the ARM instruction set, as indicated by the following table. However, compared with CISC processors, the ARM is extremely efficient. Instruction type Execution time ALU (except multiply) 1 processor cycle ALU (PC destination) See branch Single load/store 1 processor cycle plus 1 memory cycle Branch 1 memory cycle and up to 3 processor cycles Multiple load/store 1 processor cycle, plus 1 memory cycle per register Multiply Up to 17 processor cycles The ARM Instruction Set The following table lists the assembler mnemonics of all instructions provided by the ARM2/3 processor. Arithmetic and logic instructions ADC ADd with Carry Rd=Rn+Rm+C ADD ADD (without carry) Rd=Rn+Rm SBC SuBtract with Carry Rd=Rn-Rm-(1-C) SUB SUBtract (without carry) Rd=Rn-Rm RSC Reverse Subtract with Carry Rd=Rm-Rn-(1-C) RSB Reverse SuBtract (without carry) Rd=Rm-Rn AND Bitwise AND Rd=Rn AND Rm BIC Bitwise NAND Rd=Rn AND NOT Rm ORR Bitwise OR Rd=Rn OR Rm EOR Bitwise EOR (XOR) Rd=Rn EOR Rm MOV MOVe Rd=Rm MVN MOVe bitwise NOT Rd=NOT Rm Comparison instructions CMP CoMPare Rn+Rm CMN CoMpare Negative Rn-Rm TEQ Test EQuivalance Rn EOR Rm TST TeST bits Rn AND Rm Multiply instructions MUL Multiply Rd=Rm*Rs MLA Multiply and Accumulate Rd=Rm*Rs+Rn Branch instructions B Branch BL Branch with Link Register load and save LDR LoaD Register STR STore Register LDM LoaD Multiple registers STM STore Multiple registers Software interrupt SWI Perform SoftWare Interrupt The following condition suffixes may be appended to any ARM instruction: Flags AL ALways Always performed (the default) TRUE NV NeVer Never use (reserved) undef. CS Carry Set Performed if Carry flag set C CC Carry Clear Opposite to CS (Carry not set) ~C EQ EQual Performed if Zero flag set n1= n2 Z NE Not Equal Opposite to EQ (Zero flag unset) n1<>n2 ~Z VS oVerflow Set Performed if oVerflow flag set V VC oVerflow Clear Opposite to VS (oVerflow unset) ~V MI MInus Performed if Negative flag set N PL PLus Opposite to MI (Negative unset) ~N Cardinal HS Higher or Same Same as CS (Carry set) n1>=n2 C LO LOwer Same as CC (Carry clear) n1< n2 ~C LS Lower or Same Performed when less or equal n1<=n2 ~CvZ HI Higher Performed when greater than n1> n2 C^~Z 2's-complement GE Greater or Equal Performed when greater or equal n1>=n2 (N^V)v(~N^~V) LT Less Than Performed when less n1< n2 (N^~V)v(~N^V) LE Less or Equal Performed when less or equal n1<=n2 (N^~V)v(~N^V)vZ GT Greater Than Performed when greater n1> n2 ((N^V)v(~N^~V))^~Z The shift and rotate mnemonics, available as an option on all AL instructions, are: LSL Logical Shift Left ASL Arithmetic Shift Left (same as LSL) LSR Logical Shift Right ASR Arithmetic Shift Right (sign bit 31 is rewritten after the shift) ROR ROtate Right RRX Rotate Right one bit with eXtend (uses carry flag as bit 32) The following table gives the mnemonics for the various addressing modes available as suffixes to LDM and STM instructions: DA Decrement After each store/load (post-indexed decremental form) DB Decrement Before each store/load (pre-indexed decremental form) IA Increment After each store/load (post-indexed incremental form) IB Increment Before each store/load (pre-indexed incremental form) EA Empty Ascending stack (i.e. use LDMDB and STMIA) ED Empty Descending stack (i.e. use LDMIB and STMDA) FA Full Ascending stack (i.e. use LDMDA and STMIB) FD Full Descending stack (i.e. use LDMIA and STMDB), as used by RISC OS The following suffixes may be applied to certain instructions in order to modify their normal operation: Suffix Applied to Meaning ! LDR/STR (address) Write back is used LDM/STM (address register) Write back is used ^ LDM with R15 (register list) Force update of PSR Other LDM/STM (register list) Use user bank B LDR/STR (mnemonic) Load/store byte value P Comparative instructions (mnemonic) Copy calculation result to PSR S AL instructions with R15 dest. Force update of PSR AL instructions (mnemonic) Force update of flags MUL/MLA instructions (mnemonic) Force update of flags Comparative instructions (mnemonic) Force update of flags (implied) T LDR/STR (mnemonic) Force address translation The Floating Point Coprocessor In addition to the standard ARM instruction set are a group of floating point coprocessor instructions. These instructions are designed to be executed by a coprocessor chip attached to the ARM CPU, such as that provided in the Acorn FPA10 upgrade. Software emulation of the coprocessor is provided by the Acorn FPEmulator module. The following table gives information on the availability and form of the FPA10 upgrade for each machine in the Acorn range. FPA10 upgrade A3xx card A3000 card A3010 N/A A4 card A4xx card A540 chip/card A4000 N/A A5000 chip R140 card R2xx chip/card In the floating point coprocessor there are eight floating point registers designated by F0-F7. There is also a floating point status register whose bits are as follows: Bits Usage 31-21 Unused 20 INX interrupt mask 19 UFL interrupt mask 18 OFL interrupt mask 17 DVZ interrupt mask 16 IVO interrupt mask 15-05 Unused 04 INX cumulative flag 03 UFL cumulative flag 02 OFL cumulative flag 01 DVZ cumulative flag 00 IVO cumulative flag The bottom five flag bits indicate the following exceptions: IVO InValid Operation DVZ DiVision by Zero OFL OverFLow UFL UnderFLow INX INeXact value obtained due to rounding The interrupt mask bits, when set, cause exceptions to generate fatal errors. The Floating Point Instruction Set The following table lists the assembler mnemonics of the instructions provided by the floating point coprocessor (or suitable emulation.) Data transfer instructions LDF LoaD Floating point register STF STore Floating point register Register transfer instructions FLT FLoat ARM register into an FP register FIX FIX FP register into an ARM register WFS Write Floating Status flags from an ARM register RFS Read Floating Status flags to an ARM register WFC Write Floating interrupt mask RFC Read Floating interrupt mask Comparison operations CMF CoMpare Floating point numbers CNF Compare (second argument Negated) Floating point numbers CMFE CoMpare Floating point numbers and generate Error if unordered CNFE Compare Negated Floating numbers and generate Error if unordered Binary operations ADF ADd Floating point registers Fd=Fn+Fm MUF MUltiply Floating point registers Fd=Fn*Fm SUF SUbtract Floating point registers Fd=Fn-Fm RSF Reverse Subtract Floating registers Fd=Fm-Fn DVF DiVide Floating point registers Fd=Fn/Fm RDF Reverse Divide Floating registers Fd=Fm/Fn POW POWer Fd=Fn^Fm RPW Reverse POWer Fd=Fm^Fn RMF ReMainder of Floating division Fd=remainder of Fn/Fm FML Fast MuLtiply (Single precision) Fd=Fn*Fm FDV Fast DiVide (Single precision) Fd=Fn/Fm FRD Fast Reverse Divide (Single) Fd=Fm/Fn POL POLar angle conversion Fd=ATN(Fn/Fm) Unary operations MVF MoVe Floating point register Fd=Fm MNF Move Negated Floating register Fd=-Fm ABS ABSolute value Fd=ABS(Fm) RND RouND to integer Fd=INT(Fm) SQT SQuare Root Fd=SQR(Fm) LOG LOGarithm to the base 10 Fd=LOG(Fm) LGN LoGarithm to the base e (Natural) Fd=LN(Fm) EXP EXPonent of the base e (Natural) Fd=e^Fm SIN SINe Fd=SIN(Fm) COS COSine Fd=COS(Fm) TAN TANgent Fd=TAN(Fm) ASN Arc SiNe (Inverse sine) Fd=ASN(Fm) ACS Arc CoSine (Inverse cosine) Fd=ACS(Fm) ATN Arc TaNgent (Inverse tangent) Fd=ATN(Fm) Each of the instructions listed above can be suffixed with any of the ARMs 16 condition codes. In addition, a precision suffix needs to be specified. The precision suffixes are: Mantissa Exponent S Single precision 23 bits 8 bits D Double precision 52 bits 11 bits E Double Extended precision 64 bits 15 bits P Packed BCD storage format 19 digits 4 digits A rounding suffix may also be added. These are: P Round up M Round down Z Round to zero If no rounding suffix is specified then the number will be rounded to the nearest available in the current precision. Breakdown of an Encoded ARM Instruction All encoded ARM instructions occupy one 32-bit word. The common elements which constitute all instructions are given below. Bits Size Usage 31-28 Nybble Condition code (Condition) 27-24 Nybble Base instruction code (Base) 00-23 24 Depends on base instruction code Learning about ARM Instruction Encoding The file Base.txt is a good starting point for those interested in exploring the encoding of ARM instructions. This file refers to other files with give more specific information on particular instruction groups. The encoding of condition codes is explained in the file Condition.txt. Encoding an ARM Instruction 1. The first step to encoding an ARM instruction is to specify the condition code. The condition code is stored in the high nybble, and a list of condition codes is given in the file Condition.txt. 2. Next, it is necessary to obtain the base instruction code, stored in the seventh nybble. These are listed in the file Base.txt. 3. After selecting the instruction code, examine the file associated with the instruction (given in parentheses in the table of base instruction codes.) This file will explain the various elements of the bottom 3 nybbles of the instruction. Decoding an ARM Instruction 1. The first step to decoding an ARM instruction is to discover the condition code. The condition code is stored in the high nybble, and a list of condition codes is given in the file Condition.txt. 2. Next, it is necessary to find the base instruction code, stored in the seventh nybble. These are listed in the file BaseInstr.txt. 3. After selecting the instruction code, examine the file associated with the instruction (given in parentheses in the table of base instruction codes.) This file will explain the various elements of the bottom 3 nybbles of the instruction. - Text ©1994 Kade Hansson This text may be freely distributed, provided that it is unmodified, it is not offered for sale at any price, published in a periodical or book or offered on a diffusion service without the author's written consent. Kade Hansson retains copyright in all parts of this guide.