ARM
architecture
ARM's
headquarters in Cambridge UK
The ARM architecture
(originally the Acorn RISC Machine) is a 32-bit RISC processor architecture
that is widely used in a number of embedded designs. Due to their
power saving features, ARM CPUs are dominant in the mobile electronics
market, where low-power is a critical design goal.
Today, the
ARM family accounts for over 75% of all 32-bit embedded CPUs, making
it one of the most prolific 32-bit architectures in the world. ARM
CPUs are found in all corners of consumer electronics, from portable
devices (PDAs, mobile phones, media players, handheld gaming units,
and calculators) to computer peripherals (hard drives, desktop routers.)
The most noticeable branch in this family nowadays is Intel's XScale.
History

A Conexant ARM processor used mainly in routers
The ARM design was started
in 1983 as a development project at Acorn Computers Ltd.
The team, led by Roger
Wilson and Steve Furber, started development of what in some ways
resembles an advanced MOS Technology 6502. Acorn had a long line
of computers based on the 6502, so a chip that was similar to program
could represent a significant advantage for the company.
The team completed development
samples called ARM1 by 1985, and the first "real" production
systems as ARM2 the following year. The ARM2 featured a 32-bit data
bus, a 26-bit address space giving a 64 Mbyte address range and
16 32-bit registers. One of these registers served as the (word
aligned) program counter with its top 6 bits and lowest 2 bits holding
the processor status flags. The ARM2 was possibly the simplest useful
32-bit microprocessor in the world, with only 30,000 transistors
(compare with Motorola's four-year older 68000 with around 68,000).
Much of this simplicity comes from not having microcode (which represents
about 1/4 to 1/3rd of the 68000) and, like most CPUs of the day,
not including any cache. This simplicity led to its low power usage,
while performing better than the 286. A successor, ARM3, was produced
with a 4KB cache which further improved performance.
In the late 1980s Apple
Computer started working with Acorn on newer versions of the ARM
core. The work was so important that Acorn spun off the design team
in 1990 into a new company called Advanced RISC Machines. For this
reason you often see ARM lengthened to Advanced RISC Machine instead
of Acorn RISC Machine. Advanced RISC Machines became ARM Limited
when the company floated on the London Stock Exchange and NASDAQ
in 1998.
This work would eventually
turn into the ARM6. The first models were released in 1991, and
Apple used the ARM6-based ARM 610 as the basis for their Apple Newton
PDA. In 1994, Acorn used the ARM 610 as the main CPU in their RiscPC
computers.
The core has remained
largely the same size throughout these changes. ARM2 had 30,000
transistors, while the ARM6 grew to only 35,000. The idea is that
the end-user combines the ARM core with a number of optional parts
to produce a complete CPU, one that can be built on old semiconductor
fabs and still deliver lots of performance at a low cost.
The most successful implementation
has been the ARM7TDMI with hundreds of millions sold in mobile phones,
handheld video game systems, and Sega Dreamcasts. While ARM's business
has always been to sell IP cores, some of the licensees generated
microcontrollers based on this core.
The Dreamcast features
a SH4 processor which only borrows concepts from ARM (low power
consumption, optional compact instruction set etc.), but is otherwise
different from an ARM. The Dreamcast also features a sound chip
designed by Yamaha with an ARM7 core. Nintendo's Gameboy Advance,
however, uses the ARM7TDMI at 16.78MHz.
DEC licensed the architecture
(which caused some confusion because they also produced the DEC
Alpha) and produced the StrongARM. At 233 MHz this CPU drew only
1 watt of power (more recent versions draw far less). This work
was later passed to Intel as a part of a lawsuit settlement, and
Intel took the opportunity to supplement their aging i960 line with
the StrongARM. Intel have since developed its own high performance
implementation known by the name XScale.
ARM
Cores
Thumb
Perhaps in part because
of the conditional execution facility using up four bits of every
instruction, newer ARM processors have a 16-bit instruction mode,
called Thumb. The smaller opcodes have less functionality; for example,
only branches can be conditional, and many opcodes cannot access
all of the CPU's registers. However, the shorter opcodes give improved
code density overall, even though some operations will require more
opcodes to be executed. Particularly in situations where the memory
port or bus width is constrained to less than 32 bits, the shorter
Thumb opcodes allows greater performance than with 32-bit code because
of the more efficient use of the limited memory bandwidth. Typically
in embedded applications a small range of addresses have a 32-bit
datapath and the rest are 16 bits wide or narrower (e.g. the Game
Boy Advance); in this situation, it usually makes sense to compile
Thumb code and hand-optimise a few of the most CPU-intensive sections
using the 32-bit instruction set, placing them in the limited 32-bit
bus width memory.
The first processor with
Thumb technology was the ARM7TDMI. All ARM9 and later families,
including XScale have included Thumb technology.
Jazelle
ARM has implemented a
technology that allows certain of their architectures to execute
Java bytecode natively in hardware, in another execution mode alongside
the existing ARM and Thumb modes and accessed in a similar fashion
to ARM/Thumb interworking.
The first processor with
Jazelle technology was the ARM926EJ-S: Jazelle being denoted by
the 'J' in the CPU name. It has been used by mobile phone manufacturers
to speed up execution of Java ME games and applications, which is
probably what drove development of the technology.
Thumb-2
Thumb-2 technology made
its debut in the ARM1156 core, announced in 2003. Thumb-2 extends
the limited 16-bit instruction set of Thumb with additional 32-bit
instructions to give the instruction set more breadth. As a result
the stated aim for Thumb-2 is to achieve code density that is similar
to Thumb with performance similar to the ARM instruction set on
32-bit memory.
Thumb-2 also extends
both the ARM and Thumb instruction set with yet more instructions,
including bit-field manipulation, table branches, and conditional
execution.
Thumb-2EE
Thumb-2EE, marketed as
Jazelle RCT, was announced in 2005, first appearing in the Cortex-A8
processor. Thumb-2EE provides a small extension to Thumb-2, making
the instruction set particularly suited to code generated at runtime
(e.g. by JIT compilation) in managed Execution Environments. Thumb-2EE
is a target for languages such as Limbo, Java, C#, Perl and Python,
and allows JIT compilers to output smaller compiled code without
impacting performance.
New features provided
by Thumb-2EE include automatic null pointer checks on every load
and store instruction, an instruction to perform an array bounds
check, and the ability to branch to handlers, which are small sections
of frequently called code, commonly used to implement a feature
of a high level language, such as allocating memory for a new object.
NEON
NEON technology is a
combined 64 and 128bit SIMD (Single Instruction Multiple Data) instruction
set that provides standardized acceleration for media and signal
processing applications. NEON can execute MP3 audio decoder on CPU
running at 10 MHz and can run the GSM AMR (Adaptive Multi-Rate)
speech codec using CPU running at no more than 13 MHz. It features
a comprehensive instruction set, separate register files and independent
execution hardware. NEON supports 8-, 16-, 32- and 64-bit integer
and single precision floating-point data and operates in SIMD operations
for handling audio/video processing as well as graphics and gaming
processing. SIMD is a crucial element in vector supercomputers which
feature simultaneous multiple operations. In NEON, the SIMD supports
up to 16 operations at the same time.
VFP
VFP technology is a coprocessor
extension to the ARM architecture. It provides low-cost single-precision
and double-precision floating-point computation that is fully compliant
with the ANSI/IEEE Std 754-1985 Standard for Binary Floating-Point
Arithmetic. VFP provides floating-point computation suitable for
a wide spectrum of applications such as PDA, smartphones, voice
compression and decompression, three-dimensional graphics and digital
audio, printers, set-top boxes, and automotive applications. The
VFP architecture also supports execution of short vector instructions
allowing SIMD (Single Instruction Multiple Data) parallelism. This
is useful in graphics and signal-processing applications by reducing
code size and increasing throughput.
ARM licensees
ARM Ltd does not manufacture
and sell CPU devices based on their own designs, but rather, licenses
the processor architecture to interested parties. ARM offers a variety
of licensing terms, broken down by cost and deliverables. To all
licensees, ARM provides an integratable hardware-description of
the ARM core, as well as complete set of software development toolset
(compiler, debugger, SDK), and the right to sell manufactured-silicon
containing the ARM CPU. Fabless licensees, who wish to integrate
an ARM core into their own chip design, are usually only interested
in acquiring a ready-to-manufacture, pre-verified IP-core. For these
customers, ARM delivers a gate-netlist description of the chosen
ARM core, along with an abstracted simulation-model and test programs
to aid design integration and verification. More ambitiuous customers,
including integrated device manufacturers (IDM) and foundry operators,
chose to acquire the processor IP in synthesizable RTL (Verilog)
form. With the synthesizable RTL, the customer has the ability to
perform architectural level optimizations and extensions. These
allow the designer to achieve exotic design goals not otherwise
possible with an unmodified netlist (high clock speed, very low
power-consumption, instruction-set extensions, etc.) While ARM does
not grant the licensee the right to re-sell the ARM-architecture
itself, licensees may freely sell manufactured product (chip devices,
evaluation boards, complete systems, etc.) Merchant foundries can
be a special case; not only are they allowed to sell finished silicon
containing ARM-cores, they generally hold the right to re-manufacture
ARM-cores for other customers.
Like most IP vendors,
ARM prices its IP based on perceived value. In architectural terms,
the lower-performance ARM cores command a lower license cost than
the higher-performance cores. In terms of silicon implementation,
a synthesizable core is more expensive than a hard-macro (black-box)
core. Complicating price matters, merchant foundries who hold an
ARM license (such as Samsung and Fujitsu) can offer reduced licensing
costs to its fab customers. In exchange for acquiring the ARM core
through the foundry's in-house design-services, the customer can
reduce or eliminate payment of ARM's upfront license fee. Compared
to dedicated semicounductor foundries (such as TSMC and UMC) without
in-house design-services, Fujitsu/Samsung charge 2-3x more per manufactured
wafer. For low-mid volume applications, a design-service foundry
offers lower overall pricing (through subsidization of the license-fee.)
For high volume mass-produced parts, the long-term cost-reduction
achievable through lower wafer-pricing reduces the impact of ARM's
NRE cost, making the dedicated foundry a better choice.
Many hightech semiconductor
firms hold ARM licenses: Broadcom, Cirrus Logic, Freescale (spun
off from Motorola in 2004), Fujitsu, Intel (through its settlement
with DEC), IBM, Infineon Technologies, Texas Instruments, Nintendo,
Philips, VLSI, Atmel, Sharp, Samsung, and STMicroelectronics are
some of the many companies who have licensed the ARM in one form
or another. Although ARM's license terms are covered by NDA, within
the IP industry, ARM is widely known to be among the most expensive
CPU cores. A single customer product containing a basic ARM-core
can incur a one-time license fee in excess of (USD) $200,000. Where
significant quantity and architectural modification are involved,
the license fee can exceed $10M.
http://www.arm.com/
Also See:
|