Asynchronous-logic design has a complex recent history. For a number of years, it has been obvious that many blocks would benefit from a self-timed design that eliminated the clocks that form the skeleton of modern synthesis-based logic methodology. But unfamiliarity, lack of libraries and tools, and, frankly, the extreme inertia of many logic designers have kept this promising realm nearly unexplored.

Two significant exceptions—companies that have doggedly pursued asynchronous design up through the launch of commercial products—made separate presentations at the Hot Chips conference Tuesday at Stanford University. One company updated designers on its continuing progress with asynchronous Ethernet switch fabrics, while the other detailed startling progress with a project that has been running for many years: a fully asynchronous ARM CPU core.

Fulcrum Microsystems reported on the switch fabric. Interestingly, Fulcrum, which evolved from research work on asynchronous methodology at the California Institute of Technology, long ago stopped mentioning asynchronous design unless someone asks. The company has been in a position to let the specifications of its parts speak for themselves, without detailing how those specs arise.

Nonetheless, Fulcrum has provided incontrovertible evidence that an asynchronous design can go head-to-head against conventionally designed large-scale devices in similar technologies and win. In addition, Fulcrum has shown that an asynchronous methodology can deliver the stability of reusability required to generate multiple generations of successful products.

Many skeptics have been willing to conveniently forget about Fulcrum's methodology, or discount the company as an exception. But a presentation by the other quiet asynchronous powerhouse, Handshake Solutions, may serve as a sharp slap upside the head for these doubters.

Handshake described a fully asynchronous ARM 9 CPU core, including interfaces to AMBA (Advanced Microcontroller Bus Architecture) bus structures and conventional memory chips, that can stand toe-to-toe with the best conventional ARM-9 implementations on performance and die area, but win hands-down on energy efficiency.

The 996HS core, which Handshake announced in February, targets performance equivalent to that of an ARM 968E-S running at about 50 MHz, though architectural differences make a direct comparison tricky. The core executes the v5TE instruction set to ARM's satisfaction.

The core's foundation is a library of asynchronous elements that rely on a single-rail asynchronous protocol with latches between stages and more or less centralized latch-control circuitry. In large, organized structures such as pipelines, this approach should result in lower logic overhead compared with multirail asynchronous signaling approaches.

Handshake has performed the core design in such a way that the asynchronous nature of the logic is essentially hidden from a design team that licenses the core and integrates it into a conventional SOC (system on chip). The core drops into a conventional flow using industry-standard tools and requires no special libraries or provisions from the integrator. Even the dual AMBA-lite bus ports and the Tightly-Coupled Memory port will be familiar to ARM9 aficionados.

Differences between the asynchronous ARM 996HS and the conventional 968E-S are small but interesting. For one, the low gate count required for the asynchronous design permitted Handshake the luxury of an asynchronous hardware divide unit, replacing the software divide capability on the 968E-S. The company also added a nonmaskable interrupt, providing a controlled-latency access from a critical external event to code executing on the core.

The really interesting differences become apparent in performance comparisons. The 996HS, because it is asynchronous, always runs at the full available speed of the logic, whatever temperature, voltage, and process variables determine that to be. So the core can show a significant variation in performance between worst-case and normal operating conditions while still operating correctly. Handshake rates the chip at 54 DMIPS worst case and 83 DMIPS in normal operation, compared with 107 DMIPS worst-case for a 100 MHz 968E-S.

In power, the differences are more stark. The asynchronous 996 runs consistently at 45 µW/equivalent-MHz, while the most efficient implementation of the 968E-S burns through 130 µW/MHz. Equally interesting is the power spectrum. Asynchronous-design enthusiasts have argued for years that because asynchronous designs lack clocks—and hence lack a surge of supply current on every clock edge—the power spectrum of an asynchronously designed CPU should be relatively smooth and flat, without the big spikes at the clock frequency and its harmonics that are characteristic of a synchronous design. And in fact, Handshake's measured data confirms this. In the time domain, the 966 not only draws on average about a third as much current as the 968E-S core but also exhibits spikes in supply current that appear more or less at random, not synchronously. Furthermore, these spikes rise to only about 15 mA from a steady-state level around 3 to 5 mA, while the current for the synchronous 968E-S core swings nearly all the way from 0 to its 25- to 35-mA peak on each spike. In the frequency domain, the 996's power spectrum is, in fact, a decaying profile with only small spikes.

Handshake adds the ARM core to a considerable portfolio of asynchronous circuit designs executed mostly for its original corporate parent, Philips. But unlike many of those designs, the ARM 996HS is available for license. The core should go a long way toward proving the case for all those patient supporters of asynchronous design.


Read original article on EDN