the centerpiece of the “Bulldozer” module is its two tightly-linked processor cores (Figure 1). These cores share several high-bandwidth resources (such as the Floating Point Unit) to provide chip-multithreading (CMT) which efficiently executes multiple instruction threads in parallel. Bulldozer’s CMT provides a marked design improvement over current threading approaches which either funnel multiple instruction threads through one processor core (SMT) or replicate cores statically (CMP) – approaches with inherent constraints and performance bottlenecks.
Bulldozer Floating Point Unit (FPU) shown in Figure 2. High performance computing relies heavily on vector (packed integer) and floating point operations, both handled in the FPU. Bulldozer was designed to execute these operations at higher performance and using less power than the current generation of microprocessors. Key to Bulldozer’s performance and power improvements are FPU changes, including completely redesigned arithmetic units and control structures. This paper covers logic and circuit design goals and tradeoffs for the FP scheduler, datapaths, and register files. As previously described at HotChips 2010, the Bulldozer FPU supports new instructions including SSSE3, SSE4.1, SSE4.2, AVX, AES, and advanced Multiply-Add/Accumulate operations. Fitting these features into the available silicon area, power, and frequency required significant circuit innovations, including pipeline restructuring and a completely new floorplan.
the Integer Execution Unit will be described in another ISSCC paper (Session 4.6) titled “40-entry Unified, Out-of-Order Scheduler and Integer Execution Unit for the AMD Bulldozer x86-64 Core” (http://isscc.org/program/index.html). “Bulldozer’s” integer data and processor control sequencing are handled in the Integer Execution Unit (EX). This unit consists of a 1-cycle out-of-order instruction scheduler, four integer pipelines, and a Level1 Data Cache.