Intelligent Speculation for Pipelined Multithreading (thesis)
Abstract:
In recent years, microprocessor manufacturers have shifted their focus from single-core to multi-core processors. Since many of today's applications are single-threaded and since it is likely that many of tomorrow's applications will have far fewer threads than there will be processor cores, automatic thread extraction is an essential tool for effectively leveraging today's multi-core and tomorrow's many-core processors. A recently proposed technique, Decoupled Software Pipelining (DSWP), has demonstrated promise by partitioning loops into long-running threads organized into a pipeline. Using a pipeline organization and execution decoupled by inter-core communication queues, DSWP offers increased execution efficiency that is largely independent of inter-core communication latency and variability in intra-thread performance.
This dissertation extends the pipelined parallelism paradigm with speculation. Using speculation, dependences that manifest infrequently or are easily predictable can be safely ignored by the compiler allowing it to carve more, and better balanced, thread-based pipeline stages from a single thread of execution. Prior speculative threading proposals were obligated to speculate most, if not all, loop-carried dependences to squeeze the code segment under consideration into the mold required by the parallelization paradigm. Unlike those techniques, this dissertation demonstrates that speculation need only break the longest few dependence cycles to enhance the applicability and scalability of the pipelined multi-threading paradigm. By speculatively breaking these cycles, instructions that were formerly restricted to a single thread to ensure decoupling are now free to span multiple threads. To demonstrate the effectiveness of speculative pipelined multi-threading, this dissertation presents the design and experimental evaluation of our fully automatic compiler transformation, Speculative Decoupled Software Pipelining, a speculative extension to DSWP.
This dissertation additionally introduces multi-threaded transactional memories to support speculative pipelined multi-threading. Similar to past speculative parallelization approaches, speculative pipelined multi-threading relies on runtime-system support to buffer speculative modifications to memory. However, this dissertation demonstrates that existing proposals to buffer speculative memory state, transactional memories, are insufficient for speculative pipelined multi-threading because the speculative buffers are restricted to a single thread. Further, this dissertation demonstrates that this limitation leads to modularity and composability problems even for transactional programming, thus limiting the potential of that approach also. To surmount these limitations, this thesis introduces multi-threaded transactional memories and presents an initial hardware implementation.