Patent application number | Description | Published |
20150035841 | MULTI-THREADED GPU PIPELINE - Techniques are disclosed relating to a multithreaded execution pipeline. In some embodiments, an apparatus is configured to assign a number of threads to an execution pipeline that is an integer multiple of a minimum number of cycles that an execution unit is configured to use to generate an execution result from a given set of input operands. In one embodiment, the apparatus is configured to require strict ordering of the threads. In one embodiment, the apparatus is configured so that the same thread access (e.g., reads and writes) a register file in a given cycle. In one embodiment, the apparatus is configured so that the same thread does not write back an operand and a result to an operand cache in a given cycle. | 02-05-2015 |
20150039661 | TYPE CONVERSION USING FLOATING-POINT UNIT - Techniques are disclosed relating to type conversion using a floating-point unit. In one embodiment, to convert a floating-point value to a normalized integer format, a floating-point unit is configured to perform an operation to generate a result having a significant portion and an exponent portion, where the operation includes multiplying the floating-point value by a constant. In one embodiment, the apparatus is further configured to add a value to the exponent portion of the result, and set a rounding mode to round to nearest. The constant may be a greatest value less than one that can be represented using the particular number of unsigned bits. The value added to the initial exponent may be equal to the number of unsigned bits of the normalized integer format. The apparatus may perform this conversion in response to a pack instruction. | 02-05-2015 |
20150039867 | INSTRUCTION SOURCE SPECIFICATION - Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result. | 02-05-2015 |
20150058389 | EXTENDED MULTIPLY - Techniques are disclosed relating to performing extended multiplies without a carry flag. In one embodiment, an apparatus includes a multiply unit configured to perform multiplications of operands having a particular width. In this embodiment, the apparatus also includes multiple storage elements configured to store operands for the multiply unit. In this embodiment, each of the storage elements is configured to provide a portion of a stored operand that is less than an entirety of the stored operand in response to a control signal from the apparatus. In one embodiment, the apparatus is configured to perform a multiplication of given first and second operands having a width greater than the particular width by performing a sequence of multiply operations using the multiply unit, using portions of the stored operands and without using a carry flag between any of the sequence of multiply operations. | 02-26-2015 |
20150058571 | HINT VALUES FOR USE WITH AN OPERAND CACHE - Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Hint values may be used in some embodiments to suggest that a particular operand should be stored in the operand cache (so that is available for current or future use). In one embodiment, a hint value indicates that an operand should be cached whenever possible. Hint values may be determined by software, such as a compiler, in some embodiments. One or more criteria may be used to determine hint values, such as how soon in the future or how frequently an operand will be used again. | 02-26-2015 |
20150058572 | INTELLIGENT CACHING FOR AN OPERAND CACHE - Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may store a subset of operands, and may use less power and have quicker access times than the register file. In some embodiments, intelligent operand prefetching may speed execution by reducing memory bank conflicts (e.g., conflicts within a register file containing multiple memory banks). An unused operand slot for another instruction (e.g., an instruction that does not require a maximum number of source operands allowed by an instruction set architecture) may be used to prefetch an operand for another instruction in one embodiment. Prefetched operands may be stored in an operand cache, and prefetching may occur based on software-provided information. | 02-26-2015 |
20150058573 | OPERAND CACHE DESIGN - Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Selectors (e.g., multiplexers) may be used to read operands from the operand cache. Power savings may be achieved in some embodiments by activating only a subset of the selectors, which may be done by activators (e.g. flip-flops). Operands may also be concurrently provided to two or more locations via forwarding, which may be accomplished via a source selection unit in some embodiments. Operand forwarding may also reduce power and/or speed execution in one or more embodiments. | 02-26-2015 |
20150205324 | CLOCK ROUTING TECHNIQUES - Techniques are disclosed relating to clock routing techniques in processors with both pipelined and non-pipelined circuitry. In some embodiments, an apparatus includes execution units that are non-pipelined and configured to perform instructions without receiving a clock signal. In these embodiments, one or more clock lines routed throughout the apparatus do not extend into the one or more execution units in each pipeline, reducing the length of the clock lines. In some embodiments, the apparatus includes multiple such pipelines arranged in an array, with the execution units located on an outer portion of the array and clocked control circuitry located on an inner portion of the array. In some embodiments, clock lines do not extend into the outer portion of the array. In some embodiments, the array includes one or more rows of execution units. These arrangements may further reduce the length of clock lines. | 07-23-2015 |