Patent application number | Description | Published |
20090158013 | Method and Apparatus Implementing a Minimal Area Consumption Multiple Addend Floating Point Summation Function in a Vector Microprocessor - Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed. | 06-18-2009 |
20090182990 | Method and Apparatus for a Pipelined Multiple Operand Minimum and Maximum Function - Embodiments of the invention provide methods and apparatus for executing a multiple operand minimum or maximum instructions. Executing the multiple operand minimum or maximum instruction comprises transferring more than two operands to one or more processing lanes of a vector unit. A first compare operation may be performed in at least one processing lane of the vector unit to determine a greater or smaller of a first operand and a second operand. The greater (or smaller) operand may be transferred to a dot product unit, wherein, in a second compare operation, the transferred operand is compared to at least a third operand to determine one of the greater and smaller of the more than two operands. | 07-16-2009 |
20100023568 | Dynamic Range Adjusting Floating Point Execution Unit - A floating point execution unit is capable of selectively repurposing a subset of the significand bits in a floating point value for use as additional exponent bits to dynamically provide an extended range for floating point calculations. A significand field of a floating point operand may be considered to include first and second portions, with the first portion capable of being concatenated with the second portion to represent the significand for a floating point value, or, to provide an extended range, being concatenated with the exponent field of the floating point operand to represent the exponent for a floating point value. | 01-28-2010 |
20100042812 | Data Dependent Instruction Decode - A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode. | 02-18-2010 |
20100042813 | Redundant Execution of Instructions in Multistage Execution Pipeline During Unused Execution Cycles - A pipelined execution unit uses the bubbles that occur during execution to selectively repeat operations performed in one or more stages of a multistage execution pipeline to verify the results of such operations during otherwise unused execution cycles for the execution pipeline. Whenever a bubble follows a particular instruction within an execution pipeline, the result of an operation that is performed for that instruction by a particular stage of the execution pipeline may be stored, and the operation may be repeated by the stage in a subsequent execution cycle in which no productive operation would otherwise be performed due to the presence of the bubble. The results of the operations may then be compared and used to either verify the original result or identify a potential error in the execution of the instruction. | 02-18-2010 |
20100091787 | DIRECT INTER-THREAD COMMUNICATION BUFFER THAT SUPPORTS SOFTWARE CONTROLLED ARBITRARY VECTOR OPERAND SELECTION IN A DENSELY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for retrieving arbitrarily aligned vector operands within a highly threaded Network On a Chip (NOC) processor are presented. Multiple nodes in a NOC are able to access a single Compressed Direct Interthread Communication Buffer (CDICB), which contains a misaligned but compacted set of operands. Using information from a Special Purpose Register (SPR) within the NOC, each node is able to selectively extract one or more operands from the CDICB for use in an execution unit within that node. Output from the execution unit is then sent to the CDICB to update the compacted set of operands. | 04-15-2010 |
20100100707 | DATA STRUCTURE FOR CONTROLLING AN ALGORITHM PERFORMED ON A UNIT OF WORK IN A HIGHLY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for controlling an algorithm that is performed on a unit of work in a subsequent software pipeline stage in a Network On a Chip (NOC) is presented. In one embodiment, the method executes a first operation in a first node of the NOC. The first node generates payload, and then loads that payload into a message. The message with the payload is transmitted to a nanokernel that controls a second node in the NOC. The nanokernel calls an algorithm that is needed by a second operation in a second node in the NOC, which uses the algorithm to execute the second operation. | 04-22-2010 |
20100100770 | SOFTWARE DEBUGGER FOR PACKETS IN A NETWORK ON A CHIP - A breakpoint packet is dispatched to a Network On A Chip (NOC). The breakpoint packet instructs one or more specified nodes on the NOC to place the specified nodes, or a core or hardware thread within a specified node, to execute in “single step” mode, in order to enable a debugging of a work packet that is dispatched to the specific node. | 04-22-2010 |
20100100934 | SECURITY METHODOLOGY TO PREVENT USER FROM COMPROMISING THROUGHPUT IN A HIGHLY THREADED NETWORK ON A CHIP PROCESSOR - A computer-implemented method, system and computer program product for preventing an untrusted work unit message from compromising throughput in a highly threaded Network On a Chip (NOC) processor are presented. A security message, which is associated with the untrusted work unit message, directs other resources within the NOC to operate in a secure mode while a specified node, within the NOC, executes instructions from the work unit message in a less privileged non-secure mode. Thus, throughput within the NOC is uncompromised due to resources, other than the first node, being protected from the untrusted work unit message. | 04-22-2010 |
20100106940 | Processing Unit With Operand Vector Multiplexer Sequence Control - Operand vector multiplexer sequence control is used in a vector-based execution unit to control the shuffling of data elements in operand vectors used by a sequence of vector instructions processed by the vector-based execution unit. A swizzle sequence instruction is defined in an instruction set for the vector-based execution unit and is used to selectively apply a sequence of vector data element shuffle orders to one or more operand vectors to be used by the associated sequence of vector instructions. As a result, when a common sequence of data element shuffle orders is used frequently for a sequence of vector instructions, a single swizzle sequence instruction may be used to select the desired sequence of custom data element ordering for each of the vector instructions in the sequence. | 04-29-2010 |
20100125722 | Multithreaded Processing Unit With Thread Pair Context Caching - A circuit arrangement and method utilize thread pair context caching, where a pair of hardware threads in a multithreaded processor, which are each capable of executing a process, are effectively paired together, at least temporarily, to perform context switching operations such as context save and/or load operations in advance of context switches performed in one or more of such paired hardware threads. By doing so, the overall latency of a context switch, where both the context for a process being switched from must be saved, and the context for the process being switched to must be loaded, may be reduced. | 05-20-2010 |
20100188402 | User-Defined Non-Visible Geometry Featuring Ray Filtering - A method, system and computer program product for managing secondary rays during ray-tracing are presented. A non-visible unidirectional ray tracing object logically surrounds a user-selected virtual object in a computer generated illustration. This unidirectional ray tracing object prevents secondary tracing rays from emanating from the user-selected virtual object during ray tracing. | 07-29-2010 |
20100189111 | STREAMING DIRECT INTER-THREAD COMMUNICATION BUFFER PACKETS THAT SUPPORT HARDWARE CONTROLLED ARBITRARY VECTOR OPERAND ALIGNMENT IN A DENSELY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for arbitrarily aligning vector operands, which are transmitted in inter-thread communication buffer packets within a highly threaded Network On a Chip (NOC) processor, are presented. A set of multiplexers in a node in the NOC realigns and extracts data word aggregations from an incoming compressed inter-thread communication buffer packet. The extracted data word aggregations are used as operands by an execution unit within the node. | 07-29-2010 |
20100191939 | TRIGONOMETRIC SUMMATION VECTOR EXECUTION UNIT - A unique instruction and exponent adjustment adder selectively shift outputs from multiple execution units, including a plurality of multipliers, in a processor core in order to scale mantissas for related trigonometric functions used in a vector dot product. | 07-29-2010 |
20100191940 | SINGLE STEP MODE IN A SOFTWARE PIPELINE WITHIN A HIGHLY THREADED NETWORK ON A CHIP MICROPROCESSOR - A hardware thread is selectively forced to single step the execution of software instructions from a work packet granule. A “single step” packet is associated with a work packet granule. The work packet granule, with the associated “single step” packet, is dispatched as an appended work packet granule to a preselected hardware thread in a processor core, which, in one embodiment, is located at a node in a Network On a Chip (NOC). The work packet granule then executes in a single step mode until completion. | 07-29-2010 |
20100192014 | PSEUDO RANDOM PROCESS STATE REGISTER FOR FAST RANDOM PROCESS TEST GENERATION - A method, system and computer program product are presented for providing pseudo-random input test data to a test program. A seed value is generated and stored in a seed register. Using the seed value as an input, a pseudo-random input test value is generated by a Linear Feedback Shift Register (LFSR), and stored in a GPR within a processor core. Using the pseudo-random input test value from the GPR, a test program is executed within the processor core. | 07-29-2010 |
20100199067 | Split Vector Loads and Stores with Stride Separated Words - A method, system and computer program product are presented for causing a parallel load/store of stride-separated words from a data vector using different memory chips in a computer. | 08-05-2010 |
20100269123 | Performance Event Triggering Through Direct Interthread Communication On a Network On Chip - Performance event triggering through direct interthread communication (‘DITC’) on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including enabling performance event monitoring in a selected set of IP blocks distributed throughout the NOC, each IP block within the selected set of IP blocks having one or more event counters; collecting performance results from the one or more event counters; and returning performance results from the one or more event counters to a destination repository, the returning being initiated by a triggering event occurring within the NOC. | 10-21-2010 |
20110047355 | Offset Based Register Address Indexing - A circuit arrangement and method support offset based register address indexing, wherein register addresses to be used by an instruction are calculated using offsets to the full target register address, and the offsets are contained in the instruction and occupy less instruction space than the full address widths. An instruction may include at least one offset value that identifies a register address. During decoding of the instruction, an offset and a full target address are retrieved from the instruction, and then a register address is calculated by addition of the offset to the full target address. | 02-24-2011 |
20110283090 | Instruction Addressing Using Register Address Sequence Detection - A circuit arrangement and method support efficient indexing into large register files by utilizing register address sequence detection, wherein register addresses to be used by an instruction are produced by concatenating a portion of the address that is contained in the instruction with another portion that is speculatively produced by sequence detection logic. The portion of the correct full address that is not contained in the instruction is stored in a software accessible special purpose register. If the end of a particular sequence of addresses is detected by the sequence detection logic, the invention speculatively assumes that the next address in the sequence will be used. Since only a portion of the full addresses are stored in the instruction, they occupy less instruction space than the full address widths. An instruction may include at least one address portion that identifies a register address. | 11-17-2011 |
20110285709 | Allocating Resources Based On A Performance Statistic - A method includes rendering an object of a three dimensional image via a pixel shader based on a render context data structure associated with the object. The method includes measuring a performance statistic associated with rendering the object. The method also includes storing the performance statistic in the render context data structure associated with the object. The performance statistic is accessible to a host interface processor to determine whether to allocate a second pixel shader to render the object in a subsequent three-dimensional image. | 11-24-2011 |
20110285710 | Parallelized Ray Tracing - A method includes assigning a priority to a ray data structure of a plurality of ray data structures based on one or more priorities. The ray data structure includes properties of a ray to be traced from an illumination source in a three-dimensional image. The method includes identifying a portion of the three-dimensional image through which the ray passes. The method also includes identifying a slave processing element associated with the portion of the three-dimensional image. The method further includes sending the ray data structure to the slave processing element. | 11-24-2011 |
20110289485 | Software Trace Collection and Analysis Utilizing Direct Interthread Communication On A Network On Chip - Collecting and analyzing trace data while in a software debug mode through direct interthread communication (‘DITC’) on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including enabling the collection of software debug information in a selected set of IP blocks distributed through the NOC, each IP block within the selected set of IP blocks having a set of trace data; collecting software debugging information via the set of trace data; communicating the set of trace data to a destination repository; and analyzing the set of trace data at the destination repository. | 11-24-2011 |
20110298788 | PERFORMING VECTOR MULTIPLICATION - A method includes receiving packed data corresponding to pixel components to be processed at a graphics pipeline. The method includes unpacking the packed data to generate floating point numbers that correspond to the pixel components. The method also includes routing each of the floating point numbers to a separate lane of the graphics pipeline. Each of the floating point numbers are to be processed by multiplier units of the graphics pipeline. | 12-08-2011 |
20110302450 | FAULT TOLERANT STABILITY CRITICAL EXECUTION CHECKING USING REDUNDANT EXECUTION PIPELINES - A circuit arrangement and method utilize existing redundant execution pipelines in a processing unit to execute multiple instances of stability critical instructions in parallel so that the results of the multiple instances of the instructions can be compared for the purpose of detecting errors. For other types of instructions for which fault tolerant or stability critical execution is not required or desired, the redundant execution pipelines are utilized in a more conventional manner, enabling multiple non-stability critical instructions to be concurrently issued to and executed by the redundant execution pipelines. As such, for non-stability critical program code, the performance benefits of having multiple redundant execution units are preserved, yet in the instances where fault tolerant or stability critical execution is desired for certain program code, the redundant execution units may be repurposed to provide greater assurances as to the fault-free execution of such instructions. | 12-08-2011 |
20110316855 | Parallelized Streaming Accelerated Data Structure Generation - A method includes receiving at a master processing element primitive data that includes properties of a primitive. The method includes partially traversing a spatial data structure that represents a three-dimensional image to identify an internal node of the spatial data structure. The internal node represents a portion of the three-dimensional image. The method also includes selecting a slave processing element from a plurality of slave processing elements. The selected processing element is associated with the internal node. The method further includes sending the primitive data to the selected slave processing element to traverse a portion of the spatial data structure to identify a leaf node of the spatial data structure. | 12-29-2011 |
20110316864 | MULTITHREADED SOFTWARE RENDERING PIPELINE WITH DYNAMIC PERFORMANCE-BASED REALLOCATION OF RASTER THREADS - A multithreaded rendering software pipeline architecture dynamically reallocates regions of an image space to raster threads based upon performance data collected by the raster threads. The reallocation of the regions typically includes resizing the regions assigned to particular raster threads and/or reassigning regions to different raster threads to better balance the relative workloads of the raster threads. | 12-29-2011 |
20110317712 | Recovering Data From A Plurality of Packets - A method includes receiving a plurality of packets at an integrated processor block of a network on a chip device. The plurality of packets includes a first packet that includes an indication of a start of data associated with a pixel shader application. The method includes recovering the data from the plurality of packets. The method also includes storing the recovered data in a dedicated packet collection memory within the network on the chip device. The method further includes retaining the data stored in the dedicated packet collection memory during an interruption event. Upon completion of the interruption event, the method includes copying packets stored in the dedicated packet collection memory prior to the interruption event to an inbox of the network on the chip device for processing. | 12-29-2011 |
20110320719 | PROPAGATING SHARED STATE CHANGES TO MULTIPLE THREADS WITHIN A MULTITHREADED PROCESSING ENVIRONMENT - A circuit arrangement and method make state changes to shared state data in a highly multithreaded environment by propagating or streaming the changes to multiple parallel hardware threads of execution in the multithreaded environment using an on-chip communications network and without attempting to access any copy of the shared state data in a shared memory to which the parallel threads of execution are also coupled. Through the use of an on-chip communications network, changes to the shared state data may be communicated quickly and efficiently to multiple threads of execution, enabling those threads to locally update their local copies of the shared state. Furthermore, by avoiding attempts to access a shared memory, the interface to the shared memory is not overloaded with concurrent access attempts, thus preserving memory bandwidth for other activities and reducing memory latency. Particularly for larger shared states, propagating the changes, rather than an entire shared state, further improves performance by reducing the amount of data communicated over the on-chip communications network. | 12-29-2011 |
20110320724 | DMA-BASED ACCELERATION OF COMMAND PUSH BUFFER BETWEEN HOST AND TARGET DEVICES - Direct Memory Access (DMA) is used in connection with passing commands between a host device and a target device coupled via a push buffer. Commands passed to a push buffer by a host device may be accumulated by the host device prior to forwarding the commands to the push buffer, such that DMA may be used to collectively pass a block of commands to the push buffer. In addition, a host device may utilize DMA to pass command parameters for commands to a command buffer that is accessible by the target device but is separate from the push buffer, with the commands that are passed to the push buffer including pointers to the associated command parameters in the command buffer. | 12-29-2011 |
20110320771 | INSTRUCTION UNIT WITH INSTRUCTION BUFFER PIPELINE BYPASS - A circuit arrangement and method selectively bypass an instruction buffer for selected instructions so that bypassed instructions can be dispatched without having to first pass through the instruction buffer. Thus, for example, in the case that an instruction buffer is partially or completely flushed as a result of an instruction redirect (e.g., due to a branch mispredict), instructions can be forwarded to subsequent stages in an instruction unit and/or to one or more execution units without the latency associated with passing through the instruction buffer. | 12-29-2011 |
20110321049 | Programmable Integrated Processor Blocks - An integrated processor block of the network on a chip is programmable to perform a first function. The integrated processor block includes an inbox to receive incoming packets from other integrated processor blocks of a network on a chip, an outbox to send outgoing packets to the other integrated processor blocks, an on-chip memory, and a memory management unit to enable access to the on-chip memory. | 12-29-2011 |
20110321057 | MULTITHREADED PHYSICS ENGINE WITH PREDICTIVE LOAD BALANCING - A circuit arrangement and method utilize predictive load balancing to allocate the workload among hardware threads in a multithreaded physics engine. The predictive load balancing is based at least in part upon the detection of predicted future collisions between objects in a scene, such that the reallocation of respective loads of a plurality of hardware threads may be initiated prior to detection of the actual collisions, thereby increasing the likelihood that hardware threads will be optimally allocated when the actual collisions occur. | 12-29-2011 |
20120084535 | Opcode Space Minimizing Architecture Utilizing Instruction Address to Indicate Upper Address Bits - Due to the ever expanding number of registers and new instructions in modern microprocessor cores, the address widths present in the instruction encoding continue to widen, and fewer instruction opcodes are available, making it more difficult to add new instructions to existing architectures without resorting to inelegant tricks that have drawbacks such as source destructive operations. The disclosed invention utilizes specialized decode and address calculation hardware that concatenates a fixed number of least significant bits of the instruction address onto the upper address bits of each register address portion contained in the instruction, yielding the full register address, instead of providing the full register address widths for every register used in the instruction. This frees up valuable opcode space for other instructions and avoids compiler complexity. This aligns nicely with how most loops are unrolled in assembly language, where independent operations are near each other in memory. | 04-05-2012 |
20120176364 | REUSE OF STATIC IMAGE DATA FROM PRIOR IMAGE FRAMES TO REDUCE RASTERIZATION REQUIREMENTS - An apparatus, program product and method reuse static image data generated during rasterization of static geometry to reduce the processing overhead associated with rasterizing subsequent image frames. In particular, static image data generated one frame may be reused in a subsequent image frame such that the subsequent image frame is generated without having to re-rasterize the static geometry from the scene, i.e., with only the dynamic geometry rasterized. The resulting image frame includes dynamic image data generated as a result of rasterizing the dynamic geometry during that image frame, and static image data generated as a result of rasterizing the static image data during a prior image frame. | 07-12-2012 |
20130036296 | FLOATING POINT EXECUTION UNIT WITH FIXED POINT FUNCTIONALITY - A floating point execution unit is capable of selectively repurposing one or more adders in an exponent path of the floating point execution unit to perform fixed point addition operations, thereby providing fixed point functionality in the floating point execution unit. | 02-07-2013 |
20130044117 | VECTOR REGISTER FILE CACHING OF CONTEXT DATA STRUCTURE FOR MAINTAINING STATE DATA IN A MULTITHREADED IMAGE PROCESSING PIPELINE - Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit. | 02-21-2013 |
20130046518 | MULTITHREADED PHYSICS ENGINE WITH IMPULSE PROPAGATION - A circuit arrangement and method implement impulse propagation in a multithreaded physics engine by assigning ownership of objects in a scene to individual threads and propagating impulses between objects that are in contact with one another by passing inter-thread impulse messages between the threads that own the contacting objects, while locally propagating impulses through objects using the threads to which such objects are assigned. | 02-21-2013 |
20130111186 | INSTRUCTION ADDRESS ADJUSTMENT IN RESPONSE TO LOGICALLY NON-SIGNIFICANT OPERATIONS | 05-02-2013 |
20130111190 | OPERATIONAL CODE EXPANSION IN RESPONSE TO SUCCESSIVE TARGET ADDRESS DETECTION | 05-02-2013 |
20130138918 | DIRECT INTERTHREAD COMMUNICATION DATAPORT PACK/UNPACK AND LOAD/SAVE - A circuit arrangement, method, and program product for compressing and decompressing data in a node of a system including a plurality of nodes interconnected via an on-chip network. Compressed data may be received and stored at an input buffer of a node, and in parallel with moving the compressed data to an execution register of the node, decompression logic of the node may decompress the data to generate uncompressed data, such that uncompressed data is stored in the execution register for utilization by an execution unit of the node. Uncompressed data may be output by the execution unit into the execution register, and in parallel with moving the uncompressed data to an output buffer of the node connected to the on-chip network, compression logic may compress the uncompressed data to generate compressed data, such that compressed data is stored at the output buffer. | 05-30-2013 |
20130138925 | PROCESSING CORE WITH SPECULATIVE REGISTER PREPROCESSING - A method and circuit arrangement speculatively preprocess data stored in a register file during otherwise unused cycles in an execution unit, e.g., to prenormalize denormal floating point values stored in a floating point register file, to decompress compressed values stored in a register file, to decrypt encrypted values stored in a register file, or to otherwise preprocess data that is stored in an unprocessed form in a register file. | 05-30-2013 |
20130145128 | PROCESSING CORE WITH PROGRAMMABLE MICROCODE UNIT - A method and circuit arrangement utilize a programmable microcode unit that is capable of being programmed via software to modify the instruction sequences output by the microcode unit in response to microcode instructions issued to the microcode unit. Among other benefits, a programmable microcode unit consistent with the invention enables customization of a processor design to handle specific applications or tasks, as well as to support specific hardware configurations such as specific execution units. In addition, a programmable microcode unit may be updatable, e.g., to correct bugs or faults found in previous instruction sequences supported by the unit. | 06-06-2013 |
20130159668 | PREDECODE LOGIC FOR AUTOVECTORIZING SCALAR INSTRUCTIONS IN AN INSTRUCTION BUFFER - A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions. | 06-20-2013 |
20130159674 | INSTRUCTION PREDICATION USING INSTRUCTION FILTERING - A method and circuit arrangement for selectively predicating instructions in an instruction stream based upon a predication filter criteria defined by a predication filter, which describes types or patterns of instructions that should be predicated. Predication logic compares a respective instruction of an instruction stream to predication filter criteria to determine whether the respective instruction matches the predication filter criteria, and the respective instruction is selectively predicated based on whether the respective instruction matches the predication filter criteria. | 06-20-2013 |
20130159675 | INSTRUCTION PREDICATION USING UNUSED DATAPATH FACILITIES - A method and circuit arrangement for selectively predicating an instruction in an instruction stream based upon a value corresponding to a predication register address indicated by a portion of an operand associated with the instruction. A first compare instruction in an instruction stream stores a compare result in at a register address of a predication register. The register address of the predication register is stored in a portion of an operand associated with a second instruction, and during decoding the second instruction, the predication register is accessed to determine a value stored at the register address of the predication register, and the second instruction is selectively predicated based on the value stored at the register address of the predication register. | 06-20-2013 |
20130159676 | INSTRUCTION SET ARCHITECTURE WITH EXTENDED REGISTER ADDRESSING - A method and circuit arrangement selectively repurpose bits from a primary opcode portion of an instruction for use in decoding one or more operands for the instruction. Decode logic of a processor, for example, may be placed in a predetermined mode that decodes a primary opcode for an instruction that is different from that specified in the primary opcode portion of the instruction, and then utilize one or more bits in the primary opcode portion to decode one or more operands for the instruction. By doing so, additional space is freed up in the instruction to support a larger register file and/or additional instruction types, e.g., as specified by a secondary or extended opcode. | 06-20-2013 |
20130159683 | INSTRUCTION PREDICATION USING INSTRUCTION ADDRESS PATTERN MATCHING - A particular method includes receiving, at a processor, an instruction and an address of the instruction. The method also includes preventing execution of the instruction based at least in part on determining that the address is within a range of addresses. | 06-20-2013 |
20130191649 | MEMORY ADDRESS TRANSLATION-BASED DATA ENCRYPTION/COMPRESSION - A method and circuit arrangement selectively stream data to an encryption or compression engine based upon encryption and/or compression-related page attributes stored in a memory address translation data structure such as an Effective To Real Translation (ERAT) or Translation Lookaside Buffer (TLB). A memory address translation data structure may be accessed, for example, in connection with a memory access request for data in a memory page, such that attributes associated with the memory page in the data structure may be used to control whether data is encrypted/decrypted and/or compressed/decompressed in association with handling the memory access request. | 07-25-2013 |
20130191651 | MEMORY ADDRESS TRANSLATION-BASED DATA ENCRYPTION WITH INTEGRATED ENCRYPTION ENGINE - A method and circuit arrangement utilize an integrated encryption engine within a processing core of a multi-core processor to perform encryption operations, i.e., encryption and decryption of secure data, in connection with memory access requests that access such data. The integrated encryption engine is utilized in combination with a memory address translation data structure such as an Effective To Real Translation (ERAT) or Translation Lookaside Buffer (TLB) that is augmented with encryption-related page attributes to indicate whether pages of memory identified in the data structure are encrypted such that secure data associated with a memory access request in the processing core may be selectively streamed to the integrated encryption engine based upon the encryption-related page attribute for the memory page associated with the memory access request. | 07-25-2013 |
20130191824 | VIRTUALIZATION SUPPORT FOR BRANCH PREDICTION LOGIC ENABLE/DISABLE - A hypervisor and one or more guest operating systems resident in a data processing system and hosted by the hypervisor are configured to selectively enable or disable branch prediction logic through separate hypervisor-mode and guest-mode instructions. By doing so, different branch prediction strategies may be employed for different operating systems and user applications hosted thereby to provide finer grained optimization of the branch prediction logic for different operating scenarios. | 07-25-2013 |
20130191825 | VIRTUALIZATION SUPPORT FOR SAVING AND RESTORING BRANCH PREDICTION LOGIC STATES - A hypervisor and one or more programs, e.g., guest operating systems and/or user processes or applications hosted by the hypervisor to configured to selectively save and restore the state of branch prediction logic through separate hypervisor-mode and guest-mode and/or user-mode instructions. By doing so, different branch prediction strategies may be employed for different operating systems and user applications hosted thereby to provide finer grained optimization of the branch prediction logic. | 07-25-2013 |