Patents - stay tuned to the technology

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Michael K. Gschwind, Chappaqua US

Michael K. Gschwind, Chappaqua, NY US

Patent application numberDescriptionPublished
20080229066System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine - A system, method, and computer program product are provided for performing scalar operations using a SIMD data parallel execution unit. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined.09-18-2008
20080256347METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR PATH-CORRELATED INDIRECT ADDRESS PREDICTIONS - A method, system, and computer program product are provided, for maintaining a path history register of register indirect branches. A set of bits is generated based on a set of target address bits using a hit selection and/or a hash function operation, and the generated set of bits is inserted into a path history register by shifting bits in the path history register and/or applying a hash operation, information corresponding to prior history is removed from the path history register, using a shift out operation and/or a hash operation. The path, history register is used to maintain a recent target, table and generate register-indirect branch target address predictions based on path history correlation between register-indirect branches captured by the path history register.10-16-2008
20080263280LOW COMPLEXITY SPECULATIVE MULTITHREADING SYSTEM BASED ON UNMODIFIED MICROPROCESSOR CORE - A system, method and computer program product for supporting thread level speculative execution in a computing environment having multiple processing units adapted for concurrent execution of threads in speculative and non-speculative modes. Each processing unit includes a cache memory hierarchy of caches operatively connected therewith. The apparatus includes an additional cache level local to each processing unit for use only in a thread level speculation mode, each additional cache for storing speculative results and status associated with its associated processor when handling speculative threads. The additional local cache level at each processing unit are interconnected so that speculative values and control data may be forwarded between parallel executing threads. A control implementation is provided that enables speculative coherence between speculative threads executing in the computing environment.10-23-2008
20080276069METHOD AND APPARATUS FOR PREDICTIVE DECODING - Predictive decoding is achieved by fetching an instruction, accessing a predictor containing predictor information including prior instruction execution characteristics, obtaining predictor information for the fetched instruction from the predictor; and generating a selected one of a plurality of decode operation streams corresponding to the fetched instruction. The decode operation stream is selected based on the predictor information.11-06-2008
20090064152SYSTEMS, METHODS AND COMPUTER PRODUCTS FOR CROSS-THREAD SCHEDULING - Systems, methods and computer products for cross-thread scheduling. Exemplary embodiments include a cross thread scheduling method for compiling code, the method including scheduling a scheduling unit with a scheduler sub-operation in response to the scheduling unit being in a non-multithreaded part of the code and scheduling the scheduling unit with a cross-thread scheduler sub-operation in response to the scheduling unit being in a multithreaded part of the code.03-05-2009
20090077571METHOD AND APPARATUS FOR EFFICIENT PERFORMANCE MONITORING OF A LARGE NUMBER OF SIMULTANEOUS EVENTS - A system for monitoring a large number of simultaneous events implements a hybrid counter array device having a first counter portion comprising counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. A second counter portion comprises a memory array device having addressable memory locations in correspondence with the counter devices, each addressable memory location for storing a second count value representing higher order bits. A control device monitors each of the counter devices and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location. The system includes interrupt pre-indication for providing fast interrupt trigger to a processor device when a count value related to an event equals a threshold value. A data transfer sub-system additionally enables one or more of: read access or write access to both the count values in the first and second counter portions over a narrow bus, the read/write access for purposes of initializing and determining status of the count values for a monitored event type in response to a processor device request.03-19-2009
20090119494DESIGN STRUCTURE FOR PREDICTIVE DECODING - A design structure embodied in a machine readable medium used in a design process includes an apparatus for predictive decoding, the apparatus including register logic for fetching an instruction; predictor logic containing predictor information including prior instruction execution characteristics; logic for obtaining predictor information for the fetched instruction from the predictor; and decode logic for generating a selected one of a plurality of decode operation streams corresponding to the fetched instruction, wherein the decode operation stream is selected based on the predictor information.05-07-2009
20090198966Multi-Addressable Register File - A single register file may be addressed using both scalar and SIMD instructions. That is, subsets of registers within a multi-addressable register file according to the illustrative embodiments, are addressable with different instruction forms, e.g., scalar instructions, SIMD instructions, etc., while the entire set of registers may be addressed with yet another form of instructions, referred to herein as Vector-Scalar Extension (VSX) instructions. The operation set that may be performed on the entire set of registers using the VSX instruction form is substantially similar to that of the operation sets of the subsets of registers. Such an arrangement allows legacy instructions to access subsets of registers within the multi-addressable register file while new instructions, i.e. the VSX instructions, may access the entire range of registers within the multi-addressable register file.08-06-2009
20090198967METHOD AND STRUCTURE FOR LOW LATENCY LOAD-TAGGED POINTER INSTRUCTION FOR COMPUTER MICROARCHITECHTURE - A methodology and implementation of a load-tagged pointer instruction for RISC based microarchitecture is presented. A first lower latency, speculative implementation reduces overall throughput latency for a microprocessor system by estimating the results of a particular instruction and confirming the integrity of the estimate a little slower than the normal instruction execution latency. A second higher latency, non-speculative implementation that always produces correct results is invoked by the first when the first guesses incorrectly. The methodologies and structures disclosed herein are intended to be combined with predictive techniques for instruction processing to ultimately improve processing throughput.08-06-2009
20090198977Sharing Data in Internal and Memory Representations with Dynamic Data-Driven Conversion - Illustrative embodiments determine the data type of the operand being accessed as well as analyze the data value subrange of the input operand data type. If the operand's data type does not match the required format of the instruction being processed, a determination is made as to whether a subrange of data values of the data type of the input operand is supported natively. If the subrange of data values of the input operand is not supported natively, then a format conversion is performed on the data and the instruction may then operate on the data. Otherwise, the data may be operated on directly by the instruction without a format conversion operation and thus, the conversion is not performed.08-06-2009
20090307656Optimized Scalar Promotion with Load and Splat SIMD Instructions - Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.12-10-2009
20100095086Dynamically Aligning Enhanced Precision Vectors Based on Addresses Corresponding to Reduced Precision Vectors - Mechanisms for aligning enhanced precision vectors based on reduced precision data values are provided. At least one data value, having a first precision type, is received for storing in a vector register. The vector register stores data as a vector having a plurality of vector elements. The first precision type is modified to have a second precision type different in precision than the first precision type to thereby generate at least one modified data value. The at least one modified data value is stored in at least one vector element of the plurality of vector elements. An alignment of the at least one modified data value is determined relative to a boundary of a vector element of the vector register. An alignment operation to re-align the at least one modified data value based on the boundary of the vector element of the vector register is performed.04-15-2010
20100095087Dynamic Data Driven Alignment and Data Formatting in a Floating-Point SIMD Architecture - Mechanisms are provided for dynamic data driven alignment and data formatting in a floating point SIMD architecture. At least two operand inputs are input to a permute unit of a processor. Each operand input contains at least one floating point value upon which a permute operation is to be performed by the permute unit. A control vector input, having a plurality of floating point values that together constitute the control vector input, is input to the permute unit of the processor for controlling the permute operation of the permute unit. The permute unit performs a permute operation on the at least two operand inputs according to a permutation pattern specified by the plurality of floating point values that constitute the control vector input. Moreover, a result output of the permute operation is output from the permute unit to a result vector register of the processor.04-15-2010
20100095097Floating Point Only Single Instruction Multiple Data Instruction Set Architecture - Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.04-15-2010
20100095098Generating and Executing Programs for a Floating Point Single Instruction Multiple Data Instruction Set Architecture - Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements.04-15-2010
20100095285Array Reference Safety Analysis in the Presence of Loops with Conditional Control Flow - Mechanisms are provided for analyzing and optimizing loops with conditional control flow in source code based on array reference safety. Mechanisms are provided for analyzing blocks of the source code to identify a conditional control flow loop having loop source code specifying a total access range for an array reference. A safe access range, of the total access range of the array reference in the loop source code, is identified over which a compiler-based optimization of the loop source code can be safely applied without introducing new exception conditions. The compiler-based optimization of the loop source code is performed based on the identified safe access range to generate optimized code. The optimized code is output for generation of executable code for execution on a processor.04-15-2010
20100223429Hybrid Caching Techniques and Garbage Collection Using Hybrid Caching Techniques - Hybrid caching techniques and garbage collection using hybrid caching techniques are provided. A determination of a measure of a characteristic of a data object is performed, the characteristic being indicative of an access pattern associated with the data object. A selection of one caching structure, from a plurality of caching structures, is performed in which to store the data object based on the measure of the characteristic. Each individual caching structure in the plurality of caching structures stores data objects has a similar measure of the characteristic with regard to each of the other data objects in that individual caching structure. The data object is stored in the selected caching structure and at least one processing operation is performed on the data object stored in the selected caching structure.09-02-2010
20110040821Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture - Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.02-17-2011
20110040822Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture - Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.02-17-2011
20110047334Checkpointing in Speculative Versioning Caches - Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred.02-24-2011
20110047358In-Data Path Tracking of Floating Point Exceptions and Store-Based Exception Indication - Mechanisms are provided for tracking exceptions in the execution of vectorized code. A speculative instruction is executed on a vector element of a vector. An exception condition is detected in association with the vector element based on a result of executing the speculative instruction on the vector element. A special exception value is stored in the vector element in a vector register corresponding to the vector, indicative of the exception condition, without invoking an exception handler for the exception condition. The special exception value is propagated with the vector element of the vector through a processor architecture of the processor, without invoking the exception handler for the exception condition. An exception corresponding to the exception condition indicated by the special exception value is generated only in response to a non-speculative instruction being executed that performs a non-speculative operation on the vector element.02-24-2011
20110047359Insertion of Operation-and-Indicate Instructions for Optimized SIMD Code - Mechanisms are provided for inserting indicated instructions for tracking and indicating exceptions in the execution of vectorized code. A portion of first code is received for compilation. The portion of first code is analyzed to identify non-speculative instructions performing designated non-speculative operations in the first code that are candidates for replacement by replacement operation-and-indicate instructions that perform the designated non-speculative operations and further perform an indication operation for indicating any exception conditions corresponding to special exception values present in vector register inputs to the replacement operation-and-indicate instructions. The replacement is performed and second code is generated based on the replacement of the at least one non-speculative instruction. The data processing system executing the compiled code is configured to store special exception values in vector output registers, in response to a speculative instruction generating an exception condition, without initiating exception handling.02-24-2011
20110047533Generating Code Adapted for Interlinking Legacy Scalar Code and Extended Vector Code - Mechanisms for intermixing code are provided. Source code is received for compilation using an extended Application Binary Interface (ABI) that extends a legacy ABI and uses a different register configuration than the legacy ABI. First compiled code is generated based on the source code, the first compiled code comprising code for accommodating the difference in register configurations used by the extended ABI and the legacy ABI. The first compiled code and second compiled code are intermixed to generate intermixed code, the second compiled code being compiled code that uses the legacy ABI. The intermixed code comprises at least one call instruction that is one of a call from the first compiled code to the second compiled code or a call from the second compiled code to the first compiled code. The code for accommodating the difference in register configurations is associated with the at least one call instruction.02-24-2011
20110082980HIGH PERFORMANCE UNALIGNED CACHE ACCESS - A cache memory device and method for operating the same. One embodiment of the cache memory device includes an address decoder decoding a memory address and selecting a target cache line. A first cache array is configured to output a first cache entry associated with the target cache line, and a second cache array coupled to an alignment unit is configured to output a second cache entry associated with the alignment cache line. The alignment unit coupled to the address decoder selects either the target cache line or a neighbor cache line proximate the target cache line as an alignment cache line output. Selection of either the target cache line or a neighbor cache line is based on an alignment bit in the memory address. A tag array cache is split into even and odd cache lines tags, and provides one or two tags for every cache access.04-07-2011
20110214016Performing Aggressive Code Optimization with an Ability to Rollback Changes Made by the Aggressive Optimizations - Mechanisms for aggressively optimizing computer code are provided. With these mechanisms, a compiler determines an optimization to apply to a portion of source code and determines if the optimization as applied to the portion of source code will result in unsafe optimized code that introduces a new source of exceptions being generated by the optimized code. In response to a determination that the optimization is an unsafe optimization, the compiler generates an aggressively compiled code version, in which the unsafe optimization is applied, and a conservatively compiled code version in which the unsafe optimization is not applied. The compiler stores both versions and provides them for execution. Mechanisms are provided for switching between these versions during execution in the event of a failure of the aggressively compiled code version. Moreover, predictive mechanisms are provided for predicting whether such a failure is likely.09-01-2011
20110221473SOFT ERROR DETECTION FOR LATCHES - A system and method for soft error detection in digital ICs is disclosed. The system includes an observing circuit coupled to a latch, which circuit is capable of a response upon a state change of the latch. The system further includes synchronized clocking provided to the latch and to the observing circuit. For the latch, the clocking defines a window in time during which the latch is prevented from receiving data, and in a synchronized manner the clocking is enabling a response in the observing circuit. The clocking is synchronized in such a manner that the circuit is enabled for its response only inside the window when the latch is prevented from receiving data. The system may also have additional circuits that are respectively coupled to latches, with each the additional circuit and its respective latch receiving the synchronized clocking Responses of a plurality of circuits may be coupled in a configuration corresponding to a logical OR.09-15-2011
20110296421METHOD AND APPARATUS FOR EFFICIENT INTER-THREAD SYNCHRONIZATION FOR HELPER THREADS - A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.12-01-2011
20110296431METHOD AND APPARATUS FOR EFFICIENT HELPER THREAD STATE INITIALIZATION USING INTER-THREAD REGISTER COPY - This disclosure describes a method and system that may enable fast, hardware-assisted, producer-consumer style communication of values between threads. The method, in one aspect, uses a dedicated hardware buffer as an intermediary storage for transferring values from registers in one thread to registers in another thread. The method may provide a generic, programmable solution that can transfer any subset of register values between threads in any given order, where the source and target registers may or may not be correlated. The method also may allow for determinate access times, since it completely bypasses the memory hierarchy. Also, the method is designed to be lightweight, focusing on communication, and keeping synchronization facilities orthogonal to the communication mechanism. It may be used by a helper thread that performs data prefetching for an application thread, for example, to initialize the upward-exposed reads in the address computation slice of the helper thread code.12-01-2011
20120011348Matrix Multiplication Operations Using Pair-Wise Load and Splat Operations - Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.01-12-2012
20120060015Vector Loads with Multiple Vector Elements from a Same Cache Line in a Scattered Load Operation - Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.03-08-2012
20120060016Vector Loads from Scattered Memory Locations - Mechanisms for performing a scattered load operation are provided. With these mechanisms, a gather instruction is receive in a logic unit of a processor, the gather instruction specifying a plurality of addresses in a memory from which data is to be loaded into a target vector register of the processor. A plurality of separate load instructions for loading the data from the plurality of addresses in the memory are automatically generated within the logic unit. The plurality of separate load instructions are sent, from the logic unit, to one or more load/store units of the processor. The data corresponding to the plurality of addresses is gathered in a buffer of the processor. The logic unit then writes data stored in the buffer to the target vector register.03-08-2012
20120290816Optimized Scalar Promotion with Load and Splat SIMD Instructions - Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.11-15-2012
20130013863Hybrid Caching Techniques and Garbage Collection Using Hybrid Caching Techniques - Hybrid caching techniques and garbage collection using hybrid caching techniques are provided. A determination of a measure of a characteristic of a data object is performed, the characteristic being indicative of an access pattern associated with the data object. A selection of one caching structure, from a plurality of caching structures, is performed in which to store the data object based on the measure of the characteristic. Each individual caching structure in the plurality of caching structures stores data objects has a similar measure of the characteristic with regard to each of the other data objects in that individual caching structure. The data object is stored in the selected caching structure and at least one processing operation is performed on the data object stored in the selected caching structure.01-10-2013
20130073836FINE-GRAINED INSTRUCTION ENABLEMENT AT SUB-FUNCTION GRANULARITY - Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).03-21-2013
20130073838MULTI-ADDRESSABLE REGISTER FILES AND FORMAT CONVERSIONS ASSOCIATED THEREWITH - A multi-addressable register file is addressed by a plurality of types of instructions, including scalar, vector and vector-scalar extension instructions. It may be determined that data is to be translated from one format to another format. If so determined, a convert machine instruction is executed that obtains a single precision datum in a first representation in a first format from a first register; converts the single precision datum of the first representation in the first format to a converted single precision datum of a second representation in a second format; and places the converted single precision datum in a second register.03-21-2013
20130080745FINE-GRAINED INSTRUCTION ENABLEMENT AT SUB-FUNCTION GRANULARITY - Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).03-28-2013
20130086338LINKING CODE FOR AN ENHANCED APPLICATION BINARY INTERFACE (ABI) WITH DECODE TIME INSTRUCTION OPTIMIZATION - A code sequence made up multiple instructions and specifying an offset from a base address is identified in an object file. The offset from the base address corresponds to an offset location in a memory configured for storing an address of a variable or data. The identified code sequence is configured to perform a memory reference function or a memory address computation function. It is determined that the offset location is within a specified distance of the base address and that a replacement of the identified code sequence with a replacement code sequence will not alter program semantics. The identified code sequence in the object file is replaced with the replacement code sequence that includes a no-operation (NOP) instruction or having fewer instructions than the identified code sequence. Linked executable code is generated based on the object file and the linked executable code is emitted.04-04-2013
20130086361Scalable Decode-Time Instruction Sequence Optimization of Dependent Instructions - Producer-consumer instructions, comprising a first instruction and a second instruction in program order, are fetched requiring in-order execution, the second instruction is modified by the processor so that the first instruction and second instruction can be completed out-of-order, the modification comprising any one of extending an immediate field of the second instruction using immediate field information of the first instruction or providing a source location of the first instruction as an additional source location to source locations of the second instruction.04-04-2013
20130086362Managing a Register Cache Based on an Architected Computer Instruction Set Having Operand First-Use Information - A prefix instruction is executed and passes operands to a net instruction without storing the operands in an architected resource such that the execution of the next instruction uses the operands provided by the prefix instruction to perform an operation, the operands may be prefix instruction immediate field or a target register of the prefix instruction execution.04-04-2013
20130086363Computer Instructions for Activating and Deactivating Operands - An instruction set architecture (ISA) includes instructions for selectively indicating last-use architected operands having values that will not be accessed again, wherein architected operands are made active or inactive after an instruction specified last-use by an instruction, wherein the architected operands are made active by performing a write operation to an inactive operand, wherein the activation/deactivation may be performed by the instruction having the last-use of the operand or another (prefix) instruction.04-04-2013
20130086364Managing a Register Cache Based on an Architected Computer Instruction Set Having Operand Last-User Information - A multi-level register hierarchy is disclosed comprising a first level pool of registers for caching registers of a second level pool of registers in a system wherein programs can dynamically release and re-enable architected registers such that released architected registers need not be maintained by the processor, the processor accessing operands from the first level pool of registers, wherein a last-use instruction is identified as having a last use of an architected register before being released, the last-use architected register being released causes the multi-level register hierarchy to discard any correspondence of an entry to said last use architected register.04-04-2013
20130086365Exploiting an Architected List-Use Operand Indication in a Computer System Operand Resource Pool - A pool of available physical registers are provided for architected registers, wherein operations are performed that activate and deactivate selected architected registers, such that the deactivated selected architected registers need not retain values, and physical registers can be deallocated to the pool, wherein deallocation of physical registers is performed after a last-use by a designated last-use instruction, wherein the last-use information is provided either by the last-use instruction or a prefix instruction, wherein reads to deallocated architecture registers return an architected default value.04-04-2013
20130086367Tracking operand liveliness information in a computer system and performance function based on the liveliness information - Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.04-04-2013
20130086368Using Register Last Use Infomation to Perform Decode-Time Computer Instruction Optimization - Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register.04-04-2013
20130086548GENERATING COMPILED CODE THAT INDICATES REGISTER LIVENESS - Object code is generated from an internal representation that includes a plurality of source operands. The generating includes performing for each source operand in the internal representation determining whether a last use has occurred for the source operand. The determining includes accessing a data flow graph to determine whether all uses of a live range have been emitted. If it is determined that a last use has occurred for the source operand, an architected resource associated with the source operand is marked for last-use indication. A last-use indication is then generated for the architected resource. Instructions and the last-use indications are emitted into the object code.04-04-2013
20130086570LINKING CODE FOR AN ENHANCED APPLICATION BINARY INTERFACE (ABI) WITH DECODE TIME INSTRUCTION OPTIMIZATION - A code sequence made up multiple instructions and specifying an offset from a base address is identified in an object file. The offset from the base address corresponds to an offset location in a memory configured for storing an address of a variable or data. The identified code sequence is configured to perform a memory reference function or a memory address computation function. It is determined that the offset location is within a specified distance of the base address and that a replacement of the identified code sequence with a replacement code sequence will not alter program semantics. The identified code sequence in the object file is replaced with the replacement code sequence that includes a no-operation (NOP) instruction or having fewer instructions than the identified code sequence. Linked executable code is generated based on the object file and the linked executable code is emitted.04-04-2013
20130086581PRIVILEGE LEVEL AWARE PROCESSOR HARDWARE RESOURCE MANAGEMENT FACILITY - Multiple machine state registers are included in a processor core to permit distinction between use of hardware facilities by applications, supervisory threads and the hypervisor. All facilities are initially disabled by the hypervisor when a partition is initialized. When any access is made to a disabled facility, the hypervisor receives an indication of which facility was accessed and sets a corresponding hardware flag in the hypervisor's machine state register. When an application attempts to access a disabled facility, the supervisor managing the operating system image receives an indication of which facility was accessed and sets a corresponding hardware flag in the supervisor's machine state register. The multiple register implementation permits the supervisor to determine whether particular hardware facilities need to have their state saved when an application context swap occurs and the hypervisor can determine which hardware facilities need to have their state saved when a partition swap occurs.04-04-2013
20130086598GENERATING COMPILED CODE THAT INDICATES REGISTER LIVENESS - Object code is generated from an internal representation that includes a plurality of source operands. The generating includes performing for each source operand in the internal representation determining whether a last use has occurred for the source operand. The determining includes accessing a data flow graph to determine whether all uses of a live range have been emitted. If it is determined that a last use has occurred for the source operand, an architected resource associated with the source operand is marked for last-use indication. A last-use indication is then generated for the architected resource. Instructions and the last-use indications are emitted into the object code.04-04-2013
20130103932MULTI-ADDRESSABLE REGISTER FILES AND FORMAT CONVERSIONS ASSOCIATED THEREWITH - A multi-addressable register file is addressed by a plurality of types of instructions, including scalar, vector and vector-scalar extension instructions. It may be determined that data is to be translated from one format to another format. If so determined, a convert machine instruction is executed that obtains a single precision datum in a first representation in a first format from a first register; converts the single precision datum of the first representation in the first format to a converted single precision datum of a second representation in a second format; and places the converted single precision datum in a second register.04-25-2013
20130179736TICKET CONSOLIDATION - A method of ticket consolidation in computing environment may in one aspect analyze problem reports, determine whether problems reported by machines are caused by the same or substantially the same run-time configuration error or are occurring on the same physical server, and are within the given sensitivity time window, consolidate the problem tickets and increase the priority of the consolidated ticket.07-11-2013
20130212139MIXED PRECISION ESTIMATE INSTRUCTION COMPUTING NARROW PRECISION RESULT FOR WIDE PRECISION INPUTS - A technique is provided for performing a mixed precision estimate. A processing circuit receives an input of a first precision having a wide precision value. The processing circuit computes an output in an output exponent range corresponding to a narrow precision value based on the input having the wide precision value.08-15-2013
20130243325COMPARING SETS OF CHARACTER DATA HAVING TERMINATION CHARACTERS - Multiple sets of character data having termination characters are compared using parallel processing and without causing unwarranted exceptions. Each set of character data to be compared is loaded within one or more vector registers. In particular, in one embodiment, for each set of character data to be compared, an instruction is used that loads data in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. Further, an instruction is used to find the index of the first delimiter character, i.e., the first zero or null character, or the index of unequal characters. Using these instructions, a location of the end of one of the sets of data or a location of an unequal character is efficiently provided.09-19-2013
20130246699FINDING THE LENGTH OF A SET OF CHARACTER DATA HAVING A TERMINATION CHARACTER - The length of character data having a termination character is determined. The character data for which the length is to be determined is loaded, in parallel, within one or more vector registers. An instruction is used that loads data in a vector register to a specified boundary, and provides a way to determine the number of characters loaded, using, for instance, another instruction. Further, an instruction is used to find the index of the first termination character, e.g., the first zero or null character. This instruction searches the data in parallel for the termination character. By using these instructions, the length of the character data is determined using only one branch instruction.09-19-2013
20130246738INSTRUCTION TO LOAD DATA UP TO A SPECIFIED MEMORY BOUNDARY INDICATED BY THE INSTRUCTION - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary may be specified a number of ways, including, but not limited to, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, or a register based boundary.09-19-2013
20130246739COPYING CHARACTER DATA HAVING A TERMINATION CHARACTER FROM ONE MEMORY LOCATION TO ANOTHER - Copying characters of a set of terminated character data from one memory location to another memory location using parallel processing and without causing unwarranted exceptions. The character data to be copied is loaded within one or more vector registers. In particular, in one embodiment, an instruction (e.g., a Vector Load to block Boundary instruction) is used that loads data in parallel in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. To determine the number of characters loaded (a count), another instruction (e.g., a Load Count to Block Boundary instruction) is used. Further, an instruction (e.g., a Vector Find Element Not Equal instruction) is used to find the index of the first delimiter character, i.e., the first termination character, such as a zero or null character within the character data. This instruction checks a plurality of bytes of data in parallel.09-19-2013
20130246740INSTRUCTION TO LOAD DATA UP TO A DYNAMICALLY DETERMINED MEMORY BOUNDARY - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary is dynamically determined based on a specified type of boundary and one or more characteristics of the processor executing the instruction, such as cache line size or page size used by the processor.09-19-2013
20130246751VECTOR FIND ELEMENT NOT EQUAL INSTRUCTION - Processing of character data is facilitated. A Find Element Not Equal instruction is provided that compares data of multiple vectors for inequality and provides an indication of inequality, if inequality exists. An index associated with the unequal element is stored in a target vector register. Further, the same instruction, the Find Element Not Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.09-19-2013
20130246752VECTOR FIND ELEMENT EQUAL INSTRUCTION - Processing of character data is facilitated. A Find Element Equal instruction is provided that compares data of multiple vectors for equality and provides an indication of equality, if equality exists. An index associated with the equal element is stored in a target vector register. Further, the same instruction, the Find Element Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.09-19-2013
20130246754RUN-TIME INSTRUMENTATION INDIRECT SAMPLING BY ADDRESS - Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by address. An aspect of the invention includes reading sample-point addresses from a sample-point address array, and comparing, by a processor, the sample-point addresses to an address associated with an instruction from an instruction stream executing on the processor. A sample point is recognized upon execution of the instruction associated with the address matching one of the sample-point addresses. Run-time instrumentation information is obtained from the sample point. The run-time instrumentation information is stored in a run-time instrumentation program buffer as a reporting group.09-19-2013
20130246757VECTOR FIND ELEMENT EQUAL INSTRUCTION - Processing of character data is facilitated. A Find Element Equal instruction is provided that compares data of multiple vectors for equality and provides an indication of equality, if equality exists. An index associated with the equal element is stored in a target vector register. Further, the same instruction, the Find Element Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.09-19-2013
20130246759VECTOR FIND ELEMENT NOT EQUAL INSTRUCTION - Processing of character data is facilitated. A Find Element Not Equal instruction is provided that compares data of multiple vectors for inequality and provides an indication of inequality, if inequality exists. An index associated with the unequal element is stored in a target vector register. Further, the same instruction, the Find Element Not Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.09-19-2013
20130246762INSTRUCTION TO LOAD DATA UP TO A DYNAMICALLY DETERMINED MEMORY BOUNDARY - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary is dynamically determined based on a specified type of boundary and one or more characteristics of the processor executing the instruction, such as cache line size or page size used by the processor.09-19-2013
20130246763INSTRUCTION TO COMPUTE THE DISTANCE TO A SPECIFIED MEMORY BOUNDARY - A Load Count to Block Boundary instruction is provided that provides a distance from a specified memory address to a specified memory boundary. The memory boundary is a boundary that is not to be crossed in loading data. The boundary may be specified a number of ways, including, but not limited to, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, or a register based boundary; or it may be dynamically determined.09-19-2013
20130246764INSTRUCTION TO LOAD DATA UP TO A SPECIFIED MEMORY BOUNDARY INDICATED BY THE INSTRUCTION - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary may be specified a number of ways, including, but not limited to, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, or a register based boundary.09-19-2013
20130246766TRANSFORMING NON-CONTIGUOUS INSTRUCTION SPECIFIERS TO CONTIGUOUS INSTRUCTION SPECIFIERS - Emulation of instructions that include non-contiguous specifiers is facilitated. A non-contiguous specifier specifies a resource of an instruction, such as a register, using multiple fields of the instruction. For example, multiple fields of the instruction (e.g., two fields) include bits that together designate a particular register to be used by the instruction. Non-contiguous specifiers of instructions defined in one computer system architecture are transformed to contiguous specifiers usable by instructions defined in another computer system architecture. The instructions defined in the another computer system architecture emulate the instructions defined for the one computer system architecture.09-19-2013
20130246767INSTRUCTION TO COMPUTE THE DISTANCE TO A SPECIFIED MEMORY BOUNDARY - A Load Count to Block Boundary instruction is provided that provides a distance from a specified memory address to a specified memory boundary. The memory boundary is a boundary that is not to be crossed in loading data. The boundary may be specified a number of ways, including, but not limited to, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, or a register based boundary; or it may be dynamically determined.09-19-2013
20130246768TRANSFORMING NON-CONTIGUOUS INSTRUCTION SPECIFIERS TO CONTIGUOUS INSTRUCTION SPECIFIERS - Emulation of instructions that include non-contiguous specifiers is facilitated. A non-contiguous specifier specifies a resource of an instruction, such as a register, using multiple fields of the instruction. For example, multiple fields of the instruction (e.g., two fields) include bits that together designate a particular register to be used by the instruction. Non-contiguous specifiers of instructions defined in one computer system architecture are transformed to contiguous specifiers usable by instructions defined in another computer system architecture. The instructions defined in the another computer system architecture emulate the instructions defined for the one computer system architecture.09-19-2013
20130246772RUN-TIME INSTRUMENTATION INDIRECT SAMPLING BY INSTRUCTION OPERATION CODE - Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by instruction operation code. An aspect of the invention includes a method for implementing run-time instrumentation indirect sampling by instruction operation code. The method includes reading sample-point instruction operation codes from a sample-point instruction array, and comparing, by a processor, the sample-point instruction operation codes to an operation code of an instruction from an instruction stream executing on the processor. The method also includes recognizing a sample point upon execution of the instruction with the operation code matching one of the sample-point instruction operation codes. The run-time instrumentation information is obtained from the sample point. The method further includes storing the run-time instrumentation information in a run-time instrumentation program buffer as a reporting group.09-19-2013
20130246774RUN-TIME INSTRUMENTATION SAMPLING IN TRANSACTIONAL-EXECUTION MODE - Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by address. An aspect of the invention includes a method for implementing run-time instrumentation indirect sampling by address. The method includes reading sample-point addresses from a sample-point address array, and comparing, by a processor, the sample-point addresses to an address associated with an instruction from an instruction stream executing on the processor. The method further includes recognizing a sample point upon execution of the instruction associated with the address matching one of the sample-point addresses. Run-time instrumentation information is obtained from the sample point. The method also includes storing the run-time instrumentation information in a run-time instrumentation program buffer as a reporting group.09-19-2013
20130246775RUN-TIME INSTRUMENTATION SAMPLING IN TRANSACTIONAL-EXECUTION MODE - Embodiments of the invention relate to implementing run-time instrumentation sampling in transactional-execution mode. An aspect of the invention includes a method for implementing run-time instrumentation sampling in transactional-execution mode. The method includes determining, by a processor, that the processor is configured to execute instructions of an instruction stream in a transactional-execution mode, the instructions defining a transaction. The method also includes interlocking completion of storage operations of the instructions to prevent instruction-directed storage until completion of the transaction. The method further includes recognizing a sample point during execution of the instructions while in the transactional-execution mode. The method additionally includes run-time-instrumentation-directed storing, upon successful completion of the transaction, run-time instrumentation information obtained at the sample point.09-19-2013
20130247009RUN-TIME INSTRUMENTATION INDIRECT SAMPLING BY INSTRUCTION OPERATION CODE - Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by instruction operation code. An aspect of the invention includes reading sample-point instruction operation codes from a sample-point instruction array, and comparing, by a processor, the sample-point instruction operation codes to an operation code of an instruction from an instruction stream executing on the processor. A sample point is recognized upon execution of the instruction with the operation code matching one of the sample-point instruction operation codes. The run-time instrumentation information is obtained from the sample point. The run-time instrumentation information is stored in a run-time instrumentation program buffer as a reporting group.09-19-2013
20130247010RUN-TIME INSTRUMENTATION SAMPLING IN TRANSACTIONAL-EXECUTION MODE - Embodiments of the invention relate to implementing run-time instrumentation sampling in transactional-execution mode. An aspect of the invention includes run time instrumentation sampling in transactional execution mode. The method includes determining, by a processor, that the processor is configured to execute instructions of an instruction stream in a transactional-execution mode, the instructions defining a transaction. Completion of storage operations of the instructions is interlocked to prevent instruction-directed storage until completion of the transaction. A sample point is recognized during execution of the instructions while in the transactional-execution mode. Run-time-instrumentation-directed storing is performed, upon successful completion of the transaction, run-time instrumentation information obtained at the sample point.09-19-2013
20130247011TRANSFORMATION OF A PROGRAM-EVENT-RECORDING EVENT INTO A RUN-TIME INSTRUMENTATION EVENT - Embodiments of the invention relate to transforming a program-event-recording event into a run-time instrumentation event. An aspect of the invention includes enabling run-time instrumentation for collecting instrumentation information of an instruction stream executing on a processor. Detecting is performed, by the processor, of a program-event-recording (PER) event, the PER event associated with the instruction stream executing on the processor. A PER event record is written to a collection buffer as a run-time instrumentation event based on detecting the PER event, the PER event record identifying the PER event.09-19-2013
20130247012TRANSFORMATION OF A PROGRAM-EVENT-RECORDING EVENT INTO A RUN-TIME INSTRUMENTATION EVENT - Embodiments of the invention relate to transforming a program-event-recording event into a run-time instrumentation event. An aspect of the invention includes a method for transforming a program-event-recording event into a run-time instrumentation event. The method includes enabling run-time instrumentation for collecting instrumentation information of an instruction stream executing on a processor. The method also includes detecting, by the processor, a program-event-recording (PER) event, the PER event associated with the instruction stream executing on the processor. The method further includes writing a PER event record to a collection buffer as a run-time instrumentation event based on detecting the PER event, the PER event record identifying the PER event.09-19-2013
20130262815HYBRID ADDRESS TRANSLATION - Embodiments of the invention relate to hybrid address translation. An aspect of the invention includes receiving a first address, the first address referencing a location in a first address space. The computer searches a segment lookaside buffer (SLB) for a SLB entry corresponding to the first address; the SLB entry comprising a type field and an address field and determines whether a value of the type field in the SLB entry indicates a hashed page table (HPT) search or a radix tree search. Based on determining that the value of the type field indicates the HPT search, a HPT is searched to determine a second address, the second address comprising a translation of the first address into a second address space; and based on determining that the value of the type field indicates the radix tree search, a radix tree is searched to determine the second address.10-03-2013
20130262817HYBRID ADDRESS TRANSLATION - Embodiments of the invention relate to hybrid address translation. An aspect of the invention includes receiving a first address, the first address referencing a location in a first address space. The computer searches a segment lookaside buffer (SLB) for a SLB entry corresponding to the first address; the SLB entry comprising a type field and an address field and determines whether a value of the type field in the SLB entry indicates a hashed page table (HPT) search or a radix tree search. Based on determining that the value of the type field indicates the HPT search, a HPT is searched to determine a second address, the second address comprising a translation of the first address into a second address space; and based on determining that the value of the type field indicates the radix tree search, a radix tree is searched to determine the second address.10-03-2013
20130262821PERFORMING PREDECODE-TIME OPTIMIZED INSTRUCTIONS IN CONJUNCTION WITH PREDECODE TIME OPTIMIZED INSTRUCTION SEQUENCE CACHING - A method for performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching. The method includes receiving a first instruction of an instruction sequence and a second instruction of the instruction sequence and determining if the first instruction and the second instruction can be optimized. In response to the determining that the first instruction and second instruction can be optimized, the method includes, preforming a pre-decode optimization on the instruction sequence and generating a new second instruction, wherein the new second instruction is not dependent on a target operand of the first instruction and storing a pre-decoded first instruction and a pre-decoded new second instruction in an instruction cache. In response to determining that the first instruction and second instruction can not be optimized, the method includes, storing the pre-decoded first instruction and a pre-decoded second instruction in the instruction cache.10-03-2013
20130262822CACHING OPTIMIZED INTERNAL INSTRUCTIONS IN LOOP BUFFER - Embodiments of the invention relate to a computer system for storing an internal instruction loop in a loop buffer. The computer system includes a loop buffer and a processor. The computer system is configured to perform a method including fetching instructions from memory to generate an internal instruction to be executed, detecting a beginning of a first instruction loop in the instructions, determining that a first internal instruction loop corresponding to the first instruction loop is not stored in the loop buffer, fetching the first instruction loop, optimizing one or more instructions corresponding to the first instruction loop to generate a first optimized internal instruction loop, and storing the first optimized internal instruction loop in the loop buffer based on the determination that the first internal instruction loop is not stored in the loop buffer.10-03-2013
20130262823INSTRUCTION MERGING OPTIMIZATION - A computer system for optimizing instructions includes a processor including an instruction execution unit configured to execute instructions and an instruction optimization unit configured to optimize instructions and memory to store machine instructions to be executed by the instruction execution unit. The computer system is configured to perform a method including analyzing machine instructions from among a stream of instructions to be executed by the instruction execution unit, the machine instructions including a memory load instruction and a data processing instruction to perform a data processing function based on the memory load instruction, identifying the machine instructions as being eligible for optimization, merging the machine instructions into a single optimized internal instruction, and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction.10-03-2013
20130262829DECODE TIME INSTRUCTION OPTIMIZATION FOR LOAD RESERVE AND STORE CONDITIONAL SEQUENCES - A technique is provided for replacing an atomic sequence. A processing circuit receives the atomic sequence. The processing circuit detects the atomic sequence. The processing circuit generates an internal atomic operation to replace the atomic sequence.10-03-2013
20130262839INSTRUCTION MERGING OPTIMIZATION - A computer system for optimizing instructions is configured to identify two or more machine instructions as being eligible for optimization, to merge the two or more machine instructions into a single optimized internal instruction that is configured to perform functions of the two or more machine instructions, and to execute the single optimized internal instruction to perform the functions of the two or more machine instructions. Being eligible includes determining that the two or more machine instructions include a first instruction specifying a first target register and a second instruction specifying the first target register as a source register and a target register. The second instruction is a next sequential instruction of the first instruction in program order, wherein the first instruction specifies a first function to be performed, and the second instruction specifies a second function to be performed.10-03-2013
20130262840INSTRUCTION MERGING OPTIMIZATION - A computer-implemented method includes determining that two or more instructions of an instruction stream are eligible for optimization. Eligibility is based on a first instruction specifying a first target register and a second instruction specifying the first target register as a source register and a target register. The method includes merging the two or more machine instructions into a single optimized internal instruction that is configured to perform first and second functions of two or more machine instructions employing operands specified by the two or more machine instructions. The single optimized internal instruction specifies the first target register only as a single target register and the single optimized internal instruction specifies the first and second functions to be performed. The method includes executing the single optimized internal instruction to perform the first and second functions of the two or more instructions.10-03-2013
20130262841INSTRUCTION MERGING OPTIMIZATION - A computer-implemented method includes determining that two or more instructions of an instruction stream are eligible for optimization, where the two or more instructions include a memory load instruction and a data processing instruction to process data based on the memory load instruction. The method includes merging, by a processor, the two or more instructions into a single optimized internal instruction and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction.10-03-2013
20130263145METHOD AND APPARATUS FOR EFFICIENT INTER-THREAD SYNCHRONIZATION FOR HELPER THREADS - A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.10-03-2013
20130263153OPTIMIZING SUBROUTINE CALLS BASED ON ARCHITECTURE LEVEL OF CALLED SUBROUTINE - A technique is provided for generating stubs. A processing circuit receives a call to a called function. The processing circuit retrieves a called function property of the called function. The processing circuit generates a stub for the called function based on the called function property.10-03-2013
20130339651MANAGING PAGE TABLE ENTRIES - Embodiments relate to managing page table entries in a processing system. A first page table entry (PTE) of a page table for translating virtual addresses to main storage addresses is identified. The page table includes a second page table entry contiguous with the second page table entry. It is determined whether the first PTE may be joined with the second PTE, based on the respective pages of main storage being contiguous. A marker is set in the page table for indicating that the main storage pages identified by the first PTE and second PTEs are contiguous.12-19-2013
20130339652Radix Table Translation of Memory - Embodiments relate to managing memory page tables in a processing system. A request to access a desired block of memory is received. The request includes an effective address that includes an effective segment identifier (ESID) and a linear address, the linear address including a most significant portion and a byte index. An entry in a buffer that includes the ESID of the effective address is located. Based on the entry including a radix page table pointer (RPTP), performing: using the RPTP to locate a translation table of a hierarchy of translation tables, using the located translation table to translate the most significant portion of the linear address to obtain an address of a block of memory, and based on the obtained address, performing the requested access to the desired block of memory.12-19-2013
20130339653Managing Accessing Page Table Entries - A system for accessing memory locations includes translating, by a processor, a virtual address to locate a first page table entry (PTE) in a page table. The first PTE includes a marker and an address of a page of main storage. It is determined whether a marker is set in the first PTE. The system identifies a large page size of a large page associated with the first PTE based on determining that the marker is set in the first PTE. The large page consists of contiguous pages of main storage. An origin address of the large page is determined based on determining that the marker is set in the first PTE. The virtual address is used to index into the large page at the origin address to access main storage.12-19-2013
20130339654Radix Table Translation of Memory - A method includes receiving a request to access a desired block of memory. The request includes an effective address that includes an effective segment identifier (ESID) and a linear address, the linear address comprising a most significant portion and a byte index. Locating an entry, in a buffer, the entry including the ESID of the effective address. Based on the entry including a radix page table pointer (RPTP), performing, using the RPTP to locate a translation table of a hierarchy of translation tables, using the located translation table to translate the most significant portion of the linear address to obtain an address of a block of memory, and based on the obtained address, performing the requested access to the desired block of memory.12-19-2013
20130339658MANAGING PAGE TABLE ENTRIES - A method includes identifying, by a processor, a first page table entry (PTE) of a page table for translating virtual addresses to main storage addresses, the page table comprising a second page table entry contiguous with the second page table entry, determining with the processor whether the first PTE may be joined with the second PTE, the determining based on the respective pages of main storage being contiguous, and setting a marker in the page table for indicating that the main storage pages of identified by the first PTE and second PTEs are contiguous.12-19-2013
20130339659MANAGING ACCESSING PAGE TABLE ENTRIES - A method for accessing memory locations includes translating, by a processor, a virtual address to locate a first page table entry (PTE) in a page table. The first PTE includes a marker and an address of a page of main storage. It is determined, by the processor, whether a marker is set in the first PTE. A large page size of a large page associated with the first PTE is identified based on determining that the marker is set in the first PTE. The large page is made up of contiguous pages of main storage. An origin address of the large page is determined based on determining that the marker is set in the first PTE. The virtual address is used to index into the large page at the origin address to access main storage.12-19-2013
20140025927REDUCING REGISTER READ PORTS FOR REGISTER PAIRS - Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. It is determined if a pairing indicator associated with the pair of registers has a first value or a second value. The first value indicates that the wide operand is stored in a wide register, and the second value indicates that the wide operand is not stored in the wide register. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value, the wide operand is read from the pair of registers. An operation is performed using the wide operand.01-23-2014
20140025928PREDICTING REGISTER PAIRS - Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. The executing of the instruction includes determining whether a pairing indicator associated with the pair of registers has a first value, a second value or a third value. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value the wide operand is read from the pair of registers. Based on the pairing indicator having the third value, the wide operand is speculatively read from a predetermined register. The predetermined register consists of the wide register or the pair of registers.01-23-2014
20140025929MANAGING REGISTER PAIRING - Embodiments relate to reducing a number of read ports for register pairs. An aspect includes maintaining an active pairing indicator that is configured to have a first value or a second value. The first value indicates that the wide operand is stored in a wide register. The second value indicates that the wide operand is not stored in the wide register. The operand is read from either the wide register or a pair of registers based on the active pairing indicator. The active pairing indicator and the values of the set of wide registers are stored to a storage based on a request to store a register pairing status. A saved pairing indicator and saved values of the set of wide registers is loaded from the storage respectively into an active pairing register and wide registers.01-23-2014
20140047216Scalable Decode-Time Instruction Sequence Optimization of Dependent Instructions - Producer-consumer instructions, comprising a first instruction and a second instruction in program order, are fetched requiring in-order execution, the second instruction is modified by the processor so that the first instruction and second instruction can be completed out-of-order, the modification comprising any one of extending an immediate field of the second instruction using immediate field information of the first instruction or providing a source location of the first instruction as an additional source location to source locations of the second instruction.02-13-2014
20140047219Managing A Register Cache Based on an Architected Computer Instruction Set having Operand Last-User Information - A multi-level register hierarchy is disclosed comprising a first level pool of registers for caching registers of a second level pool of registers in a system wherein programs can dynamically release and re-enable architected registers such that released architected registers need not be maintained by the processor, the processor accessing operands from the first level pool of registers, wherein a last-use instruction is identified as having a last use of an architected register before being released, the last-use architected register being released causes the multi-level register hierarchy to discard any correspondence of an entry to said last use architected register.02-13-2014
20140053154PRIVILEGE LEVEL AWARE PROCESSOR HARDWARE RESOURCE MANAGEMENT FACILITY - Multiple machine state registers are included in a processor core to permit distinction between use of hardware facilities by applications, supervisory threads and the hypervisor. All facilities are initially disabled by the hypervisor when a partition is initialized. When any access is made to a disabled facility, the hypervisor receives an indication of which facility was accessed and sets a corresponding hardware flag in the hypervisor's machine state register. When an application attempts to access a disabled facility, the supervisor managing the operating system image receives an indication of which facility was accessed and sets a corresponding hardware flag in the supervisor's machine state register. The multiple register implementation permits the supervisor to determine whether particular hardware facilities need to have their state saved when an application context swap occurs and the hypervisor can determine which hardware facilities need to have their state saved when a partition swap occurs.02-20-2014
20140089636CACHING OPTIMIZED INTERNAL INSTRUCTIONS IN LOOP BUFFER - Embodiments of the invention relate to a computer system for storing an internal instruction loop in a loop buffer. The computer system includes a loop buffer and a processor. The computer system is configured to perform a method including fetching instructions from memory to generate an internal instruction to be executed, detecting a beginning of a first instruction loop in the instructions, determining that a first internal instruction loop corresponding to the first instruction loop is not stored in the loop buffer, fetching the first instruction loop, optimizing one or more instructions corresponding to the first instruction loop to generate a first optimized internal instruction loop, and storing the first optimized internal instruction loop in the loop buffer based on the determination that the first internal instruction loop is not stored in the loop buffer.03-27-2014
20140095833Prefix Computer Instruction for Compatibly Extending Instruction Functionality - A prefix instruction is executed and passes operands to a next instruction without storing the operands in an architected resource such that the execution of the next instruction uses the operands provided by the prefix instruction to perform an operation, the operands may be prefix instruction immediate field or a target register of the prefix instruction execution.04-03-2014
20140095835PERFORMING PREDECODE-TIME OPTIMIZED INSTRUCTIONS IN CONJUNCTION WITH PREDECODE TIME OPTIMIZED INSTRUCTION SEQUENCE CACHING - A method for performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching. The method includes receiving a first instruction of an instruction sequence and a second instruction of the instruction sequence and determining if the first instruction and the second instruction can be optimized. In response to the determining that the first instruction and second instruction can be optimized, the method includes, preforming a pre-decode optimization on the instruction sequence and generating a new second instruction, wherein the new second instruction is not dependent on a target operand of the first instruction and storing a pre-decoded first instruction and a pre-decoded new second instruction in an instruction cache. In response to determining that the first instruction and second instruction can not be optimized, the method includes, storing the pre-decoded first instruction and a pre-decoded second instruction in the instruction cache.04-03-2014
20140095848Tracking Operand Liveliness Information in a Computer System and Performing Function Based on the Liveliness Information - Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.04-03-2014
20140101216MIXED PRECISION ESTIMATE INSTRUCTION COMPUTING NARROW PRECISION RESULT FOR WIDE PRECISION INPUTS - A technique is provided for performing a mixed precision estimate. A processing circuit receives an input of a first precision having a wide precision value. The processing circuit computes an output in an output exponent range corresponding to a narrow precision value based on the input having the wide precision value.04-10-2014
20140101359ASYMMETRIC CO-EXISTENT ADDRESS TRANSLATION STRUCTURE FORMATS - An address translation capability is provided in which translation structures of different types are used to translate memory addresses from one format to another format. Multiple translation structure formats (e.g., multiple page table formats, such as hash page tables and hierarchical page tables) are concurrently supported in a system configuration. This facilitates provision of guest access in virtualized operating systems, and/or the mixing of translation formats to better match the data access patterns being translated.04-10-2014
20140101360ADJUNCT COMPONENT TO PROVIDE FULL VIRTUALIZATION USING PARAVIRTUALIZED HYPERVISORS - A system configuration is provided with a paravirtualizing hypervisor that supports different types of guests, including those that use a single level of translation and those that use a nested level of translation. When an address translation fault occurs during a nested level of translation, an indication of the fault is received by an adjunct component. The adjunct component addresses the address translation fault, at least in part, on behalf of the guest.04-10-2014
20140101361SYSTEM SUPPORTING MULTIPLE PARTITIONS WITH DIFFERING TRANSLATION FORMATS - A system configuration is provided with multiple partitions that supports different types of address translation structure formats. The configuration may include partitions that use a single level of translation and those that use a nested level of translation. Further, differing types of translation structures may be used. The different partitions are supported by a single hypervisor.04-10-2014
20140101362SUPPORTING MULTIPLE TYPES OF GUESTS BY A HYPERVISOR - A system configuration is provided that includes multiple partitions that have differing translation mechanisms associated therewith. For instance, one partition has associated therewith a single level translation mechanism for translating guest virtual addresses to host physical addresses, and another partition has a nested level translation mechanism for translating guest virtual addresses to host physical addresses. The different translation mechanisms and partitions are supported by a single hypervisor. Although the hypervisor is a paravirtualized hypervisor, it provides full virtualization for those partitions using nested level translations.04-10-2014
20140101363SELECTABLE ADDRESS TRANSLATION MECHANISMS WITHIN A PARTITION - An address translation capability is provided in which translation structures of different types are used to translate memory addresses from one format to another format. Multiple translation structure formats (e.g., multiple page table formats, such as hash page tables and hierarchical page tables) are concurrently supported in a system configuration. For a system configuration that includes partitions, the translation mechanism to be used for a partition or a portion thereof is selectable and may be different for different partitions or even portions within a partition.04-10-2014
20140101364SELECTABLE ADDRESS TRANSLATION MECHANISMS WITHIN A PARTITION - An address translation capability is provided in which translation structures of different types are used to translate memory addresses from one format to another format. Multiple translation structure formats (e.g., multiple page table formats, such as hash page tables and hierarchical page tables) are concurrently supported in a system configuration. For a system configuration that includes partitions, the translation mechanism to be used for a partition or a portion thereof is selectable and may be different for different partitions or even portions within a partition.04-10-2014
20140101365SUPPORTING MULTIPLE TYPES OF GUESTS BY A HYPERVISOR - A system configuration is provided that includes multiple partitions that have differing translation mechanisms associated therewith. For instance, one partition has associated therewith a single level translation mechanism for translating guest virtual addresses to host physical addresses, and another partition has a nested level translation mechanism for translating guest virtual addresses to host physical addresses. The different translation mechanisms and partitions are supported by a single hypervisor. Although the hypervisor is a paravirtualized hypervisor, it provides full virtualization for those partitions using nested level translations.04-10-2014
20140101402SYSTEM SUPPORTING MULTIPLE PARTITIONS WITH DIFFERING TRANSLATION FORMATS - A system configuration is provided with multiple partitions that supports different types of address translation structure formats. The configuration may include partitions that use a single level of translation and those that use a nested level of translation. Further, differing types of translation structures may be used. The different partitions are supported by a single hypervisor.04-10-2014
20140101404SELECTABLE ADDRESS TRANSLATION MECHANISMS - An address translation capability is provided in which translation structures of different types are used to translate memory addresses from one format to another format. Multiple translation structure formats (e.g., multiple page table formats, such as hash page tables and hierarchical page tables) are concurrently supported in a system configuration, and the use of a particular translation structure format in translating an address is selectable.04-10-2014
20140101406ADJUNCT COMPONENT TO PROVIDE FULL VIRTUALIZATION USING PARAVIRTUALIZED HYPERVISORS - A system configuration is provided with a paravirtualizing hypervisor that supports different types of guests, including those that use a single level of translation and those that use a nested level of translation. When an address translation fault occurs during a nested level of translation, an indication of the fault is received by an adjunct component. The adjunct component addresses the address translation fault, at least in part, on behalf of the guest.04-10-2014
20140101407SELECTABLE ADDRESS TRANSLATION MECHANISMS - An address translation capability is provided in which translation structures of different types are used to translate memory addresses from one format to another format. Multiple translation structure formats (e.g., multiple page table formats, such as hash page tables and hierarchical page tables) are concurrently supported in a system configuration, and the use of a particular translation structure format in translating an address is selectable.04-10-2014
20140101408ASYMMETRIC CO-EXISTENT ADDRESS TRANSLATION STRUCTURE FORMATS - An address translation capability is provided in which translation structures of different types are used to translate memory addresses from one format to another format. Multiple translation structure formats (e.g., multiple page table formats, such as hash page tables and hierarchical page tables) are concurrently supported in a system configuration. This facilitates provision of guest access in virtualized operating systems, and/or the mixing of translation formats to better match the data access patterns being translated.04-10-2014
20140101677OPTIMIZING SUBROUTINE CALLS BASED ON ARCHITECTURE LEVEL OF CALLED SUBROUTINE - A technique is provided for generating stubs. A processing circuit receives a call to a called function. The processing circuit retrieves a called function property of the called function. The processing circuit generates a stub for the called function based on the called function property.04-10-2014
20140108768Computer instructions for Activating and Deactivating Operands - An instruction set architecture (ISA) includes instructions for selectively indicating last-use architected operands having values that will not be accessed again, wherein architected operands are made active or inactive after an instruction specified last-use by an instruction, wherein the architected operands are made active by performing a write operation to an inactive operand, wherein the activation/deactivation may be performed by the instruction having the last-use of the operand or another (prefix) instruction.04-17-2014
20140108772Exploiting an Architected Last-Use Operand Indication in a System Operand Resource Pool - A pool of available physical registers are provided for architected registers, wherein operations are performed that activate and deactivate selected architected registers, such that the deactivated selected architected registers need not retain values, and physical registers can be deallocated to the pool, wherein deallocation of physical registers is performed after a last-use by a designated last-use instruction, wherein the last-use information is provided either by the last-use instruction or a prefix instruction, wherein reads to deallocated architecture registers return an archtiected default value.04-17-2014
20140164739Modify and Execute Next Sequential Instruction Facility and Instructions Therefore - An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA)06-12-2014
20140164740Branch-Free Condition Evaluation - A compare instruction of an instruction set architecture (ISA), when executed tests one or more operands for an instruction defined condition. The result of the test is stored as an operand, with leading zeros, in a general register of the ISA. The general register is identified (explicitly or implicitly) by the compare instruction. Thus, the result of the test can be manipulated by standard register operations of the computer system. In a superscalar processor, no special “condition code” renaming is required, as the standard register renaming takes care of out-of-order processing of the conditions.06-12-2014
20140164741Modify and Execute Next Sequential Instruction Facility and Instructions Therefore - An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA)06-12-2014
20140164744Tracking Multiple Conditions in a General Purpose Register and Instruction Therefor - An operate-and-insert instruction of a program, when executed performs an operation based on one or more operands, results of an instruction specified test of the operation performed are stored in an instruction specified location of an instruction specified general register. The instruction specified general register is therefore able to hold results of many operate-and-insert instructions. The program can then use non-branch type instructions to evaluate conditions saved in the register, thus avoiding the performance penalty of branch instructions.06-12-2014
20140164746Tracking Multiple Conditions in a General Purpose Register and Instruction Therefor - An operate-and-insert instruction of a program, when executed performs an operation based on one or more operands, results of an instruction specified test of the operation performed are stored in an instruction specified location of an instruction specified general register. The instruction specified general register is therefore able to hold results of many operate-and-insert instructions. The program can then use non-branch type instructions to evaluate conditions saved in the register, thus avoiding the performance penalty of branch instructions.06-12-2014
20140164747Branch-Free Condition Evaluation - A compare instruction of an instruction set architecture (ISA), when executed tests one or more operands for an instruction defined condition. The result of the test is stored as an operand, with leading zeros, in a general register of the ISA. The general register is identified (explicitly or implicitly) by the compare instruction. Thus, the result of the test can be manipulated by standard register operations of the computer system. In a superscalar processor, no special “condition code” renaming is required, as the standard register renaming takes care of out-of-order processing of the conditions.06-12-2014
20140208086VECTOR EXCEPTION CODE - Vector exception handling is facilitated. A vector instruction is executed that operates on one or more elements of a vector register. When an exception is encountered during execution of the instruction, a vector exception code is provided that indicates a position within the vector register that caused the exception. The vector exception code also includes a reason for the exception.07-24-2014
20140281430EXECUTION OF CONDITION-BASED INSTRUCTIONS - Execution of condition-based instructions is facilitated. A condition-based instruction is obtained, as well as a confidence level associated with the instruction. The confidence level is checked, and based on the confidence level being a first value, a predicted operation of the instruction, which is based on a predictor, is unconditionally performed. Further, based on the confidence level being a second value, a specified operation of the instruction, which is based on a determined condition, is conditionally performed.09-18-2014
20150039858REDUCING REGISTER READ PORTS FOR REGISTER PAIRS - Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. It is determined if a pairing indicator associated with the pair of registers has a first value or a second value. The first value indicates that the wide operand is stored in a wide register, and the second value indicates that the wide operand is not stored in the wide register. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value, the wide operand is read from the pair of registers. An operation is performed using the wide operand.02-05-2015
20150082008FORMING INSTRUCTION GROUPS BASED ON DECODE TIME INSTRUCTION OPTIMIZATION - Instructions are grouped into instruction groups based on optimizations that may be performed. An instruction is obtained, and a determination is made as to whether the instruction is to be included in a current instruction group or another instruction group. This determination is made based on whether the instruction is a candidate for optimization, such as decode time instruction optimization. If it is determined that the instruction is to be included in another group, then the other group is formed to include the instruction.03-19-2015
20150088929COMPARING SETS OF CHARACTER DATA HAVING TERMINATION CHARACTERS - Multiple sets of character data having termination characters are compared using parallel processing and without causing unwarranted exceptions. Each set of character data to be compared is loaded within one or more vector registers. In particular, in one embodiment, for each set of character data to be compared, an instruction is used that loads data in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. Further, an instruction is used to find the index of the first delimiter character, i.e., the first zero or null character, or the index of unequal characters. Using these instructions, a location of the end of one of the sets of data or a location of an unequal character is efficiently provided.03-26-2015
20150089152MANAGING HIGH-CONFLICT CACHE LINES IN TRANSACTIONAL MEMORY COMPUTING ENVIRONMENTS - Cache lines in a computing environment with transactional memory are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. When a transaction accessing a cache line in full-line coherency mode results in a transactional abort, the cache line may be placed in sub-line coherency mode if the cache line is a high-conflict cache line. The cache line may be associated with a counter in a conflict address detection table that is incremented whenever a transaction conflict is detected for the cache line. The cache line may be a high-conflict cache line when the counter satisfies a high-conflict criterion, such as reaching a threshold value. The cache line may be returned to full-line coherency mode when a reset criterion is satisfied.03-26-2015
20150089153IDENTIFYING HIGH-CONFLICT CACHE LINES IN TRANSACTIONAL MEMORY COMPUTING ENVIRONMENTS - Cache lines in a computing environment with transactional memory are configurable with a coherency mode and are associated with a high-conflict indicator. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. A cache line is placed in sub-line coherency mode based on examining the high-conflict indicator. A transaction accessing a memory address in a cache line in sub-line coherency mode marks only the sub-cache line portion associated with the memory address as transactionally accessed. The high-conflict indicator may be included in a set of descriptive bits associated with the cache line. A copy of the high-conflict indicator for a cache line in a first cache may be updated with the high-conflict indicator for the cache line in a second cache.03-26-2015
20150089154MANAGING HIGH-COHERENCE-MISS CACHE LINES IN MULTI-PROCESSOR COMPUTING ENVIRONMENTS - Cache lines in a multi-processor computing environment are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. A high-coherence-miss cache line may be placed in sub-line coherency mode. A cache line may be associated with a counter in a coherence miss detection table that is incremented whenever an access of the cache line results in a coherence request. The cache line may be a high-coherence-miss cache line when the counter satisfies a high-coherence-miss criterion, such as reaching a threshold value. The cache line may be returned to full-line coherency mode when a reset criterion is satisfied.03-26-2015
20150089155CENTRALIZED MANAGEMENT OF HIGH-CONTENTION CACHE LINES IN MULTI-PROCESSOR COMPUTING ENVIRONMENTS - Cache lines in a multi-processor computing environment are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. Communications detected on a coherence interconnect may indicate that a cache line is associated with performance-reducing events. A high-contention cache line may be placed in sub-line coherency mode. Caches accessing the cache line are notified that the cache line is in sub-line coherency mode. The cache line may be associated with a counter in a centralized detection table that is incremented based on detecting the communications. The cache line may be a high-contention cache line when the counter satisfies a high-contention criterion, such as reaching a threshold value. The cache line may be returned to full-line coherency mode when a reset criterion is satisfied.03-26-2015
20150089159MULTI-GRANULAR CACHE MANAGEMENT IN MULTI-PROCESSOR COMPUTING ENVIRONMENTS - Cache lines in a multi-processor computing environment are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. Each cache is associated with a directory having a number of directory entries and with a side table having a smaller number of entries. The directory entry for a cache line associates the cache line with a tag and a set of full-line descriptive bits. Creating a side table entry for the cache line places the cache line in sub-line coherency mode. The side table entry associates each of the sub-cache line portions of the cache line with a set of sub-line descriptive bits. Removing the side table entry may return the cache line to full-line coherency mode.03-26-2015
20150089193PREDICTIVE FETCHING AND DECODING FOR SELECTED RETURN INSTRUCTIONS - Predictive fetching and decoding for selected instructions. A determination is made as to whether an instruction to be executed in a pipelined processor is a selected return instruction, the pipelined processor having a plurality of stages including an execute stage. Based on the instruction being the selected return instruction, obtaining from a data structure a predicted return address, the predicted return address being an address of an instruction to which it is predicted that processing is to be returned. Additionally, based on the instruction being the selected return instruction, operating state for the instruction at the predicted return address is predicted. The instruction is fetched at the predicted return address, prior to the selected return instruction reaching the execute stage, and decoding of the fetched instruction is initiated based on the predicted operating state.03-26-2015
20150089194PREDICTIVE FETCHING AND DECODING FOR SELECTED INSTRUCTIONS - Predictive fetching and decoding for selected instructions (e.g., operating system instructions, hypervisor instructions or other such instructions). A determination is made that a selected instruction, such as a system call instruction, an asynchronous interrupt, a return from system call instruction or return from asynchronous interrupt, is to be executed. Based on determining that such an instruction is to be executed, a predicted address is determined for the selected instruction, which is the address to which processing transfers in order to provide the requested services. Then, fetching of instructions beginning at the predicted address prior to execution of the selected instruction is commenced. Further, speculative state relating to a selected instruction, including, for instance, an indication of the privilege level of the selected instruction or instructions executed on behalf of the selected instruction, is predicted and maintained.03-26-2015
20150089208PREDICTOR DATA STRUCTURE FOR USE IN PIPELINED PROCESSING - A predictor data structure is used for pipelined processing by a pipelined processor. The predictor data structure includes a predicted address to be used in return from execution of a selected instruction, and a predicted operating state associated with the predicted address. Based on determining a selected return instruction is to be executed, the predicted address to which processing is to be returned is obtained from the predictor data structure. Further, based on determining the selected return instruction is to be executed, a transitional operating state to be entered based on the predicted operating state stored in the predictor data structure is predicted, wherein at least one of the predicted address and the predicted transitional operating state are to be used to validate execution of the selected return instruction.03-26-2015

Patent applications by Michael K. Gschwind, Chappaqua, NY US

Website © 2015 Advameg, Inc.