MIPS Technologies, Inc.

Patent application number	Title	Published
MIPS Technologies, Inc. Patent applications
20140281413	Superforwarding Processor - Methods and systems that allow the processor to effectively and efficiently reduce or eliminate the latency associated with instructions that copy the value of one register to another register. A processor includes a superforwarding table, a superforwarding logic block, and a computation engine. The superforwarding table stores an entry, wherein the entry has a valid bit, a key field, and a forward field. The superforwarding logic block determines which register contains the information needed for an instruction. The computation engine executes instructions.	09-18-2014
20140258697	Apparatus and Method for Transitive Instruction Scheduling - A processor includes a multiple stage pipeline with a scheduler with a wakeup block and select logic. The wakeup block is configured to wake, in a first cycle, all instructions dependent upon a first selected instruction to form a wake instruction set. In a second cycle, the wakeup block wakes instructions dependent upon the wake instruction set to augment the wake instruction set. The select logic selects instructions from the wake instruction set based upon program order.	09-11-2014
20140258694	Apparatus and Method for Branch Instruction Bonding - A processor is configured to identify a branch instruction immediately followed by an architectural delay slot. A single bonded instruction comprising the branch instruction immediately followed by the architectural delay slot is created. The single bonded instruction is loaded into an instruction buffer.	09-11-2014
20140258667	Apparatus and Method for Memory Operation Bonding - A processor is configured to evaluate memory operation bonding criteria to selectively identify memory operation bonding opportunities within a memory access plan. Memory operations are combined in response to the memory operation bonding opportunities to form a revised memory access plan with accelerated memory access.	09-11-2014
20140258624	Apparatus and Method for Operating a Processor with an Operation Cache - A processor includes a computation engine to produce a computed value for a set of operands. A cache stores the set of operands and the computed value. The cache is configured to selectively identify a match and a miss for a new set of operands. In the event of a match the computed value is supplied by the cache and a computation engine operation is aborted. In the event of a miss a new computed value for the new set of operands is computed by the computation engine and is stored in the cache.	09-11-2014
20140250289	Branch Target Buffer With Efficient Return Prediction Capability - Improved branch target buffers (BTBs) and methods of processing data in a microprocessor with a pipeline are provided. According to various embodiments, a BTB is provided that includes a non-return buffer, a return buffer, and a multiplexer. The non-return buffer is designed to store a multiple of non-return entries. Each non-return entry corresponds to a non-return type instruction. The return buffer is designed to store a plurality of return entries that each correspond to a return type instruction. Additionally, the return buffer may generate a control signal. The multiplexer also generates a control signal and outputs either data from the non-return buffer or data from a return prediction stack (RPS). Whether the multiplexer returns data from the non-return buffer or the RPS depends on the control signal.	09-04-2014
20140245317	Resource Sharing Using Process Delay - Methods and systems that reduce the number of instance of a shared resource needed for a processor to perform an operation and/or execute a process without impacting function are provided. a method of processing in a processor is provided. Aspects include determining that an operation to be performed by the processor will require the use of a shared resource. A command can be issued to cause a second operation to not use the shared resources N cycles later. The shared resource can then be used for a first aspect of the operation at cycle X and then used for a second aspect of the operation at cycle X+N. The second operation may be rescheduled according to embodiments.	08-28-2014
20140244987	Precision Exception Signaling for Multiple Data Architecture - Methods and systems that perform one or more operations on a plurality of elements using a multiple data processing element processor are provided. An input vector comprising a plurality of elements is received by a processor. The processor determines if performing a first operation on a first element will cause an exception and if so, writes an indication of the exception caused by the first operation to a first portion of an output vector stored in an output register. A second operation can be performed on a second element with the result of the second operation being written to a second portion of the output vector stored in the output register.	08-28-2014
20140244977	Deferred Saving of Registers in a Shared Register Pool for a Multithreaded Microprocessor - A method of sharing a plurality of registers in a register pool among a plurality of microprocessor threads begins by allocating a first set of registers in the register pool to a first thread, the first thread executing a first instruction using the first set of registers in the register pool. The first thread is descheduled without saving values stored in the first set of registers. A second thread is scheduled to execute a second instruction using registers allocated in the register pool. Finally, the first thread is rescheduled, the first thread reusing the allocated first set of registers.	08-28-2014
20140244933	Way Lookahead - Methods and systems that identify and power up ways for future instructions are provided. A processor includes an n-way set associative cache and an instruction fetch unit. The n-way set associative cache is configured to store instructions. The instruction fetch unit is in communication with the n-way set associative cache and is configured to power up a first way, where a first indication is associated with an instruction and indicates the way where a future instruction is located and where the future instruction is two or more instructions ahead of the current instruction.	08-28-2014
20140068138	Embedded Processor with Virtualized Security Controls Using Guest Identifications, a Common Kernel Address Space and Operational Permissions - A method includes assigning unique guest identifications to different guests, specifying an address region and permissions for the different guests and controlling a guest jump from one physical memory segment to a second physical memory segment through operational permissions defined in a root memory management unit that supports guest isolation and protection.	03-06-2014
20140006470	Carry Look-Ahead Adder with Generate Bits and Propagate Bits Used for Column Sums	01-02-2014
20130332703	Shared Register Pool For A Multithreaded Microprocessor - A method of sharing a plurality of registers in a shared register pool among a plurality of microprocessor threads begins with a determination that a first instruction to be executed by a microprocessor in a first microprocessor thread requires a first logical register. Next a determination is made that a second instruction to be executed by the microprocessor in a second microprocessor thread requires a second logical register. A first physical register in the shared register pool is allocated to the first microprocessor thread for execution of the first instruction and the first logical register is mapped to the first physical register. A second physical register in the shared register pool is allocated to the second microprocessor thread for execution of the second instruction. Finally, the second logical register is mapped to the second physical register.	12-12-2013
20130263124	Apparatus and Method for Guest and Root Register Sharing in a Virtual Machine - A computer readable storage medium includes executable instructions to define a processor with guest mode control registers supporting guest mode operating behavior defined by guest context specified in the guest mode control registers. The guest mode control registers include a control bit to specify a guest access blocked register state and a shared register state. Root mode control registers support root mode operating behavior defined by root context specified in the root mode control registers. The root mode control registers include control bits to enable replicated register state access and shared register state access. The guest context and the root context support virtualization of hardware resources such that multiple operating systems supporting multiple applications are executed by the hardware resources.	10-03-2013
20130191426	Merged Floating Point Operation Using a Modebit - A first floating-point operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floating-point operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floating-point operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output.	07-25-2013
20130159781	System For Compression Of Fixed Width Values In A Processor Hardware Trace - A method of tracing processor instructions includes forming a compressed trace stream with a dynamic unit width indicator and a value block. The dynamic unit width indicator includes an address/data width indicator qualified by a unit indicator. The unit value block has a width that is a function of the address/data width indicator and the unit indicator.	06-20-2013
20130159667	Vector Size Agnostic Single Instruction Multiple Data (SIMD) Processor Architecture - A computer has a memory adapted to store a first plurality of instructions encoded with a first vector size and a second plurality of instructions encoded with a second vector size. An execution unit executes the first plurality of instructions and the second plurality of instructions by processing vector units in a uniform manner regardless of vector size.	06-20-2013
20130159578	System and method for Automatic Hardware Interrupt Handling - A processing system is provided consisting of an interrupt pin, multiple registers, a stack pointer, and an automatic interrupt system. The multiple registers store a number of processor states values. When the system detects an interrupt on the interrupt pin the system prepares to enter an exception mode where the automatic interrupt system causes an interrupt vector to be fetched, the stack pointer to be updated, and the processor state values to be read in parallel from the registers and stored in memory locations based on the updated stack pointer, prior to the execution of an interrupt service routine. A method for automatic hardware interrupt handling is also presented.	06-20-2013
20130132760	Apparatus and Method for Achieving Glitch-Free Clock Domain Crossing Signals - A computer implemented method includes identifying in an original circuit output signals that drive domain crossing logic separating a first clock domain from a second clock domain. A revised circuit is formed with a register attached to the domain crossing logic. The register receives an output signal and a synchronization signal that precludes the output signal from transitioning at selected clock cycle intervals.	05-23-2013
20130132702	Processor with Kernel Mode Access to User Space Virtual Addresses - A computer includes a memory and a processor connected to the memory. The processor includes memory segment configuration registers to store defined memory address segments and defined memory address segment attributes such that the processor operates in accordance with the defined memory address segments and defined memory address segment attributes to allow kernel mode access to user space virtual addresses for enhanced kernel mode memory capacity.	05-23-2013
20130067284	Apparatus and Method for Low Overhead Correlation of Multi-Processor Trace Information - A method of coordinating trace information in a multiprocessor system includes receiving processor trace information from a set of processors. The processor trace information from each processor includes a processor identity and a coherence indicator that demarks selective shared memory transactions. Coherence manager trace information is generated for each of the processors. The coherence manager trace information for each processor includes trace metrics and a coherence indicator.	03-14-2013
20130061060	Systems and Methods for Controlling the Use of Processing Algorithms, and Applications Thereof - Embodiments provide systems and methods for controlling the use of processing algorithms, and applications thereof. In an embodiment, authorization to use an algorithm is validated in a system having a processor capable of executing user defined instructions, by executing a user defined instruction that writes a first value to a first storage of a user defined instruction block, uses the first value to transform a second value located in a second storage of the user defined instruction block, and compares the transformed second value to a third value located in a third storage. Use of the algorithm is permitted only if the comparison of the transformed second value to the third value indicates that use of the algorithm is authorized. In another embodiment, authorization to use an at least partially decrypted algorithm is validated via a key for enablement.	03-07-2013
20130031314	Support for Multiple Coherence Domains - A number of coherence domains are maintained among the multitude of processing cores disposed in a microprocessor. A cache coherency manager defines the coherency relationships such that coherence traffic flows only among the processing cores that are defined as having a coherency relationship. The data defining the coherency relationships between the processing cores is optionally stored in a programmable register. For each source of a coherent request, the processing core targets of the request are identified in the programmable register. In response to a coherent request, an intervention message is forwarded only to the cores that are defined to be in the same coherence domain as the requesting core. If a cache hit occurs in response to a coherent read request and the coherence state of the cache line resulting in the hit satisfies a condition, the requested data is made available to the requesting core from that cache line.	01-31-2013
20120331265	Apparatus and Method for Accelerated Hardware Page Table Walk - A method of walking page tables includes comparing a virtual address to a plurality of virtual address bit segments to identify a match. Each virtual address bit segment is associated with a page table level that has a page table base address. A designated page table base address is received in response to the match. The page table walk starts at the designated page table, thereby skipping over earlier page tables.	12-27-2012
20120324164	Programmable Memory Address - A method includes storing defined memory address segments and defined memory address segment attributes for a processor. The processor is operated in accordance with the defined memory address segments and defined memory address segment attributes.	12-20-2012
20120323552	Apparatus and Method for Hardware Initiation of Emulated Instructions - A method of emulating an instruction includes identifying a fault instruction. The fault instruction is saved in a register. The fault instruction is associated with a software emulated operation. The software emulated operation is initiated with an access to the fault instruction in the register.	12-20-2012
20120290780	Multithreaded Operation of A Microprocessor Cache - A method of fetching data from a cache begins by preparing to fetch a first set of cache ways for a first data word of a first cache line a using a first thread. Next, in parallel, a second set cache ways for a first data word of a second cache line is prepared to be fetched using a second thread, and data associated with each cache way of the first set of cache ways are fetched using the first thread. Also performed in parallel, data associated with each cache way of the second set of cache ways is fetched using the second thread and a third set of cache ways for a second data word of the first cache line is prepared to be fetched using the first thread based on a selected cache way, the selected cache way selected from the first set of cache ways.	11-15-2012
20120221838	SOFTWARE PROGRAMMABLE HARDWARE STATE MACHINES - The present invention provides software programmable hardware state machines to detect a cause of an error in a processor and prevent the error from occurring. A processor core is provided that includes an execution unit, a programmable mask register and a buffer that stores values representing instructions dispatched to the execution unit. The processor core also includes control logic to determine whether there is a match between a sequence in the mask register and a sequence in the buffer and, upon detecting a match, to generate control signals to perform a desired action. The desired action prevents an unwanted change from occurring to the architectural state of the processor. The processor core further comprises a programmable fix register. In an embodiment, the control logic generates the control signals based on control bits stored in the fix register.	08-30-2012
20120082167	Method and Apparatus for Predicting Characteristics of Incoming Data Packets to Enable Speculative Processing to Reduce Processor Latency - A system for processing data packets in a data packet network has at least one input port for receiving data packets, at least one output port for sending out data packets, a processor for processing packet data, and a packet predictor for predicting a future packet based on a received packet, such that at least some processing for the predicted packet may be accomplished before the predicted packet actually arrives at the system. The system is used in preferred embodiments in Internet routers.	04-05-2012
20120036380	Clock Ratio Controller For Dynamic Voltage and Frequency Scaled Digital Systems, and Applications Thereof - The present invention provides a clock ratio controller for dynamic voltage and frequency scaled digital systems, and applications thereof. In an embodiment, a digital system is provided that includes a first digital circuit that operates at a first rate determined by a first clock signal and a second digital circuit that operates at a second rate determined by a second clock signal. The first digital circuit is coupled to the second digital circuit by a bus that is used for communications between the first digital circuit and the second digital circuit. A clock ratio controller is used to adjust the frequency of the first clock signal and/or the second clock signal in response to a power management signal without causing a loss of synchronization between the first digital circuit and the second digital circuit.	02-09-2012
20120030392	System and Method for Automatic Hardware Interrupt Handling - A processing system is provided consisting of an interrupt pin, multiple registers, a stack pointer, and an automatic interrupt system. The multiple registers store a number of processor states values. When the system detects an interrupt on the interrupt pin the system prepares to enter an exception mode where the automatic interrupt system causes an interrupt vector to be fetched, the stack pointer to be updated, and the processor state values to be read in parallel from the registers and stored in memory locations based on the updated stack pointer, prior to the execution of an interrupt service routine. A method for automatic hardware interrupt handling is also presented	02-02-2012
20110153945	Apparatus and Method for Controlling the Exclusivity Mode of a Level-Two Cache - A method of controlling the exclusivity mode of a level-two cache includes generating level-two cache exclusivity control information at a processor in response to an exclusivity mode indicator, and utilizing the level-two cache exclusivity control information to configure the exclusivity mode of the level-two cache.	06-23-2011
20110138349	Automated Digital Circuit Design Tool That Reduces or Eliminates Adverse Timing Constraints Due To An Inherent Clock Signal Skew, and Applications Thereof - The present invention provides an automated digital circuit design tool that reduces or eliminates adverse timing constraints due to an inherent clock signal skew, and applications thereof In an embodiment, an automated design tool according to the invention generates a clocking system that includes a clock signal generator, control logic, enable logic, and at least one clock gater. The clock signal generator generates a clock signal that is distributed to various logic blocks of the digital circuit using a buffered clock tree. The enable logic receives input values from the control logic and provides a control signal to the clock gater. When enabled, the clock gater allows a clock signal to pass through to multiple registers. An early clock signal is provided to register(s) in the control logic, which allows for an increased clock frequency while still meeting timing constraints.	06-09-2011
20110099353	System and Method for Extracting Fields from Packets Having Fields Spread Over More Than One Register - Systems and methods that allow for extracting a field from data stored in a pair of registers using two instructions. A first instruction extracts any part of the field from a first register designated as a first source register, and executes a second instruction extracting any part of the field from a second general register designated as a second source register. The second instruction inserts any extracted field parts in a result register.	04-28-2011
20110055497	Alignment and Ordering of Vector Elements for Single Instruction Multiple Data Processing - The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.	03-03-2011
20110055488	HORIZONTALLY-SHARED CACHE VICTIMS IN MULTIPLE CORE PROCESSORS - A processor includes multiple processor core units, each including a processor core and a cache memory. Victim lines evicted from a first processor core unit's cache may be stored in another processor core unit's cache, rather than written back to system memory. If the victim line is later requested by the first processor core unit, the victim line is retrieved from the other processor core unit's cache. The processor has low latency data transfers between processor core units. The processor transfers victim lines directly between processor core units' caches or utilizes a victim cache to temporarily store victim lines while searching for their destinations. The processor evaluates cache priority rules to determine whether victim lines are discarded, written back to system memory, or stored in other processor core units' caches. Cache priority rules can be based on cache coherency data, load balancing schemes, and architectural characteristics of the processor.	03-03-2011
20110040956	Symmetric Multiprocessor Operating System for Execution On Non-Independent Lightweight Thread Contexts - A multiprocessing system is disclosed. The system includes a multithreading microprocessor, including a plurality of thread contexts (TCs), each comprising a first control indicator for controlling whether the TC is exempt from servicing interrupt requests to an exception domain for the plurality of TCs, and a virtual processing element (VPE), comprising the exception domain, configured to receive the interrupt requests, wherein the interrupt requests are non-specific to the plurality of TCs, wherein the VPE is configured to select a non-exempt one of the plurality of TCs to service each of the interrupt requests, the VPE further comprising a second control indicator for controlling whether the VPE is enabled to select one of the plurality of TCs to service the interrupt requests. The system also includes a multiprocessor operating system (OS), configured to initially set the second control indicator to enable the VPE to service the interrupts, and further configured to schedule execution of threads on the plurality of TCs, wherein each of the threads is configured to individually disable itself from servicing the interrupts by setting the first control indicator, rather than by clearing the second control indicator.	02-17-2011
20100312991	Microprocessor with Compact Instruction Set Architecture - A re-encoded instruction set architecture (ISA) provides smaller bit-width instructions or a combination of smaller and larger bit-width instructions to improve instruction execution efficiency and reduce code footprint. The ISA can be re-encoded from a legacy ISA having larger bit-width instructions, and the re-encoded ISA can maintain assembly-level compatibility with the ISA from which it is derived. In addition, the re-encoded ISA can have new and different types of additional instructions, including instructions with encoded arguments determined by statistical analysis and instructions that have the effect of combinations of instructions.	12-09-2010
20100306513	Processor Core and Method for Managing Program Counter Redirection in an Out-of-Order Processor Pipeline - A processor core and method for managing program counter redirection in an out-of-order processor pipeline. In one embodiment, the pipeline of the processor core includes a front-end instruction fetch portion, a back-end instruction execution portion, and pipeline control logic. Operation of the instruction fetch portion is decoupled from operation of the instruction execution portion. Following detection of a control transfer misprediction, operation of the instruction fetch portion is halted and instructions residing in the instruction fetch portion are invalidated. When the instruction associated with the misprediction reaches a selected pipeline stage, instructions residing in the instruction execution portion of the pipeline are invalidated and the flow of instructions from the instruction fetch portion to the instruction execution portion of the processor pipeline is restarted. A mispredict instruction identification checker and instruction identification tags are used to determine if a control transfer instruction is permitted to redirect instruction fetching.	12-02-2010
20100287359	VARIABLE REGISTER AND IMMEDIATE FIELD ENCODING IN AN INSTRUCTION SET ARCHITECTURE - A method and apparatus provide means for compressing instruction code size. An Instruction Set Architecture (ISA) encodes instructions compact, usual or extended bit lengths. Commonly used instructions are encoded having both compact and usual bit lengths, with compact or usual bit length instructions chosen based on power, performance or code size requirements. Instructions of the ISA can be used in both privileged and non-privileged operating modes of a microprocessor. The instruction encodings can be used interchangeably in software applications. Instructions from the ISA may be executed on any programmable device enabled for the ISA, including a single instruction set architecture processor or a multi-instruction set architecture processor.	11-11-2010
20100199054	System and Method for Improving Memory Transfer - System and method for performing a high-bandwidth memory copy. Memory transfer instructions allow for copying of data from a first memory location to a second memory location without the use of load and store word instructions thereby achieving a high-bandwidth copy. In one embodiment, the method includes the steps of (1) decoding a destination address from a first memory transfer instruction, (2) storing the destination address in a register in the bus interface unit, (3) decoding a source address from a second memory transfer instruction, and (4) copying the contents of a memory location specified by the source memory address to a memory location specified by the contents of the register. Other methods and a microprocessor system are also presented.	08-05-2010
20100115244	MULTITHREADING MICROPROCESSOR WITH OPTIMIZED THREAD SCHEDULER FOR INCREASING PIPELINE UTILIZATION EFFICIENCY - A multithreading processor for concurrently executing multiple threads is provided. The processor includes an execution pipeline and a thread scheduler that dispatches instructions of the threads to the execution pipeline. The execution pipeline execution pipeline is configured for generating a thread context (TC) flush indicator associated with a thread context when one or more instructions of the thread context would stall in the execution pipeline. One or more instructions in the pipeline of the thread context associated with the thread context flush signal can be flushed or nullified.	05-06-2010
20100115243	Apparatus, Method and Instruction for Initiation of Concurrent Instruction Streams in a Multithreading Microprocessor - A fork instruction for execution on a multithreaded microprocessor and occupying a single instruction issue slot is disclosed. The fork instruction, executing in a parent thread, includes a first operand specifying the initial instruction address of a new thread and a second operand. The microprocessor executes the fork instruction by allocating context for the new thread, copying the first operand to a program counter of the new thread context, copying the second operand to a register of the new thread context, and scheduling the new thread for execution. If no new thread context is free for allocation, the microprocessor raises an exception to the fork instruction. The fork instruction is efficient because it does not copy the parent thread general purpose registers to the new thread. The second operand is typically used as a pointer to a data structure in memory containing initial general purpose register set values for the new thread.	05-06-2010
20100103938	Context Sharing Between A Streaming Processing Unit (SPU) and A Packet Management Unit (PMU) In A Packet Processing Environment - A context-selection mechanism is provided for selecting a best context from a pool of contexts for processing a data packet. The context selection mechanism comprises, an interface for communicating with a multi-streaming processor; circuitry for computing input data into a result value according to logic rule and for selecting a context based on the computed value and a loading mechanism for preloading the packet information into the selected context for subsequent processing. The computation of the input data functions to enable identification and selection of a best context for processing a data packet according to the logic rule at the instant time such that a multitude of subsequent context selections over a period of time acts to balance load pressure on functional units housed within the multi-streaming processor and required for packet processing. In preferred aspects, programmable singular or multiple predictive rules of logic are utilized in the selection process.	04-29-2010
20100070257	Methods, Systems, and Computer Program Products for Evaluating Electrical Circuits From Information Stored in Simulation Dump Files - Disclosed are methods, systems, and computer program products for evaluating performance aspects of electrical circuits, and particularly digital logic circuits. An exemplary method comprises obtaining access to a simulation dump file comprising state indications of the values of a plurality of signals of an electrical circuit at a plurality of simulation time points, and receiving an evaluation task that defines an output based on one or more input signals, with each input signal being a signal for which state indications are provided in the simulation dump file. The method further comprises generating, from the simulation dump file, one or more state representations for the input signals of the evaluation task, with each state representation being representative of the state of an input signal over a period of simulation time, and generating values of the output of the evaluation task at a plurality of simulation time points from the state representations.	03-18-2010
20100049953	DATA CACHE RECEIVE FLOP BYPASS - A microprocessor includes an N-way cache and a logic block that selectively enables and disables the N-way cache for at least one clock cycle if a first register load instructions and a second register load instruction, following the first register load instruction, are detected as pointing to the same index line in which the requested data is stored. The logic block further provides a disabling signal to the N-way cache for at least one clock cycle if the first and second instructions are detected as pointing to the same cache way.	02-25-2010
20100049912	DATA CACHE WAY PREDICTION - A microprocessor includes one or more N-way caches and a way prediction logic that selectively enables and disables the cache ways so as to reduce the power consumption. The way prediction logic receives an address and predicts in which one of the cache ways the data associated with the address is likely to be stored. The way prediction logic causes an enabling signal to be supplied only to the way predicted to contain the requested data. The remaining (N−1) of the cache ways do not receive the enabling signal. The power consumed by the cache is thus significantly reduced.	02-25-2010
20100031005	Instruction Encoding For System Register Bit Set And Clear - An instruction encoding architecture is provided for a microprocessor to allow atomic modification of privileged architecture registers. The instructions include an opcode that designates to the microprocessor that the instructions are to execute in privileged (kernel) state only, and that the instructions are to communicate with privileged control registers, a field for designating which of a plurality of privileged architecture registers is to be modified, a field for designating which bit fields within the designated privileged architecture register is to be modified, and a field to designate whether the whether the designated bit fields are to be set or cleared. The instruction encoding allows a single instruction to atomically set or clear bit fields within privileged architecture registers, without reading the privileged architecture registers into a general purpose register. In addition, the instruction encoding allows a programmer to specify whether the previous content of a privileged architecture register is to be saved to a general purpose register during the atomic modification.	02-04-2010
20100011166	Data Cache Virtual Hint Way Prediction, and Applications Thereof - A virtual hint based data cache way prediction scheme, and applications thereof. In an embodiment, a processor retrieves data from a data cache based on a virtual hint value or an alias way prediction value and forwards the data to dependent instructions before a physical address for the data is available. After the physical address is available, the physical address is compared to a physical address tag value for the forwarded data to verify that the forwarded data is the correct data. If the forwarded data is the correct data, a hit signal is generated. If the forwarded data is not the correct data, a miss signal is generated. Any instructions that operate on incorrect data are invalidated and/or replayed.	01-14-2010
20100005247	Method and Apparatus for Global Ordering to Insure Latency Independent Coherence - A method and apparatus is described for insuring coherency between memories in a multi-agent system where the agents are interconnected by one or more fabrics. A global arbiter is used to segment coherency into three phases: request; snoop; and response, and to apply global ordering to the requests. A bus interface having request, snoop, and response logic is provided for each agent. A bus interface having request, snoop and response logic is provided for the global arbiter, and a bus interface is provided to couple the global arbiter to each type of fabric it is responsible for. Global ordering and arbitration logic tags incoming requests from the multiple agents and insures that snoops are responded to according to the global order, without regard to latency differences in the fabrics.	01-07-2010
20090327649	Three-Tiered Translation Lookaside Buffer Hierarchy in a Multithreading Microprocessor - A three-tiered TLB architecture in a multithreading processor that concurrently executes multiple instruction threads is provided. A macro-TLB caches address translation information for memory pages for all the threads. A micro-TLB caches the translation information for a subset of the memory pages cached in the macro-TLB. A respective nano-TLB for each of the threads caches translation information only for the respective thread. The nano-TLBs also include replacement information to indicate which entries in the nano-TLB/micro-TLB hold recently used translation information for the respective thread. Based on the replacement information, recently used information is copied to the nano-TLB if evicted from the micro-TLB.	12-31-2009
20090313457	System and Method for Extracting Fields from Packets Having Fields Spread Over More Than One Register - Systems and methods that allow for extracting a field from data stored in a pair of registers using two instructions. A first instruction extracts any part of the field from a first register designated as a first source register, and executes a second instruction extracting any part of the field from a second general register designated as a second source register. The second instruction inserts any extracted field parts in a result register.	12-17-2009
20090282220	Microprocessor with Compact Instruction Set Architecture - A re-encoded instruction set architecture (ISA) provides smaller bit-width instructions or a combination of smaller and larger bit-width instructions to improve instruction execution efficiency and reduce code footprint. The ISA can be re-encoded from a legacy ISA having larger bit-width instructions and can be used to unify one or more ISA extensions such as application specific ASEs. The re-encoded ISA maintains assembly-level compatibility with the ISA from which it is derived. In addition, the re-encoded ISA can have new and different types of additional instructions.	11-12-2009
20090271592	Apparatus For Storing Instructions In A Multithreading Microprocessor - A circuit for selecting one of N requesters in a round-robin fashion is disclosed. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the requestors is selected next. The first addend is an N-bit vector where each bit is false if the corresponding requester is requesting access to a shared resource. The second addend is a 1-hot vector indicating the last selected requestor. A multithreading microprocessor dispatch scheduler employs the circuit for N concurrent threads each thread having one of P priorities. The dispatch scheduler generates P N-bit 1-hot round-robin bit vectors, and each thread's priority is used to select the appropriate round-robin bit from P vectors for combination with the thread's priority and an issuable bit to create a dispatch level used to select a thread for instruction dispatching.	10-29-2009
20090249351	Round-Robin Apparatus and Instruction Dispatch Scheduler Employing Same For Use In Multithreading Microprocessor - An apparatus for selecting one of N requesters of a shared resource in a round-robin fashion is disclosed. One or more of the N requestors may be disabled from being selected in a selection cycle. The apparatus includes a first input that receives a first value specifying which of the N requestors was last selected. A second input receives a second value specifying which of the N requestors is enabled to be selected. A barrel incrementer, coupled to receive the first and second inputs, 1-bit left-rotatively increments the second value by the first value to generate a sum. Combinational logic, coupled to the barrel incrementer, generates a third value specifying which of the N requestors is selected next.	10-01-2009
20090249046	APPARATUS AND METHOD FOR LOW OVERHEAD CORRELATION OF MULTI-PROCESSOR TRACE INFORMATION - A method of coordinating trace information in a multiprocessor system includes receiving processor trace information from a set of processors. The processor trace information from each processor includes a processor identity and a coherence indicator that demarks selective shared memory transactions. Coherence manager trace information is generated for each of the processors. The coherence manager trace information for each processor includes trace metrics and a coherence indicator.	10-01-2009
20090249045	APPARATUS AND METHOD FOR CONDENSING TRACE INFORMATION IN A MULTI-PROCESSOR SYSTEM - A computer readable storage medium includes executable instructions to characterize a coherency controller. The executable instructions define ports to receive processor trace information from a set of processors. The processor trace information from each processor includes a processor identity and a condensed coherence indicator. Circuitry produces a trace stream with trace metrics and condensed coherence indicators.	10-01-2009
20090249039	Providing Extended Precision in SIMD Vector Arithmetic Operations - The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.	10-01-2009
20090248988	MECHANISM FOR MAINTAINING CONSISTENCY OF DATA WRITTEN BY IO DEVICES - A multi-core microprocessor includes, in part, a cache coherence manager that maintains coherence among the multitude of microprocessor cores, and an I/O coherence unit that maintains coherent traffic between the I/O devices and the multitude of processing cores of the microprocessor. The I/O coherence unit stalls non-coherent I/O write requests until it receives acknowledgement that all pending coherent I/O write requests issued prior to the non-coherence I/O write requests have been made visible to the processing cores. The I/O coherence unit ensures that MMIO read responses are not delivered to the processing cores until after all previous I/O write requests are made visible to the processing cores. Deadlock conditions are prevented by limiting MMIO requests in such a way that they can never block I/O write requests from completing.	10-01-2009
20090198986	Configurable Instruction Sequence Generation - A configurable instruction set architecture is provided whereby a single virtual instruction may be used to generate a sequence of instructions. Dynamic parameter substitution may be used to substitute parameters specified by a virtual instruction into instructions within a virtual instruction sequence.	08-06-2009
20090193177	Virtual Processor Based Security For On-Chip Memory, and Applications Thereof - A processor-based method, system and apparatus to comprise a method, system and apparatus to access a memory location in an on-chip memory based on a virtual processing element identification associated with an instruction. The system comprises multiple virtual processing elements, an access list and a comparator coupled to the memory and the access list. In response to an instruction from a virtual processing element to access a memory location in the memory, the comparator compares a first virtual processing identification associated with the instruction to a second virtual processing identification stored in the access list and grants access to the virtual processing element to read from or write to the memory location if the first virtual processing element identification is equal to the second virtual processing element identification. The data in the memory is allocated and de-allocated by software. In one embodiment, the access list is instantiated in hardware and cannot be read from or written to by software. A virtual processing element comprises multiple hardware thread contexts with each thread context being associated with a distinct register file.	07-30-2009
20090164733	APPARATUS AND METHOD FOR CONTROLLING THE EXCLUSIVITY MODE OF A LEVEL-TWO CACHE - A method of controlling the exclusivity mode of a level-two cache includes generating level-two cache exclusivity control information at a processor in response to an exclusivity mode indicator, and utilizing the level-two cache exclusivity control information to configure the exclusivity mode of the level-two cache.	06-25-2009
20090158078	Clock ratio controller for dynamic voltage and frequency scaled digital systems, and applications thereof - The present invention provides a clock ratio controller for dynamic voltage and frequency scaled digital systems, and applications thereof. In an embodiment, a digital system is provided that includes a first digital circuit that operates at a first rate determined by a first clock signal and a second digital circuit that operates at a second rate determined by a second clock signal. The first digital circuit is coupled to the second digital circuit by a bus that is used for communications between the first digital circuit and the second digital circuit. A clock ratio controller is used to adjust the frequency of the first clock signal and/or the second clock signal in response to a power management signal without causing a loss of synchronization between the first digital circuit and the second digital circuit.	06-18-2009
20090157981	COHERENT INSTRUCTION CACHE UTILIZING CACHE-OP EXECUTION RESOURCES - A multiprocessor system maintains cache coherence among processors in a coherent domain. Within the coherent domain, a first processor can receive a command to perform a cache maintenance operation. The first processor can determine whether the cache maintenance operation is a coherent operation. For coherent operations, the first processor sends a coherent request message for distribution to other processors in the coherent domain and can cancel execution of the cache maintenance operation pending receipt of intervention messages corresponding to the coherent request. The intervention messages can reflect a global ordering of coherence traffic in the multiprocessor system and can include instructions for maintaining a data cache and an instruction cache of the first processor. Cache maintenance operations that are determined to be non-coherent can be executed at the first processor without sending the coherent request.	06-18-2009
20090132841	Processor Accessing A Scratch Pad On-Demand To Reduce Power Consumption - The present invention provides processing systems, apparatuses, and methods that access a scratch pad on-demand to reduce power consumption. In an embodiment, an instruction fetch unit initiates an instruction fetch. When a scratch pad is enabled, an instruction is retrieved from the scratch pad in parallel with a translation of a virtual address to a physical address. If the physical address is associated with the scratch pad, the retrieved instruction is provided to an execution unit. Otherwise, the scratch pad is disabled to reduce power consumption and the instruction fetch is re-initiated. When the scratch pad is disabled, an instruction is retrieved from another instruction source, such as an instruction cache, in parallel with the translation of the virtual address to the physical address. If the physical address is associated with the scratch pad, the scratch pad is enabled and the instruction fetch is re-initiated.	05-21-2009
20090125660	Interrupt and Exception Handling for Multi-Streaming Digital Processors - A multi-streaming processor has a plurality of streams for streaming one or more instruction threads, a set of functional resources for processing instructions from streams, and interrupt handler logic. The logic detects and maps interrupts and exceptions to one or more specific streams. In some embodiments, one interrupt or exception may be mapped to two or more streams, and in others two or more interrupts or exceptions may be mapped to one stream. Mapping may be static and determined at processor design, programmable, with data stored and amendable, or conditional and dynamic, the interrupt logic executing an algorithm sensitive to variables to determine the mapping. Interrupts may be external interrupts generated by devices external to the processor software (internal) interrupts generated by active streams, or conditional, based on variables. After interrupts are acknowledged, streams to which interrupts or exceptions are mapped are vectored to appropriate service routines. In a synchronous method, no vectoring occurs until all streams to which an interrupt is mapped acknowledge the interrupt.	05-14-2009
20090113365	Automated digital circuit design tool that reduces or eliminates adverse timing constraints due to an inherent clock signal skew, and applications thereof - The present invention provides an automated digital circuit design tool that reduces or eliminates adverse timing constraints due to an inherent clock signal skew, and applications thereof. In an embodiment, an automated design tool according to the invention generates a clocking system that includes a clock signal generator, control logic, enable logic, and at least one clock gater. The clock signal generator generates a clock signal that is distributed to various logic blocks of the digital circuit using a buffered clock tree. The enable logic receives input values from the control logic and provides a control signal to the clock gater. When enabled, the clock gater allows a clock signal to pass through to multiple registers. An early clock signal is provided to register(s) in the control logic, which allows for an increased clock frequency while still meeting timing constraints.	04-30-2009
20090113180	Fetch Director Employing Barrel-Incrementer-Based Round-Robin Apparatus For Use In Multithreading Microprocessor - A fetch director in a multithreaded microprocessor that concurrently executes instructions of N threads is disclosed. The N threads request to fetch instructions from an instruction cache. In a given selection cycle, some of the threads may not be requesting to fetch instructions. The fetch director includes a circuit for selecting one of threads in a round-robin fashion to provide its fetch address to the instruction cache. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the threads is selected next. The first addend is an N-bit vector where each bit is false if the corresponding thread is requesting to fetch instructions from the instruction cache. The second addend is a 1-hot vector indicating the last selected thread. In one embodiment threads with an empty instruction buffer are selected at highest priority; a last dispatched but not fetched thread at middle priority; all other threads at lowest priority. The threads are selected round-robin within the highest and lowest priorities.	04-30-2009
20090089510	SPECULATIVE READ IN A CACHE COHERENT MICROPROCESSOR - A cache coherence manager, disposed in a multi-core microprocessor, includes a request unit, an intervention unit, a response unit and an interface unit. The request unit receives coherent requests and selectively issues speculative requests in response. The interface unit selectively forwards the speculative requests to a memory. The interface unit includes at least three tables. Each entry in the first table represents an index to the second table. Each entry in the second table represents an index to the third table. The entry in the first table is allocated when a response to an associated intervention message is stored in the first table but before the speculative request is received by the interface unit. The entry in the second table is allocated when the speculative request is stored in the interface unit. The entry in the third table is allocated when the speculative request is issued to the memory.	04-02-2009
20090083493	SUPPORT FOR MULTIPLE COHERENCE DOMAINS - A number of coherence domains are maintained among the multitude of processing cores disposed in a microprocessor. A cache coherency manager defines the coherency relationships such that coherence traffic flows only among the processing cores that are defined as having a coherency relationship. The data defining the coherency relationships between the processing cores is optionally stored in a programmable register. For each source of a coherent request, the processing core targets of the request are identified in the programmable register. In response to a coherent request, an intervention message is forwarded only to the cores that are defined to be in the same coherence domain as the requesting core. If a cache hit occurs in response to a coherent read request and the coherence state of the cache line resulting in the hit satisfies a condition, the requested data is made available to the requesting core from that cache line.	03-26-2009
20090080651	SEMICONDUCTOR WITH HARDWARE LOCKED INTELLECTUAL PROPERTY AND RELATED METHODS - A computer readable medium includes executable instructions to describe an intellectual property core with a key check mechanism configured to compare an external key with an internal key in response to a specified event. A pending instruction is executed in response to a match between the external key and the internal key. An unexpected act is performed in response to a mismatch between the external key and the internal key.	03-26-2009
20090077321	Microprocessor with Improved Data Stream Prefetching - A microprocessor coupled to a system memory by a bus includes an instruction decode unit that decodes an instruction that specifies a data stream in the system memory and a stream prefetch priority. The microprocessor also includes a load/store unit that generates load/store requests to transfer data between the system memory and the microprocessor. The microprocessor also includes a stream prefetch unit that generates a plurality of prefetch requests to prefetch the data stream from the system memory into the microprocessor. The prefetch requests specify the stream prefetch priority. The microprocessor also includes a bus interface unit (BIU) that generates transaction requests on the bus to transfer data between the system memory and the microprocessor in response to the load/store requests and the prefetch requests. The BIU prioritizes the bus transaction requests for the prefetch requests relative to the bus transaction requests for the load/store requests based on the stream prefetch priority.	03-19-2009
20090073974	Method and Apparatus for Predicting Characteristics of Incoming Data Packets to Enable Speculative Processing To Reduce Processor Latency - A system for processing data packets in a data packet network has at least one input port for receiving data packets, at least one output port for sending out data packets, a processor for processing packet data, and a packet predictor for predicting a future packet based on a received packet, such that at least some processing for the predicted packet may be accomplished before the predicted packet actually arrives at the system. The system is used in preferred embodiments in Internet routers.	03-19-2009
20090063881	Low-overhead/power-saving processor synchronization mechanism, and applications thereof - A low-overhead/power-saving processor synchronization mechanism, and applications thereof. In an embodiment, the present invention provides a processor having a load-linked register. The processor implements instructions related to the load-linked register. A first instruction, when executed by the processor, causes the processor to load a first value specified by the first instruction in a first register of a register file and to load a second value in the load-linked register. A second instruction, when executed by the processor, causes the processor to suspend execution of a stream of instructions associated with the load-linked register if the second value in the load-linked register is unaltered until the second value in the load-linked register is altered. A third instruction, when executed by the processor, causes the processor to conditionally move a third value to a memory location specified by the third instruction and to move a value representing the state of the load-linked register to the third register.	03-05-2009
20090049312	Power Management for System Having One or More Integrated Circuits - Power management control software including power management policies is provided with those policies divided into observation code and response code. When predetermined execution points within the operating system	02-19-2009
20090037886	APPARATUS AND METHOD FOR EVALUATING A FREE-RUNNING TRACE STREAM - A system includes a processor to generate a free-running trace stream and a probe with a real-time decoder to dynamically detect a trigger included in the free-running trace stream.	02-05-2009
20090037704	TRACE CONTROL FROM HARDWARE AND SOFTWARE - A system and method for program counter and data tracing is disclosed. The tracing mechanism of the present invention enables increased visibility into the hardware and software state of the processor core.	02-05-2009
20080320233	Reduced Handling of Writeback Data - The complexity of the logic of the cache coherency manager unit is reduced by leveraging the data path for intervention messages and responses to carry data associated with writeback requests. A processor core unit sends a writeback request to the cache coherency manager unit. The request does not include the writeback data. Upon receiving an intervention message associated with the writeback request, the processor core unit provides an intervention message response to the cache coherency manager unit indicating that the writeback operation should not be cancelled. The intervention message response includes the writeback data. Because the cache coherency manager already requires a data path to handle data transfers between processor core units, little or no additional overhead needs to be added to the cache coherency manager to handle data associated with writeback request.	12-25-2008
20080320232	Preventing Writeback Race in Multiple Core Processors - A processor prevents writeback race condition errors by maintaining responsibility for data until the writeback request is confirmed by an intervention message from a cache coherency manager. If a request for the same data arrives before the intervention message, the processor core unit provides the requested data and cancels the pending writeback request. The cache coherency data associated with cache lines indicates whether a request for data has been received prior to the intervention message associated with the writeback request. The cache coherency data of a cache line has a value of “modified” when the writeback request is initiated. When the intervention message associated with the writeback request is received, the cache lines's cache coherency data is examined. A change in the cache coherency data from the value of “modified” indicates that the request for data has been received prior to the intervention and the writeback request should be cancelled.	12-25-2008
20080320231	Avoiding Livelock Using Intervention Messages in Multiple Core Processors - Livelocks are prevented in multiple core processors by canceling data access requests upon determining that they conflict with other data access requests. A requesting processor core sends a data access request potentially causing livelock to a cache coherency manager. A cache coherency manager receives data access requests from multiple processor. The cache coherency manager sends intervention messages to all of the processor cores in response to all data access requests that may cause livelock. Upon receiving an intervention message from the cache coherency manager, the processor core determines if the intervention message corresponds with any of its own pending data access requests. If the intervention message is associated with a data access request conflicting with one of its own pending data access requests, the processor core responds to the invention message by directing the cache coherency manager to cancel its own conflicting pending data access request.	12-25-2008
20080320230	Avoiding Livelock Using A Cache Manager in Multiple Core Processors - Livelocks are prevented in multiple core processors by verifying that a data access request is still valid before sending messages to processor cores that may cause other data access requests to fail. A cache coherency manager receives data access requests from multiple processor cores. Upon receiving a data access request that may cause a livelock, the cache coherency manager first sends an intervention message back to the requesting processor core to confirm that this data access request will succeed. If the requesting processor core determines that the data access request is still valid, it directs the cache coherency manager to proceed with the data access request. The cache coherency manager may then send intervention messages to other processor cores to complete the data access request. If the requesting processor core determines that the data access request is invalid, it directs the cache coherency manager to abandon the data access request.	12-25-2008
20080282087	System debug and trace system and method, and applications thereof - An embedded system or system on chip (SoC) includes a secure JTAG system and method to provide secure on-chip control, capture, and export of on chip information in an embedded environment to a probe. In one embodiment, the system comprises encryption logic associated with a JTAG subsystem and decryption logic in the probe for encrypted JTAG read traffic. Inverted encryption/decryption logic provides bi-directional encryption and decryption of JTAG traffic. Encrypted information includes both authentication of valid probe/target interface and encryption of debug data.	11-13-2008
20080270757	Fetch and Dispatch Disassociation Apparatus for Multistreaming Processors - A dynamic multistreaming processor has instruction queues, each instruction queue corresponding to an instruction stream, and execution units. The dynamic multistreaming processor also has a dispatch stage to select at least one instruction from one of the instruction queues and to dispatch the selected at least one instruction to one of the execution units. Lastly the dynamic multistreaming processor has a queue counter, associated with each instruction queue, for indicating the number of instructions in each queue, and a fetch counter, associated with each instruction queue, for indicating an address from which to obtain instructions when the associated instruction queue is not full. The dynamic multistreaming processor might also have fetch counters for indicating a next instruction address from which to obtain at least one instruction when the associated instruction queue is not full. The dynamic multistreaming processor could also have a second counter for indicating a next instruction address.	10-30-2008
20080222589	Protecting Trade Secrets During the Design and Configuration of an Integrated Circuit Semiconductor Design - A system and method for facilitating the design process of an integrated circuits (IC) is described. The system and method utilizes a plurality of repositories, rules engines and design and verification tools to analyze the workload and automatically produce a hardened GDSII description or other representation of the IC. Synthesizable RTL is securely maintained on a server in a data center while providing designers graphical access to customizable IP block by way of a network portal.	09-11-2008
20080222581	Remote Interface for Managing the Design and Configuration of an Integrated Circuit Semiconductor Design - A software system for facilitating the design process and minimizing the time and effort required to complete the design and fabrication of an integrated circuits (IC) is described. The software system utilizes a data center having a plurality of repositories, rules engines and design and verification tools to automatically produce a hardened GDSII description or other representation of the device in response to the formation of a electronic license agreement. Designers select contractual terms for incorporating third party intellectual property and then design and initiate manufacture of the IC by way of a network portal.	09-11-2008
20080222580	SYSTEM AND METHOD FOR MANAGING THE DESIGN AND CONFIGURATION OF AN INTEGRATED CIRCUIT SEMICONDUCTOR DESIGN - A system and methods that facilitate the design process and minimize the time and effort required to complete the design and fabrication of an integrated circuits (IC) are described. The system and method utilize a plurality of repositories, rules engines and design and verification tools to analyze the workload and automatically produce a hardened GDSII description or other representation of the device. The system and method securely maintains synthesizable RTL on a server in a data center while providing designers access to portions of the mechanism by way of a network portal.	09-11-2008
20080215857	Method For Latest Producer Tracking In An Out-Of-Order Processor, And Applications Thereof - Methods for latest producer tracking in a processor. In one embodiment, the method includes the steps of (1) writing a physical register identification value in a first register rename map location specified by a first instruction, (2) writing a first in-register status value in a second register rename map location specified by the first instruction, (3) writing a producer tracking status value at a producer tracking map location specified by the physical register identification value, and (4) modifying, upon graduation of the first instruction, the first in-register status value only if the producer tracking map location stores the producer tracking status value written in step (3). Other methods are also presented.	09-04-2008

Patent applications by MIPS Technologies, Inc.

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

MIPS Technologies, Inc.