Mill Computing, Inc. Patent applications |
Patent application number | Title | Published |
20160132331 | Computer Processor Employing Instruction Block Exit Prediction - A computer processor is provided that executes sequences of instructions stored in memory. The sequences of instructions are organized as one or more instruction blocks each having an entry point and at least one exit point offset from the entry point. An apparatus for predicting control flow through sequences of instructions includes a table storing a plurality of entries each associated with an instruction block or part thereof. At least one entry of the table corresponding to a given instruction block or part thereof includes a predictor corresponding to a predicted execution path that exits the given Instruction block or part thereof. The table is queried in order to generate a chain of predictors corresponding to a sequence of instruction blocks or parts thereof that is predicted to be executed by the computer processor. | 05-12-2016 |
20150370738 | Computer Processor Employing Split Crossbar Circuit For Operand Routing And Slot-Based Organization Of Functional Units - A computer processor including a plurality of functional units that performs operations that produce result operands at different characteristic latencies over multiple cycles. An interconnect network provides data paths for transfer of operand data between functional units. The interconnect network includes first and second crossbar parts. The first crossbar part is configured to route result operands produced with the lowest characteristic latency to any other functional unit. The second crossbar part is configured to route result operands with higher characteristic latency relative to the lowest characteristic latency to the first crossbar part where such result operands are in turn routed to any functional unit. In another aspect, the functional units can be organized as multiple slots where each slot can produce multiple result operands of different characteristic latencies in the same cycle, and wherein each slot employs separate result registers for each characteristic latency present on the slot. | 12-24-2015 |
20150370717 | Computer Processor Employing Byte-Addressable Dedicated Memory For Operand Storage - A computer processor including a first memory structure that operates over multiple cycles to temporarily store operands referenced by at least one instruction. A plurality of functional units performs operations that produce and access operands stored in the first memory structure. A second memory structure is provided, separate from the first memory structure. The second memory structure is configured as a dedicated memory for storage of operands copied from the first memory structure. The second memory structure is organized with a byte-addressable memory space and each operand stored in the second memory structure is accessed by a given byte address into the byte-addressable memory space. | 12-24-2015 |
20150370570 | Computer Processor Employing Temporal Addressing For Storage Of Transient Operands - A computer processor including a plurality of storage elements logically organized as a fixed length queue referenced by logical temporal addresses. The fixed length queue operates over multiple cycles to temporarily store operands referenced by at least one instruction utilizing the logical temporal addresses. A plurality of functional units performs operations over the multiple cycles, wherein the operations produce and access operands stored in the logical fixed length queue. Operands can be added to the front of the logical fixed length queue according to the temporal order that operands are produced by the functional units, and operands can drop from the end of the logical fixed length queue as operands are added to the front of the fixed length queue. A plurality of operands produced by the plurality of functional units (possibly with different latencies in producing such operands) can be added to the logical fixed length queue in a single cycle. A plurality of operands operated on by the functional units can be accessed from the logical fixed length queue in a single cycle. | 12-24-2015 |
20150347143 | COMPUTER PROCESSOR EMPLOYING INSTRUCTIONS WITH ELIDED NOP OPERATIONS - A computer processor that operates on distinct first and second instruction streams that have a predefined timed semantic relationship. At least one of the first and second instruction streams includes variable-length instructions having a header and associated bundle bounded by a head end and a tail end. An alignment hole within the bundle encodes information representing at least one nop operation. The computer processor includes first and second multi-stage instruction processing components configured to process in parallel the first and second instruction streams. At least one of the first and second multi-stage instruction processing components includes an instruction buffer operably coupled to a decode stage. The decode stage is configured to process a variable-length instruction by isolating and interpreting the alignment hole of the variable length instruction in order to initiate zero or more nop operations that follow the timed semantic relationship between the first and second instruction streams. | 12-03-2015 |
20150347142 | Computer Processor Employing Double-Ended Instruction Decoding - A computer processor including an instruction buffer configured to store at least one variable-length instruction having a bit bundle bounded by a head end and a tail end with a plurality of slots each defining a corresponding operation, wherein the plurality of slots and corresponding operations are logically partitioned into a plurality of distinct blocks with a first group of blocks extending from the head end of the bit bundle toward the tail end of the bit bundle and a second group of blocks extending from the tail end of the bit bundle toward the head end of the bit bundle, wherein the second group of blocks includes a tail end block disposed adjacent the tail end of the bit bundle. A decode stage is operably coupled to the instruction buffer and configured to process a given variable-length instruction stored by the instruction buffer by decoding at least one operation of a particular block belonging to the first group of blocks in parallel with decoding at least one operation of the tail end block. Additional aspects are described and claimed. | 12-03-2015 |
20150347130 | COMPUTER PROCESSOR EMPLOYING SPLIT-STREAM ENCODING - A computer processor is operably coupled to a memory system. The memory system is configured to store instruction blocks, wherein each instruction block is associated with an entry address and multiple distinct instruction streams within the instruction block. The multiple distinct instruction streams include at least a first instruction stream and a second instruction stream. The first instruction stream has an instruction order that logically extends in a direction of increasing memory space relative to the entry address of the instruction block. The second instruction stream has an instruction order that logically extends in a direction of decreasing memory space relative to the entry address of the instruction block. The computer processor includes a number of multi-stage instruction processing components corresponding to the multiple distinct instruction streams within each instruction block. The number of multi-stage instruction processing components are configured to access and process in parallel instructions belonging to multiple distinct instruction streams of a particular instruction block stored in the memory system. | 12-03-2015 |
20150220343 | Computer Processor Employing Phases of Operations Contained in Wide Instructions - A computer processor employs an instruction processing pipeline that processes a sequence of wide instructions each having an encoding that represents a plurality of different operations. The plurality of different operations of the given wide instruction are logically organized into a number of phases having a predefined ordering such that at least one operation of the given wide instruction produces data that is consumed by at least one other operation of the given wide instruction. In certain circumstances where stalling is absent, the plurality of different operations of the phases of the given wide instruction can be issued for execution by the instruction processing pipeline over a plurality of consecutive machine cycles. | 08-06-2015 |
20150205609 | Computer Processor Employing Operand Data With Associated Meta-Data - A computer processor is provided that employs a plurality of operand storage elements that store operand data values and associated meta-data as unitary operand data elements as well as at least one functional unit that performs operations that produce and access the unitary operand data elements stored in the plurality of operand storage elements. The meta-data associated with a given operand data value as part of a unitary operand data element can specify type of the unitary operand data element (e.g., vector or scalar), elemental width and floating-point error flags. The meta-data can also be used to define special operand data values (e.g., Not-a-Result and None). The meta-data is useful in optimizing execution, such as in speculation and vectorized SIMD operations. The computer processor can also support a number of particular vector operations that are useful in optimizing execution of vectorized SIMD operations. | 07-23-2015 |
20150106598 | Computer Processor Employing Efficient Bypass Network For Result Operand Routing - A computer processor is provided with a plurality of functional units that performs operations specified by the at least one instruction over the multiple machine cycles, wherein the operations produce result operands. The processor also includes circuitry that generates result tags dynamically according to the number of operations that produce result operands in a given machine cycle. A bypass network is configured to provide data paths for transfer of operand data between the plurality of functional units according to the result tags. | 04-16-2015 |
20150106597 | Computer Processor With Deferred Operations - A computer processor and corresponding method of operation employs execution logic that includes at least one functional unit and operand storage that stores data that is produced and consumed by the at least one functional unit. The at least one functional unit is configured to execute a deferred operation whose execution produces result data. The execution logic further includes a retire station that is configured to store and retire the result data of the deferred operation in order to store such result data in the operand storage, wherein the retire of such result data occurs at a machine cycle following issue of the deferred operation as controlled by statically-assigned parameter data included in the encoding of the deferred operation. | 04-16-2015 |
20150106588 | Computer Processor Employing Hardware-Based Pointer Processing - A computer processor is provided with execution logic that performs operations that utilize pointers stored in memory. In one aspect, each pointer is associated with a predefined number of event bits. The execution logic processes the event bits of a given pointer in conjunction with processing a predefined pointer-related operation involving the given pointer in order to selectively output an event-of-interest signal. | 04-16-2015 |
20150106567 | Computer Processor Employing Cache Memory With Per-Byte Valid Bits - A computer processing system with a hierarchical memory system that associates a number of valid bits for each cache line of the hierarchical memory system. The valid bits are provided for each cache line stored in a respective cache and make explicit which bytes are semantically defined and which are not for the associated given cache line. Memory requests to the cache(s) of the hierarchical memory system can include an address specifying a requested cache line as well as a mask that includes a number of bits each corresponding to a different byte of the requested cache line. The values of the bits of the byte mask indicate which bytes of the requested cache line are to be returned from the hierarchical memory system. The memory request is processed by the top level cache of the hierarchical memory system, looking for one or more valid bytes of the requested cache line corresponding to the target address of the memory request. The valid bytes of the cache line corresponding to the byte mask as stored in cache can be identified by reading out the valid bit(s) and data byte(s) stored by the cache for putative matching cache lines for those data bytes that are specified by the byte mask of the memory request, while ignoring the valid bit(s) and data byte(s) stored by the cache for putative matching cache lines for those data bytes that are not specified by the byte mask of the memory request. Extensions to shared multiprocessor systems is also described and claimed. | 04-16-2015 |
20150106566 | Computer Processor Employing Dedicated Hardware Mechanism Controlling The Initialization And Invalidation Of Cache Lines - A computer processing system includes execution logic that generates memory requests that are supplied to a hierarchical memory system. The computer processing system includes a hardware map storing a number of entries associated with corresponding cache lines, where each given entry of the hardware map indicates whether a corresponding cache line i) currently stores valid data in the hierarchical memory system, or ii) does not currently store valid data in hierarchical memory system and should be interpreted as being implicitly zero throughout. | 04-16-2015 |
20150106545 | Computer Processor Employing Cache Memory Storing Backless Cache Lines - A computer processing system with a hierarchical memory system having at least one cache and physical memory, and a processor having execution logic that generates memory requests that are supplied to the hierarchical memory system. The at least one cache stores a plurality of cache lines including at least one backless cache line. | 04-16-2015 |