Vinodh Gopal, Westborough US

Vinodh Gopal, Westborough, MA US

Patent application number	Description	Published
20090157784	DETERMINING A MESSAGE RESIDUE - A description of techniques of determining a modular remainder with respect to a polynomial of a message comprised of a series of segments. An implementation can include repeatedly accessing a strict subset of the segments and transforming the strict subset of segments to into a smaller set of segments that are equivalent to the strict subset of the segments with respect to the modular remainder. The implementation can also include determining the modular remainder based on a set of segments output by the repeatedly accessing and transforming and storing the determined modular remainder.	06-18-2009
20090157790	METHOD AND APPARATUS FOR MULTIPLYING POLYNOMIALS WITH A PRIME NUMBER OF TERMS - An efficient method and apparatus to compute a product of polynomials of degree n−1 where n is an arbitrary prime is provided. The total number of multiply operations and Arithmetic Logical Unit (ALU) operations to compute the product is minimized through the judicious use of polynomial evaluations at few points to decrease the number of multiplications while using only simple ALU operations.	06-18-2009
20090158132	Determining a message residue - In one aspect, circuitry to determine a modular remainder with respect to a polynomial of a message comprised of a series of segment. In another aspect, circuitry to access at least a portion of a first number having a first endian format, determine a second number based on a bit reflection and shift of a third number having an endian format opposite to that of the first endian format, and perform a polynomial multiplication of the first number and the at least a portion of the first number.	06-18-2009
20090164546	METHOD AND APPARATUS FOR EFFICIENT PROGRAMMABLE CYCLIC REDUNDANCY CHECK (CRC) - A method and apparatus to optimize each of the plurality of reduction stages in a Cyclic Redundancy Check (CRC) circuit to produce a residue for a block of data decreases area used to perform the reduction while maintaining the same delay through the plurality of stages of the reduction logic. A hybrid mix of Karatsuba algorithm, classical multiplications and serial division in various stages in the CRC reduction circuit results in about a twenty percent reduction in area on the average with no decrease in critical path delay.	06-25-2009
20100141488	ACCELERATED DECOMPRESSION - Techniques for decompressing a compressed input by determining, according to an ordering of allowable codewords, an offset for a variable length codeword detected in the input; accessing a record at the determined offset in a data structure having one record for each of the allowable codewords, each record including a portion for at least one of a literal value and a length value and a portion for a type value indicative of whether the record is for a literal or a length; and determining a decompressed output based at least in part on the accessed record.	06-10-2010
20100153829	RESIDUE GENERATION - In one embodiment, circuitry is provided to generate a residue based at least in part upon operations and a data stream generated based at least in part upon a packet. The operations may include at least one iteration of at least one reduction operation including (a) multiplying a first value with at least one portion of the data stream, and (b) producing a reduction by adding at least one other portion of the data stream to a result of the multiplying. The operations may include at least one other reduction operation including (c) producing another result by multiplying with a second value at least one portion of another stream based at least in part upon the reduction, (d) producing a third value by adding at least one other portion of the another stream to the another result, and (e) producing the residue by performing a Barrett reduction based at least in part upon the third value.	06-17-2010
20100153830	CARRY BUCKET-AWARE MULTIPLICATION - An apparatus comprising an integrated circuit configured to accept a plurality of operands; multiply the operands producing an result in a first binary format; and distribute the result in the first binary format over a plurality of data units in a second binary format, each unit having W bits with k>0 most significant bits set to zero.	06-17-2010
20100161536	PATTERN MATCHING - A method and apparatus to perform pattern matching is provided. The apparatus includes a first storage to store data representing a first set of pattern components, and a second storage to store data representing a second set of pattern components each corresponding to one or more components of the first set of pattern components. A first pattern matcher is configured to detect in an input stream a first component of one or more patterns and to generate a signal indicative of the detection of the first component. A second pattern matcher is configured to receive the signal from the first pattern matcher and to detect if a second component of the one or more patterns of the set of patterns immediately follows the first component in the input stream.	06-24-2010
20100205455	DIFFUSION AND CRYPTOGRAPHIC-RELATED OPERATIONS - An embodiment includes at least one processing unit to perform at least first and second sets of diffusion-related operations to produce a resulting block from a data block, and that includes at least one stage and at least one other stage. The at least one stage is to select one of first operands and second operands input to the at least one other stage. The first and second operands are respectively associated with the first and second sets of operations, respectively. The at least one other stage involves arithmetic and logical operations common to both the first and second sets of operations. At least one other processing unit is to perform at least one set of cryptographic-related operations (different, at least in part, from the first and second sets of operations) on at least one of (1) another block to produce the data block and (2) the resulting block.	08-12-2010
20100332578	Method and apparatus for performing efficient side-channel attack resistant reduction - A time-invariant method and apparatus for performing modular reduction that is protected against cache-based and branch-based attacks is provided. The modular reduction technique adds no performance penalty and is side-channel resistant. The side-channel resistance is provided through the use of lazy evaluation of carry bits, elimination of data-dependent branches and use of even cache accesses for all memory references.	12-30-2010
20110153700	Method and apparatus for performing a shift and exclusive or operation in a single instruction - Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.	06-23-2011
20110153993	Add Instructions to Add Three Source Operands - A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.	06-23-2011
20110153994	Multiplication Instruction for Which Execution Completes Without Writing a Carry Flag - A method in one aspect may include receiving a multiply instruction. The multiply instruction may indicate a first source operand and a second source operand. A product of the first and second source operands may be stored in one or more destination operands indicated by the multiply instruction. Execution of the multiply instruction may complete without writing a carry flag. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.	06-23-2011
20110153997	Bit Range Isolation Instructions, Methods, and Apparatus - Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.	06-23-2011
20110154169	SYSTEM, METHOD, AND APPARATUS FOR A SCALABLE PROCESSOR ARCHITECTURE FOR A VARIETY OF STRING PROCESSING APPLICATIONS - Systems, methods, and apparatus for a scalable processor architecture for variety of string processing application are described. In one such apparatus, n input first in, first out (FIFO) buffer stores an input stream. A plurality of memory banks store data from the input stream. A re-configurable controller processes the input stream. And an output FIFO buffer stores the processed input stream.	06-23-2011
20110161635	Rotate instructions that complete execution without reading carry flag - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.	06-30-2011
20120150887	PATTERN MATCHING - An embodiment may include circuitry to determine, at least in part, whether one or more reference patterns are present in a data stream in a packet flow. The circuitry may include first pattern matching circuitry communicatively coupled to second pattern matching circuitry. The first pattern matching circuitry may determine, based at least in part upon one or more deterministic pattern matching operations, whether at least one portion of the one or more reference patterns is present in the stream. If the first pattern matching circuitry determines that the at least one portion of the one or more reference patterns is present in the stream, the second pattern matching circuitry may determine, based at least in part upon one or more pattern matching threads, whether at least one other portion of the one or more reference patterns is present in the stream. Many modifications are possible without departing from this embodiment.	06-14-2012
20120151183	ENHANCING PERFORMANCE BY INSTRUCTION INTERLEAVING AND/OR CONCURRENT PROCESSING OF MULTIPLE BUFFERS - An embodiment may include circuitry to execute, at least in part, a first list of instructions and/or to concurrently process, at least in part, first and second buffers. The execution of the first list of instructions may result, at least in part, from invocation of a first function call. The first list of instructions may include at least one portion of a second list of instructions interleaved, at least in part, with at least one other portion of a third list of instructions. The portions may be concurrently carried out, at least in part, by one or more sets of execution units of the circuitry. The second and third lists of instructions may implement, at least in part, respective algorithms that are amenable to being invoked by separate respective function calls. The concurrent processing may involve, at least in part, complementary algorithms.	06-14-2012
20130007573	EFFICIENT AND SCALABLE CYCLIC REDUNDANCY CHECK CIRCUIT USING GALOIS-FIELD ARITHMETIC - Embodiments of the present disclosure describe methods, apparatus, and system configurations for cyclic redundancy check circuits using Galois-field arithmetic.	01-03-2013
20130191699	INSTRUCTION-SET ARCHITECTURE FOR PROGRAMMABLE CYCLIC REDUNDANCY CHECK (CRC) COMPUTATIONS - A method and apparatus to perform Cyclic Redundancy Check (CRC) operations on a data block using a plurality of different n-bit polynomials is provided. A flexible CRC instruction performs a CRC operation using a programmable n-bit polynomial. The n-bit polynomial is provided to the CRC instruction by storing the n-bit polynomial in one of two operands.	07-25-2013
20130227252	Add Instructions to Add Three Source Operands - A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.	08-29-2013
20130275722	METHOD AND APPARATUS TO PROCESS KECCAK SECURE HASHING ALGORITHM - A processor includes a plurality of registers, an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively, and an execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner.	10-17-2013
20130283064	METHOD AND APPARATUS TO PROCESS SHA-1 SECURE HASHING ALGORITHM - A processor includes an instruction decoder to receive a first instruction to process a SHA-1 hash algorithm, the first instruction having a first operand to store a SHA-1 state, a second operand to store a plurality of messages, and a third operand to specify a hash function, and an execution unit coupled to the instruction decoder to perform a plurality of rounds of the SHA-1 hash algorithm on the SHA-1 state specified in the first operand and the plurality of messages specified in the second operand, using the hash function specified in the third operand.	10-24-2013
20130290285	DIGEST GENERATION - In one embodiment, circuitry may generate digests to be combined to produce a hash value. The digests may include at least one digest and at least one other digest generated based at least in part upon at least one CRC value and at least one other CRC value. The circuitry may include cyclical redundancy check (CRC) generator circuitry to generate the at least one CRC value based at least in part upon at least one input string. The CRC generator circuitry also may generate the at least one other CRC value based least in part upon at least one other input string. The at least one other input string resulting at least in part from at least one pseudorandom operation involving, at least in part, the at least one input string. Many modifications, variations, and alternatives are possible without departing from this embodiment.	10-31-2013
20130311756	ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION WITHOUT READING CARRY FLAG - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.	11-21-2013
20130326201	PROCESSOR-BASED APPARATUS AND METHOD FOR PROCESSING BIT STREAMS - An apparatus and method are described for processing bit streams using bit-oriented instructions. For example, a method according to one embodiment includes the operations of: executing an instruction to get bits for an operation, the instruction identifying a start bit address and a number of bits to be retrieved; retrieving the bits identified by the start bit address and number of bits from a bit-oriented register or cache; and performing a sequence of specified bit operations on the retrieved bits to generate results.	12-05-2013
20140006536	TECHNIQUES TO ACCELERATE LOSSLESS COMPRESSION	01-02-2014
20140006753	MATRIX MULTIPLY ACCUMULATE INSTRUCTION	01-02-2014
20140013086	ADDITION INSTRUCTIONS WITH INDEPENDENT CARRY CHAINS - A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.	01-09-2014
20140016773	INSTRUCTIONS PROCESSORS, METHODS, AND SYSTEMS TO PROCESS BLAKE SECURE HASHING ALGORITHM - A method of an aspect includes receiving an instruction indicating a first source having at least one set of four state matrix data elements, which represent a complete set of four inputs to a G function of a cryptographic hashing algorithm. The algorithm uses a sixteen data element state matrix, and alternates between updating data elements in columns and diagonals. The instruction also indicates a second source having data elements that represent message and constant data. In response to the instruction, a result is stored in a destination indicated by the instruction. The result includes updated state matrix data elements including at least one set of four updated state matrix data elements. Each of the four updated state matrix data elements represents a corresponding one of the four state matrix data elements of the first source, which has been updated by the G function.	01-16-2014
20140016774	INSTRUCTIONS TO PERFORM GROESTL HASHING - A method is described. The method includes executing an instruction to perform one or more Galois Field (GF) multiply by 2 operations on a state matrix and executing an instruction to combine results of the one or more GF multiply by 2 operations with exclusive or (XOR) functions to generate a result matrix.	01-16-2014
20140019693	PARALLEL PROCESSING OF A SINGLE DATA BUFFER - Technologies for executing a serial data processing algorithm on a single variable length data buffer includes streaming segments of the buffer into a data register, executing the algorithm on each of the segments in parallel, and combining the results of executing the algorithm on each of the segments to form the output of the serial data processing algorithm.	01-16-2014
20140019694	PARALLELL PROCESSING OF A SINGLE DATA BUFFER - Technologies for executing a serial data processing algorithm on a single variable-length data buffer includes padding data segments of the buffer, streaming the data segments into a data register and executing the serial data processing algorithm on each of the segments in parallel.	01-16-2014
20140019725	METHOD FOR FAST LARGE-INTEGER ARITHMETIC ON IA PROCESSORS - Methods, systems, and apparatuses are disclosed for implementing fast large-integer arithmetic within an integrated circuit, such as on IA (Intel Architecture) processors, in which such means include receiving a 512-bit value for squaring, the 512-bit value having eight sub-elements each of 64-bits and performing a 512-bit squaring algorithm by: (i) multiplying every one of the eight sub-elements by itself to yield a square of each of the eight sub-elements, the eight squared sub-elements collectively identified as T1, (ii) multiplying every one of the eight sub-elements by the other remaining seven of the eight sub-elements to yield an asymmetric intermediate result having seven diagonals therein, wherein each of the seven diagonals are of a different length, (iii) reorganizing the asymmetric intermediate result having the seven diagonals therein into a symmetric intermediate result having four diagonals each of 7×1 sub-elements of the 64-bits in length arranged across a plurality of columns, (iv) adding all sub-elements within their respective columns, the added sub-elements collectively identified as T2, and (v) yielding a final 512-bit squared result of the 512-bit value by adding the value of T2 twice with the value of T1 once. Other related embodiments are disclosed.	01-16-2014
20140019764	METHOD FOR SIGNING AND VERIFYING DATA USING MULTIPLE HASH ALGORITHMS AND DIGESTS IN PKCS - Methods, systems, and apparatuses are disclosed for signing and verifying data using multiple hash algorithms and digests in PKCS including, for example, retrieving, at the originating computing device, a message for signing at the originating computing device to yield a signature for the message; identifying multiple hashing algorithms to be supported by the signature; for each of the multiple hashing algorithms identified to be supported by the signature, hashing the message to yield multiple hashes of the message corresponding to the multiple hashing algorithms identified; constructing a single digest having therein each of the multiple hashes of the messages corresponding to the multiple hashing algorithms identified and further specifying the multiple hashing algorithms to be supported by the signature; applying a signing algorithm to the single digest using a private key of the originating computing device to yield the signature for the message; and distributing the message and the signature to receiving computing devices. Other related embodiments are disclosed.	01-16-2014
20140053000	INSTRUCTIONS TO PERFORM JH CRYPTOGRAPHIC HASHING - A method is described. The method includes executing one or more JH_SBOX_L instruction to perform S-Box mappings and a linear (L) transformation on a JH state and executing one or more JH_Permute instruction to perform a permutation function on the JH state once the S-Box mappings and the L transformation have been performed	02-20-2014
20140082328	METHOD AND APPARATUS TO PROCESS 4-OPERAND SIMD INTEGER MULTIPLY-ACCUMULATE INSTRUCTION - According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.	03-20-2014
20140082451	EFFICIENT AND SCALABLE CYCLIC REDUNDANCY CHECK CIRCUIT USING GALOIS-FIELD ARITHMETIC - Embodiments of the present disclosure describe methods, apparatus, and system configurations for cyclic redundancy check circuits using Galois-field arithmetic.	03-20-2014
20140093068	INSTRUCTION SET FOR SKEIN256 SHA3 ALGORITHM ON A 128-BIT PROCESSOR - According to one embodiment, a processor includes an instruction decoder to receive a first instruction to perform first SKEIN256 MIX-PERMUTE operations, the first instruction having a first operand associated with a first storage location to store a plurality of odd words, a second operand associated with a second storage location to store a plurality of even words, and a third operand. The processor further includes a first execution unit coupled to the instruction decoder, in response to the first instruction, to perform multiple rounds of the first SKEIN256 MIX-PERMUTE operations based on the odd words and even words using a first rotate value obtained from a third storage location indicated by the third operand, and to store new odd words in the first storage location indicated by the first operand.	04-03-2014
20140093069	INSTRUCTION SET FOR MESSAGE SCHEDULING OF SHA256 ALGORITHM - A processor includes a first execution unit to receive and execute a first instruction to process a first part of secure hash algorithm 256 (SHA256) message scheduling operations, the first instruction having a first operand associated with a first storage location to store a first set of message inputs and a second operand associated with a second storage location to store a second set of message inputs. The processor further includes a second execution unit to receive and execute a second instruction to process a second part of the SHA256 message scheduling operations, the second instruction having a third operand associated with a third storage location to store an intermediate result of the first part and a third set of message inputs and a fourth operand associated with a fourth storage location to store a fourth set of message inputs.	04-03-2014
20140095844	Systems, Apparatuses, and Methods for Performing Rotate and XOR in Response to a Single Instruction - Disclosed herein are systems, apparatuses, and methods performing in a computer processor of performing a rotate and XOR in response to a single XOR and rotate instruction, wherein the rotate and XOR instruction includes a first and second source operand, a destination operand, and an immediate value.	04-03-2014
20140095845	APPARATUS AND METHOD FOR EFFICIENTLY EXECUTING BOOLEAN FUNCTIONS - An apparatus and method are described for performing efficient Boolean operations in a pipelined processor which, in one embodiment, does not natively support three operand instructions. For example, a processor according to one embodiment of the invention comprises: a set of registers for storing packed operands; Boolean operation logic to execute a single instruction which uses three or more source operands packed in the set of registers, the Boolean operation logic to read at least three source operands and an immediate value to perform a Boolean operation on the three source operands, wherein the Boolean operation comprises: combining a bit read from each of the three operands to form an index to the immediate value, the index identifying a bit position within the immediate value; reading the bit from the identified bit position of the immediate value; and storing the bit from the identified bit position of the immediate value in a destination register.	04-03-2014
20140095891	INSTRUCTION SET FOR SHA1 ROUND PROCESSING ON 128-BIT DATA PATHS - According to one embodiment, a processor includes an instruction decoder to receive a first instruction to process a SHA1 hash algorithm, the first instruction having a first operand, a second operand, and a third operand, the first operand specifying a first storage location storing four SHA states, the second operand specifying a second storage location storing a plurality of SHA1 message inputs in combination with a fifth SHA1 state. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to perform at least four rounds of the SHA1 round operations on the SHA1 states and the message inputs obtained from the first and second operands, using a combinational logic function specified in the third operand.	04-03-2014
20140122839	APPARATUS AND METHOD OF EXECUTION UNIT FOR CALCULATING MULTIPLE ROUNDS OF A SKEIN HASHING ALGORITHM - An apparatus is described that includes an execution unit within an instruction pipeline. The execution unit has multiple stages of a circuit that includes a) and b) as follows. a) a first logic circuitry section having multiple mix logic sections each having: i) a first input to receive a first quad word and a second input to receive a second quad word; ii) an adder having a pair of inputs that are respectively coupled to the first and second inputs; iii) a rotator having a respective input coupled to the second input; iv) an XOR gate having a first input coupled to an output of the adder and a second input coupled to an output of the rotator. b) permute logic circuitry having inputs coupled to the respective adder and XOR gate outputs of the multiple mix logic sections.	05-01-2014
20140156790	BITSTREAM PROCESSING USING COALESCED BUFFERS AND DELAYED MATCHING AND ENHANCED MEMORY WRITES - Methods and apparatus for processing bitstreams and byte streams. According to one aspect, bitstream data is compressed using coalesced string match tokens with delayed matching. A matcher is employed to perform search string match operations using a shortened maximum string length search criteria, resulting in generation of a token stream having data and literal data. A distance match operation is performed on sequentially adjacent tokens to determine if they contain the same distance data. If they do, the len values of the tokens are added through use of a coalesce buffer. Upon detection of a distance non-match, a final coalesced length of a matching string is calculated and output along with the prior matching distance as a coalesced token. Also disclosed is a scheme for writing variable-length tokens into a bitstream under which token data is input into a bit accumulator and written to memory (or cache to be subsequently written to memory) as each token is processed in a manner that eliminates branch mispredict operations associated with detecting whether the bit accumulator is full or close to full.	06-05-2014
20140164467	APPARATUS AND METHOD FOR VECTOR INSTRUCTIONS FOR LARGE INTEGER ARITHMETIC - An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register.	06-12-2014
20140177823	METHODS, SYSTEMS AND APPARATUS TO REDUCE PROCESSOR DEMANDS DURING ENCRYPTION - Methods and apparatus are disclosed to reduce processor demands during encryption. A disclosed example method includes detecting a request for the processor to execute an encryption cipher determining whether the encryption cipher is associated with a byte reflection operation, preventing the byte reflection operation when a buffer associated with the encryption cipher will not cause a carryover condition, and incrementing the buffer via a shift operation before executing the encryption cipher.	06-26-2014
20140185793	INSTRUCTIONS PROCESSORS, METHODS, AND SYSTEMS TO PROCESS SECURE HASH ALGORITHMS - A method of an aspect includes receiving an instruction. The instruction indicates a first source of a first packed data including state data elements a	07-03-2014
20140189289	INSTRUCTION FOR ACCELERATING SNOW 3G WIRELESS SECURITY ALGORITHM - Vector instructions for performing SNOW 3G wireless security operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first operand of the first instruction specifying a first vector register that stores a current state of a finite state machine (FSM). The execution circuitry also receives a second operand of the first instruction specifying a second vector register that stores data elements of a liner feedback shift register (LFSR) that are needed for updating the FSM. The execution circuitry executes the first instruction to produce a updated state of the FSM and an output of the FSM in a destination operand of the first instruction.	07-03-2014
20140189290	INSTRUCTION FOR FAST ZUC ALGORITHM PROCESSING - Vector instructions for performing ZUC stream cipher operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first vector instruction to perform an update to a liner feedback shift register (LFSR), and receives a second vector instruction to perform an update to a state of a finite state machine (FSM), where the FSM receives inputs from re-ordered bits of the LFSR. The execution circuitry executes the first vector instruction and the second vector instruction in a single-instruction multiple data (SIMD) pipeline.	07-03-2014
20140189293	Instructions for Sliding Window Encoding Algorithms - A processor is described having an instruction execution pipeline having a functional unit to execute an instruction that compares vector elements against an input value. Each of the vector elements and the input value have a first respective section identifying a location within data and a second respective section having a byte sequence of the data. The functional unit has comparison circuitry to compare respective byte sequences of the input vector elements against the input value's byte sequence to identify a number of matching bytes for each comparison. The functional unit also has difference circuitry to determine respective distances between the input vector ‘s elements’ byte sequences and the input value's byte sequence within the data.	07-03-2014
20140189368	INSTRUCTION AND LOGIC TO PROVIDE SIMD SECURE HASHING ROUND SLICE FUNCTIONALITY - Instructions and logic provide SIMD secure hashing round slice functionality. Some embodiments include a processor comprising: a decode stage to decode an instruction for a SIMD secure hashing algorithm round slice, the instruction specifying a source data operand set, a message-plus-constant operand set, a round-slice portion of the secure hashing algorithm round, and a rotator set portion of rotate settings. Processor execution units, are responsive to the decoded instruction, to perform a secure hashing round-slice set of round iterations upon the source data operand set, applying the message-plus-constant operand set and the rotator set, and store a result of the instruction in a SIMD destination register. One embodiment of the instruction specifies a hash round type as one of four MD5 round types. Other embodiments may specify a hash round type by an immediate operand as one of three SHA-1 round types or as a SHA-2 round type.	07-03-2014
20140189369	Instructions Processors, Methods, and Systems to Process Secure Hash Algorithms - A method of an aspect includes receiving an instruction. The instruction indicates a first source of a first packed data including state data elements a	07-03-2014
20140195782	METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM - A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.	07-10-2014
20140195817	THREE INPUT OPERAND VECTOR ADD INSTRUCTION THAT DOES NOT RAISE ARITHMETIC FLAGS FOR CRYPTOGRAPHIC APPLICATIONS - A method is described that includes performing the following within an instruction execution pipeline implemented on a semiconductor chip: summing three input vector operands through execution of a single instruction; and, not raising any arithmetic flags even though a result of the summing creates more bits than circuitry designed to transport the summation is able to transport.	07-10-2014
20140205084	INSTRUCTIONS TO PERFORM JH CRYPTOGRAPHIC HASHING IN A 256 BIT DATA PATH - A method is described. The method includes executing one or more JH_SBOX_L instructions to perform S-Box mappings and a linear (L) transformation on a JH state and executing one or more JH_P instructions to perform a permutation function on the JH state once the S-Box mappings and the L transformation have been performed.	07-24-2014
20140237218	SIMD INTEGER MULTIPLY-ACCUMULATE INSTRUCTION FOR MULTI-PRECISION ARITHMETIC - A multiply-and-accumulate (MAC) instruction allows efficient execution of unsigned integer multiplications. The MAC instruction indicates a first vector register as a first operand, a second vector register as a second operand, and a third vector register as a destination. The first vector register stores a first factor, and the second vector register stores a partial sum. The MAC instruction is executed to multiply the first factor with an implicit second factor to generate a product, and to add the partial sum to the product to generate a result. The first factor, the implicit second factor and the partial sum have a same data width and the product has twice the data width. The most significant half of the result is stored in the third vector register, and the least significant half of the result is stored in the second vector register.	08-21-2014
20150043729	INSTRUCTION AND LOGIC TO PROVIDE A SECURE CIPHER HASH ROUND FUNCTIONALITY - Instructions and logic provide secure cipher hashing algorithm round functionality. Some embodiments include a processor comprising: a decode stage to decode an instruction for a secure cipher hashing algorithm, the first instruction specifying a source data, and one or more key operands. Processor execution units, are responsive to the decoded instruction, to perform one or more secure cipher hashing algorithm round iterations upon the source data, using the one or more key operands, and store a result of the instruction in a destination register. One embodiment of the instruction specifies a secure cipher hashing algorithm round iteration using a Feistel cipher algorithm such as DES or TDES. In one embodiment a result of the instruction may be used in generating a resource assignment from a request for load balancing requests across the set of processing resources.	02-12-2015
20150082047	EFFICIENT MULTIPLICATION, EXPONENTIATION AND MODULAR REDUCTION IMPLEMENTATIONS - In one embodiment, the present disclosure provides a method that includes segmenting an n-bit exponent e into a first segment e	03-19-2015
20150089195	METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION - Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.	03-26-2015
20150089196	METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION - Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.	03-26-2015
20150089197	METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION - Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.	03-26-2015
20150089199	ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION EITHER WITHOUT WRITING OR READING FLAGS - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.	03-26-2015
20150089200	ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION EITHER WITHOUT WRITING OR READING FLAGS - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.	03-26-2015
20150089201	ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION EITHER WITHOUT WRITING OR READING FLAGS - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.	03-26-2015

Patent applications by Vinodh Gopal, Westborough, MA US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Vinodh Gopal, Westborough US

Vinodh Gopal, Westborough, MA US