Yen-Kuang Chen, Cupertino US

Yen-Kuang Chen, Cupertino, CA US

Patent application number	Description	Published
20090249026	Vector instructions to enable efficient synchronization and parallel reduction operations - In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.	10-01-2009
20090271572	Dynamically Re-Classifying Data In A Shared Cache - In one embodiment, the present invention includes a method for determining if a state of data is indicative of a first class of data, re-classifying the data from a second class to the first class based on the determination, and moving the data to a first portion of a shared cache associated with a first requester unit based on the re-classification. Other embodiments are described and claimed.	10-29-2009
20100005241	DETECTION OF STREAMING DATA IN CACHE - An apparatus to detect streaming data in memory is presented. In one embodiment the apparatus use reuse bits and S-bits status for cache lines wherein an S-bit status indicates the data in the cache line are potentially streaming data. To enhance the efficiency of a cache, different measures can be applied to make the streaming data become the next victim during a replacement.	01-07-2010
20100138607	Increasing concurrency and controlling replication in a multi-core cache hierarchy - In one embodiment, the present invention includes a directory of a private cache hierarchy to maintain coherency between data stored in the cache hierarchy, where the directory is to enable concurrent cache-to-cache transfer of data to two private caches. Other embodiments are described and claimed.	06-03-2010
20100153649	SHARED CACHE MEMORIES FOR MULTI-CORE PROCESSORS - Embodiments of shared cache memories for multi-core processors are presented. In one embodiment, a cache memory comprises a group of sampling cache sets and a controller to determine a number of misses that occur in the group of sampling cache sets. The controller is operable to determine a victim cache line for a cache set based at least in part on the number of misses.	06-17-2010
20110025700	Using a Texture Unit for General Purpose Computing - An interpolation unit, such as may be found in a texture unit or texture sampler, may be used utilized to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to an interpolation unit. The interpolation unit may use linear interpolators in order to perform the dot product calculations.	02-03-2011
20110035426	Bitstream Buffer Manipulation with a SIMD Merge Instruction - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.	02-10-2011
20110066806	System and method for memory bandwidth friendly sorting on multi-core architectures - In some embodiments, the invention involves utilizing a tree merge sort in a platform to minimize cache reads/writes when sorting large amounts of data. An embodiment uses blocks of pre-sorted data residing in “leaf nodes” residing in memory storage. A pre-sorted block of data from each leaf node is read from memory and stored in faster cache memory. A tree merge sort is performed on the nodes that are cache resident until a block of data migrates to a root node. Sorted blocks reaching the root node are written to memory storage in an output list until all pre-sorted data blocks have been moved to cache and merged upward to the root. The completed output list in memory storage is a list of the fully sorted data. Other embodiments are described and claimed.	03-17-2011
20110078340	VIRTUAL ROW BUFFERS FOR USE WITH RANDOM ACCESS MEMORY - Methods, apparatuses and systems to decrease the energy consumption of a memory chip while increasing its effect bandwidth during the execution of any workload. Methods, apparatuses and systems may allow a memory chip utilize a plurality of virtual row buffers to respond to requests for data included in a memory array block. Methods, apparatuses and systems may further eliminate or reduce the cost associated with transferring unnecessary data from a memory array block to row buffers by altering the data transfer size between a memory array block and a row buffer.	03-31-2011
20110134137	Texture Unit for General Purpose Computing - A texture unit may be used utilized to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to the texture unit. The texture unit may use linear interpolators in order to perform the dot product calculations.	06-09-2011
20110138122	GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.	06-09-2011
20110138128	Technique for tracking shared data in a multi-core processor or multi-processor system - A technique to track shared information in a multi-core processor or multi-processor system. In one embodiment, core identification information (“core IDs”) are used to track shared information among multiple cores in a multi-core processor or multiple processors in a multi-processor system.	06-09-2011
20110145184	METHODS AND SYSTEMS TO TRAVERSE GRAPH-BASED NETWORKS - Methods and systems to translate input labels of arcs of a network, corresponding to a sequence of states of the network, to a list of output grammar elements of the arcs, corresponding to a sequence of grammar elements. The network may include a plurality of speech recognition models combined with a weighted finite state machine transducer (WFST). Traversal may include active arc traversal, and may include active arc propagation. Arcs may be processed in parallel, including arcs originating from multiple source states and directed to a common destination state. Self-loops associated with states may be modeled within outgoing arcs of the states, which may reduce synchronization operations. Tasks may be ordered with respect to cache-data locality to associate tasks with processing threads based at least in part on whether another task associated with a corresponding data object was previously assigned to the thread.	06-16-2011
20110148896	Grouping Pixels to be Textured - A region or group of pixels may be textured as a unit, using a range specifier and one or more anchor pixels to define the group. In some embodiments, processing grouped pixels improves efficiency.	06-23-2011
20120042121	Scatter-Gather Intelligent Memory Architecture For Unstructured Streaming Data On Multiprocessor Systems - A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion.	02-16-2012
20120159130	MECHANISM FOR CONFLICT DETECTION USING SIMD - A system and method are configured to detect conflicts when converting scalar processes to parallel processes (“SIMDifying”). Conflicts may be detected for an unordered single index, an ordered single index and/or ordered pairs of indices. Conflicts may be further detected for read-after-write dependencies. Conflict detection is configured to identify operations (i.e., iterations) in a sequence of iterations that may not be done in parallel.	06-21-2012
20120166761	VECTOR CONFLICT INSTRUCTIONS - A processing core implemented on a semiconductor chip is described having first execution unit logic circuitry that includes first comparison circuitry to compare each element in a first input vector against every element of a second input vector. The processing core also has second execution logic circuitry that includes second comparison circuitry to compare a first input value against every data element of an input vector.	06-28-2012
20120290799	GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.	11-15-2012
20130179633	SCATTER-GATHER INTELLIGENT MEMORY ARCHITECTURE FOR UNSTRUCTURED STREAMING DATA ON MULTIPROCESSOR SYSTEMS - A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion.	07-11-2013
20130185541	BITSTREAM BUFFER MANIPULATION WITH A SIMD MERGE INSTRUCTION - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.	07-18-2013
20130297878	GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.	11-07-2013
20140040542	SCATTER-GATHER INTELLIGENT MEMORY ARCHITECTURE FOR UNSTRUCTURED STREAMING DATA ON MULTIPROCESSOR SYSTEMS - A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion.	02-06-2014
20140068226	VECTOR INSTRUCTIONS TO ENABLE EFFICIENT SYNCHRONIZATION AND PARALLEL REDUCTION OPERATIONS - In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.	03-06-2014
20140156274	METHODS AND SYSTEMS TO TRAVERSE GRAPH-BASED NETWORKS - Methods and systems to translate input labels of arcs of a network, corresponding to a sequence of states of the network, to a list of output grammar elements of the arcs, corresponding to a sequence of grammar elements. The network may include a plurality of speech recognition models combined with a weighted finite state machine transducer (WFST). Traversal may include active arc traversal, and may include active arc propagation. Arcs may be processed in parallel, including arcs originating from multiple source states and directed to a common destination state. Self-loops associated with states may be modeled within outgoing arcs of the states, which may reduce synchronization operations. Tasks may be ordered with respect to cache-data locality to associate tasks with processing threads based at least in part on whether another task associated with a corresponding data object was previously assigned to the thread.	06-05-2014
20140176590	Texture Unit for General Purpose Computing - A texture unit may be used to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to the texture unit. The texture unit may use linear interpolators in order to perform the dot product calculations.	06-26-2014
20140337580	GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.	11-13-2014

Patent applications by Yen-Kuang Chen, Cupertino, CA US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Yen-Kuang Chen, Cupertino US

Yen-Kuang Chen, Cupertino, CA US