Grover, WA

Benjamin T. Grover, Issaquah, WA US

Patent application number	Description	Published
20100266208	Automated Image Cropping to Include Particular Subjects - A digital image is automatically cropped to fit within a desired frame. The cropping is based on one or more of two identified portions of the image. One of the portions is an all-subjects portion that includes all the identified subjects of a particular type in the image. The other portion is an attention portion that identifies an intended focus of the image. An attempt to crop the image to include both of these portions is made, and if unsuccessful then an attempt to crop the image to include at least the all-subjects portion is made. If neither of these attempts is successful, then the image is cropped to include one or more, but less than all, of the identified subjects of the particular type in the image.	10-21-2010
20120131387	MANAGING AUTOMATED AND MANUAL APPLICATION TESTING - An application for which approval is requested is identified and multiple automated tests are applied to the application in groups of automated tests. Each of the groups of automated tests includes multiple ones of the multiple automated tests. If one or more automated tests in a group of automated tests returns an inconclusive result, then a manual check is initiated for the application based on the one or more automated tests that returned the inconclusive result. If one or more automated tests in a group, or a manual test applied in the manual check, returns a fail result then an indication that the application is rejected is returned, the indication that the application is rejected including an identification of why the application is rejected. If none of the multiple automated tests returns a fail result, then a manual testing phase is initiated.	05-24-2012
20130185246	APPLICATION QUALITY TESTING TIME PREDICTIONS - Techniques for application quality testing time predictions are described that provide customized processing time predictions for application testing. Testing times for various stages of testing are collected during application testing. The collected testing data may be stored in association with various attribute values ascertained for the applications. The collected timing data is used to predict processing times for applications that match defined profiles. The profiles may be defined to include selected attributes and values used to identify applications having corresponding attributes and values. The collected timing data may be processed to derive predictions for processing times on a profile-by-profile basis. When a new application is submitted, the application is matched to a particular profile based on attribute values possessed by the application. Then, processing time predictions associated with the matching profile may be obtained, assigned to the application, and output for presentation to a developer of the application.	07-18-2013

Patent applications by Benjamin T. Grover, Issaquah, WA US

Benjamin T. Grover, Maple Valley, WA US

Patent application number	Description	Published
20140261097	TABLE WITH A ROTATABLE TABLETOP - A table having a vertical support structure and a tabletop that is rotatable relative to the vertical support structure. The rotatable tabletop is configured to lock and unlock at a variety of positions relative to the vertical support structure via a latch mechanism.	09-18-2014
20140264113	MECHANICALLY LATCHING SOLENOID VALVE - A linear actuated valve controls the flow of fluid between a first passageway and a second passageway. The valve can be solenoid actuated so as to selectively permit and prohibit fluid flow communication between the first passageway and the second passageway. The valve can include a mechanical latch that provides a de-energized yet valve open position.	09-18-2014

Vinod Grover, Mercer Island, WA US

Patent application number	Description	Published
20090259828	EXECUTION OF RETARGETTED GRAPHICS PROCESSOR ACCELERATED CODE BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.	10-15-2009
20090259829	THREAD-LOCAL MEMORY REFERENCE PROMOTION FOR TRANSLATING CUDA CODE FOR EXECUTION BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.	10-15-2009
20090259832	RETARGETTING AN APPLICATION PROGRAM FOR EXECUTION BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.	10-15-2009
20090259996	PARTITIONING CUDA CODE FOR EXECUTION BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.	10-15-2009
20090259997	VARIANCE ANALYSIS FOR TRANSLATING CUDA CODE FOR EXECUTION BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.	10-15-2009
20100191930	TRANSACTIONAL MEMORY COMPATIBILITY MANAGEMENT - Transactional memory compatibility type attributes are associated with intermediate language code to specify, for example, that intermediate language code must be run within a transaction, or must not be run within a transaction, or may be run within a transaction. Attributes are automatically produced while generating intermediate language code from annotated source code. Default rules also generate attributes. Tools use attributes to statically or dynamically check for incompatibility between intermediate language code and a transactional memory implementation.	07-29-2010
20100218195	Software filtering in a transactional memory system - A method and apparatus for utilizing hardware mechanisms of a transactional memory system is herein described. Various embodiments relate to software-based filtering of operations from read and write barriers and read isolation barriers during transactional execution. Other embodiments relate to software-implemented read barrier processing to accelerate strong atomicity. Other embodiments are also described and claimed.	08-26-2010
20110145637	Performing Mode Switching In An Unbounded Transactional Memory (UTM) System - In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.	06-16-2011
20120079215	Performing Mode Switching In An Unbounded Transactional Memory (UTM) System - In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.	03-29-2012
20120254875	Method for Transforming a Multithreaded Program for General Execution - A technique is disclosed for executing a program designed for multi-threaded operation on a general purpose processor. Original source code for the program is transformed from a multi-threaded structure into a computationally equivalent single-threaded structure. A transform operation modifies the original source code to insert code constructs for serial thread execution. The transform operation also replaces synchronization barrier constructs in the original source code with synchronization barrier code that is configured to facilitate serialization. The transformed source code may then be conventionally compiled and advantageously executed on the general purpose processor.	10-04-2012
20130113809	TECHNIQUE FOR INTER-PROCEDURAL MEMORY ADDRESS SPACE OPTIMIZATION IN GPU COMPUTING COMPILER - A device compiler and linker is configured to optimize program code of a co-processor enabled application by resolving generic memory access operations within that program code to target specific memory spaces. In situations where a generic memory access operation cannot be resolved and may target constant memory, constant variables associated with those generic memory access operations are transferred to reside in global memory.	05-09-2013
20130117548	ALGORITHM FOR VECTORIZATION AND MEMORY COALESCING DURING COMPILING - One embodiment of the present invention sets forth a technique for reducing the number of assembly instructions included in a computer program. The technique involves receiving a directed acyclic graph (DAG) that includes a plurality of nodes, where each node includes an assembly instruction of the computer program, hierarchically parsing the plurality of nodes to identify at least two assembly instructions that are vectorizable and can be replaced by a single vectorized assembly instruction, and replacing the at least two assembly instructions with the single vectorized assembly instruction.	05-09-2013
20130117734	TECHNIQUE FOR LIVE ANALYSIS-BASED REMATERIALIZATION TO REDUCE REGISTER PRESSURES AND ENHANCE PARALLELISM - A device compiler and linker within a parallel processing unit (PPU) is configured to optimize program code of a co-processor enabled application by rematerializing a subset of live-in variables for a particular block in a control flow graph generated for that program code. The device compiler and linker identifies the block of the control flow graph that has the greatest number of live-in variables, then selects a subset of the live-in variables associated with the identified block for which rematerializing confers the greatest estimated profitability. The profitability of rematerializing a given subset of live-in variables is determined based on the number of live-in variables reduced, the cost of rematerialization, and the potential risk of rematerialization.	05-09-2013
20130117735	ALGORITHM FOR 64-BIT ADDRESS MODE OPTIMIZATION - One embodiment of the present invention sets forth a technique for extracting a memory address offset from a 64-bit type-conversion expression included in high-level source code of a computer program. The technique involves receiving the 64-bit type-conversion expression, where the 64-bit type-conversion expression includes one or more 32-bit expressions, determining a range for each of the one or more 32-bit expressions, calculating a total range by summing the ranges of the 32-bit expressions, determining that the total range is a subset of a range for a 32-bit unsigned integer, calculating the memory address offset based on the ranges for the one or more 32-bit expressions, and generating at least one assembly-level instruction that references the memory address offset.	05-09-2013
20130198494	METHOD FOR COMPILING A PARALLEL THREAD EXECUTION PROGRAM FOR GENERAL EXECUTION - A technique is disclosed for executing a compiled parallel application on a general purpose processor. The compiled parallel application comprises parallel thread execution code, which includes single-instruction multiple-data (SIMD) constructs, as well as references to intrinsic functions conventionally available in a graphics processing unit. The parallel thread execution code is transformed into an intermediate representation, which includes vector instruction constructs. The SIMD constructs are mapped to vector instructions available within the intermediate representation. Intrinsic functions are mapped to corresponding emulated runtime implementations. The technique advantageously enables parallel applications compiled for execution on a graphics processing unit to be executed on a general purpose central processing unit configured to support vector instructions.	08-01-2013
20130300752	SYSTEM AND METHOD FOR COMPILER SUPPORT FOR KERNEL LAUNCHES IN DEVICE CODE - A system and method for compiling source code (e.g., with a compiler). The method includes accessing a portion of device source code and determining whether the portion of the device source code comprises a piece of work to be launched on a device from the device. The method further includes determining a plurality of application programming interface (API) calls based on the piece of work to be launched on the device and generating compiled code based on the plurality of API calls. The compiled code comprises a first portion operable to execute on a central processing unit (CPU) and a second portion operable to execute on the device (e.g., GPU).	11-14-2013
20130304996	METHOD AND SYSTEM FOR RUN TIME DETECTION OF SHARED MEMORY DATA ACCESS HAZARDS - A system and method for detecting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises an initialization bit as well as access type information, collectively called the state tracking bits for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. The second access is identified based on a status of the state tracking bits. The method also includes determining a hazard based on a first type of access and a second type of access to the shared memory location. Information related to the first access is provided in the table.	11-14-2013
20130305021	METHOD FOR CONVERGENCE ANALYSIS BASED ON THREAD VARIANCE ANALYSIS - Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler.	11-14-2013
20130305252	METHOD AND SYSTEM FOR HETEROGENEOUS FILTERING FRAMEWORK FOR SHARED MEMORY DATA ACCESS HAZARD REPORTS - A system and method for detecting, filtering, prioritizing and reporting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises initialization information for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a potential conflict by identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. First information associated with a first access and second information associated with the second access to the location is determined. Filter criteria is applied to the first and second information to determine whether the instruction causes a reportable hazard. The instruction is reported when it causes the reportable hazard.	11-14-2013
20140096147	SYSTEM AND METHOD FOR LAUNCHING CALLABLE FUNCTIONS - A system and method are provided for launching a callable function. A processing system includes a host processor, a graphics processing unit, and a driver for launching a callable function. The driver is adapted to recognize at load time of a program that a first function within the program is a callable function. The driver is further adapted to generate a second function. The second function is adapted to receive arguments and translate the arguments from a calling convention for launching a function into a calling convention for calling a callable function. The second function is further adapted to call the first function using the translated arguments. The driver is also adapted to receive from the host processor or the GPU a procedure call representing a launch of the first function and, in response, launch the second function.	04-03-2014
20140164727	SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR OPTIMIZING THE MANAGEMENT OF THREAD STACK MEMORY - A system, method, and computer program product for optimizing thread stack memory allocation is disclosed. The method includes the steps of receiving source code for a program, translating the source code into an intermediate representation, analyzing the intermediate representation to identify at least two objects that could use a first allocated memory space in a thread stack memory, and modifying the intermediate representation by replacing references to a first object of the at least two objects with a reference to a second object of the at least two objects.	06-12-2014
20150205590	CONFLUENCE ANALYSIS AND LOOP FAST-FORWARDING FOR IMPROVING SIMD EXECUTION EFFICIENCY - One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.	07-23-2015

Patent applications by Vinod Grover, Mercer Island, WA US

Vinod Grover, Bellevue, WA US

Patent application number	Description	Published
20140143755	SYSTEM AND METHOD FOR INSERTING SYNCHRONIZATION STATEMENTS INTO A PROGRAM FILE TO MITIGATE RACE CONDITIONS - A system and method are provided for inserting synchronization statements into a program file to mitigate race conditions. The method includes reading a program file and determining one or more convergent statements in the program file. The method also includes inserting one or more synchronization statements in the program file between the determined convergent statements. The method further includes removing one or more of the inserted synchronization statements and writing the modified program file. The method may include, after removing the inserted synchronization statements, identifying to a user any remaining inserted synchronization statements.	05-22-2014

Vinod Grover, Redmond, WA US

Patent application number	Description	Published
20150212933	METHODS FOR REDUCING MEMORY SPACE IN SEQUENTIAL OPERATIONS USING DIRECTED ACYCLIC GRAPHS - Various disclosed embodiments are directed to methods and systems for reducing memory space in sequential computer-implemented operations. The method includes generating a directed acyclic graph (DAG) having a plurality of vertices and directed edges, wherein each edge connects a predecessor vertex to a successor vertex. Each vertex represents one of the computer-implemented operations and each directed edge represents output data generated by the operations. The method includes merging one of the predecessor vertex with one of the successor vertex by combining the operations of the predecessor vertex and the successor vertex if the predecessor and successor vertices are connected by a directed edge and there is only one directed edge originating from the predecessor vertex. The merger of the predecessor and the successor vertices reduces the number of directed edges in the DAG, resulting in a reduction of intermediate buffer memory required to store the output data.	07-30-2015

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Grover, WA

Benjamin T. Grover, Issaquah, WA US

Benjamin T. Grover, Maple Valley, WA US

Vinod Grover, Mercer Island, WA US

Vinod Grover, Bellevue, WA US

Vinod Grover, Redmond, WA US