Patent application number | Description | Published |
20090165646 | EFFLUENT GAS RECOVERY PROCESS FOR SILICON PRODUCTION - Effluent gas from a polysilicon reactor is directed to a gas separation membrane with a permeate gas being recycled to the reactor and the retentate being chilled with a cryogenic condenser using liquid cryogen. Liquid cryogen vaporized by the hot effluent gas may be stored or used to seal and/or chill the reactor or blanket a Si feed to a SiHCl3 reactor. | 07-02-2009 |
20090165647 | EFFLUENT GAS RECOVERY PROCESS FOR SILICON PRODUCTION - Purified SiHCl3 is used as a sweep gas across a permeate side of a gas separation membrane receiving effluent gas from a polysilicon reactor. The combined sweep gas and permeate is recycled to the reactor. | 07-02-2009 |
20090166173 | Effluent gas recovery process for silicon production - Purified SiHCl3 is used as a sweep gas across a permeate side of a gas separation membrane receiving effluent gas from a polysilicon reactor. The combined sweep gas and permeate is recycled to the reactor. | 07-02-2009 |
20090320519 | Recovery of Hydrofluoroalkanes - A mixture of air and one or more halogenated alkanes is directed to a gas separation membrane where it is separated into an oxygen, nitrogen, and moisture-enriched and halogenated alkane-depleted permeate and a halogenated alkane-enriched and oxygen, nitrogen, and moisture-depleted retentate. The retentate is directed to a cryogenic condenser where an amount of halogenated alkane is condensed therein. | 12-31-2009 |
20100077796 | Hybrid Membrane/Distillation Method and System for Removing Nitrogen from Methane - A hybrid gas separation membrane/cryogenic distillation method and system produces high purity gaseous methane from a gas mixture containing a majority of methane and a minority of nitrogen. | 04-01-2010 |
20100313750 | Method and System for Membrane-Based Gas Recovery - A fast gas is recovered from a feed gas containing a fast gas and at least one slow gas using a gas separation membrane. A controller may control a control valve associated with a partial recycle of a permeate gas from the membrane for combining with the feed gas. A controller may control a control valve associated with the backpressure of a residue gas from the membrane. | 12-16-2010 |
20110000257 | Effluent Gas Recovery System in Polysilicon and Silane Plants - Purified SiHCl | 01-06-2011 |
20130247761 | Method and System for Membrane-Based Gas Recovery - A fast gas is recovered from a feed gas containing a fast gas and at least one slow gas using a gas separation membrane. A controller may control a control valve associated with a partial recycle of a permeate gas from the membrane for combining with the feed gas. A controller may control a control valve associated with the backpressure of a residue gas from the membrane. | 09-26-2013 |
20130255483 | Method and System for Membrane-Based Gas Recovery - A fast gas is recovered from a feed gas containing a fast gas and at least one slow gas using a gas separation membrane. A controller may control a control valve associated with a partial recycle of a permeate gas from the membrane for combining with the feed gas. A controller may control a control valve associated with the backpressure of a residue gas from the membrane. | 10-03-2013 |
Patent application number | Description | Published |
20120198214 | N-WAY MEMORY BARRIER OPERATION COALESCING - One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed. | 08-02-2012 |
20130124838 | INSTRUCTION LEVEL EXECUTION PREEMPTION - One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity. | 05-16-2013 |
20130132711 | COMPUTE THREAD ARRAY GRANULARITY EXECUTION PREEMPTION - One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity. | 05-23-2013 |
20130166877 | SHAPED REGISTER FILE READS - One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers. | 06-27-2013 |
20130166882 | METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS WITHOUT INSTRUCTION DECODE - Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction. | 06-27-2013 |
20130212364 | PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS - One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency. | 08-15-2013 |
20130232322 | UNIFORM LOAD PROCESSING FOR PARALLEL THREAD SUB-SETS - One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available. | 09-05-2013 |
20130268715 | DYNAMIC BANK MODE ADDRESSING FOR MEMORY ACCESS - One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses. | 10-10-2013 |
20130311686 | MECHANISM FOR TRACKING AGE OF COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM - One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests. | 11-21-2013 |
20130311996 | MECHANISM FOR WAKING COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM - One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests. | 11-21-2013 |
20130311999 | RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER - One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests. | 11-21-2013 |
20140165072 | TECHNIQUE FOR SAVING AND RESTORING THREAD GROUP OPERATING STATE - A streaming multiprocessor (SM) included within a parallel processing unit (PPU) is configured to suspend a thread group executing on the SM and to save the operating state of the suspended thread group. A load-store unit (LSU) within the SM re-maps local memory associated with the thread group to a location in global memory. Subsequently, the SM may re-launch the suspended thread group. The LSU may then perform local memory access operations on behalf of the re-launched thread group with the re-mapped local memory that resides in global memory. | 06-12-2014 |
20140168245 | TECHNIQUE FOR PERFORMING MEMORY ACCESS OPERATIONS VIA TEXTURE HARDWARE - A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits. | 06-19-2014 |
20140173193 | TECHNIQUE FOR ACCESSING CONTENT-ADDRESSABLE MEMORY - A tag unit configured to manage a cache unit includes a coalescer that implements a set hashing function. The set hashing function maps a virtual address to a particular content-addressable memory unit (CAM). The coalescer implements the set hashing function by splitting the virtual address into upper, middle, and lower portions. The upper portion is further divided into even-indexed bits and odd-indexed bits. The even-indexed bits are reduced to a single bit using a XOR tree, and the odd-indexed are reduced in like fashion. Those single bits are combined with the middle portion of the virtual address to provide a CAM number that identifies a particular CAM. The identified CAM is queried to determine the presence of a tag portion of the virtual address, indicating a cache hit or cache miss. | 06-19-2014 |
20140173258 | TECHNIQUE FOR PERFORMING MEMORY ACCESS OPERATIONS VIA TEXTURE HARDWARE - A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits. | 06-19-2014 |
20140189260 | APPROACH FOR CONTEXT SWITCHING OF LOCK-BIT PROTECTED MEMORY - A streaming multiprocessor in a parallel processing subsystem processes atomic operations for multiple threads in a multi-threaded architecture. The streaming multiprocessor receives a request from a thread in a thread group to acquire access to a memory location in a lock-protected shared memory, and determines whether a address lock in a plurality of address locks is asserted, where the address lock is associated the memory location. If the address lock is asserted, then the streaming multiprocessor refuses the request. Otherwise, the streaming multiprocessor asserts the address lock, asserts a thread group lock in a plurality of thread group locks, where the thread group lock is associated with the thread group, and grants the request. One advantage of the disclosed techniques is that acquired locks are released when a thread is preempted. As a result, a preempted thread that has previously acquired a lock does not retain the lock indefinitely. | 07-03-2014 |
20140189329 | COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING - Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors. | 07-03-2014 |
20140189711 | COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING - Techniques are provided for restoring thread groups in a cooperative thread array (CTA) within a processing core. Each thread group in the CTA is launched to execute a context restore routine. Each thread group, executes the context restore routine to restore from a memory a first portion of context associated with the thread group, and determines whether the thread group completed an assigned function prior to executing the context restore routine. If the thread group completed an assigned function prior to executing the context restore routine, then the thread group exits the context restore routine. If the thread group did not complete the assigned function prior to executing the context restore routine, then the thread group executes one or more operations associated with a trap handler routine. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors. | 07-03-2014 |
20140281679 | SELECTIVE FAULT STALLING FOR A GPU MEMORY PIPELINE IN A UNIFIED VIRTUAL MEMORY SYSTEM - One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a selective fault-stalling pipeline. Upon detecting a memory access fault associated with an operation executing on a particular SM, a replay unit in the selective fault-stalling pipeline considers the operation as a faulting operation. Subsequently, instead of notifying the SM of the memory access fault, the replay unit recirculates the operation—reinserting the operation into the selective fault-stalling pipeline. Recirculating faulting operations in such a fashion enables the SM to execute other operation while the replay unit stalls the faulting request until the associated access fault is resolved. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a memory access fault, cancel the associated operation and subsequent operations. | 09-18-2014 |
20140372703 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR WARMING A CACHE FOR A TASK LAUNCH - A system, method, and computer program product for warming a cache for a task launch is described. The method includes the steps of receiving a task data structure that defines a processing task, extracting information stored in a cache warming field of the task data structure, and, prior to executing the processing task, generating a cache warming instruction that is configured to load one or more entries of a cache storage with data fetched from a memory. | 12-18-2014 |
20150103087 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR DISCARDING PIXEL SAMPLES - A system, method, and computer program product are provided for discarding pixel samples. The method includes the steps of completing shading operations for a pixel set including one or more pixels to generate per-sample shaded attributes according to a shader program executed by a processing pipeline. Discard information for the pixel set is evaluated and one or more per-sample shaded attributes for at least one pixel in the pixel set are discarded based on the evaluated discard information. | 04-16-2015 |
20150113254 | EFFICIENCY THROUGH A DISTRIBUTED INSTRUCTION SET ARCHITECTURE - A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction. | 04-23-2015 |