Patent application number | Description | Published |
20090006399 | COMPRESSION METHOD FOR RELATIONAL TABLES BASED ON COMBINED COLUMN AND ROW CODING - A robust method to compress relations close to their entropy while still allowing efficient queries. Column values are encoded into variable length codes to exploit skew in their frequencies. The codes in each tuple are concatenated and the resulting tuplecodes are sorted and delta-coded to exploit the lack of ordering in a relation. Correlation is exploited either by co-coding correlated columns, or by using a sort order that can leverage the correlation. Also presented is a novel Huffman coding scheme, called segregated coding, that preserves maximum compression while allowing range and equality predicates on the compressed data, without even accessing the full dictionary. Delta coding is exploited to speed up queries, by reusing computations performed on nearly identical records. | 01-01-2009 |
20090094258 | OFF-LOADING STAR JOIN OPERATIONS TO A STORAGE SERVER - A method, storage server, and computer readable medium for off-loading star-join operations from a host information processing system to a storage server. At least a first and second set of keys from a first and second dimension table, respectively are received from a host system. Each of the first and second set of keys is associated with at least one fact table. A set of locations associated with a set of foreign key indexes are received from the host system. A set of fact table indexes are traversed. At least a first set of Row Identifiers (“RIDs”) associated with the first set of keys and at least a second set of RIDs associated with the second set of keys are identified. An operation is performed on the first and second sets of RIDs to identify an intersecting set of RIDs. The intersecting set of RIDs are then stored. | 04-09-2009 |
20090113309 | ADAPTIVE GREEDY METHOD FOR FAST LIST INTERSECTION VIA SAMPLING - The embodiments of the invention provide a method of intersecting a group of lists. The method begins by performing a first selecting process including selecting a top list from the group of lists to leave remaining lists. The top list can be the smallest list of the group of lists. The method can also select a pair of lists from the group of lists, such that the pair of lists has the smallest intersection size relative to other pairs of lists of the group of lists. Next, the method estimates intersections of the remaining lists with the top list by estimating an amount of intersection between the remaining lists and the top list. This involves sampling a portion of the remaining lists. The method also includes identifying larger list pairs having smaller intersections sizes when compared to smaller list pairs having larger intersections sizes. | 04-30-2009 |
20090174583 | Method for Compressed Data with Reduced Dictionary Sizes by Coding Value Prefixes - The speed of dictionary based decompression is limited by the cost of accessing random values in the dictionary. If the size of the dictionary can be limited so it fits into cache, decompression is made to be CPU bound rather than memory bound. To achieve this, a value prefix coding scheme is presented, wherein value prefixes are stored in the dictionary to get good compression from small dictionaries. Also presented is an algorithm that determines the optimal entries for a value prefix dictionary. Once the dictionary fits in cache, decompression speed is often limited by the cost of mispredicted branches during Huffman code processing. A novel way is presented to quantize Huffman code lengths to allow code processing to be performed with few instructions, no branches, and very little extra memory. Also presented is an algorithm for code length quantization that produces the optimal assignment of Huffman codes and show that the adverse effect of quantization on the compression ratio is quite small. | 07-09-2009 |
20090248648 | METHOD FOR EVALUATING A CONJUNCTION OF EQUITY AND RANGE PREDICATES USING A CONSTANT NUMBER OF OPERATIONS - Methods are described to simultaneously apply conjugates of equality, range, and in-list predicates. A first set of methods are described for the simultaneous application of equality predicates. A second set of methods are described for the simultaneous application of a mixture of range and equality predicates. A third method is described for the simultaneous applying a mixture of in-list predicates. The described methods allow for quick evaluation of complex predicates as they efficiently implement the computation done per record, while maintaining the same execution time irrespective of the number of fields. | 10-01-2009 |
20090248699 | SYSTEM TO DELEGATE VIRTUAL STORAGE ACCESS METHOD RELATED FILE OPERATIONS TO A STORAGE SERVER USING AN IN-BAND RPC MECHANISM - A method is disclosed that places data-intensive subprocesses in close physical and logical proximity to the facility responsible for storing the data, so that high efficiencies at reduced cost are achieved. In one specific example, new computer programs, termed adjuncts, are added and placed in a logical partition on a storage facility so that they can be invoked using appropriate commands issued on the I/O channel. Further, programs or changes are added to existing programs on the host machine, wherein such programs or changes discover the function extensions and invoke them to perform data processing. | 10-01-2009 |
20090249023 | APPLYING VARIOUS HASH METHODS USED IN CONJUNCTION WITH A QUERY WITH A GROUP BY CLAUSE - A novel method is described for applying various hash methods used in conjunction with a query with a Group By clause. A plurality of drawers are identified, wherein each of the drawers is made up of a collection of cells from a single partition of a Group By column and each of the drawers being defined for a specific query. A separate hash table is independently computed for each of the drawers and a hashing scheme (picked from among a plurality of hashing schemes) is independently applied for each of the drawers. | 10-01-2009 |
20090254521 | FREQUENCY PARTITIONING: ENTROPY COMPRESSION WITH FIXED SIZE FIELDS - A frequency partitioning technique is introduced that amortizes the work of computing codeword lengths within a tuplecode by grouping together tuples that have the same pattern of codeword lengths. Specifically, the technique entropy codes and partitions column values in each column into disjoint sets called column partitions, assigns a codeword length to each of the column partitions, identifies cells (a combination of codeword lengths), and collectively storing tuples associated with each of the cells. | 10-08-2009 |
20090292714 | ADAPTIVE LAZY MERGING - A query processing method intersects two or more unsorted lists based on a conjunction of predicates. Each list comprises a union of multiple sorted segments. The method performs lazy segment merging and an adaptive n-ary intersecting process. The lazy segment merging comprises starting with each list being a union of completely unmerged segments, such that lookups into a given list involve separate lookups into each segment of the given list. The method intersects the lists according to the predicates while performing the lazy segment merging, such that the lazy segment merging reads in only those portions of each segment that are needed for the intersecting. As the intersecting proceeds and the lookups are performed, the intersecting selectively merges the segments together, based on a cost-benefit analysis of the cost of merging compared to the benefit produced by reducing a number of lookups. | 11-26-2009 |
20100042587 | Method for Laying Out Fields in a Database in a Hybrid of Row-Wise and Column-Wise Ordering - A method, system, and article are provided for employment of a hybrid layout of representation of data objects in computer memory. Columns of the database are separated based upon a classification of the columns. A vertical partition in the form of a bank is provided to receive an assignment of one or more data objects identified in the columns. Each bank is sized to be a divisor of a size of an associated hardware register. Assignment of data objects to banks organizes the data in a manner that support efficient query processing that mitigates the quantity of banks required to respond to the query. | 02-18-2010 |
20100049730 | EFFICIENT PREDICATE EVALUATION VIA IN-LIST - A predicate over a single column of a table is converted into at least one IN-list, wherein the IN-list is generated for a set of tuples of the column, and the generation is done over a data structure representing a set of distinct values of the column where the predicate applies and having a smaller cardinality than the table. The generated IN-list is evaluated over the set of tuples and the results of the evaluation are outputted as an evaluation of the predicate. | 02-25-2010 |
20100250517 | SYSTEM AND METHOD FOR PARALLEL COMPUTATION OF FREQUENCY HISTOGRAMS ON JOINED TABLES - According to one embodiment of the present invention, a method for the parallel computation of frequency histograms in joined tables is provided. The method includes reading data in a table row-by-row from a database system using a coordinator unit and distributing each read row to separate worker units. Each worker unit computes a partial frequency histogram for each column in the table in parallel. The partial histograms from the worker units are then merged and the coordinator unit sends the merged frequency histograms to the worker units. | 09-30-2010 |
20110040744 | SYSTEM, METHOD, AND APPARATUS FOR SCAN-SHARING FOR BUSINESS INTELLIGENCE QUERIES IN AN IN-MEMORY DATABASE - A computer-implemented method for scan sharing across multiple cores in a business intelligence (BI) query. The method includes receiving a plurality of BI queries, storing a block of data in a first cache, scanning the block of data in the first cache against a first batch of queries on a first processor core, and scanning the block of data against a second batch of queries on a second processor core. The first cache is associated with a first processor core. The block of data includes a subset of data stored in an in-memory database (IMDB). The first batch of queries includes two or more of the BI queries. The second batch of queries includes one or more of the BI queries that are not included in the first batch of queries. | 02-17-2011 |
20120078980 | COMPACT AGGREGATION WORKING AREAS FOR EFFICIENT GROUPING AND AGGREGATION USING MULTI-CORE CPUS - A system is described for creating compact aggregation working areas for efficient grouping and aggregation using multi-core CPUs. The system implements operations including computing a running aggregate for a group within a business intelligence (BI) query, and identifying a location to store running aggregate information within an aggregation working area of a cache. The aggregation working area includes first and second data structures. The first data structure stores running aggregate information that is associated with a group that is accessed frequently relative to a threshold. The second data structure stores running aggregate information that is associated with a group that is accessed infrequently relative to the threshold. The operations also include storing the running aggregate information in either the first or second data structure of the aggregation working area based on a characterization of the group as a frequently or infrequently accessed group. | 03-29-2012 |
20120117064 | ADAPTIVE CELL-SPECIFIC DICTIONARIES FOR FREQUENCY-PARTITIONED MULTI-DIMENSIONAL DATA - A cell-specific dictionary is applied adaptively to adequate cells, where the cell-specific dictionary subsequently optimizes the handling of frequency-partitioned multi-dimensional data. This includes improved data partitioning with super cells or adjusting resulting cells by sub-dividing very large cells and merging multiple small cells, both of which avoid the highly skewed data distribution in cells and improve the query processing. In addition, more efficient encoding is taught within a cell in case the distinct values that actually appear in that cell are much smaller than the size of the column dictionary. | 05-10-2012 |
20130262370 | Fast Predicate Table Scans Using Single Instruction, Multiple Data Architecture - An approach is provided in which a processor receives a scan request to scan data included in a data table. The processor selects a column in the data table corresponding to the scan request and retrieves column data entries from the selected column. In addition, the processor identifies the width of the selected column and selects a scan algorithm based upon the identified column width. In turn, the processor loads the column data entries into column data vectors and computes scan results from the column data vectors using the selected scan algorithm. | 10-03-2013 |
20130262519 | Fast Predicate Table Scans Using Single Instruction, Multiple Data Architecture - An approach is provided in which a processor receives a scan request to scan data included in a data table. The processor selects a column in the data table corresponding to the scan request and retrieves column data entries from the selected column. In addition, the processor identifies the width of the selected column and selects a scan algorithm based upon the identified column width. In turn, the processor loads the column data entries into column data vectors and computes scan results from the column data vectors using the selected scan algorithm. | 10-03-2013 |
20130325900 | INTRA-BLOCK PARTITIONING FOR DATABASE MANAGEMENT - A method for storing database information includes storing a table having data values in a column major order. The data values are stored in a list of blocks. The method also includes assigning a tuple sequence number (TSN) to each data value in each column of the table according to a sequence order in the table. The data values that correspond to each other across a plurality of columns of the table have equivalent TSNs. The method also includes assigning each data value to a partition based on a representation of the data value. The method also includes assigning a tuple map value to each data value. The tuple map value identifies the partition in which each data value is located. | 12-05-2013 |
20130325901 | INTRA-BLOCK PARTITIONING FOR DATABASE MANAGEMENT - A method for storing database information, including: storing a table having data values in a column major order, wherein the data values are stored in a list of blocks, assigning a tuple sequence number (TSN) to each data value in each column of the table according to a sequence order in the table, wherein data values that correspond to each other across a plurality of columns of the table have equivalent TSNs; assigning each data value to a partition based on a representation of the data value; and assigning a tuple map value to each data value, wherein the tuple map value identifies the partition in which each data value is located. | 12-05-2013 |
20140006379 | EFFICIENT PARTITIONED JOINS IN A DATABASE WITH COLUMN-MAJOR LAYOUT | 01-02-2014 |
20140006380 | EFFICIENT PARTITIONED JOINS IN A DATABASE WITH COLUMN-MAJOR LAYOUT | 01-02-2014 |
20140006381 | PREDICATE PUSHDOWN WITH LATE MATERIALIZATION IN DATABASE QUERY PROCESSING | 01-02-2014 |
20140006382 | PREDICATE PUSHDOWN WITH LATE MATERIALIZATION IN DATABASE QUERY PROCESSING | 01-02-2014 |
20140032569 | SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR REDUCING HASH TABLE WORKING-SET SIZE FOR IMPROVED LATENCY AND SCALABILITY IN A PROCESSING SYSTEM - System, method and computer program products for storing data by computing a plurality of hash functions of data values in a data item, and determining a corresponding memory location for one of the plurality of hash functions of data values in the data item. Each memory location is of a cacheline size wherein a data item is stored in a memory location. Each memory location can store a plurality of data items. A key portion of all data items is contiguously stored within the memory location, and a payload portion is contiguously stored within the memory location. Payload portions are packed as bit-aligned in a fixed-sized memory location, comprising a bucket in a bucketized hash table, each bucket sized to store multiple key portions and payload portions that are packed as bit-aligned in a fixed-sized bucket. Corresponding key portions are stored as compressed keys in said fixed-sized bucket. | 01-30-2014 |
20140074818 | MULTIPLICATION-BASED METHOD FOR STITCHING RESULTS OF PREDICATE EVALUATION IN COLUMN STORES - A system joins predicate evaluated column bitmaps having varying lengths. The system includes a column unifier for querying column values with a predicate and generating an indicator bit for each of the column values that is then joined with the respective column value. The system also includes a bitmap generator for creating a column-major linear bitmap from the column values and indicator bits. The column unifier also determines an offset between adjacent indicator bits. The system also includes a converter for multiplying the column-major linear bitmap with a multiplier to shift the indicator bits into consecutive positions in the linear bitmap. | 03-13-2014 |
20140085115 | DATA COMPRESSION USING DICTIONARY ENCODING - Embodiments relate to data compression using dictionary encoding. An aspect includes subdividing a table of uncompressed data into a first block and a second block of complete rows. Another aspect includes determining information about a frequency of occurrence of different values for each column of the first block. Another aspect includes selecting a row of the first block to be removed out of the first block using frequency of occurrence-information. Another aspect includes removing the a row out of the first block to form an updated first block and determining information about a frequency of occurrence of different values for each column of the updated first block. Another aspect includes deriving a dictionary containing code-words for encoding the values of the updated first block. Another aspect includes encoding the values of the updated first block based on the code-words. Another aspect includes adding the removed row to the second block. | 03-27-2014 |
20140214794 | JOIN OPERATION PARTITIONING - Partitioned join operations are performed between a first database object and a second database object by determining an agent group for an agent in response to the agent receiving rows of the second database object to process; partitioning the rows to determine a target hash table for each row and adding the partitioned rows to work to be performed by the agent group; and distributing the work for the group to agents of the group by assigning to a single agent all the rows associated with a particular hash table to perform a join operation on the assigned rows. Each partition is assigned a first counter value indicating an upper bound of a task id range that is most recently assigned to an agent in the agent group for processing, and a second counter value indicating the highest task id that has been processed for that partition. | 07-31-2014 |
20140214795 | DYNAMICALLY DETERMINING JOIN ORDER - A weight is determined for each of a plurality of join predicates for a join between one or more first database objects and one or more second database objects based on a join selectivity for each of the plurality of join predicates. The plurality of join predicates are sorted based on the determined weights. The join operation is performed joining the one or more first database objects with the one or more second database objects in accordance with an order of the sorted plurality of join predicates. | 07-31-2014 |
20140214796 | EFFICIENT JOIN WITH ONE OR MORE LARGE DIMENSION TABLES - Embodiments of the invention relate to processing queries that utilize fact and/or dimension tables. In one aspect, a pre-join filtering phase precedes a star join. The necessary conditions for the pre-join filtering are considered for a given SQL query, including an estimated size of the hash table exceeding a threshold and presence of a local predicate either on the fact table or one or more dimension tables that is not a large dimension table. Once the necessary conditions are satisfied, the execution of the query exploits the pre-join filtering to build a pre-join output filter from columns of a reduced fact table that joins with each large dimension table. Thereafter, all the dimension tables and the fact table are joined in a star join while exploiting each pre-join filter. Accordingly, the order of when joins occur is changed in order to reduce the size of the fact table and to work from the fact table to reduce the size of large dimension tables. | 07-31-2014 |
20140214855 | REDUCING COLLISIONS WITHIN A HASH TABLE - Collisions in hash tables are reduced by removing each empty bucket from a hash table and compacting the non-empty buckets, generating a map of the hash table indicating a status of the buckets of the hash table, and accessing data in the hash table by applying a hash key to the generated map to determine a corresponding bucket containing the data. | 07-31-2014 |
20140214900 | SUPPORTING FLEXIBLE TYPES IN A DATABASE - Providing database support. A first group of data are received, the first group of data being expressed in a first format, and a second group of data are received, the second group of data being expressed in a second format, the second format being different from the first format. The first and second groups of data are merged, and are represented in at least one common column. Such representing includes: maintaining the first and second formats; and providing a tuple map which provides reference to the first and second formats. | 07-31-2014 |
20140214912 | PERFORMING BATCHES OF SELECTIVE ASSIGNMENTS IN A VECTOR FRIENDLY MANNER - Embodiments of the invention relate to processing queries. A query operation to be performed on a table of data is translated into a series of bit level logical operations using expansion and/or saturation operations. A mask is created from the series of bit level logical operations. This mask is then simultaneously applied to multiple rows from the table of data. | 07-31-2014 |
20140372388 | HASHING SCHEME USING COMPACT ARRAY TABLES - Embodiments include a method, system, and computer program product for creating an array table. In one embodiment the method includes identifying keys associated with values in a database and identifying bits common between the plurality of keys using logical functions and removing the common bits to form condensed keys. The method also includes modulating the condensed keys using identified common bits to create transformed keys and populating the plurality of array tables using the transformed keys and associated values. | 12-18-2014 |
20140372392 | REDUCING COLLISIONS WITHIN A HASH TABLE - Collisions in hash tables are reduced by removing each empty bucket from a hash table and compacting the non-empty buckets, generating a map of the hash table indicating a status of the buckets of the hash table, and accessing data in the hash table by applying a hash key to the generated map to determine a corresponding bucket containing the data. | 12-18-2014 |
20140372407 | JOIN OPERATION PARTITIONING - Partitioned join operations are performed between a first database object and a second database object by determining an agent group for an agent in response to the agent receiving rows of the second database object to process; partitioning the rows to determine a target hash table for each row and adding the partitioned rows to work to be performed by the agent group; and distributing the work for the group to agents of the group by assigning to a single agent all the rows associated with a particular hash table to perform a join operation on the assigned rows. Each partition is assigned a first counter value indicating an upper bound of a task id range that is most recently assigned to an agent in the agent group for processing, and a second counter value indicating the highest task id that has been processed for that partition. | 12-18-2014 |
20140372411 | ON-THE-FLY ENCODING METHOD FOR EFFICIENT GROUPING AND AGGREGATION - Embodiments include a method and computer program product for encoding data while it is being processed as part of a query is provided. The method includes receiving a query request and determining a set of values associated with data to be encoded for completing the query request. The method also includes encoding those values such that any subsequent processing operations can be performed on the encoded values to complete the requested query. After performing the subsequent processing operations to complete the requested query, each value is decoded back to its original value. | 12-18-2014 |
20140372470 | ON-THE-FLY ENCODING METHOD FOR EFFICIENT GROUPING AND AGGREGATION - Embodiments include a system for encoding data while it is being processed. The system includes a processor, an encoder and a decoder. The processor is configured to process a query request by determining a set of values. The encoder is configured for encoding the set of values, such that a subsequent processing operation can be performed on the encoded values. The processor performs the subsequent processing operations. The decoder is configured for decoding each value back to its value prior to being encoded upon completion of the processor completing the requested query. | 12-18-2014 |
20140379985 | MULTI-LEVEL AGGREGATION TECHNIQUES FOR MEMORY HIERARCHIES - Embodiments include method, system, and computer program product for providing aggregation hierarchy that is related memory hierarchies. In one embodiment, the method includes determining capacity of a first level memory of a memory hierarchy for processing data relating to completion of an aggregation process and generating a per thread local look-up table in said first level memory upon determining said capacity. Upon the first level memory reaching capacity, a plurality of per thread partitions to store remaining data to complete the aggregation process in a second level memory of the memory hierarchy is generated such that each of said per-thread partitions includes an identical amount of data portion on each thread. The method also includes storing the per thread partitions in said second level memory and providing a single global look up table for each of the identical data portions. | 12-25-2014 |