Class / Patent application number | Description | Number of patent applications / Date published |
707742000 | Inverted index | 74 |
20100094877 | SYSTEM AND METHOD FOR DISTRIBUTED INDEX SEARCHING OF ELECTRONIC CONTENT - There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes. | 04-15-2010 |
20100100552 | ROUTING XML QUERIES - A vast amount of information currently accessible over the Web, and in corporate networks, is stored in a variety of databases, and is being exported as XML data. However, querying this totality of information in a declarative and timely fashion is problematic because this set of databases is dynamic, and a common schema is difficult to maintain. The present invention provides a solution to the problem of issuing declarative, ad hoc XPath queries against such a dynamic collection of XML databases, and receiving timely answers. There is proposed a decentralized architectures, under the open and the agreement cooperation models between a set of sites, for processing queries and updates to XML data. Each site consists of XML data nodes. (which export their data as XML, and also pose queries) and one XML router node (which manages the query and update interactions between sites). The architectures differ in the degree of knowledge individual router nodes have about data nodes containing specific XML data. There is therefore provided a method for accessing data over a wide area network comprising: providing a decentralized architecture comprising a plurality of data nodes each having a database, a query processor and a path index, and a plurality of router nodes each having a routing state, maintaining a routing state in each of the router nodes, broadcasting routing state updates from each of the databases to the router nodes, routing path queries to each of the databases by accessing the routing state. | 04-22-2010 |
20100131515 | DOCUMENT SIMILARITY SCORING AND RANKING METHOD, DEVICE AND COMPUTER PROGRAM PRODUCT - A device, computer program product and a method for computing the similarity of a set of documents that avoids the large, wasted computational effort involved in calculating very small similarity scores by using thresholds to stop a similarity calculation between documents, thus ensuring that, with high probability, all document pairs with higher similarity than the thresholds have been found. | 05-27-2010 |
20100211572 | INDEXING AND SEARCHING JSON OBJECTS - Disclosed is a method of encoding JavaScript Object Notation (JSON) documents in an inverted index, wherein a tree representation of a JSON document is first generated, and, next, the JSON document is shredded into a list of tuples for each atom node, n, in the tree, where value is a label associated with n, path is a concatenation of node labels associated with ancestors of n, type is a description of a type of value, and jdewey of n is a partial Dewey code of its closest ancestor array node, if one exists, or empty, otherwise. Lastly, an inverted index is built using as index term, and jdewey as payload. A method is also described to search the inverted index. | 08-19-2010 |
20100262607 | System and Method for Automatic Matching of Contracts to Impression Opportunities Using Complex Predicates and an Inverted Index - A method for indexing advertising contracts for rapid retrieval and matching in order to match satisfying contracts to advertising slots. The descriptions of the advertising contracts include logical predicates indicating applicability to a particular demographic. Also, the descriptions of advertising slots contain logical predicates indicating applicability to a particular demographic, thus matches can be performed using at least matches on the basis of intersecting demographics. The disclosure contains structure and techniques for receiving a set of contracts with predicates, preparing a data structure index of the set of contracts, receiving an advertising slot with predicates, and structure and techniques for retrieving from the data structure contracts that satisfy a match to the advertising slot predicates. The disclosure includes cases were the predicates are presented in conjoint forms and in disjoint forms, and techniques are provided to consider indexing and matching in cases of IN predicates and well as NOT-IN predicates. | 10-14-2010 |
20100262608 | INDEX AGING AND MERGING - Systems and methods for processing an index are described. An index may be merged with another index of comparable age and size into a single index. Since older indexes are less likely to need updating, they are “set aside” to age based on certain adaptive criteria such as the age and size of the index, percentage of deletions, and how long it takes to update the index. An index that has been set aside may be compacted into a format that is optimized for fast searching. | 10-14-2010 |
20100318519 | Incremental Maintenance of Inverted Indexes for Approximate String Matching - In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy. | 12-16-2010 |
20110022600 | METHOD OF DATA RETRIEVAL, AND SEARCH ENGINE USING SUCH A METHOD - A method of data retrieval from a data repository in response to a query having either list of keywords and/or list of attribute-value pairs, the method comprising the steps of:
| 01-27-2011 |
20110066623 | Methods and Systems for Compressing Indices - Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry. | 03-17-2011 |
20110202541 | RAPID UPDATE OF INDEX METADATA - Systems and methods for performing an updating process to an in-memory index are provided. Upon receiving notice of document modifications covered by an inverted index associated with a search engine, in the form of an update file, a representation of the modification is published onto various index serving machines. Each index serving machine receiving the update file determines if the modifications are applicable to the index serving machine. If an index serving machine determines that it contains mapping information corresponding to the modified documents, the index serving machine utilizes the update file and associated mapping information to update an in-memory index. In embodiments, the in-memory index is used to provide results to user queries in tandem with the inverted index. In some embodiments, an extra in-memory index is maintained that is revised with constantly incoming metadata updates and the existing in-memory index is periodically swapped with the revised in-memory index. | 08-18-2011 |
20110219008 | INDEXING MULTIPLE TYPES OF DATA TO FACILITATE RAPID RE-INDEXING OF ONE OR MORE TYPES OF DATA - A method and indexing system indexes the content of a body of documents into a content index, and the metadata of the documents into a metadata index which is a parallel index to the content index. The metadata is copied into a data store that is easily accessible by the indexing system and is stored in native form. The indexing system can dynamically re-index the metadata from the native metadata in the data store to produce a new metadata index which is used to replace the original metadata index. Search queries received by a search engine associated with the indexing system are applied to both the content and metadata index and the results are merged for return. | 09-08-2011 |
20110258198 | USING BEHAVIOR DATA TO QUICKLY IMPROVE SEARCH RANKING - Systems and methods for applying user behavior data to improve serach query result ranking are provided. Upon receiving an update file indicating that recent, significant user behavior data is available for a document associated with an inverted index, the update file is published periodically and frequently to an index server. After filtering out the relevant update information from the update file, the index server extracts identifiers of the documents having the associated user behavior data. The update file and the identifier of the documents are utilized to update an in-memory index containing representations of metadata indicative of the user behavior. The in-memory index is continuously updated and utilized to serve search query results in response to user search queries. Search query results from the in-memory index are ranked using the user behavior data prior to serving. Thus, results associated with recent, significant user-behavior metadata receive prominent placement on the search results page. | 10-20-2011 |
20110289093 | UPDATING AN INVERTED INDEX - Systems and methods for processing an index are described. To insure that the most updated index is available without having to update the index after every change (which can consume enormous resources), a specially marked postings list is generated for a changed item. During retrieval, the specially marked postings list supplements the existing content of an inverted index referencing the changed item. In this manner, the retrieval result for items containing the term under which the changed item was originally indexed is updated in accordance with the specially marked postings list to insure the most accurate retrieval result. | 11-24-2011 |
20120005214 | ORDERED INDEX - Systems and methods for processing an index are described. A postings list of items containing a particular term are ordered in a desired retrieval order, e.g., most recent first. The ordered items are inserted into an inverted index in the desired retrieval order, resulting in an ordered inverted index from which items may be efficiently retrieved in the desired retrieval order. During retrieval, items may first be retrieved from a live index, and the retrieved items from the live and ordered indexes may be merged. The retrieved items may also be filtered in accordance with the items' file grouping parameters. | 01-05-2012 |
20120059828 | Methods and Systems for Compressing Indices - Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry. | 03-08-2012 |
20120084296 | Method and Apparatus for Searching a Hierarchical Database and an Unstructured Database with a Single Search Query - Techniques for searching a hierarchical database and an unstructured database with a single search query are described herein. | 04-05-2012 |
20120089611 | METHOD OF UPDATING AN INVERTED INDEX, AND A SERVER IMPLEMENTING THE METHOD - This method of updating an inverted index from at least one electronic document in which each electronic document is constituted by at least one ordered set of objects comprises, for each of said objects:
| 04-12-2012 |
20120109970 | METHODS FOR INDEXING AND SEARCHING BASED ON LANGUAGE LOCALE - In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine. | 05-03-2012 |
20120150867 | CLUSTERING A COLLECTION USING AN INVERTED INDEX OF FEATURES - Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query. | 06-14-2012 |
20120166445 | METHOD, APPARATUS AND COMPUTER READABLE MEDIUM FOR INDEXING ADVERTISEMENTS TO COMBINE RELEVANCE WITH CONSUMER CLICK FEEDBACK - A method and apparatus are provided for better web ad matching by combining relevance with consumer click feedback. In one example, the method includes receiving a query page, extracting features from the query page, re-weighting the query page, evaluating the query page in light of each ad in order to score each ad and pick substantially best ad matches of the indexed ads, and returning the substantially best ad matches to the consumer computer. | 06-28-2012 |
20120179689 | DIRECTORY TREE SEARCH - Directory tree searching uses a path index to determine a set of documents tor a directory path portion of a search query. The set of documents for the directory path portion is evaluated with a set of document for an indexed term portion of the search query to determine common documents. | 07-12-2012 |
20120259862 | Method and apparatus for processing A query - Provided are a method and apparatus for processing a query. The method includes generating string sets comprising a plurality of partial strings from a query string, determining a subset of the string sets as a candidate set, and searching for a document comprising the query string from the candidate set. | 10-11-2012 |
20120303632 | COMPUTERIZED SEARCHABLE DOCUMENT REPOSITORY USING SEPARATE METADATA AND CONTENT STORES AND FULL TEXT INDEXES - A computerized searchable repository stores documents as structured metadata parts and unstructured content parts using single instancing. A full text index used for keyword searching includes a metadata index and a content index. A linking structure includes metadata-to-content (MD to CT) links and content-to-metadata (CT to MD) linking entries, with each MD to CT link linking a metadata part of a document to each content part of the document, and each CT to MD linking entry having one or more CT to MD links collectively linking a content part to the metadata parts of the documents that include the content part. Indexing includes metadata indexing a metadata part, conditionally content indexing a content part, and updating the linking structure. Content indexing is performed only if the content part does not match a content part already stored and indexed. Index entries each associate a key word or key value with corresponding metadata or content parts containing the key word or key value. Updating the linking structure includes generating new MD to CT and CT to MD links between the metadata part and either the new content part or an existing matching content part if present. | 11-29-2012 |
20120323927 | Method and System for Inverted Indexing of a Dataset - Methods and systems for providing an inverted index for a dataset are disclosed. The inverted index includes a position vector, with fields that correspond to values in the indexed dataset. The fields include data to be used in determining where each value appears in the dataset. The position vector is populated differently for different value types. A 1:1 value appears once in the dataset; a 1:n value appears multiple times. For a 1:1 value, the position vector stores information for where that value appears. For a 1:n value, the position vector stores a pointer, e.g. a memory reference, that identifies a list of locations where the value appears. The list can be encoded or otherwise compressed. A set of indicators can be stored for the fields indicating whether the field has 1:n or 1:1 value information. The indicator is used to control interpretation of the information in a field. | 12-20-2012 |
20130007004 | METHOD AND APPARATUS FOR CREATING A SEARCH INDEX FOR A COMPOSITE DOCUMENT AND SEARCHING SAME - A tool for generating at least one search index for a composite document, wherein the composite document comprises multiple component documents. The search index is generated by extracting characters from the document, segregating the characters into tokens of one or more characters, and determining location information of the tokens. The location information can include the page number of the component document and X, Y page coordinates for the tokens. The tool also provides a user interface that allows for searching of the composite document using at least one of the generated indexes. The user interface allows the user to enter one or more search terms and to select the criteria that will be used during the search. Results are presented to the user via a list of document names that are also hyperlinks to the document. The results documents are listed in order of relevancy, and fragments of text that contain the searched terms are also available to the user, for each document. | 01-03-2013 |
20130013616 | Systems and Methods for Natural Language Searching of Structured Data - The invention relates to searching structured data using natural language searches. More specifically and preferably, the invention relates to the use of an inverted file index built from generated documents to make data, typically unsearchable using a natural language search, searchable. | 01-10-2013 |
20130018891 | REAL-TIME SEARCH OF VERTICALLY PARTITIONED, INVERTED INDEXES - Provided are techniques for processing a query. A query including constraints for at least two vertically partitioned, inverted indexes is received. The constraints in the query are separated based on the vertically partitioned, inverted indexes. A document identifier iterator is obtained for each of the constraints, wherein each document identifier iterator is associated with a posting list, and wherein each posting list is ordered by document identifier order. A run-time join of the posting lists is performed to obtain a final result set. | 01-17-2013 |
20130073559 | Methods for Indexing and Searching Based on Language Locale - In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine. | 03-21-2013 |
20130086071 | AUGMENTING SEARCH WITH ASSOCIATION INFORMATION - Techniques and tools are described for augmenting search using association information. Searches can be performed using a combination of index information and association information. In some examples, index information is stored in a first data store and association information is stored in a second data store. Search queries can be received and modified using association information. Modified search queries can be executed using a combination of index information and association information. Index information can be generated by indexing a set of documents. Association information can be generated by monitoring user activity occurring between users and a set of documents. | 04-04-2013 |
20130097174 | Calculating Valence of Expressions within Documents for Searching a Document Index - Tools and techniques related to calculating valence of expressions within documents. These tools may provide methods that include receiving input documents for processing, and extracting expressions from the documents for valence analysis, with scope relationships occurring between terms contained in the expressions. The methods may calculate calculating valences of the expressions, based on the scope relationships between terms in the expressions. | 04-18-2013 |
20130138660 | SYSTEM AND METHOD FOR DISTRIBUTED INDEX SEARCHING OF ELECTRONIC CONTENT - There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes. | 05-30-2013 |
20130151533 | PROVISION OF QUERY SUGGESTIONS INDEPENDENT OF QUERY LOGS - Described herein are various technologies pertaining to provision of query suggestions to a user independent of a query log. Key phrases are automatically identified in documents of a document corpus, and a forward index and inverted index are generated. The forward index indexes key phrases by documents, and the inverted index indexes documents by key phrases. A query is received from a user, and documents relevant to the query are retrieved. Key phrases in the retrieved documents are identified via the forward index, and a subset of the key phrases are selected as query suggestions by determining coverage of the key phrases as identified in the inverted index. | 06-13-2013 |
20130151534 | MULTIMEDIA METADATA ANALYSIS USING INVERTED INDEX WITH TEMPORAL AND SEGMENT IDENTIFYING PAYLOADS - The addition of relative term positions, temporal positions, and segment identifiers to an inverted index allows for temporal and phrase queries of multimedia assets. Segment identifiers enable any search results to be examined in context. The system makes advantageous use of Lucene's binary payload functionality to store temporal data and segment identifiers as additional binary data for each term instance in the inverted index. The payloads are made up of three variable-length integers, which account for twelve extra bytes of metadata, which are stored for each term instance. A content database on a Master/Administrator server node provides the indexes for search into content in response to user events, returning results in JSON format. The search results may then be used to locate and present content segments to a user containing both requested search term results and the time location within the multimedia asset in which the search term(s) is found. | 06-13-2013 |
20130238631 | INDEXING AND SEARCHING ENTITY-RELATIONSHIP DATA - Method, system, and computer program product for indexing and searching entity-relationship data are provided. The method includes: defining a logical document model for entity-relationship data including: representing an entity as a document containing the entity's searchable content and metadata; dually representing the entity as a document and as a category; and representing each relationship instance for the entity as a category set that contains categories of all participating entities in the relationship. The method also includes: translating entity-relationship data into the logical document model; and indexing the entity-relationship data of the populated logical document model as an inverted index. The method may include searching indexed entity-relationship data using a faceted search, wherein the categories are all categories required for supporting faceted navigation. | 09-12-2013 |
20130254211 | MATCHING DOCUMENTS AGAINST MONITORS - Techniques and tools are described for matching documents against monitors. An index can be generated from a plurality of monitors, where the index represents the query logic of the plurality of monitors. The index can be searched using the documents as search queries. The searching can comprise matching the documents against the monitors using the query logic represented in the index. An index can be distributed to a plurality of computing devices to be searched at the plurality of computing devices, where each computing device searches a subset of a plurality of documents against the full index. Searching at the plurality of computing devices can be performed in parallel, and results can be aggregated at a central location. | 09-26-2013 |
20130262470 | DATA STRUCTURE, INDEX CREATION DEVICE, DATA SEARCH DEVICE, INDEX CREATION METHOD, DATA SEARCH METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - In an inverted list of each node in a taxonomy, among each node, an inverted list of the highest node is a list of integer values indicating an identifier of search subject data, and an inverted list of a node other than the highest node, in place of the identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than the node. Furthermore, a list of integer values in an inverted list of each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before the integer value in the block is converted into a bit string of a variable length integer code. | 10-03-2013 |
20130262471 | REAL TIME MAPPING OF USER MODELS TO AN INVERTED DATA INDEX FOR RETRIEVAL, FILTERING AND RECOMMENDATION - A catalog record is bridged to information stored in at least one inverted index by receiving an application user interface call associated with a predetermined filter request including a record identifier identifying a record in a relational database. A bitset is generated based on item identifiers in the record. The bitset is applied to at least one inverted index to obtain metadata associated with the item identifiers. | 10-03-2013 |
20130275436 | PSEUDO-DOCUMENTS TO FACILITATE DATA DISCOVERY - Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure. | 10-17-2013 |
20130290345 | String and Sub-String Searching Using Inverted Indexes - Inverted indexes for terms and for term separators are separately provided to minimize data redundancy. Search queries are parsed to identify terms and term separators, if any, and the corresponding inverted indexes are searched for responsive documents. Related apparatus, systems, techniques and articles are also described. | 10-31-2013 |
20130339369 | Search Method and Apparatus - The present disclosure provides techniques to solve problems (e.g., the low efficiency and a waste of resources) derived from conventional methods. These techniques may include extracting, by a computing device, the first N keywords appearing the most in target information published by target users as target words, and creating an inverted index based on information on a page of the target users and the target words, wherein the inverted index includes a target field and a page information field, and N is an integer. The computing device may receive an inquiry phrase and determine target users matching the inquiry phrase in the inverted index based on the inquiry phrase. The computing device may calculate a relevance between the matched target users and the inquiry phrase through the target field and the page information field, and return a certain result based on the relevance. | 12-19-2013 |
20140032567 | RESOURCE EFFICIENT DOCUMENT SEARCH - The present document relates to a system and method for searching a document using one or more search terms. In particular, the present document relates to a resource efficient method for searching a document within a database of documents. A method for determining an inverse index on an electronic device including a database is described. The inverse index is configured to map a plurality of text data entities from the database to a search term. The method includes determining a plurality of relevance vectors for a plurality of text data entities from the database. Determining a relevance vector for a text data entity from the database includes: selecting N terms which are descriptive of the text data entity; and determining the relevance vector from the selected N terms. Furthermore, the method includes determining the inverse index comprising a plurality of records. | 01-30-2014 |
20140059054 | PARALLEL GENERATION OF TOPICS FROM DOCUMENTS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhanced parallel latent Dirichlet allocation (PLDA+). A PLDA+ system is a system of multiple processors that are configured to generate topics from multiple documents. The multiple processors are designated as two types: document processors and matrix processors. The documents are distributed among the document processors. Generated topics are distributed among the matrix processors. Tasks performed on the document processors and matrix processors are segregated into two types of tasks: computation-bound tasks and communication-bound tasks. Computation-bound tasks are CPU intensive tasks; communication-bound tasks are network intensive tasks. Data placement and pipeline strategies are employed such that the computation-bound tasks and the communication-bound tasks are distributed to the processors in a balanced manner, and performed in parallel. | 02-27-2014 |
20140101167 | Creation of Inverted Index System, and Data Processing Method and Apparatus - The present disclosure relate to techniques for establishing an inverted indexing system and related data processing. The techniques may include writing, by a computing device, inverted indexes of a massive amount of data records into at least one inverted file. The computing device may then write description information of the written inverted file into a description file associated with the inverted file, and establish the inverted indexing system based on the inverted file and the description file of the inverted file. The techniques enhance efficiency in establishing the inverted indexing system and in processing data using the systems. | 04-10-2014 |
20140129566 | Method and Apparatus for Geographic Document Retrieval - A geographic document retrieval method (GDR) can be executed by a computer system to index, retrieve and rank geographical documents. Textual and spatial attributes of geographical documents are indexed separately using inverted index and spatial index, respectively. Spatial attributes of a document are represented as one or more contiguously closed regions of arbitrary shapes. Upon receiving an input query carrying a geographic representation of a location using arbitrary regions, the GDR method retrieves one or more documents by executing an overlap test between arbitrary regions from the query and the arbitrary regions associated with the documents. | 05-08-2014 |
20140129567 | SYSTEM FOR GENERATING INDEX RESISTANT AGAINST DIVULGING OF INFORMATION, INDEX GENERATION DEVICE, AND METHOD THEREFOR - In the present invention, scope search can be effectively performed in a database having encrypted registration information. A plurality of values, first identification information to identify the plurality of values, and a key are accepted as input. A value group is generated from the plurality of values. The value group is treated as a word group, and a secure index is generated from the word group, the first identification information, and the key. On the basis of a value to be retrieved and a key, trapdoor information for the value to be retrieved is generated. With respect to the generated secure index, a secure index assessment process is performed using the trapdoor information. When the value to be retrieved is assessed to be contained in the secure index as a result of the assessment process, second identification information to identify the secure index is output. | 05-08-2014 |
20140164388 | QUERY AND INDEX OVER DOCUMENTS - A document index is generated from a set of documents and is used to identify documents that match one or more queries. A tree is generated for each document with a node corresponding to each object of the document. The nodes of the generated trees are merged or combined to generate the document index, which is itself a tree. In addition, an inverted index is generated for each node of the index that identifies the tree(s) that the node originated from. When a query is received, the query is first executed against the document index tree: during the execution, proper set operations are applied to the inverted indices associated with the nodes matched by the query. The resulted set identifies the documents that may match the query. The query is then executed on the identified documents. | 06-12-2014 |
20140214853 | SEQUENTIAL CHAIN REGISTRY - Systems and methods are disclosed for tracking an object as it traverses a sequential chain. The relationships between the object, its movement through space and time, and the entities associated with the object at a discreet point of time are captured by a sequential chain. A unique identifier may be created that is continuously modified as the object traverses the sequential chain. The unique identifier may be used to capture relationship information between the object and its related entities and movements. | 07-31-2014 |
20140236962 | Updating An Inverted Index In A Real Time Fashion - Systems and methods for regularly updating portions of a merged index are provided. Initially, upon receiving an indication that modifications have occurred to content of web-based documents, dynamic update of index (DUI) objects that identify the documents and expose the modified content are composed by ascertaining relative positions of the modified content within the documents, and packaging identifiers of the documents, the relative positions, and metadata underlying the modified content into a message. The DUI objects are applied to an overloading index that maintains structured records of recent modifications. In particular, portions of the overloading index are targeted utilizing the document identifiers and the relative positions specified by the DUI object, thereby updating the targeted portions within the overloading index corresponding to the modified content without rewriting the entire overloading index. Periodically, an association process is invoked for grouping the merged index with the overloading index for search purposes. | 08-21-2014 |
20140324882 | METHOD AND SYSTEM FOR NAVIGATING COMPLEX DATA SETS - The present invention relates to systems and methods for storing, navigating and retrieving information. In particular, the present invention is concerned with systems and methods for storing data in, for retrieving data from, and for navigating large and/or complex datasets. The systems and methods of the present invention in particular are concerned with the materialization/denormalization of complex data sets comprising a plurality of large, interconnected but distinct data record collections. The materialization/denormalization of such data sets can be performed in a precomputation phase, prior to a browsing/searching operation. | 10-30-2014 |
20140337355 | Indexed Natural Language Processing - A method and computer program product for implementing indexed natural language processing are disclosed. Source document features including but not limited to terms, punctuation, parts-of-speech, phrases (including the syntactic types of the phrases), dependent clauses (including the syntactic types of the dependent clauses), independent clauses (including the syntactic types of the independent clauses), sentences, paragraphs, labeled document sections and document type and cognitive grammar constraints on the scope of influence and binding for the same are entered into an index by their begin and end byte offsets (or some alternative indexing method). Queries against the source documents are implemented as nested constructs that specify queries as sets that have terms or other sets as set elements and where sets may be constructed according to: 1) ordering (or the lack thereof); 2) boolean relations; 3) fuzzy relations; and 4) scoping according to: a) proximity; b) phrase inclusion; c) clause inclusion; d) sentence inclusion; e) paragraph inclusion; f) section inclusion; g) document type; and cognitive grammar constraints. Further, terms that are the components of a query are divided into sets according to the expected cognitive grammar relations between those terms as they would appear as surface forms in the source documents. As an aid to constructing queries in this manner, in some implementations, a surface form ontology is implemented in which the surface forms from which desired concepts can be expressed are represented according to their cognitive grammar compositions. Using these methods, queries can be composed that analyze the source documents via the intermediary of an index at a level of detail that has heretofore been possible only by application of standard Natural Language Processing (NLP) techniques directly to the source document. This novel application combining the strengths of cognitive grammar, surface form ontology and indexing results in information retrieval (IR) with significantly improved levels of recall and precision and information extraction (IE) with significantly improved flexibility and processing speeds over very large sets of data. | 11-13-2014 |
20140372450 | METHODS OF VIEWING AND ANALYZING HIGH CONTENT BIOLOGICAL DATA - The invention relates generally to a method for interactive viewing and analysis of high content data in a biological pathway context. The high content data maybe related to the expression of biomarkers within a tissue, cellular, or cellular compartment of individual cell such that the data may reveal patterns of expression to identify a biological process, a clinical diagnosis or prognosis. | 12-18-2014 |
20140379728 | MULTIFACETED SEARCH - A query is received that includes two or more facets of a multidimensional inverted index for a collection of documents. Each document is associated with at least one facet. Generation of the multidimensional inverted index includes creating one or more entries. Each entry includes a combination of two or more facets and a posting list of indications for the documents associated with respective facets of each entry. Each indication identifies a document. Generation of the index also includes determining documents associated with respective facets of the combination of each entry. The multidimensional inverted index is searched for an entry having the combination of two or more facets included in the query and a search result is returned. An indication for a document may be included in a posting list if it is determined that the document is associated with each facet of the combination of facets of the entry. | 12-25-2014 |
20150066947 | INDEXING APPARATUS AND METHOD FOR SEARCH OF SECURITY MONITORING DATA - An indexing apparatus and method for search of security monitoring data are provided. The indexing apparatus includes a data collection unit and a data index generation unit. The data collection unit collects data, that is, a basis of search of monitoring information, from a database in which security monitoring data has been stored. The data index generation unit generates file structure-based data in which indices have assigned to multiple search elements of the data collected by the data collection unit. | 03-05-2015 |
20150088901 | Extract Operator - In one embodiment, a method includes receiving, from a user, a search query requesting objects of a first object type. The search query includes an inner query requesting objects of a second object type. The method includes identifying objects of the second object type requested by the inner query using an inverted index of a data store corresponding to the second object type; identifying objects of the first object type requested by the search query using the identified objects of the second object type and a forward index of the data store corresponding to the second object type; and sending search results to the user responsive to the search query, each search result corresponding to an identified object of the first object type. | 03-26-2015 |
20150142821 | DATABASE SYSTEM FOR ANALYSIS OF LONGITUDINAL DATA SETS - A database system performs analytics on longitudinal data, such as medical histories with events occurring to patients over time. Input data is processed into streams of events. A set of indexes of event characteristics is generated. A set of patient event histories, partitioned by patient, is generated. Several copies of event data are stored, each copy being structured to support a specific analytical task. Data is partitioned and distributed over several hardware nodes to allow parallel queries. Definitions of sets of candidate patients are translated into sets of filters applied to the set of indexes. Data for these candidates are input to analytical modules. Reports from analysis are automatically generated to be compatible with standard guidelines for reporting. Workflows support one task or a set of closely related tasks by offering the user a defined sequence of query options and analytic choices specifically arranged for the task. | 05-21-2015 |
20150324421 | TRANSFORMING QUERIES IN A MULTI-TENANT DATABASE SYSTEM - In a method, system, and computer-readable medium having instructions for executing a query in a database system, a query request is received with a query predicate to filter data returned in response to the query request and the query predicate has a formula, the query request is transformed to a transformed query request by preprocessing the formula in the query predicate, and the query request is optimized using the transformed query request. | 11-12-2015 |
20150339408 | UNIVERSAL TRANSACTION REPOSITORY - Embodiments disclosed herein relate to systems, methods, and computer program products for providing an extensible input database and associated reference database. In some embodiments, the system and method provide an extensible input database and a graphical user interface for inputting data into the extensible input database; receive data from a user via the graphical user interface, the data comprising content for the extensible input database; generate a key for a reference database based on the content received from the user; populate the extensible input database with the content; and associate the content in the extensible input database with the key in the reference database. The extensible input database is flexible in receiving different types of data and reduces the number of databases needed in order to store different types of data. | 11-26-2015 |
20150347482 | OPTIMIZING A CONTENT INDEX FOR TARGET AUDIENCE QUERIES - Apparatus and methods are provided for indexing electronic content to be served to users' mobile and/or stationary communications and computing devices. An index is composed of multiple slices, with each slice storing multiple entries and each entry representing one content item or one campaign or collection of content items. An entry is populated with tokens representing attribute/value pairs of a target audience of the content item and/or property/value pairs of the item or the item's campaign. A query or request to identify content items for serving to a particular user is similarly formatted with tokens representing attribute/value pairs of the user and/or item/campaign. Queries can then be executed rapidly across any or all index entries in any or all slices. Within a slice, entries may be sorted by value or score, and integer components within an individual entry may be sorted to facilitate rapid comparison with a query. | 12-03-2015 |
20150356169 | METHODS AND SYSTEMS FOR INDEXING REFERENCES TO DOCUMENTS OF A DATABASE AND FOR LOCATING DOCUMENTS IN THE DATABASE - Methods and systems allow indexing references to documents of a database according to database reference profiles. Documents may then be located in the database using decoding protocols based on the database reference profiles. To this end, the documents are stored in the database and searchable terms extracted therefrom are associated with posting lists. Each posting list is divided into blocks of M database references. The blocks are encoded according to a pattern that depends on the M database references. A corresponding pointer to a table of encoding patterns is appended to each block. When a query is received for a searchable term, blocks are extracted from a posting list corresponding to the searchable term and a pointer for each block is used to extract a decoding protocol related to an encoding pattern for the block. | 12-10-2015 |
20160004736 | SIMILAR DATA SEARCH DEVICE,SIMILAR DATA SEARCH METHOD,AND COMPUTER-READABLE STORAGE MEDIUM - A similar data search device includes: an inverted index generating unit which determines size ranges of sets of search targets for each of inverted indexes so that the number of sets of search targets is not smaller than a specified number and generates inverted indexes by dividing the sets of search targets according to the determined size ranges; an unnecessary inverted index identifying unit which determines, based on a size of a set of search conditions and a threshold value specified for a similarity between sets, a condition necessary for the similarity to be no smaller than the threshold value, and identifies, as an inverted index unnecessary for searches, any inverted index other than those inverted indexes containing a set whose minimum size value satisfies the condition; and a data search unit which conducts a search on a non-identified inverted index. | 01-07-2016 |
20160012090 | Inverted index and inverted list process for storing and retrieving information | 01-14-2016 |
20160012092 | INVERTED TABLE FOR STORING AND QUERYING CONCEPTUAL INDICES | 01-14-2016 |
20160034563 | FACILITATING EXECUTION OF CONCEPTUAL QUERIES CONTAINING QUALITATIVE SEARCH TERMS - The disclosed embodiments relate to a system that facilitates performing searches based on qualitative search terms. During operation, the system receives a query that applies a qualitative search term to an attribute of data items in a set of data items. While executing the query, the system processes each data item in the set of data items by extracting an attribute value from the data item and then using a concept-mapping to determine a compatibility index for the attribute value, wherein the concept-mapping associates each attribute value with a numerical compatibility index that indicates a compatibility between the attribute value and the qualitative search term. Finally, the system uses the compatibility index as a factor in determining whether to include the data item in a set of query results. | 02-04-2016 |
20160048585 | BLOOM FILTER WITH MEMORY ELEMENT - Techniques are provided for determining if an element is contained in a set of elements. In one aspect, an element may be received and inserted into a bloom filter. The element may also be inserted into a memory associative on the bloom filter indexes. In another aspect, a search element may be received and compared to a bloom filter. If the search element is included in the bloom filter, a memory may be used to determine if the search element is included in the set of elements. | 02-18-2016 |
20160062731 | SHORTLIST COMPUTATION FOR SEARCHING HIGH-DIMENSIONAL SPACES - Techniques are disclosed for indexing and searching high-dimensional data using inverted file structures and product quantization encoding. An image descriptor is quantized using a form of product quantization to determine which of several inverted lists the image descriptor is to be stored. The image descriptor is appended to the corresponding inverted list with a compact coding using a product quantization encoding scheme. When processing a query, a shortlist is computed that includes a set of candidate search results. The shortlist is based on the orthogonality between two random vectors in high-dimensional spaces. The inverted lists are traversed in the order of the distance between the query and the centroid of a coarse quantizer corresponding to each inverted list. The shortlist is ranked according to the distance estimated by a form of product quantization, and the top images referred to by the ranked shortlist are reported as the search results. | 03-03-2016 |
20160063046 | METHODS AND SYSTEMS FOR INDEXING REFERENCES TO DOCUMENTS OF A DATABASE AND FOR LOCATING DOCUMENTS IN THE DATABASE - Methods and systems allow indexing references to documents of a database according to database reference profiles. Documents may then be located in the database using decoding protocols based on the database reference profiles. To this end, the documents are stored in the database and searchable terms extracted therefrom are associated with posting lists. Each posting list is divided into blocks of M database references. The blocks are encoded according to a pattern that depends on the M database references. A corresponding pointer to a table of encoding patterns is appended to each block. When a query is received for a searchable term, blocks are extracted from a posting list corresponding to the searchable term and a pointer for each block is used to extract a decoding protocol related to an encoding pattern for the block. | 03-03-2016 |
20160070734 | METHODS AND SYSTEMS FOR INDEXING REFERENCES TO DOCUMENTS OF A DATABASE AND FOR LOCATING DOCUMENTS IN THE DATABASE - Methods and systems allow indexing references to documents of a database according to database reference profiles. Documents may then be located in the database using decoding protocols based on the database reference profiles. To this end, the documents are stored in the database and searchable terms extracted therefrom are associated with posting lists. Each posting list is divided into blocks of M database references. The blocks are encoded according to a pattern that depends on the M database references. A corresponding pointer to a table of encoding patterns is appended to each block. When a query is received for a searchable term, blocks are extracted from a posting list corresponding to the searchable term and a pointer for each block is used to extract a decoding protocol related to an encoding pattern for the block. | 03-10-2016 |
20160070770 | SUGGESTING SOCIAL GROUPS FROM USER SOCIAL GRAPHS - A system and computer-implemented method for suggesting social groups is provided. Direct contacts connected to a user of a social networking service are identified. Secondary contacts are further identified, where each of the secondary contacts is connected to at least one of the direct contacts. A set of direct contacts is determined from the direct contacts based on connections between the direct contacts and the secondary contacts. The set of direct contacts is provided as a suggested social group. | 03-10-2016 |
20160092466 | SYSTEM AND METHOD FOR SUPPORTING ZERO-COPY BINARY RADIX TREE IN A DISTRIBUTED COMPUTING ENVIRONMENT - A system and method supports key management in a distributed computing environment such as a distributed data grid. A binary radix tree is used to intern a plurality of binary keys. The binary radix tree is serialized to a byte buffer and a view of the binary is created. A byte sequence interface to the nodes of the serialized binary radix tree allows use of references which refer to positions in the serialized binary radix tree instead of requiring byte array copes of the interned keys. Use of references into the byte array in place of a byte array copies of interned keys reduces the memory overhead associated with referrers such as reverse indices which make reference to values associated with the plurality of binary keys. The reduction in memory overhead enhances performance and capabilities of a distributed computing environment such as a distributed data grid. | 03-31-2016 |
20160103906 | GENERATING AND IMPLEMENTING LOCAL SEARCH ENGINES OVER LARGE DATABASES - Embodiments described herein are directed to providing local search engines over large databases. In one scenario, a computing system receives as inputs data records stored in a database. The computing parses the data records into file pairs that each include a keyword file and record ID file and merges file pairs into a keyword file and record ID file, where the keyword file includes keywords in sorted order, and where the record ID file includes a list of record IDs for keywords in the keyword file. The computing system further creates an offset file which stores offset values for starting addresses of record ID lists in the record ID file, and generates an index of keywords by assigning unique identifiers to keywords in the keyword file. The computing system also provides a query interface that allows the database's data records to be searched using the generated index of keywords. | 04-14-2016 |
20160110366 | Method and System for Storing, Retrieving, and Managing Data for Tags - This invention relates generally to a method and system for storing, retrieving, and managing data for tags that are associated in some manner to any type of object. More particularly, the present invention writes data to these tags, reads data from these tags, and manages data that is written to and/or read from these tags. In addition, the invention accesses and/or stores data associated with tags from or into repositories, constructs and maintains data structures from these repositories and responds to queries using the data structures. | 04-21-2016 |
20160124938 | CACHING OF DEEP STRUCTURES FOR EFFICIENT PARSING - A parsing method and system. The method includes generating an n-gram model of a domain and computing a tf-idf frequency associated with n-grams of the n-gram model. A list including a frequently occurring group of n-grams based on the tf-idf frequency is generated. The frequently occurring group of n-grams is transmitted to a deep parser component and a deep parse output from the deep parser component is generated. The deep parse output is stored within a cache and a processor verifies if a specified text word sequence of the deep parse output is available in the cache. | 05-05-2016 |
20160253359 | EFFICIENT IMAGE MATCHING FOR LARGE SETS OF IMAGES | 09-01-2016 |
20160378848 | POLYGON-BASED INDEXING OF PLACES - In particular embodiments, a method includes receiving a query for a specified place or a type of place, receiving an identification of a location of the computing device within a first map tile, identifying first places that are located at least partially within the first map tile and correspond to the query, the first places being identified in an index by records that correspond to the first map tile, and identifying second places that correspond to the query and are each located at least partially within second map tiles that include a parent map tile associated with the first map tile. The second places are identified in the index by one or more records that correspond to the second map tiles. The method further includes determining scores for places that include the first and second places based on one or more relevance factors. | 12-29-2016 |