Patent application number | Description | Published |
20080208856 | Classification-Based Method and Apparatus for String Selectivity Estimation - Histogram construction and selectivity estimation for string and substring match queries in databases of data having strings associated with attributes. The histogram construction counts string-attribute pairs in the documents, and outputs string-attribute-count triples sorted by count. The collection is partitions the collection into buckets. A synopsis is generated for the partition, having an average selectivity or count of the string-attribute-count triples in the partition and summary information representing the set of string-attribute pairs belonging to the bucket. Subsequent queries, both for exact and substring matches, use the synopsis to estimate the selectivity of buckets. | 08-28-2008 |
20080215542 | Method For Supporting Ontology-Related Semantic Queries in DBMSs with XML Support - A method for supporting semantic matching queries in a database management system (DBMS) by extracting and storing the transitive/subsumption relationships from a given ontology data in a DBMS with native XML support. These transitive relationships are transformed into a set of XML documents that are natural mappings of the hierarchical structure of the transitive relationships. A table function construct expresses semantic matching queries in a declarative manner. The semantic matching queried are automatically rewritten or translated into standard SQL/XML search operators such as XQuery, XPath and XMLExists, and executed by the SQL/XML DBMS on the given instance data and the extracted transitive relationships data. | 09-04-2008 |
20080259084 | METHOD AND APPARATUS FOR ORGANIZING DATA SOURCES - A method and apparatus for organizing deep Web services are provided. In one aspect, the method and apparatus obtains a collection of sources and their associated attributes and/or input modes, for instance, using a crawling algorithm. The method and apparatus uses this information to organize the sources into communities. A mining algorithm such as the hyperclique mining algorithm is used to obtain cliques of highly correlated attributes. A clustering algorithm such as the hierarchical agglomerative clustering algorithm is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures. The sources that are associated with each signature form a community and a graph representation of the communities is constructed, where the vertices are communities and the edges are the shared attributes. | 10-23-2008 |
20080270367 | SYSTEM AND METHOD FOR SEARCHING DEEP WEB SERVICES - A system and method for searching deep web services are provided. The system and method in one aspect allow organizing communities, sources and schema attributes in a multi-tier containment relationship; searching representative schema attributes in one or more communities; searching representative services in one or more communities; searching for related schema attributes; and searching for related communities. | 10-30-2008 |
20080270374 | METHOD AND SYSTEM FOR COMBINING RANKING AND CLUSTERING IN A DATABASE MANAGEMENT SYSTEM - A system for combining ranking and clustering in a query. Bit vectors are intersected on Boolean attributes resulting in a vector. Two summary grids are constructed by intersecting bit vectors on clustering and ranking attributes. The vector is intersected with each summary grid to obtain a filtered clustering and ranking grid. An algorithm is applied on the clustering grid to obtain clusters. Vectors associated with buckets in the clusters are intersected resulting in one vector for each cluster. The vector corresponding to each cluster is intersected with the ranking grid to obtain a modified grid. Buckets are pruned according to bounds of each bucket in the modified grid and a predetermined number to obtain candidate buckets containing the predetermined number of data. The data are retrieved and a ranking score is calculated. The top predetermined number of data are sorted according to ranking scores and a result is returned. | 10-30-2008 |
20080281626 | Enabling Interoperability Between Participants in a Network - Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified. | 11-13-2008 |
20080307104 | Methods and Apparatus for Functional Model-Based Data Provenance in Stream Processing Environments - Techniques for deriving a provenance of one or more of a plurality of output data elements generated from a given output port of a PC are provided. At least one dependency function is created that relates the one or more output data elements to a set of one or more input ports of the PC and a corresponding plurality of input data elements. The dependency function comprises an encoding of at least one of one or more temporal filters and one or more sequence filters relating to the plurality of input data elements. The at least one dependency function is stored. A history of stream-level bindings of one or more input streams to one or more input ports of the processing component and one or more output streams from one or more output ports of the processing component is stored. The plurality of input data elements belonging to the one or more input streams and the plurality of output data elements belonging to the one or more output streams are stored. The set of one or more input data elements from the plurality of input data elements are determined that relate to the one or more output data elements in accordance with at least one dependency function and the history of stream-level bindings. | 12-11-2008 |
20090204551 | Learning-Based Method for Estimating Costs and Statistics of Complex Operators in Continuous Queries - A learning-based method for estimating costs or statistics of an operator in a continuous query includes a cost estimation model learning procedure and a model applying procedure. The model learning procedure builds a cost estimation model from training data, and the applying procedure uses the model to estimate the cost associated with a given query. The learning procedure uses a feature extractor, a confidence adjustor and a cost estimator. The feature extractor collects relevant training data and obtains feature values. The extracted feature values are associated with costs and used to create the cost estimator. The extracted feature values, the associated costs, the cost estimator, and a user interface are used to create a confidence adjuster. When applying the confidence adjuster and the cost estimator to a continuous stream of data, the feature extractor extracts feature values from the data stream, uses the extracted feature values as input into the confidence adjuster to determine whether or not the cost estimator should be used, and if so, uses the extracted feature values as inputs into the cost estimator to obtain the desired cost values. | 08-13-2009 |
20090292729 | Method and Apparatus for Maintaining and Processing Provenance Data in Data Stream Processing System - Techniques are disclosed for maintaining and processing provenance data in such data stream processing systems. For example, a method for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, comprising the following steps. A portion of data associated with the data stream is maintained. The maintained data comprises inputs to each processing element that contributed to an output of each processing element. In response to an alert generated by one of the processing elements, a scheduler is triggered to determine when a pre-calculation of a prospective query related to the alert should be executed. In response to the scheduler, at least a portion of the maintained data is used to determine a set of data that contributed to the alert such that the alert-contributing set of data can be used to respond to the prospective query upon arrival thereof. | 11-26-2009 |
20090292818 | Method and Apparatus for Determining and Validating Provenance Data in Data Stream Processing System - Techniques are disclosed for determining and validating provenance data in such data stream processing systems. For example, a method for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, comprises the following steps. Input data elements and output data elements associated with at least one processing element of the plurality of processing elements are obtained. One or more intervals are computed for the processing element using data representing observations of associations between inputs elements and output elements of the processing element, wherein, for a given one of the intervals, one or more particular input elements contained within the given interval are determined to have contributed to a particular output element. In another method, intervals are specified, and then validated by comparing the specified intervals against intervals computed based on observations. | 11-26-2009 |
20090307006 | METHOD OF COLLABORATIVE EVALUATION INFRASTRUCTURE TO ASSESS THE QUALITY OF HEALTHCARE CLNICAL DECISION ACTORS - A voting system employing individual healthcare actors is described wherein votes representing the relation between target measurements and actual measurements are aggregated and used to determine treatment of patients. | 12-10-2009 |
20100011030 | STATISTICS COLLECTION USING PATH-IDENTIFIERS FOR RELATIONAL DATABASES - Disclosed are a system, method, and computer readable medium for collecting statistics associated with data in a database. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory as determined is allocated for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated. | 01-14-2010 |
20100145986 | Querying Data and an Associated Ontology in a Database Management System - A method, apparatus, and computer program for querying data and an associated ontology in a database. An ontology is associated with data in database. Responsive to receiving a query from a requestor, relational data in the database is identified using the query to form identified relational data. Ontological knowledge in the ontology is identified using the identified relational data and the ontology. A result is returned to the requestor. | 06-10-2010 |
20100312779 | ONTOLOGY-BASED SEARCHING IN DATABASE SYSTEMS - A method, information processing system, and computer program storage product retrieve data from a database. A search request is received from a user for a set of data in at least one database. An ontology query over is performed over at least one ontology associated with at least one database resulting in an ontological dataset associated with the search request in response to receiving the search request from the user. The ontological dataset includes at least one of a set of synonyms, a set of hypernyms, and a set of hyponyms, associated with the search request. A data query is performed over data in the at least one database using the ontological dataset in response to performing the ontology query. The set of data is returned to the user based on the data query that has been performed. | 12-09-2010 |
20110282652 | MAPPING OF RELATIONSHIP ENTITIES BETWEEN ONTOLOGIES - Methods, apparatus and systems, including computer program products, for reducing an error rate when mapping entities between a first ontology and a second ontology. One or more of a general language dictionary and an industry-specific dictionary are provided. Natural language processing of the first ontology is performed to identify one or more candidate relationship entities in the first ontology. Each candidate relationship entity includes a compound name having two or more semantic labels, and each candidate relationship entity has a name that neither exists in the general language dictionary or the industry-specific dictionary. Each of the one or more candidate relationship entities in the first ontology is mapped to one or more entities in the second ontology using one or more configurable computer-implemented mapping algorithms. | 11-17-2011 |
20120036110 | Automatically Reviewing Information Mappings Across Different Information Models - A computer-implemented method, system, and program product for automatically reviewing a mapping between information models. The method includes: receiving a mapping between an element in the first information model to an element in the second information model. Each element is associated with an element identifier and an element value, and the mapping signifies a relationship between the element in the first information model and the element in the second information model. The method further includes comparing the received mapping against one or more known indications of suspicious mappings to determine if the received mapping resembles one of the indications of suspicious mappings. If the received mapping is determined to be suspicious, identifying the received mapping as one that requires review. | 02-09-2012 |
20120047114 | ENFORCING QUERY POLICIES OVER RESOURCE DESCRIPTION FRAMEWORK DATA - A method of performing a graph query issued by a user is provided. The method includes performing on a processor, receiving a user graph query; rewriting the user graph query as a new query based on a query policy expressed in a graph query language; and performing the new query on graph data to obtain a result. | 02-23-2012 |
20120047124 | DATABASE QUERY OPTIMIZATIONS - A method of processing a query is provided. The method includes performing on a processor: receiving a database query that includes a plurality of predicates that associate a subject with an object, where one or more of the predicates is a variable predicate; generating at least one new query by selectively replacing the at least one variable predicate in the database query with a non-variable predicate; and performing the at least one new database query on a database to obtain a query result. | 02-23-2012 |
20130179464 | SYSTEM AND METHOD FOR PROVENANCE FUNCTION WINDOW OPTIMIZATION - A system and method for managing provenance data are disclosed. In accordance with one method, input data elements assessed by a processing element are evaluated. The method further includes determining whether an input window comprising the input data elements includes a sufficient amount of relevant input data. If the input window does not include a sufficient amount of relevant input data, then the input data elements are designated for reference in response to a provenance query. | 07-11-2013 |
20150046482 | TWO-LEVEL CHUNKING FOR DATA ANALYTICS - Two-level chunking for data analytics is disclosed. An example method includes dividing an array into fixed-size chunks. The method also includes dynamically combining the fixed-size chunks into a super-chunk, wherein a size of the super-chunk is based on parameters of a subsequent operation. | 02-12-2015 |
20150088936 | Statistical Analysis using a graphics processing unit - A data structure having plural elements may be divided into plural sections, each section including a portion of the plural elements. The data structure may include information related statistical analysis. Instructions may be generated to execute a function on the data structure on a section-by-section basis. These instructions may be executed by a graphics processing unit. | 03-26-2015 |