Patent application number | Description | Published |
20090198646 | SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR AN ALGEBRAIC APPROACH TO RULE-BASED INFORMATION EXTRACTION - Systems, methods and computer program products for an algebraic approach to rule-based information extraction. Exemplary embodiments include a method for rule-based information extraction, the method including specifying an annotator using algebraic operators, wherein each algebraic operator describes annotations identification from text documents. | 08-06-2009 |
20100042587 | Method for Laying Out Fields in a Database in a Hybrid of Row-Wise and Column-Wise Ordering - A method, system, and article are provided for employment of a hybrid layout of representation of data objects in computer memory. Columns of the database are separated based upon a classification of the columns. A vertical partition in the form of a bank is provided to receive an assignment of one or more data objects identified in the columns. Each bank is sized to be a divisor of a size of an associated hardware register. Assignment of data objects to banks organizes the data in a manner that support efficient query processing that mitigates the quantity of banks required to respond to the query. | 02-18-2010 |
20110040744 | SYSTEM, METHOD, AND APPARATUS FOR SCAN-SHARING FOR BUSINESS INTELLIGENCE QUERIES IN AN IN-MEMORY DATABASE - A computer-implemented method for scan sharing across multiple cores in a business intelligence (BI) query. The method includes receiving a plurality of BI queries, storing a block of data in a first cache, scanning the block of data in the first cache against a first batch of queries on a first processor core, and scanning the block of data against a second batch of queries on a second processor core. The first cache is associated with a first processor core. The block of data includes a subset of data stored in an in-memory database (IMDB). The first batch of queries includes two or more of the BI queries. The second batch of queries includes one or more of the BI queries that are not included in the first batch of queries. | 02-17-2011 |
20110295853 | EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM - A data mashup system having information extraction capabilities for receiving multiple streams of textual data, at least one of which contains unstructured textual data. A repository stores annotators that describe how to analyze the streams of textual data for specified unstructured data components. The annotators are applied to the data streams to identify and extract the specified data components according to the annotators. The extracted data components are tagged to generate structured data components and the specified unstructured data components in the input data streams are replaced with the tagged data components. The system then combines the tagged data from the multiple streams to form a mashup output data stream. | 12-01-2011 |
20110295854 | AUTOMATIC REFINEMENT OF INFORMATION EXTRACTION RULES - A method and system for automatically refining information extraction (IE) rules. A provenance graph for IE rules on a set of test documents is determined. The provenance graph indicates a sequence of evaluations of the IE rules that generates an output of each operator of the IE rules. Based on the provenance graph, high-level rule changes (HLCs) of the IE rules are determined. Low-level rule changes (LLCs) of the IE rules are determined to specify how to implement the HLCs. Each LLC specifies changing an operator's structure or inserting a new operator in between two operators. Based on how the LLCs affect the IE rules and previously received correct results of applying the rules on the test documents, a ranked list of the LLCs is determined. The IE rules are refined based on the ranked list. | 12-01-2011 |
20120078980 | COMPACT AGGREGATION WORKING AREAS FOR EFFICIENT GROUPING AND AGGREGATION USING MULTI-CORE CPUS - A system is described for creating compact aggregation working areas for efficient grouping and aggregation using multi-core CPUs. The system implements operations including computing a running aggregate for a group within a business intelligence (BI) query, and identifying a location to store running aggregate information within an aggregation working area of a cache. The aggregation working area includes first and second data structures. The first data structure stores running aggregate information that is associated with a group that is accessed frequently relative to a threshold. The second data structure stores running aggregate information that is associated with a group that is accessed infrequently relative to the threshold. The operations also include storing the running aggregate information in either the first or second data structure of the aggregation working area based on a characterization of the group as a frequently or infrequently accessed group. | 03-29-2012 |
20120209844 | EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM - A data mashup system having information extraction capabilities for receiving multiple streams of textual data, at least one of which contains unstructured textual data. A repository stores annotators that describe how to analyze the streams of textual data for specified unstructured data components. The annotators are applied to the data streams to identify and extract the specified data components according to the annotators. The extracted data components are tagged to generate structured data components and the specified unstructured data components in the input data streams are replaced with the tagged data components. The system then combines the tagged data from the multiple streams to form a mashup output data stream. | 08-16-2012 |
20130318075 | DICTIONARY REFINEMENT FOR INFORMATION EXTRACTION - A method for refining a dictionary for information extraction, the operations including: inputting a set of extracted results from execution of an extractor comprising the dictionary on a collection of text, wherein the extracted results are labeled as correct results or incorrect results; processing the extracted results using an algorithm configured to set a score of the extractor above a score threshold, wherein the score threshold balances a precision and a recall of the extractor; and outputting a set of candidate dictionary entries corresponding to a full set of dictionary entries, wherein the candidate dictionary entries are candidates to be removed from the dictionary based on the extracted results. | 11-28-2013 |
20130318076 | REFINING A DICTIONARY FOR INFORMATION EXTRACTION - A method for refining a dictionary for information extraction, the operations including: inputting a set of extracted results from execution of an extractor comprising the dictionary on a collection of text, wherein the extracted results are labeled as correct results or incorrect results; processing the extracted results using an algorithm configured to set a score of the extractor above a score threshold, wherein the score threshold balances a precision and a recall of the extractor; and outputting a set of candidate dictionary entries corresponding to a full set of dictionary entries, wherein the candidate dictionary entries are candidates to be removed from the dictionary based on the extracted results. | 11-28-2013 |
20140244554 | NON-DETERMINISTIC FINITE STATE MACHINE MODULE FOR USE IN A REGULAR EXPRESSION MATCHING SYSTEM - A non-deterministic finite state machine module for use in a regular expression matching system. The system includes a computational unit implementing a non-deterministic finite state machine representing a regular expression, wherein the computational unit is configured to: receive an input data stream, wherein an occurrence of the regular expression is determined, and an activation signal; process the input data stream with respect to the non-deterministic finite state machine depending on the activation signal; and provide at least one branch data output for initializing an additional non-deterministic finite state machine module if the processing of an element of the input data stream according to the non-deterministic finite state machine results in a branching of a processing thread. | 08-28-2014 |