Patent application number | Description | Published |
20080256063 | TECHNIQUE FOR SEARCHING FOR KEYWORDS DETERMINING EVENT OCCURRENCE - A keyword search system including a text input unit for inputting subtexts obtained by dividing each text into parts, while associating the subtexts with an event through a process recorded in the text; a prediction device adjuster for adjusting a corresponding event prediction device to maximize the percentage of text in which the inputted event is identical to a prediction result in a first text group selected from the subtexts; a prediction processor for generating a prediction result for each section, by inputting each text in a second text group selected from the corresponding subtexts in the adjusted event prediction device; and a search unit for calculating the prediction precision for the second text group of the event prediction device using a comparison between the inputted event and the prediction result for each subtext, and searching for keywords in sections with a certain degree of prediction precision. | 10-16-2008 |
20090063150 | METHOD FOR AUTOMATICALLY IDENTIFYING SENTENCE BOUNDARIES IN NOISY CONVERSATIONAL DATA - Sentence boundaries in noisy conversational transcription data are automatically identified. Noise and transcription symbols are removed, and a training set is formed with sentence boundaries marked based on long silences or on manual markings in the transcribed data. Frequencies of head and tail n-grams that occur at the beginning and ending of sentences are determined from the training set. N-grams that occur a significant number of times in the middle of sentences in relation to their occurrences at the beginning or ending of sentences are filtered out. A boundary is marked before every head n-gram and after every tail n-gram occurring in the conversational data and remaining after filtering. Turns are identified. A boundary is marked after each turn, unless the turn ends with an impermissible tail word or is an incomplete turn. The marked boundaries in the conversational data identify sentence boundaries. | 03-05-2009 |
20090132442 | Method and Apparatus for Determining Decision Points for Streaming Conversational Data - A method for determining a decision point in real-time for a data stream from a conversation includes receiving streaming conversational data; and determining when to classify the streaming conversational data, using a measure of certainty, by performing certainty calculations at a plurality of time instances during the conversation and by selecting a decision point in response to the certainty calculations, the decision point not being based on a fixed window of conversational data but being based on accumulated conversational data available at different ones of the plurality of time instances. Systems and computer program products are also provided. | 05-21-2009 |
20110078158 | Automatic Taxonomy Enrichment - Techniques for enriching a taxonomy using one or more additional taxonomies are provided. The techniques include receiving two or more taxonomies, wherein the two or more taxonomies comprise a destination taxonomy and one or more additional taxonomies, determining one or more relevant portions of the two or more taxonomies by identifying one or more common terms between the two or more taxonomies, importing one or more relevant portions from the one or more additional taxonomies into the destination taxonomy, and using the one or more imported taxonomy portions to enrich the destination taxonomy. | 03-31-2011 |
20110191781 | RESOURCES MANAGEMENT IN DISTRIBUTED COMPUTING ENVIRONMENT - A method, system and a computer program product for determining resources allocation in a distributed computing environment. An embodiment may include identifying resources in a distributed computing environment, computing provisioning parameters, computing configuration parameters and quantifying service parameters in response to a set of service level agreements (SLA). The embodiment may further include iteratively computing a completion time required for completion of the assigned task and a cost. Embodiments may further include computing an optimal resources configuration and computing at least one of an optimal completion time and an optimal cost corresponding to the optimal resources configuration. Embodiments may further include dynamically modifying the optimal resources configuration in response to at least one change in at least one of provisioning parameters, computing parameters and quantifying service parameters. | 08-04-2011 |
20120047179 | SYSTEMS AND METHODS FOR STANDARDIZATION AND DE-DUPLICATION OF ADDRESSES USING TAXONOMY - Systems and associated methods for address standardization and applications related thereto are described. Embodiments exploit a common context in a taxonomy and a given address to detect and correct deviations in the address. Embodiments establish a possible path from a root of the taxonomy to a leaf in the taxonomy that can possibly generate a given address. Given a new address, embodiments use complete addresses, and/or segments or elements thereof, to compute the representations of the elements and find a closest matching leaf in the taxonomy. Embodiments then traverse the path to a root node to detect the agreement and disagreement between the path and the address entry. Taxonomical structured is thus used to detect, segregate and standardize the expected fields. | 02-23-2012 |
20120150825 | Cleansing a Database System to Improve Data Quality - According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above. | 06-14-2012 |
20120179658 | Cleansing a Database System to Improve Data Quality - According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above. | 07-12-2012 |
20120221508 | SYSTEMS AND METHODS FOR EFFICIENT DEVELOPMENT OF A RULE-BASED SYSTEM USING CROWD-SOURCING - Described herein are methods, systems, apparatuses and products for efficient development of a rule-based system. An aspect provides a method including accessing data records; converting said data records to an intermediate form; utilizing intermediate forms to compute similarity scores for said data records; and selecting as an example to be provided for rule making at least one record of said data records having a maximum dissimilarity score indicative of dissimilarity to already considered examples. | 08-30-2012 |
20120323866 | EFFICIENT DEVELOPMENT OF A RULE-BASED SYSTEM USING CROWD-SOURCING - Described herein are methods, systems, apparatuses and products for efficient development of a rule-based system. An aspect provides a method including accessing data records; converting said data records to an intermediate form; utilizing intermediate forms to compute similarity scores for said data records; and selecting as an example to be provided for rule making at least one record of said data records having a maximum dissimilarity score indicative of dissimilarity to already considered examples. | 12-20-2012 |
20130238610 | Automatically Mining Patterns For Rule Based Data Standardization Systems - Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group. | 09-12-2013 |
20130238611 | Automatically Mining Patterns for Rule Based Data Standardization Systems - Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group. | 09-12-2013 |
20140156673 | MEASURING AND ALTERING TOPIC INFLUENCE ON EDITED AND UNEDITED MEDIA - Methods and arrangements for measuring and utilizing media topic influence. A publically disseminated media transmission is received. Public influence of the media transmission is measured via: identifying one or more media sources used to disseminate the media transmission; and obtaining one or more predetermined influence values associated with the one or more media sources. | 06-05-2014 |
20140214832 | INFORMATION GATHERING VIA CROWD-SENSING - Methods and arrangements for gathering and managing crowd-sourced information. An event is identified using crowd-sourced information, and component parts of the event are identified using the crowd-sourced information. Information missing from the event is identified using the crowd-sourced information. Individuals associated with the event are identified, and additional crowd-sourced information on the event is harvested from the individuals. | 07-31-2014 |
20140244611 | KEYWORD REFINEMENT IN TEMPORALLY EVOLVING ONLINE MEDIA - Methods and arrangements for keyword refinement and enhancement. There is received an initial keyword list comprising one or more keywords. Information is harvested from one or more information feeds, and an item is ascertained from the harvested information. One or more keywords from the initial keyword list are associated with the item. One or more new keywords are developed based on the associating of one or more keywords from the initial keyword list with the item. Other variants and embodiments are broadly contemplated herein. | 08-28-2014 |
20150066990 | SYSTEMS AND METHODS FOR DISCOVERING TEMPORAL PATTERNS IN TIME VARIANT BIPARTITE GRAPHS - Systems and methods for identifying entities sharing a temporal pattern using bipartite graphs are described. In one embodiment, a method includes identifying a temporal pattern in a sequence of bipartite graphs for a sequence of records involving two entity types, where records of the sequence of bipartite graphs vary according to time. An embodiment may color code the edges between entity types in the sequence of bipartite graphs according to the at least one temporal pattern identified (e.g., increasing sales between a business representative and a customer). An embodiment may therefore identify a time-based relationship between at least two entities according to the coded edges. Given the identification of entities having a time-based relationships, groups of these entities may be identified and trends may be derived therefrom (e.g., increasing sales for business units of a particular geographic region). | 03-05-2015 |