Latent semantic index or analysis (LSI or LSA)

Subclass of:

707 - Data processing: database and file management or data structures

707705000 - DATABASE AND FILE ACCESS

707736000 - Preparing data for information retrieval

707737000 - Clustering and grouping

Patent class list (only not empty are listed)

Deeper subclasses:

Document	Title	Date
Entries
20100100546	Context-aware semantic virtual community for communication, information and knowledge management - A method for creation of a semantic information management environment, said method comprised of steps of: providing said semantic information environment consisting of an architecture partitioned according to the classification of the use of natural language by information scale, dynamical properties, or semantic classifications; detection, classification, and storage of semantic and contextual information detected and stored by recording of observed contextual parameters associated with events in said semantic information management environment; said interactions including the use of information management or electronic communication applications embedded or linked to said architecture, or separate from said architecture; said observations including the use of natural language as parameters that have specific semantic properties; detection, classification and storage of use of natural language in said semantic information environment; representation of semantic processes containing said detected, classified, and stored contextual information and natural language use in said semantic information environment; said representations of semantic processes used to link and associate natural language use with objects, entities, facts, communication, information, and digital files in said semantic information environment; providing said users of said semantic information environment with information and knowledge management tools, reports, representations, and interfaces that utilize said semantic process representations.	04-22-2010
20100100547	METHOD, SYSTEM AND APPARATUS FOR GENERATING RELEVANT INFORMATIONAL TAGS VIA TEXT MINING - A method and system for generating information tags from product-related documents. The system includes an accessible storage storing text documents, wherein the text documents are related to a plurality of products. The system includes a memory access module for retrieving a document from the accessible storage related to a specified product selected from the plurality of products. The system includes a parser module for parsing the retrieved document into sentences, wherein each sentence is stored as an array. The system includes a filter module for filtering the parsed sentences into a result set, wherein the result set includes a set of tags extracted from the retrieved document relevant to the selected product. The system includes an output module for outputting the result set to the accessible storage.	04-22-2010
20100114894	Semantically Aware Relational Database Management System and Related Methods - A semantically aware relational database management system includes suitable programming to relate attributes of the relational database to semantic equivalents of such attributes. In response to receiving a query, the relational database management system performs at least one semantically aware operation on the data in the relational database in order to determine what data is to be retrieved in response to the query. Results of the query presented to a user may include data derived from performing the semantically aware operations.	05-06-2010
20100191734	SYSTEM AND METHOD FOR CLASSIFYING DOCUMENTS - A method of classifying a plurality of documents that form part of a data set comprises retrieving the plurality of documents from a computing device and applying a hashing representation scheme to the plurality of documents from the data set to obtain a feature vector representation of each of the plurality of documents. A classification label is associated with selected documents of the plurality of documents in the data set. A learning algorithm is executed to learn a functional relationship between the feature vector representations of the plurality of documents and the classification label associated with the at least one document. The functional relationship learned is utilized to associate classification labels with feature vector representations of other documents of the data set so as to provide document classifications.	07-29-2010
20100198827	METHOD FOR FINDING TEXT READING ORDER IN A DOCUMENT - A method for finding text reading order in a document such as a scanned newspaper or magazine includes the steps of pruning unnecessary text zones using semantic analysis (	08-05-2010
20100198828	FORMING CROWDS AND PROVIDING ACCESS TO CROWD DATA IN A MOBILE ENVIRONMENT - A system and method are provided for forming crowds of users and providing access to corresponding crowd data. In one embodiment, a central system, which includes one or more servers, operates to obtain current locations for users of mobile devices. The central system forms a crowd including a number of users based on the current locations of the number of users. The central system then generates crowd data for the crowd and provides access to the crowd data for the crowd. In one embodiment, the crowd data for the crowd includes an aggregate profile for the crowd. In another embodiment, the crowd data includes data characterizing the crowd. The central system provides access to the crowd data by serving crowd data requests.	08-05-2010
20100211570	DISTRIBUTED SYSTEM - The present invention relates to distributed systems in which resource utilisation decisions depend upon the semi-automatic categorisation of resource descriptions stored in the distributed system. In the principal embodiment, the resource descriptions are web service descriptions which are augmented with tags (i.e. descriptive words or phrases) entered by users and/or by web service administrators. The initial use of automatic categorisation of these descriptions, followed by a user-driven fine-tuning of the automatically-generated categories enables the rapid creation of reliable categorisation of the resource descriptions, which in turns results in better resource utilisation decisions and hence a more efficient use of the resources of the distributed system.	08-19-2010
20100228733	Method and System For Semantic Distance Measurement - A system and method for performing classification using semantic distance measurements. Items of electronic content accessed by individuals over a global communications network are identified. A set of content that includes the plurality of identified items of electronic content are stored. The set of content is normalized. Each of the keywords contained the set of content is identified and a semantic distance between each of the identified keywords is measured.	09-09-2010
20100250544	PROCESS FOR ORGANIZING MULTIMEDIA DATA - A multimedia data organization process, i.e. creation, of a photo album or slideshow, said multimedia data being represented by contingent individuals (	09-30-2010
20100293166	System And Method For A Unified Semantic Ranking of Compositions of Ontological Subjects And The Applications Thereof - The present invention discloses methods, systems, and tools for unified semantic ranking of compositions of ontological subjects. The method breaks a composition to a plurality of partitions as well as its constituent ontological subjects of different orders and builds a participation matrix indicating the participation of ontological subjects of the composition in other ontological subjects, i.e. the partitions, of the composition. Using the participation information of the OSs into each other a similarity matrix is built from which the semantic importance ranks of the partitions of the composition are calculated. The method systematically enables the calculation the semantic ranks of ontological subjects of different orders of the composition. Various systems for implementing the method and numerous applications and services are disclosed.	11-18-2010
20110004595	DIAGNOSTIC REPORT SEARCH SUPPORTING APPARATUS AND DIAGNOSTIC REPORT SEARCHING APPARATUS - According to embodiments, a diagnostic report search supporting apparatus and a diagnostic report searching apparatus each have a report registering part, a structuring processing part, a related-term analyzing part, a counting part, and a keyword extracting part. The structuring processing part extracts terms from a sentence written in a diagnostic report, and classifies the terms into predetermined kinds. The related-term analyzing part generates combinations each composed of two or more terms based on the plurality of terms having been extracted. The counting part counts the existence number of same combinations in the plurality of combinations, and extracts combinations whose existence numbers are a predetermined number or more. The keyword extracting part extracts a combination including a desired keyword, and extracts a term other than the desired keyword as a related keyword.	01-06-2011
20110022598	MIXING KNOWLEDGE SOURCES FOR IMPROVED ENTITY EXTRACTION - The disclosed embodiments of computer systems and techniques utilize an ensemble semantics framework to combine knowledge acquisition systems that yield significantly higher quality resources than each system in isolation. Gains in entity extraction are achieved by combining state-of-the-art distributional and pattern-based systems with a large set of features from, for example, a webcrawl, query logs, and wisdom of the crowd sources. This results in improved query interpretation and greater relevancy in providing search results and advertising, for example.	01-27-2011
20110066618	QUERY TERM RELATIONSHIP CHARACTERIZATION FOR QUERY RESPONSE DETERMINATION - Methods, apparatuses, and systems are provided to determine a response to a user submitted query based, at least in part, on a relationship between and/or among a plurality of terms of the query.	03-17-2011
20110066619	AUTOMATICALLY FINDING CONTEXTUALLY RELATED ITEMS OF A TASK - Architecture for enabling a user to automatically recover documents and other information associated with work contexts and recover documents and other information artifacts associated with a specific project. The architecture enables monitoring and recording of activity information related to user interactions with information artifacts pertaining to a particular work context. The user can select a document having a portion of work content (e.g., a term or other type of reference item in a document) related to the work context. A lexical analysis is performed on the activity information and the reference item to identify lexical similarities. A list of candidate items (e.g., related documents) is inferred from the information artifacts based on the lexical similarities. The candidate items related to the work context are presented to the user, who can select specific items to reestablish the work context.	03-17-2011
20110072020	Expediting Reverse Geocoding With A Bounding Region - A method for reverse geocoding location information obtained by a wireless communications device comprises determining the location information for a location, communicating the location information to a reverse geocoding server that reverse-geocodes the location information to generate location description data for a bounding region that geographically surrounds the location, receiving the location description data from the reverse geocoding server for the bounding region containing the location, and caching the location description data for the bounding region in a memory cache on the device. When the current location remains within one or more bounding regions cached on the device, location description data is fetched from the cache, thus improving application responsiveness. Only when the current location is no longer within the bounding region(s) does the device communicate a new request to the reverse geocoding server.	03-24-2011
20110072021	Semantic and Text Matching Techniques for Network Search - In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.	03-24-2011
20110078146	SYSTEMS AND METHODS FOR USING METADATA TO ENHANCE DATA IDENTIFICATION OPERATIONS - Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.	03-31-2011
20110082863	SEMANTIC ANALYSIS OF DOCUMENTS TO RANK TERMS - A method, apparatus and computer program product provides for a semantic analyzer to produce and rank semantic terms to reflect their relationship to the theme and topics of a document. The text and the document can have no relationship to any pre-selected keywords before the semantic analyzer performs text extraction. The semantic analyzer extracts text from a document and performs semantic analysis on the extracted text. The semantic analyzer provides a plurality of ranked semantic terms as a result of the semantic analysis and associates semantic terms with the document as semantic keywords. The semantic terms define content to be presented with the document where the content is an advertisement, a link to a remote information resource or a second document.	04-07-2011
20110106807	SYSTEMS AND METHODS FOR INFORMATION INTEGRATION THROUGH CONTEXT-BASED ENTITY DISAMBIGUATION - Described within are systems and methods for disambiguating entities, by generating entity profiles and extracting information from multiple documents to generate a set of entity profiles, determining equivalence within the set of entity profiles using similarity matching algorithms, and integrating the information in the correlated entity profiles. Additionally, described within are systems and methods for representing entities in a document in a Resource Description Framework and leveraging the features to determine the similarity between a plurality of entities. An entity may include a person, place, location, or other entity type.	05-05-2011
20110119272	SEMANTIC RECONSTRUCTION - Determining a semantic relationship is disclosed. Source content is received. Cluster analysis is performed at least in part by using at least a portion of the source content. At least a portion of a result of the cluster analysis is used to determine the semantic relationship between two or more content elements comprising the source content.	05-19-2011
20110173200	APPARATUS AND METHOD FOR AUTHORING DATA IN COMMUNICATION SYSTEM - An apparatus for authoring data in a communication system includes: an extraction unit configured to receive media corresponding to contents and extract contents information regarding the contents from the received media; a generation unit configured to generate a DMB ECG XML-based metadata comprising the extracted contents information; and a processing unit configured to visualize particulars of the DMB ECG XML-based metadata through a user interface and process the user interface so that the DMB ECG XML-based metadata is generated and edited on a template.	07-14-2011
20110173201	METHOD OF DETERMINING A RELIABILITY INDICATOR FOR SIGNATURES OBTAINED FROM CLINICAL DATA AND USE OF THE RELIABILITY INDICATOR FOR FAVORING ONE SIGNATURE OVER THE OTHER - This invention relates to a method and an apparatus for determining a reliability indicator for at least one set of signatures obtained from clinical data collected from a group of samples. The signatures are obtained by detecting characteristics in the clinical data from the group of sample sand each of the signatures generate a first set of stratification values that stratify the group of samples. At least one additional and parallel stratification source to the signatures obtained from group of sample sis provided, the at least one additional and parallel stratification source to the signatures being independent from the signatures and generates a second set of stratification values. A comparison is done for each respective sample, where the first stratification values are compared with a true reference stratification values, and where the second stratification values are compared with the true reference stratification values. The signatures are assigned with similarity measure indicators indicating whether the first and the second stratification values match with the true reference stratification values. These are then implementing as input in determining the reliability of the signatures.	07-14-2011
20110179035	CITATION NETWORK VIEWER AND METHOD - A visualization-based interactive legal research tool that generates from a multi-dimensional citation network a semantics-constrained citation sub-network that focuses on one individual issue in which a user is interested, and puts the sub-network on an interactive user interface (“UT”), which allows the researcher to browse, navigate, and jump over to start new sub-networks on different issues that are relevant to original issues.	07-21-2011
20110179036	Methods and Apparatuses For Abstract Representation of Financial Documents - Systems and methods are provided for creating abstracted, normalized, and reuseable and combinable representations of information contained in multiple documents and information of any supported format, and allowing for exporting of information in any other desired and supported format. Further the system and methods provide for uploading documents based on a known template, where the data members can be automatically recognized and the document stored in normalized format without end-user or developer intervention. Normalization of data is achieved transparently on upload and denormalization performed transparently on download. Further, embodiments provide for the reuse and recombination of data members to create entirely new representations.	07-21-2011
20110191344	AUTOMATIC ORGANIZATION OF BROWSING HISTORIES - An automatic organization into topics for a browsing history. In one embodiment, a system identifies groups of browsing actions as related, and clusters the browsing history (e.g. a web browsing history) into sessions based on heuristics used to determine relationships. Latent semantic analysis can be used to determine the relationships which can be considered topics. User interfaces for displaying or otherwise presenting these sessions can include icons representative of topics, and these icons can have different sizes depending on a frequency of web page visits within a topic. The topics can be displayed in time ranges or in a cover flow view or both time ranges and cover flow view.	08-04-2011
20110191345	DOCUMENT ANALYSIS SYSTEM - An information processing apparatus (	08-04-2011
20110202535	SYSTEM AND METHOD FOR DETERMINING THE PROVENANCE OF A DOCUMENT - A method of identifying a provenance of a document is provided. The method may include obtaining a query document that is included in a document set comprising a plurality of documents. The method may also include grouping the plurality of documents into a plurality of fine clusters based on a textual similarity between the plurality of documents. The method may also include identifying a target fine cluster within the plurality of fine clusters, the target fine cluster including the query document. The method may also include ordering the documents included in the target fine cluster based, at least in part, on metadata associated with each of the documents to identify a source document. The method may also include generating a query response that includes the source document.	08-18-2011
20110219003	DETERMINATION OF PASSAGES AND FORMATION OF INDEXES BASED ON PARAGRAPHS - A method for retrieving information from a document includes a process of grouping paragraphs in the document to form passages, and forming indexes relating to a number of words in the passages. The number of paragraphs in a passage is determined based on the number of paragraphs considered optimum for a writer to cover a particular topic. Passages are formed by merging each N consecutive paragraphs in the document, where N is an integer greater than 1. Thus, individual passages may include paragraphs that are identical to other passages.	09-08-2011
20110225159	SYSTEM AND METHOD OF STRUCTURING DATA FOR SEARCH USING LATENT SEMANTIC ANALYSIS TECHNIQUES - The disclosed embodiments provide a system and method for using modified Latent Semantic Analysis techniques to structure data for efficient search and display. The present invention creates a hierarchy of clustered documents, representing the topics of a domain corpus, through a process of optimal agglomerative clustering. The output from a search query is displayed in a fisheye view corresponding to the hierarchy of clustered documents. The fisheye view may link to a two-dimensional self-organizing map that represents semantic relationships between documents.	09-15-2011
20110225160	COMPUTER PRODUCT, OPERATION AND MANAGEMENT SUPPORT APPARATUS AND METHOD - A computer-readable, non-transitory medium stores therein an operation management support program that causes a computer to execute a process that includes acquiring execution history information recording for each element group included in activity diagrams expressing work procedures for operation processes executed by a system, correlations between elements and access destinations thereof; searching among elements not yet selected from among all element groups, for a second element having an access destination coinciding with that of a first element selected from among all element groups, the searching performed by referring to the acquired execution history information; setting the first and the second elements as synonymous elements, if a second element is retrieved at the searching; extracting from among the element groups included in the activity diagrams including synonymous elements, a common element string of elements common among the activity diagrams that include the synonymous elements; and outputting the extracted common element string.	09-15-2011
20110246467	EXTRACTION OF ATTRIBUTES AND VALUES FROM NATURAL LANGUAGE DOCUMENTS - One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.	10-06-2011
20110252036	Domain-Specific Sentiment Classification - A domain-specific sentiment classifier that can be used to score the polarity and magnitude of sentiment expressed by domain-specific documents is created. A domain-independent sentiment lexicon is established and a classifier uses the lexicon to score sentiment of domain-specific documents. Sets of high-sentiment documents having positive and negative polarities are identified. The n-grams within the high-sentiment documents are filtered to remove extremely common n-grams. The filtered n-grams are saved as a domain-specific sentiment lexicon and are used as features in a model. The model is trained using a set of training documents which may be manually or automatically labeled as to their overall sentiment to produce sentiment scores for the n-grams in the domain-specific sentiment lexicon. This lexicon is used by the domain-specific sentiment classifier.	10-13-2011
20110258193	METHOD FOR CALCULATING ENTITY SIMILARITIES - One embodiment of the present invention provides a system for estimating a similarity level between semantic entities. During operation, the system selects two or more semantic entities associated with a number documents. The system subsequently parses the documents into sub-parts, and calculates the similarity level between the semantic entities based on occurrences of the semantic entities within the sub-parts of the documents.	10-20-2011
20110295857	SYSTEM AND METHOD FOR ALIGNING AND INDEXING MULTILINGUAL DOCUMENTS - A system and method for aligning multilingual content and indexing multilingual documents, to a computer readable data storage medium having stored thereon computer code means for indexing multilingual documents, to a system for presenting multilingual content. The method for aligning multilingual content and indexing multilingual documents comprises the steps of generating multiple bilingual terminology databases, wherein each bilingual terminology database associates respective terms in a pivot language with one or more terms in another language; and combining the multiple bilingual terminology databases to form a multilingual terminology database, wherein the multilingual terminology database associates terms in different languages via the pivot language terms.	12-01-2011
20110302168	GRAPHICAL MODELS FOR REPRESENTING TEXT DOCUMENTS FOR COMPUTER ANALYSIS - In a method for representing a text document with a graphical model, a document including a plurality of ordered words is received and a graph data structure for the document is created. The graph data structure includes a plurality of nodes and edges, with each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other. The graph data structure is stored in an information repository.	12-08-2011
20110314022	K engine - process count after build in threads - In a KStore having a plurality of K nodes with count fields a method for updating count fields, receiving a particle to provide a received particle, updating selected node counts of the plurality of nodes counts in response to the received particle to provide first updated K node count fields, and saving selected K node count fields for later updating to provide second updated count fields are recited. The K nodes include elemental root nodes and the second updated K node count fields include elemental root nodes of the plurality of elemental root nodes. The second updated K node count fields include only elemental root nodes of the plurality of elemental root nodes. The first updated K node count fields include no elemental root nodes. The second updated K node count fields include K nodes pointed to by the Result pointers of the first updated K node count fields.	12-22-2011
20110320454	MULTI-FACET CLASSIFICATION SCHEME FOR CATALOGING OF INFORMATION ARTIFACTS - A system and method for constructing a hierarchical multi-faceted classification structure includes organizing a plurality of visual categories into a multi-relational reference ontology that accounts for a plurality of different types of relationships. Media artifacts are categorized into the plurality of visual categories. The categories of artifacts are refined based on faceted ontology relationships or constraints from the multi-relational reference ontology. The multi-relational reference ontology and the one or more media artifacts with relationships are stored as the hierarchical multi-faceted classification structure in computer readable memory storage.	12-29-2011
20120011124	UNSUPERVISED DOCUMENT CLUSTERING USING LATENT SEMANTIC DENSITY ANALYSIS - According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.	01-12-2012
20120023103	Generation of Annotation Tags Based on Multimodal Metadata and Structured Semantic Descriptors - In one embodiment, a method of generating annotation tags (	01-26-2012
20120041953	TEXT MINING OF MICROBLOGS USING LATENT TOPIC LABELS - A latent topic labels text mining system and method to mine and analyze the content of textual data. Embodiments of the system and method are particularly well suited for use on microblog data to help people identify posts they want to read and to find people that they want to follow. Embodiments of the system and method use a modified Labeled LDA technique (called an L+LDA technique) that analyzes content using a combination of labeled and latent topics. The resultant data is assigned labels one of four labels to generate a lower-dimensional representation of the data that the individual words in a microblog post. This learned topic representation is used to characterize, summarize, filter, find, suggest, and compare the content of microblog posts. Embodiments of the system and method also include visualization techniques such as a tag cloud visualization that is used to visualize microblogging data.	02-16-2012
20120072423	Semantic Grouping for Program Performance Data Analysis - Particular portions of program execution data are specified and organized in semantic groups. A grouping expression written in a transformation syntax language specifies a pattern and a replacement, for grouping performance data samples. An exception to the pattern can also be specified. In response to the grouping expression, a cost accounting shows groups and their costs. The grouping expression may operate on names and/or name-associated characteristics such as private/public status, author, directory, and the like. Samples may represent nodes in a directed acyclic graph memorializing call stacks or memory allocation. Grouping expressions are used to group nodes and consolidate costs by various procedures when making modified sample stacks: clustering-by-name, entry-group-clustering, folding-by-name, a folding-by-cost. An entry group clustering shows at least one entry point name while avoiding unwanted detail.	03-22-2012
20120109964	ADAPTIVE MULTIMEDIA SEMANTIC CONCEPT CLASSIFIER - A method of classifying a set of semantic concepts on a second multimedia collection based upon adapting a set of semantic concept classifiers and updating concept affinity relations that were developed to classify the set of semantic concepts for a first multimedia collection. The method comprises providing the second multimedia collection from a different domain and a processor automatically classifying the semantic concepts from the second multimedia collection by adapting the semantic concept classifiers and updating the concept affinity relations to the second multimedia collection based upon the local smoothness over the concept affinity relations and the local smoothness over data affinity relations.	05-03-2012
20120124050	SYSTEM AND METHOD FOR HS CODE RECOMMENDATION - A system for harmonized commodity description and coding system (HS) code recommendation includes an ontology editor for creating an HS code ontology based on HS codes of export and import items, and a feature vector processor for extracting feature vectors of a product of a company requesting for an HS code of the product by with reference to the description of the product in response to the request. An HS code recommendation unit extracts one or more HS codes appropriate for the product by comparing the extracted feature vectors with feature vectors of the product searched from a feature vector database. The extracted HS codes are provided to the company requesting for an HS code of the product.	05-17-2012
20120124051	ONTOLOGICAL INFORMATION RETRIEVAL SYSTEM - An ontological information retrieval system is provided. According to an embodiment, the subject ontological information retrieval system can be utilized for computer-aided clinical Traditional Chinese Medicine (TCM) practice. In one implementation, a graphical user interface (GUI) is provided, enabling a user to input a query with symptoms determined from a patient, and the system's parser can find instances of the symptoms in a document object model (DOM) tree of the TCM ontological information. Diagnosis based upon the symptoms can be communicated to the user through the GUI. A relevance index (RI) and/or a frequency index (F1) can be further provided for evaluating a diagnosis by comparing the symptoms determined from a patient with the expected symptoms of the diagnosed illness and returning a value based on the number of matched symptoms, or a weighted index of matched symptoms.	05-17-2012
20120136865	METHOD AND APPARATUS FOR DETERMINING CONTEXTUALLY RELEVANT GEOGRAPHICAL LOCATIONS - An approach is provided for determining and utilizing geographical locations contextually relevant to a user. A contextually relevant location platform determines location-based data associated with a user and/or user device. The contextually relevant location platform determines stationary points based, at least in part, on the location-based data. The contextually relevant location platform determines context data associated with the stationary points. The contextually relevant location platform determines at least one location anchor based, at least in part, on the stationary points and the associated context data, wherein the at least one location anchor represents a bounded geographical area of contextual relevance to the user.	05-31-2012
20120173531	SYSTEMS AND METHODS FOR USING METADATA TO ENHANCE DATA IDENTIFICATION OPERATIONS - Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.	07-05-2012
20120173532	DETERMINATION TREE GENERATING APPARATUS - According to one embodiment, a determination tree generating apparatus includes a determination unit, a condition generating unit, a determining unit, and a point branch generating unit. The determination unit provisionally and sequentially determines all component categories to be classification component categories for a first point of a determination tree. The point branch generating unit generates a first point assigned to a classification component category, and generates component names to be assigned to one or more branches leading from an assigned first point to one or more child points.	07-05-2012
20120209851	APPARATUS AND METHOD FOR MANAGING MOBILE TRANSACTION COUPON INFORMATION IN MOBILE TERMINAL - An apparatus and a method manage a received mobile transaction coupon in a mobile terminal. The apparatus includes a communication unit, an information analyzer, a schedule manager, an output unit, and a controller. The communication unit receives a mobile transaction coupon. The information analyzer obtains the received mobile transaction coupon information. The schedule manager registers the obtained mobile transaction coupon information in an alarm program. The output unit outputs the registered mobile transaction coupon information on a relevant date via the alarm program. The controller controls to register the mobile transaction coupon information in the alarm program, and controls to store the received mobile transaction coupon in a storage area corresponding to a reception type or a folder for a widget function.	08-16-2012
20120221574	HIGH-ACCURACY SIMILARITY SEARCH SYSTEM - A pivot is determined from enrolled data by a pivot determination unit, raw data is acquired, features are extracted from the raw data, a score is calculated as one of a distance and a degree of similarity between the features, an index vector is generated by using the score for the pivot, a Δ score is calculated as one of a distance and a degree of similarity between the index vectors, a parameter of each non-pivot including a regression coefficient is trained by using training data, order to select the non-pivots is, by using the Δ score between search data and the non-pivot as well as the regression coefficient, determined in descending order of posterior probability through logistic regression, and a search result is outputted based on the score between the search data and the enrolled data.	08-30-2012
20120239655	DISTRIBUTED STORAGE AND METADATA SYSTEM - A system for storing digital images and accessing and storing digital image information using a communication network includes a plurality of independently controlled digital storage repositories associated with one or more different authorization groups, wherein a first digital storage repository includes a first digital image with associated first semantic information and wherein a second digital storage repository in a common authorization group with the first digital storage repository includes a second digital image with associated second semantic information and an associated second category, and wherein the processor of the first digital storage repository uses its computer program to independently access and match the first semantic information with the second semantic information, to associate the second category with the first semantic information, and to store the second category in association with the first semantic information in the first digital storage repository.	09-20-2012
20120259853	Real Time Association of Related Breaking News Stories Across Different Content Providers - Methods and systems for relating breaking news stories across content providers include receiving a breaking news headline for a breaking news from a content provider. The breaking news headline is tokenized in substantial real time by identifying a plurality of headline tokens. A plurality of news stories is received from a plurality of content providers. Each of the plurality of news stories is tokenized to identify a plurality of story tokens. The plurality of headline tokens and story tokens are analyzed to determine if one or more of the news stories are related to the breaking news headline. Based on the analysis, one or more of the news stories are mapped to the breaking news headline. The mapping enables presentation of the one or more news stories from one or more of the content providers while rendering the breaking news headline.	10-11-2012
20120259854	Conversion Path Based Segmentation - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium including receiving user interaction data, wherein the user interaction specifies user interactions with content items and conversion items. A conversion item is a user action that satisfies a predetermined conversion criteria. The method includes receiving conversion data including conversion path data for a plurality of conversion paths, wherein each conversion path includes user interaction data prior to and including a conversion event. The method includes determining a first interaction, an assist interaction or a last interaction with content items for the conversion event. The method includes providing an ability to define a segment, using a processor, the conversion path data based on path-level dimensions and path-level metrics.	10-11-2012
20120259855	DOCUMENT CLUSTERING SYSTEM, DOCUMENT CLUSTERING METHOD, AND RECORDING MEDIUM - In the provided document clustering system (	10-11-2012
20120259856	CATEGORIZING OBJECTS, SUCH AS DOCUMENTS AND/OR CLUSTERS, WITH RESPECT TO A TAXONOMY AND DATA STRUCTURES DERIVED FROM SUCH CATEGORIZATION - A Website may be automatically categorized by (a) accepting Website information, (b) determining a set of scored clusters (e.g., semantic, term co-occurrence, etc.) for the Website using the Website information, and (c) determining at least one category (e.g., a vertical category) of a predefined taxonomy using at least some of the set of clusters.	10-11-2012
20120271828	Localized Translation of Keywords - In one implementation, a method includes receiving a request for translation of one or more first keywords from a source language to a target language; and translating, using a machine translation process, the first keywords from the source language into a plurality of second keywords in the target language. The method can also include determining, by a computer system, frequencies with which each of the second keywords occur in a corpus associated with the target language. The method can further include selecting, by the computer system, a subset of the second keywords to use in the target language based on the determined frequencies of occurrence.	10-25-2012
20120310939	Systems And Methods For Clustering Time Series Data Based On Forecast Distributions - In accordance with the teachings described herein, systems and methods are provided for clustering time series based on forecast distributions. A method for clustering time series based on forecast distributions may include: receiving time series data relating to one or more aspects of a physical process; applying a forecasting model to the time series data to generate forecasted values and confidence intervals associated with the forecasted values, the confidence intervals being generated based on distribution information relating to the forecasted values; generating a distance matrix that identifies divergence in the forecasted values, the distance matrix being generated based the distribution information relating to the forecasted values; and performing a clustering operation on the plurality of forecasted values based on the distance matrix. The distance matrix may be generated using a symmetric Kullback-Leibler divergence algorithm.	12-06-2012
20120323920	CREATING A SEMANTICALLY AGGREGATED INDEX IN AN INDEXER-AGNOSTIC INDEX BUILDING SYSTEM - A method for creating a semantically aggregated index in an indexer-agnostic index building system includes: extracting documents from a data source, each document including a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.	12-20-2012
20120330959	Method and Apparatus for Assessing a Person's Security Risk - A method for assessing a person's security risk includes receiving data from a plurality of disparate data sources in which at least two of the plurality of disparate data sources maintain their respective data in different manners. The method also includes identifying at least one item of data from at least two different data sources that correspond to a first real-world person. The method further includes merging the items from the at least two different data sources into a first record associated with the first real-world person. The method additionally includes identifying one or more relationships between the first real-world person and one or more other real-world people. The method also includes adding the identified one or more relationships to the first record associated with the first real-world person. The method further includes determining a level of risk associated with the first real-world person based on the first record.	12-27-2012
20130013612	TECHNIQUES FOR COMPARING AND CLUSTERING DOCUMENTS - Certain example embodiments relate to techniques for analyzing documents. A plurality of documents/document portions are imported into a database, with at least some of the documents/document portions being structured and at least some being unstructured. The imported documents/document portions are organized into one or more collections. A selection of at least one of the one or more collections is made. An index of words and/or groups of words is built (and optionally refined in accordance with one or more predefined rules) based on each of the document or document portion in each selection. A document-word matrix is built (and optionally weighted using a semantic approach), with the matrix including a value indicative of a number of times each word and/or group of words in the index appears in each document/document portion. One or more clusters of documents are generated using the document-word matrix.	01-10-2013
20130031100	Generating a Discussion Group in a Social Network Based on Similar Source Materials - The present invention includes a system and method for generating a discussion group based on different electronic images. A mixed media reality database receives MMR objects that correspond to source material and indexes the MMR objects. A content management engine generates a cluster that includes MMR objects based on a similarity of source material. An MMR engine receives an electronic image from a user device, performs a visual search and identifies an MMR object that is associated with the electronic image. A social network application identifies a discussion group associated with the cluster that includes the MMR object and provides the user device with access to the discussion group.	01-31-2013
20130054604	METHOD AND APPARATUS FOR INFORMATION CLUSTERING BASED ON PREDICTIVE SOCIAL GRAPHS - An approach is provided for providing information clustering based on predictive social graphs. An information clustering platform processes and/or facilitates a processing of one or more social graphs associated with one or more users to cause, at least in part, a prediction of one or more future states of the one or more social graphs. The information clustering platform further causes, at least in part, a clustering of one or more data items associated with at least one information space based, at least in part, on the one or more social graphs, the one or more future states, or a combination thereof.	02-28-2013
20130117268	Identifying and suggesting classifications for financial data according to a taxonomy - A method includes identifying a table within a first document. The method includes analyzing at least one of: a column heading in the table, a row heading in the table, and data in a cell in the table. The method includes determining, based on the analysis, that the table contains financial data classifiable according to a taxonomy. The method includes analyzing, by a classification component comprising at least one classification engine, at least one of a column heading in the table and a row heading in the table. The method includes generating, by the classification component, a classification suggestion for at least one element in the table, based on the analysis of the classification component.	05-09-2013
20130159313	Multi-Concept Latent Semantic Analysis Queries - A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms.	06-20-2013
20130166561	SYMANTIC FRAMEWORK FOR DYNAMICALLY CREATING A PROGRAM GUIDE - An application server includes a Semantic Analysis Core Service (SACS) function that communicates with a Semantic Analysis Client (SAC) in a Set Top Box (STB). The SACS groups programs available for rendering to a subscriber into program clusters. The SACS generates the program clusters based on a determined semantic similarity between the programs, and on parameters that indicate a subscriber's preference for certain program content. The program that are semantically similar to existing clusters within a predetermined viewing window are provided to the STB and output to the subscriber on a display as a program preference list or channel line-up. The STB also monitors the subscriber's interaction with the programs and calculates a preference score for each program indicating the subscriber's continuing, or waning, interest in a given program. The preference score is used to update the score of the program cluster to which the program belongs.	06-27-2013
20130204877	ATTRIBUTION USING SEMANTIC ANALYISIS - A method, system, and computer program product for semantic attribution of a request. Source data statements for the request are received. A selection of a domain for the received source data statements is received. The received source data statements are semantically analyzed, which includes matching elements in the received source data statements to respective one or more entries in an ontology associated with the selected domain. The ontology includes items and relationships that define the selected domain. Each element in the received source data statements is a word or a phrase. The one or more entries are assigned to the matched elements, respectively, to annotate each matched element with a respective annotation consisting of the respective one or more entries. The annotated elements are saved with the respective annotations.	08-08-2013
20130212107	INFORMATION PROCESSING APPARATUS AND CONTROL METHOD THEREOF - Enlargement values indicating a degree of enlargement when spatial data is stored in a partial spatial region are calculated for one or more partial spatial regions within a multidimensional index, and in the case where the enlargement value is greater than or equal to a threshold value, a new partial spatial region that contains at least the spatial data is generated.	08-15-2013
20130275432	SERVER, INFORMATION-MANAGEMENT METHOD, INFORMATION-MANAGEMENT PROGRAM, AND COMPUTER-READABLE RECORDING MEDIUM WITH SAID PROGRAM RECORDED THEREON - A server includes an input information database (	10-17-2013
20130290338	METHOD AND APPARATUS FOR PROCESSING ELECTRONIC DATA	10-31-2013
20130332458	LEXICAL ENRICHMENT OF STRUCTURED AND SEMI-STRUCTURED DATA - Generally discussed herein are systems and methods for lexically enriching structured and semi-structured data. In one or more embodiments, a method can include receiving a code, lexicalizing the code, lexically combining the lexicalized code with a lexical descriptor, and sending the lexical combination to a keyword database.	12-12-2013
20140040270	METHOD AND APPARATUS FOR ANALYZING A DOCUMENT - Method, apparatus, and computer-readable medium are provided for analyzing a document including text. In one example, a method for identifying patterns in a document is described. The method includes identifying a plurality of candidate phrases in the document based on candidate identification criteria, grouping the candidate phrases of the plurality of candidate phrases with a phrase family based on family criteria and comparison between candidate phrases of the plurality of candidate phrases to obtain consistent phrases, and, for remaining phrases not meeting all of the candidate identification criteria, associating at least one of the remaining phrases with a phrase family based on inconsistent phrase criteria to obtain inconsistent phrases. Identified in this manner, the inconsistent phrase may be displayed via a user interface to permit a user the opportunity to determine whether an inconsistent phrase requires modification.	02-06-2014
20140067815	Labeling Product Identifiers and Navigating Products - The present disclosure provides example methods and apparatuses of labeling product identifiers and methods of navigating products. Description information of one or more products is extracted. The description information of the products is clustered into a text. A subject analysis is applied to the text by using a text analysis method based on subject models to obtain one or more subjects and definition names for the subjects. A subject that is correlated to the description information of the product is used as an identifier of the product to label the product. The present techniques label the products with identifiers that have one or more user dimension attributes so that users may easily and intuitively find their desired products.	03-06-2014
20140074844	METHOD AND SYSTEM FOR IMPLEMENTING SEMANTIC ANALYSIS OF INTERNAL SOCIAL NETWORK CONTENT - Disclosed is a method, system, and computer program product for semantically analyzing the content within an internal social network. Using the results of the analysis, the executives can gain a better understanding of, and insight into, the organization and its employees. A dashboard tool may be used in some embodiments of the invention to visualize the results of the semantic analysis.	03-13-2014
20140074845	METHODS, SYSTEMS, AND COMPUTER-READABLE MEDIA FOR SEMANTICALLY ENRICHING CONTENT AND FOR SEMANTIC NAVIGATION - Content of different formats may be sourced from various data sources such as content servers and ingested into a data integration server by an ingestion broker embodied on a non-transitory computer readable medium. The ingestion broker may normalize the content of different formats into a uniform representation that can be indexed and delivered across multiple digital channels for a variety of applications. The normalized content may be analyzed and semantic metadata may be determined from the normalized content. The normalized content can be semantically enriched by associating the semantic metadata and the like with the content. The semantic metadata can be stored in a semantic index that can be used for searching via the data integration server. During search, the semantic metadata can be instantiated as facets for user navigation and refinement of search criteria and additional semantic relationships can be assigned to the words in the normalized content.	03-13-2014
20140101162	METHOD AND SYSTEM FOR RECOMMENDING SEMANTIC ANNOTATIONS - A method for recommending semantic annotations on a main document and sub documents is provided. The method includes: extracting a keyword of the main document; extracting a or a set of keyword of each sub document; and generating a or a set of keyword similarity of each of the sub documents based on a degree of similarity between the keyword of the main document and the keyword of each of the sub documents. The method also includes: obtaining a plurality of words appeared on each of the sub documents and calculating a frequency of each of the words; generating a semantic capacity of each of the sub documents according to the frequencies; grouping the main document and at least one of the sub documents into a semantic document set based on the semantic capacities and the keyword similarities; and annotating the main document according to the semantic document set.	04-10-2014
20140114978	METHOD AND SYSTEM FOR SOCIAL MEDIA BURST CLASSIFICATIONS - The present invention is directed to a method, system, and article of manufacture for systematically and automatically identifying abnormal or collective behavior patterns in microblogging messages that produce burst phenomena, such as Twitter storms. A microblogging storm engine in a storm detection server is configured to detect and classify the volume, shape, and type of a Twitter storm when keying on topics such as, but not limited to, a brand, an event, a person, an entity, a country, or a controversial issue. The microblogging storm engine comprises a storm detection module, a storm classification module, a database interface module, and a sentiment process module. The storm detection module is configured to detect different patterns of microblogging storms by capturing the volume of a particular storm to assist in output statistical analysis. The storm classification module is configured to classify the storms into different types of a particular storm category.	04-24-2014
20140143253	STOCHASTIC DOCUMENT CLUSTERING USING RARE FEATURES - Systems, methods, and apparatus for clustering resources using rare features are provided. For example, an environment includes an extraction module, an index module, and a cluster module. The extractions module identifies a set of resources and extracts a plurality of features from the resources. The plurality of features may be rare features. The index module identifies and generates a rare features index. The cluster module identifies at least two resources that share rare features, creates one or more clusters based on the identified at least two resources, and associates resources that share similar features with the one or more clusters. Resources that do not share similar features are not associated with the one or more clusters. Identifying at least two resources that share rare features is based at least upon a threshold.	05-22-2014
20140156665	AUTOMATIC DOCUMENT CLASSIFICATION VIA CONTENT ANALYSIS AT STORAGE TIME - Techniques are disclosed for efficiently and automatically classifying textual documents or files. In some embodiments, the classification process is integrated into or otherwise made part of the storage function, such that when the user initiates a save process for a given file, the file is processed through a classifier prior to (or contemporaneously with) completing the save function. In some such embodiments, textual content of the file is analyzed using natural language processing to identify a main or substantial concept discussed in the file, and one or more corresponding tags are then assigned to that file. Subsequently, the user can access that file based on the one or more tags, for instance, through a user interface that allows the user to select one or more content categories associated with the assigned tags. The files can be text-based, but may include other content as well, such as images, video, and audio.	06-05-2014
20140172858	HEADER-TOKEN DRIVEN AUTOMATIC TEXT SEGMENTATION - A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented.	06-19-2014
20140188885	Utilization and Power Efficient Hashing - Methods, systems, and computer readable storage medium embodiments for hashing with improved utilization and power efficiency are disclosed. Some embodiments include inserting a key in a selected bucket in accordance with an bucket identifier generated by a hash function, wherein the selected bucket is one of a plurality of buckets of a hash table configured in at least one memory, determining respective unique bit strings based upon corresponding bit positions for a plurality of keys in the selected bucket including the inserted key, inserting the respective unique bit strings in a table location corresponding to the bucket identifier, wherein the table location is one of a plurality of table locations in at least one control table configured in the at least one memory. Other embodiments include lookup operations in a hash table.	07-03-2014
20140195539	SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING SYSTEMATIC REVIEWS OF A SCIENTIFIC FIELD - A system and method are provided for automatically generating systematic reviews of received information in a field of science and technology, such as scientific literature, where the systematic review includes a systematic review of a research field in the scientific literature. The method includes the steps of constructing a time series networks of words, passages, documents, and citations and/or co-citations within received information into a synthesized network, decomposing the networks into clusters of fields or topics, performing part-of-speech tagging of text within the received information to provide tagged text, constructing semantic structures of concepts and/or assertions extracted from the source text, generating citation-based and content-based summaries of the clusters of fields or topics and the semantic structures, and generating structured narratives of the clusters of fields or topics and the summaries of the generated semantic structures. Narratives of the citation-based and content-based summaries are merged into a systematic review.	07-10-2014
20140207782	SYSTEM AND METHOD FOR COMPUTERIZED SEMANTIC PROCESSING OF ELECTRONIC DOCUMENTS INCLUDING THEMES - System and method for computerized identification of themes in a large data set, the system comprising reducing the number of data set members in a large data set, using at least one computerized data set member pruning technique other than random selection; and using a computerized theme identification technique for identifying a plurality of themes in the reduced data set.	07-24-2014
20140207783	SYSTEM AND METHOD FOR COMPUTERIZED IDENTIFICATION AND EFFECTIVE PRESENTATION OF SEMANTIC THEMES OCCURRING IN A SET OF ELECTRONIC DOCUMENTS - System and method for computerized identification and presentation of semantic themes occurring in a set of electronic documents, comprising performing topic modeling on the set of documents thereby to yield a set of topics and for each topic, a topic-modeling output list of words; and using a processor performing a matching algorithm to match only a subset of each topic-modeling output list of words, to the output list's corresponding topic, such that each word appears in no more than a predetermined number of subsets from among said subsets.	07-24-2014
20140207784	SAMPLING OF EVENTS TO USE FOR DEVELOPING A FIELD-EXTRACTION RULE FOR A FIELD TO USE IN EVENT SEARCHING - Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.	07-24-2014
20140214841	Semantic Product Classification - The present disclosure extends to methods, systems, and computer program products for updating a merchant database with new product items and placing the new product items within a hierarchy of existing merchant product offerings. In operation, the new product is represented by a title and description that can be semantically classified using a plurality of classification models and reviewed by users for accuracy.	07-31-2014
20140214842	SUMMARIZATION OF SHORT COMMENTS - A method and a system for summarization of short comments are provided. The system comprises a memory to store a comments collection. The comments collection stores a plurality of comments for later access. The comments respectively include an overall rating and at least one phrase. The system also includes one or more processors to implement an aspect module to map a portion of the plurality of comments to a first aspect corresponding to an attribute of the entity. The one or more processor also implementing a rating module to determine an aspect rating corresponding to the first aspect based on the respective overall rating of the portion of the plurality of comments.	07-31-2014
20140250127	SYSTEM AND METHOD FOR CLUSTERING CONTENT ACCORDING TO SIMILARITY - Systems and methods for clustering content according to similarity are provided that identify and group similar content using a set of tags associated with the content. A topic model of a group of content is built, producing a probability distribution of topic membership for the content. Individual items of content are then clustered using a clustering algorithm, and a distance matrix from the probability distribution is built. Based on the distance matrix, individual items of content are labeled as “must-link” or “cannot-link” pairs with the group of content. The topic model is then embedded into successively smaller dimensions using a kernel method, until the clustering is stable with respect to both the behavioral and content domains.	09-04-2014
20140258301	ENTITY DISAMBIGUATION IN NATURAL LANGUAGE TEXT - A device analyzes first text to identify a pair of terms, within the first text, that are alias terms. The device analyzes the first text by performing two or more of: a latent semantic analysis of the pair of terms, based on the pair of terms being associated with a particular tag; a tag-based analysis that determines that the pair of terms are associated with compatible tags; a transitive analysis that determines that a pair of neighbor terms, associated with the pair of terms, are associated with compatible tags; or a co-location analysis based on a distance between the pair of terms in the first text. The device generates, based on analyzing the first text, a glossary that includes the pair of terms identified as alias terms. The device replaces terms within the first text or a second text that is different from the first text, using the glossary.	09-11-2014
20140258302	INFORMATION RETRIEVAL DEVICE AND INFORMATION RETRIEVAL METHOD - An information retrieval device for retrieving information related to a word includes: an input unit for inputting a word; a pattern generation unit which, upon input of a new word after input of a given number of words, generates a word group in a case of adding the new word to a previously input word and a word group in a case of replacing a previously input word with the new word; an occurrence information derivation unit which, for each of the word groups generated, derives occurrence information corresponding to a probability of occurrence of the word group; and a determination unit which determines a word group to be used in new retrieval, based on the derived occurrence information.	09-11-2014
20140280168	METHOD AND SYSTEM FOR IMPLEMENTING AUTHOR PROFILING - Disclosed is an improved method, system, and computer program product for analyzing interests of consumers, where semantic analysis is performed on writings by authors on social media sites. The results of the semantic analysis provide a profile of the authors. These author profiles can be used to identify and correlate topical interests by consumers. An enterprise or business can more effectively market to the consumers based upon this knowledge of the consumers' interests.	09-18-2014
20140280169	Method And Apparatus For A Frequently-Asked Questions Portal Workflow - In FAQ based systems, associating questions with answers can be a time consuming task if performed manually. In one embodiment, a method of building a frequently-asked questions (FAQ) portal can include creating cluster labels. The labels can include predefined universal semantic labels and application-specific labels. The method can further include applying the cluster labels to clusters of queries within an FAQ application. The method can additionally include adjusting the application-specific labels to support combined and newly created clusters of queries based on application-specific queries within the FAQ application on an ongoing basis and reapplying the universal semantic labels and the adjusted application-specific labels to the combined and newly created clusters of queries. The method and system proposed herein allow for the automated clustering of queries and association with applicable answers, which leads to higher efficiencies for a faster response time for a user.	09-18-2014
20140280170	COMPUTER-READABLE STORAGE MEDIUM STORING A GROUPING SUPPORT PROGRAM, GROUPING SUPPORT METHOD AND GROUPING SUPPORT SERVER - A computer-readable storage medium storing a grouping support program of opinion information that, when executed by a computer, performs a grouping support method includes specifying a related opinion information related to a first opinion information and a second opinion information among a plurality of collected opinion information based on a related word guided from the first opinion information and the second opinion information selected from the plurality of collected opinion information; and grouping the specified related opinion information in the same group as the first opinion information and the second opinion information.	09-18-2014
20140317121	SUPPORTING ACQUISITION OF INFORMATION - An apparatus supports acquisition of information from a document including a plurality of words. An acquisition hardware unit acquires first information that shows a degree to which the document belongs to each of a plurality of clusters based on a concept included in the document. Second information shows a degree to which a single word among the plurality of words appears in each of the plurality of clusters based on a concept of the single word. A generation hardware unit, based on the first and second information, generates third information that shows a degree of overlap between the concept included in the document and the concept of the single word. A determination hardware unit determines whether or not the third information shows a degree of overlap that is lower than a predetermined criterion, and an output hardware unit outputs a result of this determination.	10-23-2014
20140351255	METHOD AND SYSTEM FOR RECOMMENDING KEYWORD BASED ON SEMANTIC AREA - A method of recommending a keyword includes redefining a semantic area using a search log including location information, and providing a keyword associated with the semantic area to a user located in the semantic area as a recommended keyword.	11-27-2014
20140365494	SEARCH TERM CLUSTERING - When conducting the same or similar search, different users can use different search terms and phrases, resulting in an increase in the quantity of unique search terms and phrases. The intent of the various search terms and phrases is determined based on clustering of the terms and phrases of the various users. User search terms bare clustered using semantic and syntactic distances. Thus, the search engine receives a search query from a user and computes a similarity between and among user search terms. The computation uses syntactic techniques to analyze lexical aspects of linguistic terms, and semantic techniques to consider activity of the user in the particular field of interest. A similarity metric is used to determine the similarity between two search terms by computing their syntactic and semantic distances. A clustering technique is then used to cluster search terms based on their pair-wise distance.	12-11-2014
20150019558	IDENTIFICATION OF SEMANTIC RELATIONSHIPS WITHIN REPORTED SPEECH - Methods and computer-readable media for associating words or groups of words distilled from content, such as reported speech or an attitude report, of a document to form semantic relationships collectively used to generate a semantic representation of the content are provided. Semantic representations may include elements identified or parsed from a text portion of the content, the elements of which may be associated with other elements that share a semantic relationship, such as an agent, location, or topic relationship. Relationships may also be developed by associating one element that is in relation to, or is about, another element, thereby allowing for rapid and effective comparison of associations found in a semantic representation with associations derived from queries. The semantic relationships may be determined based on semantic information, such as potential meanings and grammatical functions of each element within the text portion of the content.	01-15-2015
20150046459	MINING MULTILINGUAL TOPICS - Techniques for utilizing data mining technology to extract universal topics with multilingual representations from a multilingual database, and to organize existing or new documents in different languages by analyzing their respective topic distributions.	02-12-2015
20150066939	GROUPING SEMANTICALLY RELATED NATURAL LANGUAGE SPECIFICATIONS OF SYSTEM REQUIREMENTS INTO CLUSTERS - A device may analyze text to identify a set of text portions of interest, and may analyze the text to identify a set of terms included in the set of text portions. The device may perform a similarity analysis to determine a similarity score. The similarity score may be determined between each term, included in the set of terms, and each text portion, included in the set of text portions, or the similarity score may be determined between each term and each other term included in the set of terms. The device may determine a set of dominant terms based on performing the similarity analysis. The set of dominant terms may include at least one term with a higher average degree of similarity than at least one other term. The device may provide information that identifies the set of dominant terms.	03-05-2015
20150074112	Multimedia Question Answering System and Method - An embodiment provides a multimedia question answering system and method. The system includes a question input unit, configured to receive a text question input by a user, a parsing unit, configured to acquire feature information and a semantic category of the text question, a category determining unit, configured to determine whether the semantic category exists in a preset multimedia database. The system further includes a similarity acquiring unit, configured to, when a determination result is yes, match the feature information with all text features corresponding to the semantic category in the database, so as to acquire a similarity between each text feature and the feature information. The system also includes a multimedia answer output unit, configured to acquire a corresponding text feature when the similarity is greater than a preset threshold, and output multimedia answer information corresponding to the text feature and prestored in the multimedia database.	03-12-2015
20150081714	Active Knowledge Guidance Based on Deep Document Analysis - An approach is provided for an information handling system to present knowledge-based information. In the approach, a semantic analysis is performed on the document with the analysis resulting in various sets of semantic content. Each of the sets of semantic content corresponds to an area in the document. The areas of the document are visually highlighted using visual indicators that show the availability of the sets of semantic content to a user via a user interface. In response to a user selection, such as a selection using the user interface or a user specified configuration setting, a selected set of semantic content is displayed to the user using the interface.	03-19-2015
20150081715	RETRIEVAL DEVICE AND METHOD - A processor performs semantic analysis on a query and generates one or more semantic structures where each structure is expressed by a graph. The processor generates retrieval keys corresponding to combinations of nodes connected directly or indirectly in the semantic structures, in addition to retrieval keys corresponding to minimum units of semantic connections between nodes in the generated semantic structures. The processor retrieves relevant documents whose sentences are matched to combinations of nodes, by using the generated retrieval keys, in the semantic structures stored in an index for retrieval on a database storing the documents.	03-19-2015
20150088896	CATEGORIZING OBJECTS, SUCH AS DOCUMENTS AND/OR CLUSTERS, WITH RESPECT TO A TAXONOMY AND DATA STRUCTURES DERIVED FROM SUCH CATEGORIZATION - A Website may be automatically categorized by (a) accepting Website information, (b) determining a set of scored clusters (e.g., semantic, term co-occurrence, etc.) for the Website using the Website information, and (c) determining at least one category (e.g., a vertical category) of a predefined taxonomy using at least some of the set of clusters.	03-26-2015
20150100584	METHOD, COMPUTER PROGRAM AND APPARATUS FOR ANALYZING SYMBOLS IN A COMPUTER SYSTEM - The present invention provides a computer-implemented method of analyzing messages in a computer system to allow workflows constituted by the messages to be identified, the method comprising: analyzing a sequence of messages in a computer system in order to classify the messages, thereby producing a corresponding sequence of classifications of the messages; and, applying sequence induction to the sequence of classifications of the messages to produce (i) a set or sub-sequences of the classifications of the messages and (ii) a sequence grammar for the sub-sequences, from which a workflow constituted by the sequence of messages can be identified.	04-09-2015
20150120738	SYSTEM AND METHOD FOR DOCUMENT CLASSIFICATION BASED ON SEMANTIC ANALYSIS OF THE DOCUMENT - A computer based method and system for classifying a document into one or more categories. The method and system can be configured to identify one or more cluster of clauses or sentences from a plurality of semantically similar clauses of the document and determine one or more representative concepts for each cluster of the document. Accordingly, one or more categories for the document are determined from the one or more representative concepts and the document is classified into the one or more categories.	04-30-2015
20150127652	LABELING/NAMING OF THEMES - The disclosed solution uses machine learning-based methods to improve the knowledge extraction process in a specific domain or business environment. By formulizing a specific company's internal knowledge and terminology, the ontology programming accounts for linguistic meaning to surface relevant and important content for analysis. For example, the disclosed ontology programming adapts to the language used in a specific domain, including linguistic patterns and properties, such as word order, relationships between terms, and syntactical variations. Based on the self-training mechanism developed by the inventors, the ontology programming automatically trains itself to understand the domain or environment of the communication data by processing and analyzing a defined corpus of communication data.	05-07-2015
20150134666	DOCUMENT RETRIEVAL USING INTERNAL DICTIONARY-HIERARCHIES TO ADJUST PER-SUBJECT MATCH RESULTS - Techniques for managing big data include retrieval using per-subject dictionaries having multiple levels of sub-classification hierarchy within the subject. Entries may include subject-determining-power (SDP) scores that provide an indication of the descriptive power of the entry term with respect to the subject of the dictionary containing the term. The same term may have entries in multiple dictionaries with different SDP scores in each of the dictionaries. A retrieval request for one or more documents containing search terms descriptive of the one or more documents can be processed by identifying a set of candidate documents tagged with subjects, i.e., identifiers of per-subject dictionaries having entries corresponding to a search term, then using affinity values to adjust the aggregate score for the terms in the dictionaries. Documents are then selected for best match to the subject based on the adjusted scores. Alternatively, the adjustment may be performed after selecting the documents by re-ordering them according to adjusted scores.	05-14-2015
20150142812	Methods And Systems For Query Segmentation In A Search - The present application discloses a method, a server and a computer readable storage medium for segmenting a search query. The server receives a query segmentation request including a search query, and the search query further includes an ordered sequence of semantic elements. Each semantic element is correlated with one or more predetermined search terms each at least including the respective semantic element. The server further modifies the search terms by replacing irrelevant semantic elements with segmentation identifiers. The modified search terms are then combined to form combined search queries each of which includes the ordered sequence of semantic elements and at least one segmentation identifier that separates the semantic elements. A specific combined search query is identified based on search probabilities of the combined search queries, and the search query is segmented according to a location of at least one segmentation identifier in the specific combined search query.	05-21-2015
20150293982	DISPLAYING A REPRESENTATIVE ITEM FOR A COLLECTION OF ITEMS - Displaying a representative item for a collection of items includes obtaining, from at least one source, a history of interests associated with a user, analyzing the history of interests associated with the user to determine preference criteria for the user, identifying, based on the preference criteria for the user, a representative item for a collection of items, and displaying, to the user, the representative item.	10-15-2015
20150302013	SEMANTIC LABELING APPARATUS AND METHOD THEREOF - A semantic labeling apparatus and method thereof include a place identifier processor configured to, based on location data of a user, generate place attributes of places that indicate information of a user visit for each place, wherein user location remains unchanged within the places for a predetermined period of time. A group identifier processor is configured to cluster the places based on the place attributes, classify the places into groups, acquire a semantic label for each of the groups, and designate the acquired semantic label as the semantic label of each of the groups. A label determiner is configured to determine the semantic label of each of the groups as a semantic label of each member place of each of the groups.	10-22-2015
20150302014	SYSTEM AND PROCESS FOR BUILDING A CATALOG USING VISUAL OBJECTS - A method including: clustering a plurality of records, each record comprises at least one object image and at least one textual field associated with the object, to yield a plurality of clusters such that the object images in each cluster exhibit between them a visual similarity above a specified value; associating each cluster with a label by applying a dictionary function to the textual fields of each cluster, wherein the label reflects a common semantic factor of the textual fields of each cluster, wherein the common semantic factor has a value above a specified threshold. Accordingly, the visual similarity provides a measure of resemblances between two visual objects that can be based on at least one of: the fit between their color distribution such as correlation between their HSV color histograms, the fit between their texture, the fit between their shapes, the correlation between this edge histograms and face similarity.	10-22-2015
20150309986	METHODS, SYSTEMS, AND DEVICES FOR MACHINES AND MACHINE STATES THAT FACILITATE MODIFICATION OF DOCUMENTS BASED ON VARIOUS CORPORA AND/OR MODIFICATION DATA - Computationally implemented methods and systems include accepting a submission of a particular document that includes at least one particular lexical unit, facilitating acquisition of document modification data that includes data configured to be used to determine a modification to the particular document, and receiving an updated document in which at least a portion of at least one occurrence of the at least one particular lexical unit has been replaced with at least a portion of an acquired replacement lexical unit that is at least partly based on the document modification data. In addition to the foregoing, other aspects are described in the claims, drawings, and text.	10-29-2015
20150309991	INPUT SUPPORT DEVICE, INPUT SUPPORT METHOD, AND INPUT SUPPORT PROGRAM - An input support device according to one embodiment includes a receiving unit, a search unit and an output unit. The receiving unit receives an input letter string in Roman letters. The search unit performs first processing that searches a storage unit storing alphabetical words/phrases in a first language and romanized words/phrases in a second language corresponding to the words/phrases in the first language in a way their correspondence can be specified for words/phrases in the first language containing the input letter string, and second processing that searches the storage unit for words/phrases in the first language corresponding to the romanized word/phrase containing the input letter string. The output unit outputs a result of the first processing and a result of the second processing as input candidates.	10-29-2015
20150310004	DOCUMENT MANAGEMENT SYSTEM, DOCUMENT MANAGEMENT METHOD, AND DOCUMENT MANAGEMENT PROGRAM - It is possible to reduce a review load of a reviewer. A document management system acquires digital information recorded in a plurality of computers or a server and analyzes the acquired digital information for relevance to a lawsuit. The document management system includes a thread classification unit that verifies supplementary information of each piece of document data included in the digital information and classifies the document data into threads based on the supplementary information, a similarity analysis unit that extracts elements included in the supplementary information of the classified document data for each thread and analyzes similarity between the threads based on the extracted elements, and an integration unit that integrates the threads based on the similarity.	10-29-2015
20150310099	System And Method For Generating Labels To Characterize Message Content - A system and method for generating labels to characterize message content are provided. At least one component, associated with a document, is extracted from a message. Words regarding the extracted component are extracted from the message as candidate labels. Those candidate labels that are discriminative of the document associated with the extracted component are identified by comparing the candidate labels for the component with other candidate labels extracted from other messages with at least one of a same and a different component. Content of the message is characterized using the discriminative candidate labels.	10-29-2015
20150331847	APPARATUS AND METHOD FOR CLASSIFYING AND ANALYZING DOCUMENTS INCLUDING TEXT - A document classification and analysis system includes a processor, a memory including one or more storage regions, and a non-transitory computer-readable medium having stored thereon instructions that, when executed, cause the processor to perform a method. The method includes receiving a document including a plurality of words, performing morpheme analysis on the document to extract original forms of the words, tagging each of the words based on a corresponding part-of-speech, determining location information of the words based on an order of the words in the document, applying one or more lexicon lists to the document to classify each of the words, and storing the location information.	11-19-2015
20150331929	NATURAL LANGUAGE IMAGE SEARCH - Natural language image search is described, for example, whereby natural language queries may be used to retrieve images from a store of images automatically tagged with image tags being concepts of an ontology (which may comprise a hierarchy of concepts). In various examples, a natural language query is mapped to one or more of a plurality of image tags, and the mapped query is used for retrieval. In various examples, the query is mapped by computing one or more distance measures between the query and the image tags, the distance measures being computed with respect to the ontology and/or with respect to a semantic space of words computed from a natural language corpus. In examples, the image tags may be associated with bounding boxes of objects depicted in the images, and a user may navigate the store of images by selecting a bounding box and/or an image.	11-19-2015
20150331936	METHOD AND SYSTEM FOR EXTRACTING A PRODUCT AND CLASSIFYING TEXT-BASED ELECTRONIC DOCUMENTS - A system to automatically enhance, tag, classify, categorize, cluster and index products described in unstructured text-based electronic documents. The system and method incorporate the use of text normalization, regular expressions, product number matching rules, text segmentation, entity detection, language models, predictive modeling, hierarchal subspace clustering, formal concept analysis, and a weighted combination of all techniques to detect and infer knowledge extracted from a digital version of raw, unstructured product text. Knowledge extracted and inferred comprises knowledge units including: main conceptual entity, entity text patterns, product language models, and conceptual hierarchies. The extracted knowledge units are utilized to store and index products in a product knowledge database and the products and knowledge units are made available to users via a user interface.	11-19-2015
20150339369	GENERATING PARTITIONED HIERARCHICAL GROUPS BASED ON DATA SETS FOR BUSINESS INTELLIGENCE DATA MODELS - Techniques are described for generating a hierarchical group based on a set of data. In one example, a method includes classifying two or more data items from a set of data with respect to a library of ontological concepts. The method further includes classifying the two or more data items with respect to lexical correlations between the two or more data items. The method further includes generating a hierarchical group in which the two or more data items are partitioned into one or more hierarchical partitions based at least in part on the classifying with respect to the library of ontological concepts and the classifying with respect to the lexical correlations, wherein each of the one or more hierarchical partitions comprises the two or more data items.	11-26-2015
20150339376	NATURAL LANGUAGE DATA ANALYTICS PLATFORM - A system for natural language analytics, stored and operating on a network-connected computing device, comprising a natural language application data importer, further comprising a natural language application data importer, a natural language application data augmenter that enriches the data and an analytics component which provides a means of querying structured as well as unstructured data and which also contains a method for providing adaptive natural language analytics.	11-26-2015
20150347562	DERIVING USER CHARACTERISTICS FROM USERS' LOG FILES - Systems and methods for generating a grammar describing activities of a user are disclosed. An aspect receives log data for the user, clusters the log data around a plurality of cluster centroids, assigns one or more semantic labels to each of the plurality of cluster centroids based on determining that a threshold number of log data points have been assigned to each of the plurality of cluster centroids, determines a sequence in which the log data points were clustered around the plurality of cluster centroids, generates one or more grammars representing a sequence of possible activities of the user based on the sequence in which the log data points were clustered around the plurality of cluster centroids and the one or more semantic labels of each of the plurality of cluster centroids, and filters the assigned one or more semantic labels for each of the plurality of cluster centroids.	12-03-2015
20150356418	METHODS AND APPARATUS FOR IDENTIFYING CONCEPTS CORRESPONDING TO INPUT INFORMATION - Techniques for use in identifying one of more concepts in a knowledge representation (KR). The techniques include obtaining user context information associated with a user, wherein the user context information comprises a plurality of words; Also included are semantic disambiguation techniques comprising obtaining user context information associated with a user, wherein the user context information comprises a first portion and a second portion different from the first portion; and disambiguating between a first and second concept in a knowledge representation (KR) associated with a first meaning of the first portion. Semantic disambiguation techniques further include obtaining user context information associated with a user, wherein the user context information comprises a first portion and a second portion different from the first portion; and disambiguating between a first concept and second concept in a knowledge representation (KR) using a measures of dominance and semantic coherence. Additionally, techniques are disclosed for calculating a measure of semantic coherence based on a graph of a knowledge representation (KR) and, an overlap of semantic context of a first concept and a second concept in the KR.	12-10-2015
20150370783	METHOD OF USING A SEMANTIC WEB DATA SOURCE IN A TARGET APPLICATION - A method of using a semantic web data source in a target application includes the target application calling the application program interface of a bridge component, and the bridge component retrieves the required data from the semantic web data source, translates the retrieved data semantically and syntactically to reflect the meaning and syntax of the target application, and returns the translated data in the format of the target application.	12-24-2015
20150370887	SEMANTIC MERGE OF ARGUMENTS - A method comprising using at least one hardware processor for: receiving a topic under consideration (TUC) and a set of claims referring to the TUC; identifying semantic similarity relations between claims of the set of claims; clustering the claims into a plurality of claim clusters based on the identified semantic similarity relations, wherein said claim clusters represent semantically different claims of the set of claims; and generating a list of non-redundant claims comprising said semantically different claims.	12-24-2015
20150379147	SYSTEM AND METHODS FOR PREDICTING USER BEHAVIORS BASED ON PHRASE CONNECTIONS - A method and system for predicting user behaviors based on term taxonomies are provided. The system comprises generating phrases respective of user generated content, wherein each phrase is a sentiment phrase or a non-sentiment phrase, each sentiment phrase including at least one word describing a sentiment; identifying at least one connection between at least two of the generated phrases, wherein each connection is a direct connection or a hidden connection; generating at least one term taxonomy based on the identified at least one connection, wherein each term taxonomy is an association between a non-sentiment phrase and at least one of a plurality of sentiment phrases; periodically analyzing the at least one term taxonomy to determine at least one trend of each non-sentiment phrase respective of the associated plurality of sentiment phrases; and generating a prediction of future behavior of the at least one trend with respect to the at least one term taxonomy.	12-31-2015
20160012039	PROVIDING CONTEXT IN FUNCTIONAL TESTING OF WEB SERVICES	01-14-2016
20160012051	COMPUTING FEATURES OF STRUCTURED DATA	01-14-2016
20160012120	REAL-TIME DATA MANAGEMENT FOR A POWER GRID	01-14-2016
20160012122	AUTOMATICALLY LINKING TEXT TO CONCEPTS IN A KNOWLEDGE BASE	01-14-2016
20160012123	METHODS OF EVALUATING SEMANTIC DIFFERENCES, METHODS OF IDENTIFYING RELATED SETS OF ITEMS IN SEMANTIC SPACES, AND SYSTEMS AND COMPUTER PROGRAM PRODUCTS FOR IMPLEMENTING THE SAME	01-14-2016
20160026690	CONVERSATION ANALYTICS - A system for conversation analytics comprising an analytics server stored and operating on a network-connected device that receives and processes conversations from various sources over a communications network, a plurality of communication bridges that may connect to and receive communication data from various communication endpoints, a media server that receives communication data from the bridges and provides this data to the analytics server for analysis, and a database that may store data from various elements of the system and provide them as needed for future reference.	01-28-2016
20160034560	METHOD AND SYSTEM FOR SECURELY STORING PRIVATE DATA IN A SEMANTIC ANALYSIS SYSTEM - Disclosed is an approach for allowing an entity to perform semantic analysis in a SaaS semantic analysis platform upon private data possessed by one or more entities. In one or more embodiments, separate processing pipelines may be provided to the plurality of entities thereby keeping private data secure within the semantic analysis platform. In one or more embodiments, a common processing pipeline is provide with data associated a first entity being assigned a first identification code, and data associated with a second entity being assigned a second identification code.	02-04-2016
20160042053	METHODS AND SYSTEMS FOR MAPPING DATA ITEMS TO SPARSE DISTRIBUTED REPRESENTATIONS - A method of mapping data items to sparse distributed representations (SDRs) includes clustering in a two-dimensional metric space, by a reference map generator, a set of data documents selected according to at least one criterion, generating a semantic map. The semantic map associates a coordinate pair with each of the set of data documents. A parser generates an enumeration of data items occurring in the set of data documents. A representation generator determines, for each data item in the enumeration, occurrence information. The representation generator generates a distributed representation using the occurrence information. A sparsifying module receives an identification of a maximum level of sparsity. The sparsifying module reduces a total number of set bits within the distributed representation based on the maximum level of sparsity to generate an SDR having a normative fillgrade.	02-11-2016
20160063044	DISTRIBUTED SORTING OF EVENT LOG FILES - In an example, composite keys for an event log may be provided. A partitioner may be configured to extract a natural key from the composite keys and distribute log lines of event log files to a plurality of reducer nodes based on a value of the natural key. A comparator may use a log time of the composite key to sort a received portion of the distributed log lines.	03-03-2016
20160063095	UNSTRUCTURED DATA GUIDED QUERY MODIFICATION - A method, system, and computer program product for unstructured data guided query modification are provided in the illustrative embodiments. A set of parameters is identified in a structured database query. Using a Natural language processing (NLP) engine, a set of tokens is identified in an unstructured data. Using the NLP engine, corresponding to a subset of the set of parameters, sets of variations are obtained. A fit is found between a first token from the set of tokens and a first variant of a first parameter, the first variant of the first parameter being a member of a first set of variations corresponding to the first parameter. The first parameter in the structured database query is substituted with the first variant to produce a substituted query, wherein the substituted query produces a result set that is related to the unstructured data.	03-03-2016
20160063100	SEMANTIC DATA STRUCTURE AND METHOD - Disclosed is a method, a device, a system and/or a manufacture of a semantic data structure. In one embodiment, a physical memory usable to store information within a datastore comprises a number of domains. Each domain includes a unique identifier and organizes data into a domain structure that includes and an identity element, a content element, and a context element, each of which may be implemented as an EAV triplet. A fundamental instantiation of the domain structure contains a primitive data and a relational instantiation of the domain structure contains references to other domains. The references of the content element may be constrained, for example to a directed acyclic graph architecture, while references of the context element may reference any domain. Additional instantiations may build orders of referential structure, provide security and control of data resources within the datastore, and model users and application programs.	03-03-2016
20160078126	Computer-Implemented System And Method For Generating Document Groupings For Display - A computer-implemented system and method for generating document groupings is provided. A lexicon of terms extracted from a set of documents is generated. The lexicon includes a frequency of each extracted term within each document in the set. Concepts each having two or more of the extracted terms are generated. A subset of the documents in the set is selected based on the term frequencies. The subset of documents is grouped into clusters based on the concepts. A similarity of each document cluster is calculated with one or more documents based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms. The weights are updated until a rate of change for each cluster becomes constant.	03-17-2016
20160085843	PERSPECTIVE DATA ANALYSIS AND MANAGEMENT - A system and computer-implemented method for managing perspective data is disclosed. The method may include identifying a variant feature of an item having a first set of perspective data. The method may include grouping, based on the variant feature, the first set of perspective data into a first group and a second group. The method may include determining a first set of relevancy scores for the first group and a second set of relevancy scores for the second group. The method may also include establishing, using at least one of the first and second relevancy scores, a second set of perspective data configured to include a subset of the first set of perspective data.	03-24-2016
20160085855	PERSPECTIVE DATA ANALYSIS AND MANAGEMENT - A system and computer implemented method for managing perspective data is disclosed. The method may include collecting a first lot of perspective data for an item. The method may include introducing a variant feature to the item to constitute a modified item. The method may include collecting a second lot of perspective data for the modified item. The method may also include evaluating the first and second lots of perspective data to ascertain a sentiment fluctuation based on information relevant to the variant feature.	03-24-2016
20160092448	Method For Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon - An approach is provided for identifying entity relationships based on word classifications extracted from business documents stored in a plurality of corpora. In the approach, performed by an information handling system, a plurality of cluster classifications are identified for the business documents so that entity information from the business documents can be classified or assigned to the cluster classifications, such as by performing natural language processing (NLP) analysis of the business documents. The approach applies semantic analysis to identify and score entity relationships between the entity information classified in the cluster classifications, and based on the scored entity relationships, cluster relationships between the cluster classifications are identified.	03-31-2016
20160092549	Information Handling System and Computer Program Product for Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon - An approach is provided for identifying entity relationships based on word classifications extracted from business documents stored in a plurality of corpora. In the approach, performed by an information handling system, a plurality of cluster classifications are identified for the business documents so that entity information from the business documents can be classified or assigned to the cluster classifications, such as by performing natural language processing (NLP) analysis of the business documents. The approach applies semantic analysis to identify and score entity relationships between the entity information classified in the cluster classifications, and based on the scored entity relationships, cluster relationships between the cluster classifications are identified.	03-31-2016
20160098483	AUTOMATIC DOCUMENT CLASSIFICATION VIA CONTENT ANALYSIS AT STORAGE TIME - Techniques are disclosed for efficiently and automatically classifying textual documents or files. In some embodiments, the classification process is integrated into or otherwise made part of the storage function, such that when the user initiates a save process for a given file, the file is processed through a classifier prior to (or contemporaneously with) completing the save function. In some such embodiments, textual content of the file is analyzed using natural language processing to identify a main or substantial concept discussed in the file, and one or more corresponding tags are then assigned to that file. Subsequently, the user can access that file based on the one or more tags, for instance, through a user interface that allows the user to select one or more content categories associated with the assigned tags. The files can be text-based, but may include other content as well, such as images, video, and audio.	04-07-2016
20160103885	SYSTEM FOR, AND METHOD OF, BUILDING A TAXONOMY - A taxonomy is built by associating metadata with search terms. A body of data records is analyzed to identify pairs of search terms co-occurring in individual data records and to obtain an observed measure of the frequency of such co-occurrences between identified pairs. A taxonomy is then built by constructing metadata and associating the search terms with respective metadata, the metadata for each co-occurring search term identifying at least one other search term with which it co-occurs, together with a measure of relatedness based on the observed co-occurrence frequency measure between the co-occurring pair.	04-14-2016
20160110762	EXTRACTING PRODUCT PURCHASE INFORMATION FROM ELECTRONIC MESSAGES - Improved systems and methods for extracting product purchase information from electronic messages transmitted between physical network nodes to convey product purchase information to designated recipients. These examples provide a product purchase information extraction service that is able to extract product purchase information from electronic messages with high precision across a wide variety of electronic message formats and thereby solve the practical problems that have arisen as a result of the proliferation of different electronic message formats used by individual merchants and across different merchants and different languages. In this regard, these examples are able to automatically learn the structures and semantics of different message formats, which accelerates the ability to support new message sources, new markets, and different languages.	04-21-2016
20160117402	SYSTEMS AND METHODS FOR INTEGRATING PERSONAL SOCIAL NETWORKS WITHIN AN ORGANIZATION - A method includes receiving, from a user device, a first signal, the first signal including an authorization indicator associated with a social network system profile of a first user. A second signal is sent, the second signal including a first request for social network information associated with the first user. The first request is based at least in part on the authorization indicator. A third signal is received, the third signal including social network information associated with the first user. The method further includes receiving, from a second user, a fourth signal, the fourth signal including a second request for social network information associated with the first user. One or more metrics is defined, based at least in part on the social network information associated with the first user. A fifth signal is sent, the fifth signal sent such that a visual element based at least in part on the one or more metrics is displayed at an output device.	04-28-2016
20160125006	INDEXING CONTENT AND SOURCE CODE OF A SOFTWARE APPLICATION - In a method for generating a searchable index from an analysis of a software application, receiving a first software application. The one or more processors determine that a first source code of the first software application is inaccessible. The one or more processors stimulate the first software application. The one or more processors analyze textual data resulting from the stimulation of the first software application. The one or more processors classify one or more images resulting from the stimulation of the first software application. The one or more processors index the analyzed textual data and the classified one or more images resulting from the stimulation of the first software application.	05-05-2016
20160125065	STORAGE MEDIUM, INFORMATION PRESENTATION METHOD, AND INFORMATION PRESENTATION APPARATUS - A non-transitory computer-readable storage medium stores a program that causes a computer to execute a process. The process includes obtaining data including multiple character strings that are separated from each other, identifying a first character string and second character strings in the obtained data, extracting third character strings indicating relationships between the first character string and the second character strings from character string collections stored in a database, selecting a character string collection from the character string collections based on proportions of the extracted third character strings included in the respective character string collections, and outputting information on the selected character string collection. Each of the character string collections includes character string sets each of which includes two character strings and a third character string indicating a relationship between the two character strings.	05-05-2016
20160132521	SYSTEMS AND METHODS FOR FILE CLUSTERING, MULTI-DRIVE FORENSIC ANALYSIS AND DATA PROTECTION - A system and method for file clustering, multi-drive forensic analysis and protection of sensitive data. Multiple memory devices can store files. A module can extract characteristics from the stored files, identify similarities between the files based on the extracted characteristics and generate file clusters based on the identified similarities. A visual representation of the file clusters, which can be generated to show the identified similarities among the files, can be displayed by a user interface module.	05-12-2016
20160147735	MEDIA CONTENT SEARCH BASED ON A RELATIONSHIP TYPE AND A RELATIONSHIP STRENGTH - Provided are techniques for a media content search based on a relationship type and a relationship strength. Selection of two objects in a media file in media content is received. Search criteria for a relationship type and a relationship strength between the two objects is received. One or more media files in the media content are identified in which the two objects have the relationship type and the relationship strength.	05-26-2016
20160147868	MEDIA CONTENT SEARCH BASED ON A RELATIONSHIP TYPE AND A RELATIONSHIP STRENGTH - Provided are techniques for a media content search based on a relationship type and a relationship strength. Selection of two objects in a media file in media content is received. Search criteria for a relationship type and a relationship strength between the two objects is received. One or more media files in the media content are identified in which the two objects have the relationship type and the relationship strength.	05-26-2016
20160162542	Model Navigation Constrained by Classification - A method, system and computer-usable medium are disclosed for efficient searching of a semantic model of resources and resource relationships. A query is received from an application. In turn the query is processed to determine an application usage classification for the application, which is then used to reference an index of subsets of the semantic model to identify a subset of the semantic model associated with the application usage classification. The identified subset of the semantic model is then used to modify the query, which is then used as a modified query to query the semantic model. In response, a sub-graph of the semantic model corresponding to the subset of the semantic mode is received, which in turn is provided to the application.	06-09-2016
20160162569	METHODS AND SYSTEMS FOR IMPROVING MACHINE LEARNING PERFORMANCE - Systems and methods are presented for providing improved machine performance in natural language processing. In some example embodiments, an API module is presented that is configured to drive processing of a system architecture for natural language processing. Aspects of the present disclosure allow for a natural language model to classify documents while other documents are being retrieved in real time. The natural language model and the documents are configured to be stored in a stateless format, which also allows for additional functions to be performed on the documents while the natural language model is used to continue classifying other documents.	06-09-2016
20160162576	AUTOMATED CONTENT CLASSIFICATION/FILTERING - Apparatuses, components, methods, and techniques for classifying content are provided. An example method classifies textual content as objectionable. Another example identifies relevant attributes for the content. The example method includes analyzing a body of the content to determine a level of similarity between text in the content and a corpus of predetermined content. The example method further includes upon determining that the level of similarity is greater than a predefined threshold using natural language processing to extract a plurality of features from the content, the features being associated with concepts related to the body of the content. The example method further includes analyzing the extracted features to determine a second level of similarity between the content and the corpus of predetermined content. The example method further includes upon determining that the second level of similarity is greater than a second predefined threshold, classifying the content as objectionable.	06-09-2016
20160171088	DATA RELATIONSHIPS IN A QUESTION-ANSWERING ENVIRONMENT	06-16-2016
20160179902	MANAGING ANSWER FEASIBILITY	06-23-2016
20160179932	System And Method Of Personalized Message Threading For A Multi-Format, Multi-Protocol Communication System	06-23-2016
20160179940	Responding to Data Requests Related to Constrained Natural Language Vocabulary Terms	06-23-2016
20160179945	SYSTEM AND METHOD FOR THE INDEXING AND RETRIEVAL OF SEMANTICALLY ANNOTATED DATA USING AN ONTOLOGY-BASED INFORMATION RETRIEVAL MODEL	06-23-2016
20160179973	METHOD AND SERVER FOR ANALYZING SOCIAL MEDIA CONTENT BASED ON SURVEY PARTICIPATION DATA RELATED TO A WEBSITE	06-23-2016
20160188698	SYSTEM AND METHOD FOR BUILDING, VERIFYING AND MAINTAINING AN ONTOLOGY - A system and method for building, maintaining and verifying an ontology are disclosed. The ontology may be build based on a meronymy and a taxonomy. In one example, the generated ontology may be an automotive mechanical systems ontology and the meronymy is a vehicle systems meronymy and the taxonomy is a universal parts taxonomy. The system may utilize consistent lexical derivation of the elements in the generated taxonomy.	06-30-2016
20160203141	METHOD FOR LEARNING A LATENT INTEREST TAXONOMY FROM MULTIMEDIA METADATA	07-14-2016
20160253364	SYSTEM FOR LINKING DIVERSE DATA SYSTEMS	09-01-2016
20160378847	DISTRIBUTIONAL ALIGNMENT OF SETS - Technology for classifying a data set includes extracting one or more features from items of the data set, computing a specificity measure for the extracted features, and measuring the similarity of the extracted features to a set of characteristic features associated with the property of one or more reference models.	12-29-2016
20170235813	METHODS AND SYSTEMS FOR MODELING COMPLEX TAXONOMIES WITH NATURAL LANGUAGE UNDERSTANDING	08-17-2017
20190146981	LARGE SCALE SOCIAL GRAPH SEGMENTATION	05-16-2019
20220138245	SYSTEM AND METHOD TO AUTOMATICALLY CREATE, ASSEMBLE AND OPTIMIZE CONTENT INTO PERSONALIZED EXPERIENCES - A method and system provide the ability to personalize a digital channel. Multiple content assets are obtained and include an image content associate. Each of the assets is associated with an associated set of semantic elements. The multiple content assets are clustered into content clusters based on a similarity of the semantic elements. A first content asset is selected. The clustering is used as a metric to estimate distances between the first content asset and remaining multiple content assets. The remaining multiple content assets are scored based on the distances. One of the remaining multiple content assets is selected based on the scoring and provided for a personalized component of the digital channel. In addition, a coverage map that includes both users and content may be generated based on the clusters and then utilized to select the content asset.	05-05-2022
20220138431	METHOD AND SYSTEM FOR SECURELY STORING PRIVATE DATA IN A SEMANTIC ANALYSIS SYSTEM - Disclosed is an approach for allowing an entity to perform semantic analysis in a SaaS semantic analysis platform upon private data possessed by one or more entities. In one or more embodiments, separate processing pipelines may be provided to the plurality of entities thereby keeping private data secure within the semantic analysis platform. In one or more embodiments, a common processing pipeline is provide with data associated a first entity being assigned a first identification code, and data associated with a second entity being assigned a second identification code.	05-05-2022

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Latent semantic index or analysis (LSI or LSA)

Subclass of:

707 - Data processing: database and file management or data structures

707705000 - DATABASE AND FILE ACCESS

707736000 - Preparing data for information retrieval

707737000 - Clustering and grouping

Patent class list (only not empty are listed)

Deeper subclasses: