Patent application number | Description | Published |
20090265363 | FORUM WEB PAGE CLUSTERING BASED ON REPETITIVE REGIONS - Described is a technology by which forum web pages are processed into clusters for classification purposes, including by determining repetitive regions between pages and associating pages that have similar repetitive regions into a common cluster. Patterns corresponding to the regions are determined, and a feature set based at least in part on those patterns (e.g., pattern frequency) is extracted from the page. The feature set of a page is compared against the feature set of another page to determine similarity therewith, e.g., via a feature space distance computation that is evaluated against a threshold distance. | 10-22-2009 |
20090327237 | WEB FORUM CRAWLING USING SKELETAL LINKS - A method and system for identifying informative links of a web site for use in crawling the web site is provided. A forum crawler analyzes sample web pages of a web forum to identify informative links and then crawls the web forum by following links determined to be informative and not following other links. The forum crawler system determines whether links are informative based on whether they are part of the overall structure of the web site or are used to select sequential information that has been split onto multiple web pages. | 12-31-2009 |
20100169300 | Ranking Oriented Query Clustering and Applications - Techniques described herein allow for suggesting creation of tools for improving search engine performance. Specifically, these tools focus on producing more relevant search engine results via a URL-based query clustering method. These tools first extract tokens from Uniform Resource Locators associated to search queries. With these tokens, these tools form query clusters of common tokens. The resulting clusters can be used to help understand the similarities in user search queries via URL-based cluster queries to produce more relevant search results. | 07-01-2010 |
20100205168 | Thread-Based Incremental Web Forum Crawling - The incremental web forum crawling technique described herein is a web forum crawling technique that employs a thread-wise strategy that takes into account thread-level statistics, for example, the number of replies and the frequency of replies, to estimate the activity trend of each thread. To extract such statistical information, the technique employs a simple yet very robust approach to extract the timestamp of each post in a discussion thread. It also employs a regression model to predict the time of the next post for each thread. | 08-12-2010 |
20100211533 | EXTRACTING STRUCTURED DATA FROM WEB FORUMS - The web forum data extraction technique is designed for the structured data extraction of data on web forums using both page-level information and site-level knowledge. To do this, the technique finds the kinds of page objects a forum site has, which object a page belongs to, and how different page objects are connected with each other. This information can be obtained by re-constructing the sitemap of the target forum which is based on a Data Object Model of the target forum. The web forum data extraction technique collects three kinds of evidence for data extraction: 1) inner-page features which cover both semantic and layout information on an individual page; 2) inter-vertex features which describe linkage-related observations; and 3) inner-vertex features which characterize interrelationships among pages in one vertex. The technique employs Markov Logic Networks to combine the types of evidence statistically for inference and thereby can extract the desired structures. | 08-19-2010 |
20110077848 | TRAVELOGUE-BASED TRAVEL ROUTE PLANNING - A location extraction component analyzes a set of travelogues to identify locations mentioned therein. A co-occurrence extraction component computes co-occurrence values for the identified locations. When a request to generate a travel route from a starting location to an ending location is received, suggested locations on or near the travel route are identified through the use of the co-occurrence values. A suggested travel route is then generated that passes through the starting location, the ending location, and the suggested locations. A map may be displayed showing the starting location, the ending location, the suggested locations, and the suggested travel route. | 03-31-2011 |
20110078139 | Travelogue locating mining for travel suggestion - A location extraction component analyzes a set of travelogues to identify all of the locations mentioned therein. A co-occurrence extraction component computes co-occurrence values for the identified locations. When the identity of a specified location is received, suggested locations for the specified location are identified through the use of the co-occurrence values. A map is displayed that encompasses an area including the specified location and the suggested locations. The map might include indicators for the specified location and for each of the suggested locations. Attributes of the indicators, such as their size or color, can be modified based upon the co-occurrence value associated with the corresponding suggested location. | 03-31-2011 |
20110078575 | TRAVELOGUE-BASED CONTEXTUAL MAP GENERATION - A map user interface control provides functionality for displaying a map in conjunction with the display of a Web page. The map control operates in combination with a location extraction component that analyzes the contents of the Web page to identify locations mentioned therein. Once the location extraction component has identified the locations mentioned in the Web page, a map is generated that encompasses the locations identified in the Web page. Once the map has been generated, the map control displays the map in conjunction with the display of the Web page. The map might include visual indicators corresponding to the locations mentioned in the Web page. The map might also include visual indicators corresponding to other locations near the locations identified in the Web page that have been identified using co-occurrence values generated through an analysis of a set of travelogues. | 03-31-2011 |
20110264655 | LOCATION CONTEXT MINING - Concepts and technologies are described herein for mining location contexts within document text. Through an implementation of the concepts and technologies presented herein, functionality can be provided for location context mining within articles, websites, travelogues, or other such documents. A location context is a concept associated with a specific location. For example, the contexts “beach” and “hula” are associated with Hawaii. Similarly, “glacier” and “polar bear” are contexts associated with Alaska. Location context mining can automatically discover locations and location contexts by mining information from a set of documents. User interfaces to support queries of the mined information are also presented herein. | 10-27-2011 |
20110264664 | IDENTIFYING LOCATION NAMES WITHIN DOCUMENT TEXT - Concepts and technologies are described herein for identifying location names within document text. Through an implementation of the concepts and technologies presented herein, functionality can be provided for identifying location names within articles, websites, travelogues, or other such documents. For instance, documents containing the names of cities, regions, countries, landmarks, or other locations may be associated with those locations. The location names may be unambiguously identified even when the location names may also have common word meanings that are not location associated or when the location name may be associated with more than one location. | 10-27-2011 |
20120117052 | WEB FORUM CRAWLING USING SKELETAL LINKS - A method and system for identifying informative links of a web site for use in crawling the web site is provided. A forum crawler analyzes sample web pages of a web forum to identify informative links and then crawls the web forum by following links determined to be informative and not following other links. The forum crawler system determines whether links are informative based on whether they are part of the overall structure of the web site or are used to select sequential information that has been split onto multiple web pages. | 05-10-2012 |