Patent application number | Description | Published |
20090049062 | Method for Organizing Structurally Similar Web Pages from a Web Site - Techniques are described for organizing structurally similar web pages for a website. Fingerprints are made of the structure of the web pages using shingling by placing the web page's HTML tags and attributes in sequence and encoding the tags and attributes using a standard encoding technique. Fixed-size portions of the encoded sequence are taken and a set of values extracted using independent hash functions to compute the shingles. Alternatively, a DOM tree representation of HTML of the web page is generated and each path of the DOM tree encoded and values extracted using independent hash functions to compute the shingles. A specified number of shingles are retained as the fingerprint. The pages are then clustered based upon the URL and the similarity of the shingles. The clustered hierarchal organization of pages is further pruned by various criteria including similarity of shingles or support of the cluster node in the hierarchy. | 02-19-2009 |
20090063538 | METHOD FOR NORMALIZING DYNAMIC URLS OF WEB PAGES THROUGH HIERARCHICAL ORGANIZATION OF URLS FROM A WEB SITE - Techniques are described for normalizing dynamic URLs using a hierarchical organization of a web site. Given web pages associated with a web site, an information extraction method is used to generate data structures that represent the content or structure of each of the web pages. These data structures are appended to the corresponding dynamic URLs. The modified URLs with the data structures are tokenized with the resulting tokens clustered to create a hierarchical organization. Nodes of the hierarchical organization may be merged based upon occurrence or patterns of content and structure. The merged hierarchical organization may then be pruned to remove irrelevant information and to reduce the memory footprint of the hierarchical organization. When a new dynamic URL is received, the new dynamic URL is matched to the hierarchical organization. Important parameters are taken into account and irrelevant information may be removed. Based upon the matching to the hierarchical organization, a normalized URL is returned. | 03-05-2009 |
20090171986 | TECHNIQUES FOR CONSTRUCTING SITEMAP OR HIERARCHICAL ORGANIZATION OF WEBPAGES OF A WEBSITE USING DECISION TREES - A decision tree may be determined that is a site map for a domain of web pages. A clustering of a plurality of web pages of a domain is determined, in an unsupervised fashion, based on content-related features of the plurality of web pages. Each determined cluster includes a plurality of web pages, each of the plurality of web pages characterized by a resource locator and each of the resource locators being characterized by at least one resource locator token. The clustering is processed to organize indications of the content-related features of the plurality of web pages into a decision tree characterized by a plurality of nodes, each node characterized by a feature and a value, the feature being at least one of the resource locator tokens and the value being a value of that resource locator token. | 07-02-2009 |
20090319481 | FRAMEWORK FOR AGGREGATING INFORMATION OF WEB PAGES FROM A WEBSITE - The present invention is directed towards systems and methods for extending media annotations using collective knowledge. The method according to one embodiment of the present invention comprises receiving a plurality of content items and associated annotations. The method further normalizes the plurality of associated annotations and calculates pair frequencies for the plurality of associated annotations. The method then retrieves a plurality of alternative annotations and provides the plurality of alternative annotations. | 12-24-2009 |
20100169311 | APPROACHES FOR THE UNSUPERVISED CREATION OF STRUCTURAL TEMPLATES FOR ELECTRONIC DOCUMENTS - A method and apparatus for creating templates for electronic documents is provided. One or more attributes are extracted, using a seed template, from a first document, such as a web page. A second document that contains a particular attribute, extracted from the first document, is identified. The second document may be in a different cluster than the first document. The second document is annotated, using an extracted attribute, to create an annotated document. The second document is annotated without human intervention. A new template for the annotated document is generated. The new template facilitates extraction of information from the annotated document. The new template may be used to extract additional attributes from all documents in the cluster of documents of which the second document is a member. The process may continue over numerous iterations to generate a large number of templates in an automated fashion. | 07-01-2010 |
20100241486 | REDUCING REVENUE RISK IN ADVERTISEMENT ALLOCATION - Methods, systems, and apparatuses are provided for selecting advertisements in an advertisement auction. A plurality of bids for an advertisement placement is received. An average expected payout for each bid of the plurality of bids is calculated to determine a plurality of average expected payouts. A plurality of possible allocations of the advertisements is determined. An expected revenue value for each of the possible allocations is calculated based on the calculated average expected payouts to generate a plurality of expected revenue values. A risk value is calculated for each of the possible allocations to generate a plurality of risk values. A bid of the plurality of bids is enabled to be selected based on the calculated expected revenue values and risk values. | 09-23-2010 |
20100250362 | System and Method for an Online Advertising Exchange with Submarkets Formed by Portfolio Optimization - A system and method to distribute computation for an exchange in which advertisers buy online advertising space from publishers. The exchange maintains submarkets, each containing a subset of the ad calls supplied by publishers and a subset of the offers and budgets representing demand from advertisers. Portfolio optimization techniques allocate the supply of ad calls from publishers over the submarkets, with the goal of maximizing profits for publishers while limiting the volatility of those profits. Portfolio optimization techniques allocate the demand from advertisers over the submarkets, with the goal of maximizing return on investment for advertisers. The exchange re-allocates supply and demand over submarkets periodically. Also, periodically, the most effective submarkets are replicated and the least effective submarkets are eliminated. | 09-30-2010 |
20110029477 | INFORMATION SIMILARITY AND RELATED STATISTICAL TECHNIQUES FOR USE IN DISTRIBUTED COMPUTING ENVIRONMENTS - Embodiments of methods, systems and/or apparatuses relating to data processing in distributed computing environments are disclosed. In particular, methods, systems, and/or apparatuses for determining information similarly and/or performing related statistical techniques which may be implemented or operated in a distributed computing environment are disclosed. | 02-03-2011 |
20110166927 | Dynamic Pricing Model For Online Advertising - The present invention provides methods and systems for use in association with an online advertising auction. Advertiser bid information may be obtained, including a maximum amount per impression and a target click through rate (“CTR”). Following serving, if a delivered CTR is equal to or greater than the target CTR, then pricing per impression is at the maximum amount. If, however, the delivered CTR is less than the target CTR, then pricing per impression is at an amount equal to the maximum amount per impression multiplied by the ratio of the delivered CTR to the target CTR. | 07-07-2011 |