Patent application number | Description | Published |
20080267497 | IMAGE SEGMENTATION AND ENHANCEMENT - Methods, apparatus, and machine-readable media for segmenting and enhancing images are described. In one aspect, gradient magnitude values at respective pixels of a given image are determined. The gradient magnitude values are thresholded with a global threshold to produce thresholded gradient magnitude values. The pixels are segmented into respective groups in accordance with a watershed transform of the thresholded magnitude values. A classification record is generated. The classification record labels as background pixels ones of the pixels segmented into one of the groups determined to be largest in size and labels as non-background pixels ones of the pixels segmented into any of the groups except the largest group. | 10-30-2008 |
20100189345 | System And Method For Removing Artifacts From A Digitized Document - A system and method is disclosed for removing artifacts from a digitized document. The method discloses receiving a digitized document, having an image format, and including content and an artifact; identifying a content boundary within the digitized document; enhancing the digitized document after identifying the content boundary; and removing the artifact by cropping the digitized document to the content boundary after enhancing the digitized document The system discloses a processor configured to operate a series of functional modules, including: a means for receiving a digitized document, having an image format, and including content and an artifact; a content boundary identification module, for identifying a content boundary within the digitized document; an image enhancement module, for enhancing the digitized document after identifying the content boundary; and a content cropping module, for removing the artifact by cropping the digitized document to the content boundary after enhancing the digitized document. | 07-29-2010 |
20100225937 | IMAGED PAGE WARP CORRECTION - A method of correcting warp on an imaged page includes generating projection profiles for pixels on the imaged page and determining a reference baseline based on the projection profiles; calculating a deviation away from the reference baseline for points along a boundary; and mapping the points along the boundary to the reference baseline. | 09-09-2010 |
20120102388 | TEXT SEGMENTATION OF A DOCUMENT - A system and method are provided for segmenting text from a portable document format (PDF) document. The system includes a memory for storing computer executable instructions and a processing unit for accessing the memory and executing the computer executable instructions. The computer executable instructions include an engine to group line segments into text blocks using a homogeneity measure based on relative line space difference between line segments and a homogeneity measure based on difference in font size between line segments, where the line segments comprise text elements extracted from the PDF document. | 04-26-2012 |
20120275694 | System and Method of Foreground-background Segmentation of Digitized Images - A system and method for segmenting foreground and background regions on a digitized image uses a computer, having a processor and system memory, to segment the image into initial regions and identify background regions from the initial regions. A complete background surface is estimated of the image, and pixels of the image are rectified with the estimated background surface to normalize the image. Normalized pixels are compared with a threshold color to determine a final segmentation of background regions. | 11-01-2012 |
20120303636 | System and Method for Web Content Extraction - A method and system for extracting Web content is disclosed. In one embodiment, Web content in a Webpage is extracted by identifying paragraphs in the Web content based on line-break node determination. A range of text-body associated with the identified paragraphs is then identified using a maximum scoring subsequence. Further, the identified text-body is refined using a heuristic rule of substantially horizontal alignment. Furthermore, one or more titles and one or more images associated with the Web content are extracted. Moreover, the Web content including the identified paragraphs, the one or more titles and the one or more images are outputted. | 11-29-2012 |
20130091150 | DETERMIINING SIMILARITY BETWEEN ELEMENTS OF AN ELECTRONIC DOCUMENT - Disclosed is a computer-implemented method of determining smarty between first and second elements of an electronic document. The method uses a computer to calculate a plurality of measures of similarity between the first and second elements in at least two representations of the electronic document. A computer program product and system implementing this method are also disclosed. | 04-11-2013 |
20130114105 | Semantically Ranking Content in a Website - Semantically ranking content in a website ( | 05-09-2013 |
20130124684 | VISUAL SEPARATOR DETECTION IN WEB PAGES USING CODE ANALYSIS - A method for detection of visual separators in web pages using code analysis includes receiving a web page and its associated web code by a web page analysis device and analyzing the web code to detect visual separators in the web page. A web page analysis device for visual separator detection in web pages is also provided. | 05-16-2013 |
20130124953 | PRODUCING WEB PAGE CONTENT - A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content. | 05-16-2013 |
20130145255 | SYSTEMS AND METHODS FOR FILTERING WEB PAGE CONTENTS - A system and method for selectively filtering web page contents are disclosed. In one example embodiment a document object model (DOM) structure and visual information of the web page contents are generated. The document object model (DOM) structure and the visual information are analyzed to determine multiple web page content attributes. One or more filtering parameters are selected from the multiple web page content attributes. The web page is filtered based on the one or more filtering parameters. | 06-06-2013 |
20130159889 | Obtaining Rendering Co-ordinates Of Visible Text Elements - A computer-implemented method for obtaining the rendering co-ordinates of visible text elements on a web page is disclosed. The web page is represented by an input data structure comprising a plurality of text nodes, each of which represents a text element on the web page. The method comprises the following steps:
| 06-20-2013 |
20130205202 | Transformation of a Document into Interactive Media Content - Systems and methods are provided for transforming a document into interactive media content. A system can include a memory for storing computer executable instructions and a processing unit for accessing the memory and executing the computer executable instructions. The computer executable instructions can include an engine to generate a dynamic composition of the text blocks and visual blocks of the document, based on semantic features of the text blocks and the visual blocks, to provide the interactive media content. | 08-08-2013 |
20130275854 | Segmenting a Web Page into Coherent Functional Blocks - Segmenting a web page ( | 10-17-2013 |
20130283148 | Extraction of Content from a Web Page - A system and method are provided for extracting main content from a web page. Web page segmentation is performed on a web page to provide affinity-grouped segments. Descriptive features of at least one of the affinity-grouped segments are computed. At least one of the affinity-grouped segments is classified as a main body segment based on the computed descriptive features. Additional affinity-grouped segments are classified as to a document function based on the computed descriptive features. Classified affinity-grouped segments are assembled according to their classified document functions to provide the main content. | 10-24-2013 |