Patent application number | Description | Published |
20130188875 | Vector Graphics Classification Engine - A vector graphics classification engine and associated method for classifying vector graphics in a fixed format document is described herein and illustrated in the accompanying figures. The vector graphics classification engine defines a pipeline for categorizing vector graphics parsed from the fixed format document as font, text, paragraph, table, and page effects, such as shading, borders, underlines, and strikethroughs. Vector graphics that are not otherwise classified are designated as basic graphics. By sequencing the detection operations in a selected order, misclassification is minimized or eliminated. | 07-25-2013 |
20130191366 | Pattern Matching Engine - A pattern matching engine and associated method for detecting one or more of headers, footers, watermarks, page numbering, page colors, and page borders appearing in a fixed format document. The pattern matching engine performs pattern matching across pages of the fixed format document to identify repeating patterns. Using heuristic analysis, repeating patterns meeting selected criteria are classified as headers, footers, or watermarks. Filtering removes repeating patterns unlikely to represent headers, footers, or watermarks. The information produced by the pattern matching engine allows the repeating elements to be properly reconstructed as flowable elements when converting a fixed format document into a flow format document. | 07-25-2013 |
20130191715 | Borderless Table Detection Engine - A borderless table detection engine and associated method for identifying borderless tables appearing in data extracted from a fixed format document. Due to the lack of visible borders, reliable automated detection of a borderless table is difficult. The borderless table detection engine uses whitespace, rather than content, to detect borderless table candidates. Applying heuristic analysis, the borderless table detection engine discards borderless table candidates with a layout that lacks sufficient characteristics of a table and is unlikely to be a valid borderless table. | 07-25-2013 |
20130191732 | Fixed Format Document Conversion Engine - A fixed format document conversion engine and associated method for converting a fixed format document into a flow format document. The fixed format document conversion engine includes a sequence of layout analysis engines and semantic analysis engines to analyzes the base physical layout information obtained from the fixed format document to enrich, modify, and classify the physical layout information into progressively more advanced physical layout information and, ultimately, semantic layout information. The semantic layout information is mapped and serialized into a selected flow format document with a high level of flowability. | 07-25-2013 |
20140013215 | Paragraph Alignment Detection and Region-Based Section Reconstruction - A paragraph alignment detection engine and a section reconstruction engine. The paragraph alignment detection engine determines the paragraph alignment of a paragraph and updates the paragraph alignment property of the paragraph in the data store for single line and multi-line paragraphs. The paragraph alignment detection engine employs per paragraph comparisons and relative comparisons to other paragraphs to determine the paragraph alignment of a single line paragraph. The paragraph alignment detection engine employs per paragraph comparisons and relative comparisons of the lines of a paragraph to determine the paragraph alignment of a multi-line paragraph. The section reconstruction engine minimizes the number of sections created in the flow format document by identifying the columns on each page, combining contiguous pages with the same column layout into a single section, and creating alternative objects to contain regions associated special cases in lieu of creating additional sections. | 01-09-2014 |
20140208191 | Grouping Fixed Format Document Elements to Preserve Graphical Data Semantics After Reflow - Determining relationships between graphical elements in a fixed format document is provided. Graphical element sizes and their relative positions may be analyzed to determine whether two or more graphical elements should be aggregated together or whether the graphical elements should belong to different graphical groups. Graphs and figures comprising objects that are absolutely positioned may be detected, as well as objects where inter-element positions need to be preserved from regular document flow. Additionally, background objects may be differentiated from regular text flow when the objects overlap with text. | 07-24-2014 |
20140257789 | Detection and Reconstruction of East Asian Layout Features in a Fixed Format Document - Detection of East Asian layout features and reconstruction of East Asian layout features is provided. Vertically written text in the fixed format document is detected and rotated for layout analysis. After layout analysis, the rotated text is rotated back and restructured in a flow format document. When a plurality of characters is written horizontally in a vertical line of text, vertically overlapping text runs are detected, designated as horizontal-in-vertical text, and are restructured as horizontal-in-vertical text in a flow format document. Lines of text are analyzed for attributes of a ruby line and are designated as ruby text, associated with corresponding text in a ruby base line, and restructured as ruby text in a flow format document. Text in a fixed format document is analyzed for detection of a particular East Asian language so that a font for the language is designated in a flow format document. | 09-11-2014 |
20140258851 | Table of Contents Detection in a Fixed Format Document - Detection of table of contents entries in a fixed format document for reconstruction of table of contents entries in a flow format document is provided. One or more table of contents entries are detected in a fixed format document, and table of contents entry candidates are generated by grouping one or more lines containing suspected table of contents entries. Each grouping is compared to text contained in the fixed format document for locating matching headings, subheadings, and associated text in the fixed format document. After non-matching or false positive matches are discarded, headings found in the fixed format document matching headings contained in table of contents entry candidates are used to reconstruct table of contents entries in a table of contents page, area or section in a reconstructed flow format document. | 09-11-2014 |
20140258852 | Detection and Reconstruction of Right-to-Left Text Direction, Ligatures and Diacritics in a Fixed Format Document - Detection of right-to-left text direction, left-to-right text direction, ligatures and diacritics in fixed format documents for reconstruction of fixed format documents into flow format documents is provided. Each text run of a fixed format document is analyzed for directionality. If text runs contain ligatures, the ligatures are mapped to corresponding characters for proper reading order of the ligatures in context with other characters comprising a text run in which the ligatures are situated or neighboring the ligature. Each text run is collected based on determined text directionality for reconstruction in a flow format document. Proper text directionality for columns of text is determined in the same manner as proper text directionality for text runs in paragraphs of text. If diacritics are present in association with one or more characters or glyphs, a determination may be made as to a carrier character or glyph associated with each diacritic. | 09-11-2014 |