Class / Patent application number | Description | Number of patent applications / Date published |
382177000 | Segmenting individual characters or words | 75 |
20080205759 | Distortion Correction of a Scanned Image - Disclosed are embodiments of systems and methods for eliminating or reducing the distortion in a scanned image. In embodiments, the image is segmented into foreground and background pixels. Foreground pixels may be grouped into “letters.” Using index-based searching, “letters” may be grouped into “words” and “words” may be grouped into baselines. One or more dominant baselines may be selected and the characteristics of the dominant baseline or baselines may be used to unwarp the image. | 08-28-2008 |
20080267503 | Increasing Retrieval Performance of Images by Providing Relevance Feedback on Word Images Contained in the Images - An interactive system provides for increasing retrieval performance of images depicting text by allowing users to provide relevance feedback on words contained in the images. The system includes a user interface through which the user queries the system with query terms for images contained in the system. Word image suggestions are displayed to the user through the user interface, where each word image suggestion contains the same or slightly variant text as recognized from the word image by the system than the particular query terms. Word image suggestions can be included in the system by the user to increase system recall of images for the one or more query terms and can be excluded from the system by the user to increase precision of image retrieval results for particular query terms. | 10-30-2008 |
20080292186 | WORD RECOGNITION METHOD AND WORD RECOGNITION PROGRAM - A word recognition method of performing recognition processing with respect to each word candidate obtained by reading characters in character information written in a reading material is provided. This word recognition method includes a matching processing step of collating each word candidate with a plurality of words in a word dictionary and calculating, every word, a matching score indicative of a degree that each word candidate matches with a word, a character quality score calculating step of calculating a character quality score indicative of a degree that a character candidate constituting each word candidate matches with an arbitrary character, and a correcting step of correcting a matching score obtained at the matching processing step based on a character quality score acquired at the character quality score calculating step. | 11-27-2008 |
20080304746 | METHOD AND APPARATUS FOR CHARACTER STRING RECOGNITION - To provide a method and apparatus for character string recognition that enables improvement in accuracy of character recognition while maintaining high-speed operation performance in character recognition. | 12-11-2008 |
20090060335 | SYSTEM AND METHOD FOR CHARACTERIZING HANDWRITTEN OR TYPED WORDS IN A DOCUMENT - A method of characterizing a word image includes traversing the word image stepwise with a window to provide a plurality of window images. For each of the plurality of window images, the method includes splitting the window image to provide a plurality of cells. A feature, such as a gradient direction histogram, is extracted from each of the plurality of cells. The word image can then be characterized based on the features extracted from the plurality of window images. | 03-05-2009 |
20090060336 | Document image processing apparatus, document image processing method and computer readable medium - A document image processing apparatus includes an specifying section, an extracting section, a recognizing section, an interpreting section, an arranging section and a generating section. The specifying section specifies a sentence region including a character row from a document image. The extracting section extracts at least one of character row images included in the specified sentence region. The recognizing section recognizes respective characters included in the extracted character row image. The interpreting section interprets an original sentence character row comprising the recognized characters and generates an interpreted sentence character row. The arranging section arranges the respective character row images in the sentence region by contracting the respective character row images. The arranging section arranges the generated respective interpreted sentence character rows in a vacant region except a region arranging the respective character row images from the sentence region. | 03-05-2009 |
20090080775 | IMAGE-PROCESSING APPARATUS WHICH HAS AN IMAGE REGION DISTINCTION PROCESSING CAPABILITY, AND AN IMAGE REGION DISTINCTION PROCESSING METHOD - In an image-processing apparatus having a capability of performing region distinction processing and an image region discrimination processing method, a first region distinction unit uses a previously set threshold value for an image region distinction to perform a region distinction processing of a character and a non-character on image data read from an original document, an edge feature amount image and a character determination signal are obtained, a second region distinction unit makes a region distinction on the edge feature amount image based on the threshold value and generates and displays sub-region images obtained by dividing the edge feature amount image into plural parts, a character discrimination strength adjustment is performed on a display screen while each of the sub-region images is visually identified, the correction parameter is reflected in the edge feature amount image, and the region distinction processing is performed again. | 03-26-2009 |
20090103808 | CORRECTION OF DISTORTION IN CAPTURED IMAGES - An image processing method comprises analysing an image of a portion of text, and detecting the inter-line spacing and the inter-word spacing across the area of the image. Based on the inter-line and inter-word spacings, a quadrilateral shape is derived which represents the deformation of the text image from an undistorted image. The image is modified to perform perspective correction based on the derived quadrilateral. | 04-23-2009 |
20090129676 | Segmenting a String Using Similarity Values - Disclosed are systems and methods for segmenting a string comprised of one or more string segments using similarity values. In embodiments, each string segment may contain at least a variation of a marker string that may be used to separate string segments in the string. In embodiments, a similarity value representing the result of comparing the marker string to substrings of the string may be computed, and a similarity vector representing the set of comparisons for the locations on the string may be generated. In embodiments, the similarity vector may be used to identify candidate segmentation locations in the string. In embodiments, a set of segmentation locations in the string may be derived from the candidate segmentation locations in the string, and the string may be segmented according to the set of segmentation locations. | 05-21-2009 |
20090161955 | METADATA DETERMINATION METHOD AND IMAGE FORMING APPARATUS - A method for extracting a character string from print data rasterizes the print data into a raster image. Then, the method divides the raster image into a character region and non-character region and determines character data used for metadata based on the raster image of the character region and character data extracted from the print data and drawn at approximately the same position as the character region. | 06-25-2009 |
20090169106 | Method and Computer Program Product for Recognition Error Correction Data - A method for altering a recognition error correction data structure, the method includes: altering at least one key out of a set of semantically similar keys in response to text appearance probabilities of keys of the set of semantically similar keys to provide an at least one altered key; and replacing the at least one key by the at least one altered key. | 07-02-2009 |
20090202151 | FORMAT PROCESSING APPARATUS FOR DOCUMENT IMAGE AND FORMAT PROCESSING METHOD FOR THE SAME - An image processing apparatus of an embodiment of the invention includes a character region characteristic determination unit to identify a character region of an image and to output a character region characteristic determination signal, a character region image separation unit to separate, based on the character region characteristic determination signal, the image into at least two attribute regions, that is, plural character region images and an other region image, and a separated image processing unit to process each of the plural character region images and the other region image, and in at least the separated image processing unit, according to a characteristic of each of the plural character region images, at least one process of a compression method, a compression ratio, a resolution, and a multi-value number for at least one of the character region images is different from a process of the other region image or the other character region image. | 08-13-2009 |
20090290797 | IMAGE PROCESSING FOR STORING OBJECTS SEPARATED FROM AN IMAGE IN A STORAGE DEVICE - An image processing apparatus has a separation unit for separating objects constituting an image input by an image input unit, a setting unit for setting a criterion to determine whether or not a separated object is stored, and a determination unit for determining whether the separated object is stored based on the criterion set by the setting unit. The image processing apparatus also has a unit for displaying the separated object, responding to a user access via an interface unit, when the separated object is determined to be stored by the determination unit and storing the separated object such that the separated object can be reused. | 11-26-2009 |
20090324081 | METHOD AND APPARATUS FOR RECOGNIZING CHARACTER IN CHARACTER RECOGNIZING APPARATUS - Disclosed is a method and an apparatus for recognizing a character and efficiently removing a misrecognized character. The method includes detecting character regions including at least one character in an input image, converting the input image into a binary image, discriminating the characters from a non-character, re-classifying the character region including a number of characters equal to or less than a threshold into a non-character region, and outputting only the characters present in the character region. | 12-31-2009 |
20100008581 | WORD DETECTION METHOD AND SYSTEM - A method of characterizing a word image includes traversing the word image in steps with a window and at each of a plurality of the steps, identifying a window image. For each of the plurality of window images, a feature is extracted. The word image is characterized, based on the features extracted from the plurality of window images, wherein the features are considered as a loose collection with associated sequential information. | 01-14-2010 |
20100008582 | METHOD FOR RECOGNIZING AND TRANSLATING CHARACTERS IN CAMERA-BASED IMAGE - A method for recognizing an image photographed by a camera and translating characters in connection with an electronic dictionary is provided. The method includes directly selecting an area to be recognized from the photographed character image and performing character recognition, translating and recognizing characters of a user's selected word in connection with dictionary data, and displaying translation result information of user's selected character or word in connection with dictionary data on a screen device. The recognition includes providing information on location of the selected character image area and location of the recognized character string words to the user, and then translating a character string or word in a location area selected by the user. The electronic dictionary-connected search and translation is for searching the character or word selected in connection with the electronic dictionary database, and providing translation result to the user. | 01-14-2010 |
20100034461 | METHOD AND APPARATUS FOR GENERATING MEDIA SIGNAL - A method of generating a media signal is provided. The method detects a pattern indicating a request for a media signal to be generated from an input image, extracts a region identified by the detected pattern and generates the media signal for the extracted region. | 02-11-2010 |
20100040287 | Segmenting Printed Media Pages Into Articles - Methods and systems for segmenting printed media pages into individual articles quickly and efficiently. A printed media based image that may include a variety of columns, headlines, images, and text is input into the system which comprises a block segmenter and a article segmenter system. The block segmenter identifies and produces blocks of textual content from a printed media image while the article segmenter system determines which blocks of textual content belong to one or more articles in the printed media image based on a classifier algorithm. A method for segmenting printed media pages into individual articles is also presented. | 02-18-2010 |
20100054599 | DOCUMENT PROCESSING APPARATUS, DOCUMENT PROCESSING METHOD, AND COMPUTER READABLE MEDIUM - A document processing apparatus includes: a character segmentation unit that segment a plurality of character images from a document image; a character image classifying unit that classifies the character images to categories corresponding to each of the character images; an average character image obtaining unit that obtains average character images for each of the categories of the character images classified by the character image classifying unit; a character recognizing unit that performs a character recognition to a character contained in each of the average character images; and an output unit that outputs character discriminating information as a character recognition result obtained by the character recognizing unit. | 03-04-2010 |
20100104188 | Systems And Methods For Defining And Processing Text Segmentation Rules - Computer-implemented methods and systems are provided for text segmentation of textual data. Rules are accessed that define how the input stream is to be segmented into textual data elements through pattern matching. The one or more rules are applied to the input stream to determine the textual data elements in the input stream which are then provided as output. | 04-29-2010 |
20100177964 | TRIGGERING ACTIONS IN RESPONSE TO OPTICALLY OR ACOUSTICALLY CAPTURING KEYWORDS FROM A RENDERED DOCUMENT - A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user. | 07-15-2010 |
20100189352 | Classifying an Input Character - A method for classifying an input character is disclosed. Character models are used. Each character model is associated with an output character and defines a model specific segmentation scheme for that output character and an associated segment model. The model specific segmentation scheme defines a minimum length corresponding to a number of points in a stroke of the output character and a minimum length threshold. Using each of the character models, the input character is decomposed into segments and the segments are evaluated against the segment model of the respective character model to produce a score indicative of the conformity of the segments with the segment model. The character model that produced the highest score is selected and the input character is classified as the output character associated with the character model that produces the highest score. | 07-29-2010 |
20100208996 | DEVICES AND METHODS FOR RESTORING LOW-RESOLUTION TEXT IMAGES - A system that extracts text from an image includes a capture device that captures the image having a low resolution. An image segmentation subsystem partitions the image into image segments. An image restoration subsystem generates a resolution-expanded image from the image segments and negates degradation effects of the low-resolution image by transforming the image segments from a first domain to a second domain and deconvolving the transformed image segments in the second domain to determine parameters of the low-resolution image. A text recognition subsystem transforms the restored image data into computer readable text data based on the determined parameters. | 08-19-2010 |
20100215270 | System and Methods for Automatically Accessing a Web Site on Behalf of a Client - A system for performing an automated network-based login procedure on an interactive keypad image includes a software agent executable from a digital medium connected to the network for navigating to a login page, accessing the keypad image, and performing an automated login, and an automated login support application executable from the same or a different digital medium connected to the network, the support application including at least an image processor, an optical character recognizer, and an image data encoder and decoder. The software agent performs a login at the virtual keypad image based on character image matching and location information acquisition for each character of a client's specific set of credential characters included in the image of the keypad. | 08-26-2010 |
20100278427 | METHOD AND SYSTEM FOR PROCESSING TEXT - The present invention provides a method and system for text processing. The method comprises determining at least a part of characters in a text; dividing the text into a plurality of text segments by using the at least a part of characters as separators; and decoding the plurality of text segments respectively. | 11-04-2010 |
20100278428 | APPARATUS, METHOD AND PROGRAM FOR TEXT SEGMENTATION - There is provided an apparatus including a model based topic segmentation section that segments a text using a topic model representing semantic coherence, a parameter estimation section that estimates a control parameter used in segmenting the text based on detection of a change point of word distribution in the text, using the result of segmentation by the model based topic segmentation unit as training data, and a change point detection topic segmentation section that segments the text, based on detection of the change point of word distribution in the text, using the parameter estimated by the parameter estimation section (FIG. | 11-04-2010 |
20110116715 | Computer-Implemented System And Method For Recognizing Patterns In A Digital Image Through Document Image Decomposition - A computer-implemented system and method for retrieving a digital image through document image decomposition is provided. A stored digital image is retrieved. Generic visual features are extracted. The features are grouped into a primitive layer including word-graphs that each include words and features. The words are grouped into a layout layer including zone hypotheses that each include one or more of the words. Causal dependencies between the word-graphs and the zone hypotheses are expressed through zone models that include a joint probability defining a pair of probabilistic models generated through a learned binary edge classifier. Each pair of probabilistic models is expressed as an optimal set selection problem including a set of cost functions and constraints. The optimal set selection problem is evaluated through a heuristic search of the cost functions and constraints and a non-overlapping optimal set of the zone hypotheses is provided that characterize the stored digital image. | 05-19-2011 |
20110150335 | Triggering Actions in Response to Optically or Acoustically Capturing Keywords from a Rendered Document - A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user. | 06-23-2011 |
20110170777 | TIME-SERIES ANALYSIS OF KEYWORDS - Processing for a time-series analysis of keywords comprises clustering or classifying pieces of document data, each of which is description of a phenomenon in a natural language, on the basis of frequencies of occurrence of keywords in the pieces of document data, individual keywords being also clustered or classified by clustering or classifying the pieces of document data, and performing a time-series analysis of frequencies of occurrence of pieces of document data containing individual keywords in clusters or classes into which the pieces of document data are clustered or classified or a time-series analysis of frequencies of occurrence of pieces of document data containing clusters or classes into which the individual keywords are clustered or classified. Frequency distribution showing variation of the frequencies of occurrence of the pieces of document data is acquired by the time-series analysis. | 07-14-2011 |
20110182513 | WORD-BASED DOCUMENT IMAGE COMPRESSION - Locations of word images corresponding to words in a document image are ascertained. The word images are grouped into clusters. For each of multiple of the clusters, a respective compressed word image cluster is determined based on a joint compression of respective ones of the word images that are grouped into the cluster. The positions of the word images in the document image are associated with the respective ones of the compressed word image clusters corresponding to the clusters respectively containing the word images. | 07-28-2011 |
20110243445 | DETECTING POSITION OF WORD BREAKS IN A TEXTUAL LINE IMAGE - Line segmentation in an OCR process is performed to detect the positions of words within an input textual line image by extracting features from the input to locate breaks and then classifying the breaks into one of two break classes which include inter-word breaks and inter-character breaks. An output including the bounding boxes of the detected words and a probability that a given break belongs to the identified class can then be provided to downstream OCR or other components for post-processing. Advantageously, by reducing line segmentation to the extraction of features, including the position of each break and the number of break features, and break classification, the task of line segmentation is made less complex but with no loss of generality. | 10-06-2011 |
20110249897 | CHARACTER RECOGNITION - Systems and methods for character recognition by performing lateral view-based analysis on the character data and generating a feature vector based on the lateral view-based analysis. | 10-13-2011 |
20110268360 | WORD RECOGNITION OF TEXT UNDERGOING AN OCR PROCESS - A method for identifying words in a textual image undergoing optical character recognition includes receiving a bitmap of an input image which includes textual lines that have been segmented by a plurality of chop lines. The chop lines are each associated with a confidence level reflecting a degree to which the respective chop line properly segments the textual line into individual characters. One or more words are identified in one of the textual lines based at least in part on the textual lines and a first subset of the plurality of chop lines which have a chop line confidence level above a first threshold value. If the first word is not associated with a sufficiently high word confidence level, at least a second word in the textual line is identified based at least in part on a second subset of the plurality of chop lines which have a confidence level above a second threshold value lower than the first threshold value. | 11-03-2011 |
20110274354 | SEGMENTATION OF A WORD BITMAP INTO INDIVIDUAL CHARACTERS OR GLYPHS DURING AN OCR PROCESS - An image processing apparatus is provided that includes a character chopper component that segments words into individual characters in a bitmap of a textual image undergoing an OCR process. The Character chopper component is configured to produce a set of (possibly curved) chop-lines which divide a bitmap of any given word into its individual character or glyph candidates. Cases where an input bitmap contains two separate words are handled by marking a place where those words should be split. The character segmentation algorithm computes the set of vertically oriented, curved chop-lines by considering glyph and background colors in a given word bitmap. The set is filtered afterwards using various heuristics, in order to preserve those lines that indeed do separate a word's glyphs and minimize the number of those that do not. | 11-10-2011 |
20110280481 | USER CORRECTION OF ERRORS ARISING IN A TEXTUAL DOCUMENT UNDERGOING OPTICAL CHARACTER RECOGNITION (OCR) PROCESS - An electronic model of the image document is created by undergoing an OCR process. The electronic model includes elements (e.g., words, text lines, paragraphs, images) of the image document that have been determined by each of a plurality of sequentially executed stages in the OCR process. The electronic model serves as input information which is supplied to each of the stages by a previous stage that processed the image document. A graphical user interface is presented to the user so that the user can provide user input data correcting a mischaracterized item appearing in the document. Based on the user input data, the processing stage which produced the initial error that gave rise to the mischaracterized item corrects the initial error. Stages of the OCR process subsequent to this stage then correct any consequential errors arising in their respective stages as a result of the initial error. | 11-17-2011 |
20120020561 | METHOD AND SYSTEM FOR OPTICAL CHARACTER RECOGNITION USING IMAGE CLUSTERING - The present disclosure provides a computer-implemented method of translating an image-based electronic document into a text-based electronic document. The method includes electronically scanning an image-based document to determine positions of word images in the image-based document. The method also includes extracting the word images from the image-based document and storing the word images to an electronic storage device. The method also includes grouping a subset of the word images into a word cluster based on a similarity of the word images, wherein the word images in the word cluster correspond to a same actual word. The method also includes generating a character-encoded transcription for the word cluster based on the word images in the word cluster. The method also includes adding the character-encoded transcription to a text-based electronic document at locations corresponding to the positions of the word images in the image-based document. | 01-26-2012 |
20120099791 | Straightening Out Distorted Text Lines of Images - A method for correcting distortions in a scanned image of a page, paragraph, sentence or other portion of text is disclosed. The method comprises identifying at least one set of collinear elements in the scanned image; and generating a corrected image based on the scanned image including for at least some of the collinear elements in each set applying a spatial location correction to position all collinear elements in the set on a common horizontal rectilinear base line in the corrected image. | 04-26-2012 |
20120128249 | SCRIPT-AGNOSTIC TEXT REFLOW FOR DOCUMENT IMAGES - Script-agnostic text reflow technique embodiments are presented that generally reflow text found in an image of a document in a manner that functions across multiple scripts, multiple fonts of a script and multiple languages using the same script. This generally involves segmenting regions of text in a document image into individual words and doing this without relying on any script-specific characteristics or requiring any form of character recognition. While segmenting text, the possible presence of accents, diacritics and punctuation marks is considered. | 05-24-2012 |
20120281919 | METHOD AND SYSTEM FOR TEXT SEGMENTATION - A method and system for segmenting a text into a plurality of sections is provided. The text may be received in the form of an image. The method involves receiving one or more input labels from a user corresponding to one or more segmentation points of a plurality of segmentation points of the text. The plurality of segmentation points of the text are obtained by applying one or more segmentation heuristics over the text. The one or more input labels provided by the user are utilized to label the plurality of segmentation points of the text. In response to labeling, validation is performed to identify whether a segmentation point of the plurality of segmentation points is a valid segmentation point. Thereafter, based on the validation, a set of valid segmentation points is updated with one or more segmentation points of the plurality of segmentation points. The set of valid segmentation points facilitates segmentation of the text for recognizing the plurality of sections. | 11-08-2012 |
20120308135 | METHODS FOR DIGITAL MAPPING AND ASSOCIATED APPARATUS - A method comprises extracting a local identifier ( | 12-06-2012 |
20120321189 | SYSTEMS AND METHODS FOR AUTOMATED EXTRACTION OF MEASUREMENT INFORMATION IN MEDICAL VIDEOS - Systems and methods providing automated extraction of information contained in video data and uses thereof are described. In particular, systems and associated methods are described that provide techniques for extracting data embedded in video, for example measurement-value pairs of medical videos, for use in a variety of applications, for example video indexing, searching and decision support applications. | 12-20-2012 |
20130094760 | AUTOMATIC IDENTIFICATION OF DIGITAL CONTENT RELATED TO A BLOCK OF TEXT, SUCH AS A BLOG ENTRY - A system for identifying digital content related to a portion of a block of text receives, automatically or via input by a user, an indication of one or more words included in the block of text. The system searches a database of digital content based on the one or more words and retrieves from the database one or more digital content items or identifiers of digital content items that are related to the one or more words. The system provides the retrieved digital content items or identifiers to the user, and receives a selection of one or more of the provided items or identifiers from the user. The system associates for display or replay the one or more selected digital content items with the one or more words in the block of text. Other embodiments of the system are also disclosed. | 04-18-2013 |
20130108159 | METHOD AND APPARATUS FOR AUTOMATICALLY IDENTIFYING CHARACTER SEGMENTS FOR CHARACTER RECOGNITION | 05-02-2013 |
20130108160 | CHARACTER RECOGNITION DEVICE, CHARACTER RECOGNITION METHOD, CHARACTER RECOGNITION SYSTEM, AND CHARACTER RECOGNITION PROGRAM | 05-02-2013 |
20130136359 | SEGMENTATION OF TEXTUAL LINES IN AN IMAGE THAT INCLUDE WESTERN CHARACTERS AND HIEROGLYPHIC CHARACTERS - An image processing apparatus segments Western and hieroglyphic portions of textual lines. The apparatus includes an input component that receives an input image having at least one textual line. The apparatus also includes an inter-character break identifier component that identifies candidate inter-character breaks along a textual line and an inter-character break classifier component. The inter-character break classifier component classifies each of the candidate inter-character breaks as an actual break, a non-break or an indeterminate break based at least in part on the geometrical properties of each respective candidate inter-character break and the bounding boxes adjacent thereto. A character recognition component recognizes the candidate characters based at least in part on a feature set extracted from each respective candidate character that can be histogram features, Gabor features or any other feature set applicable to character recognition. A Western and hieroglyphic text classifier component finds and classifies textual line segments as Western text segments or hieroglyphic text segments and further passes the recognition results to an output component. | 05-30-2013 |
20130170751 | METHODS AND DEVICES FOR PROCESSING SCANNED BOOK'S DATA - A method for processing data of a scanned book having a plurality of pages is disclosed. The method includes obtaining page image data from a page. The method further includes segmenting and recognizing the page image data to obtain locations of rectangular boxes corresponding to the respective characters and text codes for the respective characters. The method also includes obtaining respective aggregated character line information for each line of characters. The method further includes adjusting the rectangular boxes in accordance with the obtained aggregated character line information. | 07-04-2013 |
20130251263 | VARIABLE GLYPH SYSTEM AND METHOD - Using methods, computer-readable storage media, and apparatuses for computer-implemented processing, a passage of text may be variably rendered. For each glyph in the passage of text, a glyph representation is varied according to a geometric transformation that was determined from statistical measurements of at least one geometric property from an ensemble of representations of the current glyph. Each varied glyph representation is included in renderable output data, such that when the passage of text is rendered to an output device, a given rendered representation of a given glyph subtly differs from other rendered representations of the given glyph. | 09-26-2013 |
20130259378 | METHODS AND SYSTEMS FOR ASSESSING THE QUALITY OF AUTOMATICALLY GENERATED TEXT - A set of ordered characters is received in association with information specifying the locations of the characters within the image of the document. Language-conditional character probabilities for each character are determined based on a set of language models and the ordering of the characters. Neighbor characters associated with a target character are identified based on the locations of the characters. Language-conditional character probabilities associated with the neighbor characters and language-conditional character probabilities associated with the target character are combined to generate a local language-conditional likelihood associated with the target character, the local language-conditional likelihood representing a concordance of the target character to a language model. | 10-03-2013 |
20140023273 | TRELLIS BASED WORD DECODER WITH REVERSE PASS - Systems, apparatuses, and methods to relate images of words to a list of words are provided. A trellis based word decoder analyses a set of OCR characters and probabilities using a forward pass across a forward trellis and a reverse pass across a reverse trellis. Multiple paths may result, however, the most likely path from the trellises has the highest probability with valid links. A valid link is determined from the trellis by some dictionary word traversing the link. The most likely path is compared with a list of words to find the word closest to the most. | 01-23-2014 |
20140105496 | System and Method for Selecting Segmentation Parameters for Optical Character Recognition - A computer-implemented method for selecting at least one segmentation parameter for optical character recognition is provided. The method can include receiving an image having a character string that includes one or more characters. The method can also include receiving a character string identifying each of the one or more characters. The method can also include automatically generating at least one segmentation parameter. The method can also include performing segmentation on the image having the character string using the at least one segmentation parameter. The method can also include determining if a resultant segmentation satisfies one or more criteria and if the resultant segmentation satisfies the one or more criteria, selecting the at least one segmentation parameter. | 04-17-2014 |
20140105497 | System and Method for Selecting and Displaying Segmentation Parameters for Optical Character Recognition - A computer-implemented method for selecting at least one segmentation parameter for optical character recognition is provided. The method can include receiving an image having a character string that includes one or more characters. The method can also include receiving a character string identifying each of the one or more characters. The method can also include automatically generating at least one segmentation parameter. The method can also include performing segmentation on the image having the character string using the at least one segmentation parameter. The method can also include determining if a resultant segmentation satisfies one or more criteria and if the resultant segmentation satisfies the one or more criteria, selecting the at least one segmentation parameter. | 04-17-2014 |
20140219562 | SYSTEM AND METHODS FOR ARABIC TEXT RECOGNITION BASED ON EFFECTIVE ARABIC TEXT FEATURE EXTRACTION - A method for automatically recognizing Arabic text includes building an Arabic corpus comprising Arabic text files written in different writing styles and ground truths corresponding to each of the Arabic text files, storing writing-style indices in association with the Arabic text files, digitizing an Arabic word to form an array of pixels, dividing the Arabic word into line images, forming a text feature vector from the line images, training a Hidden Markov Model using the Arabic text files and ground truths in the Arabic corpus in accordance with the writing-style indices, and feeding the text feature vector into a Hidden Markov Model to recognize the Arabic words. | 08-07-2014 |
20140270526 | METHOD FOR SEGMENTING TEXT WORDS IN DOCUMENT IMAGES - A word segmentation method for processing a document image applies clustering analysis to the spacing segments of a line. The spacing segments are generated by thresholding a one-dimensional vertical projection profile of the line. Taking advantage of the bimodal distribution of spacing length distribution of text lines, a k-means clustering algorithm is used, with the number of clusters pre-set to two, to classify the spacing segments as either character spacing or word spacing. Moreover, k-means++ initialization is used to enhance performance of cluster analysis. The clustering result such as cluster centers and compactness is used to prune single-word text line, single table item, etc. The locations of the word spacing segments are then used to segment the line of text into words. | 09-18-2014 |
20140294302 | TRIGGERING ACTIONS IN RESPONSE TO OPTICALLY OR ACOUSTICALLY CAPTURING KEYWORDS FROM A RENDERED DOCUMENT - A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user. | 10-02-2014 |
20150071542 | AUTOMATED REDACTION - In embodiments, one or more computer-readable media may have instructions stored thereon which, when executed by a processor of a computing device provide the computing device with a redaction module. The redaction module may be configured to receive a request to redact a selection of text from a document and identify instances of the text occurring within the document through an analysis of word coordinate information of an image of the document. The redaction module may further be configured to generate redaction information, including redaction coordinates, the redaction coordinates may be based on the word coordinate information associated with respective instances of the text occurring within the document. The redactions, when applied to the image in accordance with the redaction coordinates, may redact the respective instances of the text. Other embodiments may be described and/or claimed. | 03-12-2015 |
20150146982 | METHODS AND APPARATUS RELATING TO TEXT ITEMS IN IMAGES - A method and an electronic device are provided for obtaining an image or a video frame, including applying to the image or the video frame, at least one image processing technique, scanning the image or the video frame, to identify a text item, determining an item type for the identified text item, and determining an action, corresponding to the item type. | 05-28-2015 |
20150324355 | TRIGGERING ACTIONS IN RESPONSE TO OPTCALLY OR ACOUSTICALLY CAPTURING KEYWORDS FROM A RENDERED DOCUMENT - A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user. | 11-12-2015 |
20150356365 | OPTICAL CHARACTER RECOGNITION METHOD - The optical character recognition method applies a first OCR engine to provide an identification of characters of at least a first type of characters and zones of at least a second type of characters in the character string image. A second OCR engine is applied on the zones of the at least second type of characters to provide an identification of characters of a second type of characters. The characters identified by the first OCR engine and by the second OCR engine are in a further step combined to obtain the identification of the characters of the character string image. | 12-10-2015 |
20160063340 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD AND STORAGE MEDIUM - According to one embodiment, an information processing apparatus includes an image acquisition module, an elevation-angle acquisition module, a character deformation specification module, a character detection dictionary storage, a character detection dictionary selector and a character detector. The elevation-angle acquisition module is configured to acquire an elevation angle of a photographic device assumed when the photographic device has obtained an acquired image. The character deformation specification module is configured to specify how an appearance of the character in the acquired image is deformed, based on the acquired elevation angle. | 03-03-2016 |
20160092745 | IMAGE PROCESSING APPARATUS AND IMAGE PROCESSING METHOD - When there is a possibility that a third character region is redundantly selected in both a case where the line extraction process is performed starting from a first character region and a case where the line extraction process is performed starting from a second character region located in a line different from a line containing the first character region, the line recognition unit determines which line to incorporate the third character region in, by comparing a case of incorporating the third character region into the line starting with the first character region, with a case of incorporating the third character region into the line starting with the second character region. | 03-31-2016 |
20160180164 | METHOD FOR CONVERTING PAPER FILE INTO ELECTRONIC FILE | 06-23-2016 |
20160188990 | METHOD AND SYSTEM FOR RECOGNIZING CHARACTERS - The present disclosure relates to a method and a system for recognizing characters. In one embodiment, the input image comprising one or more characters to be recognized is received and processed to extract one or more nodes and edges of each character in the input image. Using the extracted nodes and edges, a graphical representation and adjacency matrix of each character is generated and compared with a predetermined graphical representation and adjacency matrix to determine a match. Based on the comparison, a matching probability is determined based on which one or more characters in the input image is recognized and displayed as output. The proposed recognition method and system recognizes character with more accuracy and speed. Further, the present disclosure is simple, cost-effective and reduces the complexity involved in automatic recognition of characters. | 06-30-2016 |
382178000 | Separating touching or overlapping characters | 4 |
20090067720 | COLOR CODED LETTER GUIDE - A writing sheet assembly including a sheet-like body portion and a guide portion on the body portion. The guide portion includes at least three generally parallel spaced apart lines defining first and second portions therebetween such that a user can write on the portions while using the spaced lines to guide the writing thereof. The first portion is of a first color and the second portion is of a second color different from the first color, and the first portion is positioned on top of the second portion. The first color is blue or green and the second color is green or brown. | 03-12-2009 |
20090074291 | IMAGE PROCESSING APPARATUS AND IMAGE PROCESSING METHOD - An image processing apparatus includes an acquisition unit configured to acquire a document image, a primary region segmentation unit configured to segment the acquired document image into a plurality of regions, a detection unit configured to detect a text region including an erroneous sentence from the regions segmented by the primary region segmentation unit, a secondary region segmentation unit configured to detect a second attribute region partly overlapped with an original sentence of the erroneous sentence and separate the detected region into the second attribute region and a part of the original sentence, and a combining unit configured to combine the part of the original sentence separated by the secondary region segmentation unit with the text region including the erroneous sentence. | 03-19-2009 |
20150086113 | System and Method for Detection and Segmentation of Touching Characters for OCR - The present disclosure relates to a system and a method for detection of touching characters in a media, characterized by segmentation of adjoining character spaces. In the very first step, an aspect ratio is calculated for each connected component. A candidate touching position of each character is determined by calculating a threshold aspect ratio for each character. Further, a candidate cut column is determined based on a relation between column pixel densities and corresponding length thereof the column in order to segment the touching characters at the candidate cut column. | 03-26-2015 |
20150302598 | LINE SEGMENTATION METHOD - A line segmentation method which starts with determining a first starting point coordinate and generating a list of potential character widths dependent on a maximum character width stored in a database and on characteristics of the portion of the line of text corresponding to the maximum character width. The method determines a second portion of the line of text corresponding to the first starting point coordinate and the first width on the list of potential character widths. On the second portion, a classification method is applied providing a likelihood of error for the first width and a candidate character. The likelihood of error is compared with a first threshold determined by a trade-off between speed and accuracy, and if the likelihood of error corresponding to the first width is lower than the threshold value, the candidate character is selected as the character meaning that a segment is known. | 10-22-2015 |
382179000 | Segmenting hand-printed characters | 9 |
20100067793 | HANDWRITTEN WORD SPOTTER USING SYNTHESIZED TYPED QUERIES - A wordspotting system and method are disclosed for processing candidate word images extracted from handwritten documents. In response to a user inputting a selected query string, such as a word to be searched in one or more of the handwritten documents, the system automatically generates at least one computer-generated image based on the query string in a selected font or fonts. A model is trained on the computer-generated image(s) and is thereafter used in the scoring the candidate handwritten word images. The candidate or candidates with the highest scores and/or documents containing them can be presented to the user, tagged, or otherwise processed differently from other candidate word images/documents. | 03-18-2010 |
20100189353 | Method for Improving Character Outlines Using Multiple Alignment Zones - A method aligns a character to a sampling grid of an image, where an outline of the character is specified by input pen commands. Points and contours of the input pen commands are determined. An orientation of each contour is determined. A first directed acyclic graph (DAG) is constructed indicating a hierarchical relationship of related contours. Radicals are determined using the first DAG. Simple segments of the contours are determined and merged independently for each radical. Segment pairs and their hinted coordinates are determined. The segment pairs are sorted and a second DAG is constructed for the sorted segment pairs. Collisions between the segment pairs are resolved using the second DAG. The segments pairs, x-free points, and y-free points are fitted to the sampling grid independently for each radical and a result of the fitting is stored in output pen commands. | 07-29-2010 |
20100232699 | System For Line Extraction In Digital Ink - A system for line extraction in digital ink. The digital ink represents handwritten input and is comprised of a stroke sequence. The system comprises a processor configured for: receiving the digital ink from a pen device; segmenting the strokes into a sequence of substrokes; grouping substrokes about a selected substroke into a temporally preceding group of substrokes and a temporally subsequent group of substrokes; calculating a centroid for each substroke or group of substrokes; calculating angular differences between the selected substroke and its temporally neighbouring groups of substrokes; and determining positions of extrema of the angular differences. The extrema correspond to substrokes at line breaks, thereby enabling line extraction in the stroke sequence. | 09-16-2010 |
20110293181 | Handwritten Character Recognition System - A character classification system is disclosed. The character classification system has an input device for receiving a handwritten input character, and a processor. The processor is configured to, for each character model, each character model being associated with an output character and defining a model specific segmentation scheme for that output character and an associated segment model, the model specific segmentation scheme defining a minimum length corresponding to a number of points in a stroke of the output character: (i) decompose the handwritten input character into one or more segments in accordance with the model specific segmentation scheme of the respective character model; and (ii) evaluate the one or more segments against the segment model of the respective character model to produce a score indicative of the conformity of the one or more segments with the segment model. The processor then selects the character model that produced the highest score, and classifies the handwritten input character as the output character associated with the character model that produces the highest score. | 12-01-2011 |
20120201459 | Annotation Detection and Anchoring on Ink Notes - Systems and methods for detecting annotation digital ink strokes and further associating annotation digital ink strokes with word digital ink strokes are presented. Ink strokes are captured on a writing surface and then classified as words or annotations. Annotations are then anchored to corresponding words. When words are relocated or edited on the writing surface, the anchored annotations are also relocated and may even be reshaped according to the changes in the anchored words. | 08-09-2012 |
20120328195 | Handwritten Character Recognition System - A handwritten character recognition system is disclosed. The system has an input device for receiving handwritten strokes, and a processor for classifying the handwritten strokes as an output character. The processor does so by calculating a degree of membership of the handwritten strokes to each of a plurality of character models. The character model that produces the highest degree of membership is selected and the handwritten strokes ate classified as the output character associated with the character model that produces the highest degree of membership. | 12-27-2012 |
20130315483 | HANDWRITTEN DOCUMENT RETRIEVAL APPARATUS AND METHOD - According to one embodiment, a handwritten character retrieval apparatus is provided with an acquisition unit, a separation unit, a feature extraction unit and a retrieval unit. The acquisition unit acquires a document including handwriting data. The separation unit separates the document into a plurality of parts. The feature extraction unit extracts feature values, each indicating a feature value of each part. The retrieval unit executes retrieval based on the feature values. | 11-28-2013 |
20130315484 | HANDWRITTEN CHARACTER RETRIEVAL APPARATUS AND METHOD - According to one embodiment, a handwritten character retrieval apparatus is provided with an acquisition unit, a feature extraction unit, an segmentation unit, a attribute append unit and a retrieval unit. The acquisition unit acquires a handwritten character string in units of a stroke. The feature extraction unit extracts a first feature value unique to each of the strokes from the handwritten character string. The segmentation unit segments the strokes into a plurality of sets. The attribute append unit appends a second feature value based on the sets to each of the strokes. The retrieval unit executes retrieval based on the first feature value and the second feature value. | 11-28-2013 |
20140112582 | CHARACTER RECOGNITION - Systems and methods for character recognition by performing lateral view-based analysis on the character data and generating a feature vector based on the lateral view-based analysis. | 04-24-2014 |