20130077855 | SYSTEMS AND METHODS FOR PROCESSING DOCUMENTS OF UNKNOWN OR UNSPECIFIED FORMAT - A computer implemented method for extracting meaningful text from a document of unknown or unspecified format. In a particular embodiment, the method includes reading the document, thereby to extract raw encoded text, analysing the raw encoded text, thereby to identify one or more text chunks, and for a given chunk, performing compression identification analysis to determine whether compression is likely and, in the event that compression. The method can further include performing a decompression process, performing an encoding identification process thereby to identify a likely character encoding protocol, and converting the chunk using the identified likely character encoding protocol, thereby to output the chunk as readable text. | 03-28-2013 |