Man Pages

djvutxt(1) - phpMan djvutxt(1) - phpMan

Command: man perldoc info search(apropos)  


DJVUTXT(1)                       DjVuLibre-3.5                      DJVUTXT(1)



NAME
       djvutxt - Extract the hidden text from DjVu documents.


SYNOPSIS
       djvutxt [options] inputdjvufile [outputtxtfile]


DESCRIPTION
       Program  djvutxt decodes the hidden text layer of a DjVu document inputdjvufile and prints it into file output-
       txtfile or on the standard output.  The hidden text layer is usually generated with  the  help  of  an  optical
       character recognition software.

       Without options -detail and -escape, this program simply outputs the UTF-8 text.  Option -detail cause the out-
       put of S-expressions describing the text and its location.  Option -escape uses  C-style  escape  sequences  to
       represent nonprintable non-ASCII characters.




OPTIONS
       --page=pagespec
              Specify  which  pages  should be processed.  When this option is not specified, the text of all pages of
              the documents is concatenated into the output file.  The page specification  pagespec  contains  one  or
              more  comma-separated  page ranges.  A page range is either a page number, or two page numbers separated
              by a dash.  For instance, specification 1-10 outputs pages 1 to 10, and specification  1,3,99999-4  out-
              puts pages 1 and 3, followed by all the document pages in reverse order up to page 4.

       --detail=keyword
              This  options  causes  djvutxt  to output S-expressions specifying the position of the text in the page.
              See the manual page djvused(1) for a description of the output format.  Argument keyword  specifies  the
              maximum  level  of detail for which text location is reported.  The recognized values are: page, column,
              region, para, line, word, and char.  All other values are interpreted as char.

       --escape
              Output escape sequences of the form  "ooo" for all non ASCII or non printable UTF-8 characters  and  for
              the backslash character.





REMARKS
       Use program djvused(1) for more control over the text layer.


CREDITS
       This  program  was  initially  written  by Andrei Erofeev <andrew_erofeevATyahoo.com> and was then improved Bill
       Riemers <docbillATsourceforge.net> and many others. It was then rewritten to use the  ddjvuapi  by  Leon  Bottou
       <leonbATsourceforge.net>.


SEE ALSO
       djvu(1), djvused(1)




DjVuLibre-3.5                     10/11/2001                        DJVUTXT(1)