eZpedia : The Free eZ Publish CMS Documentation Encyclopedia

Solution: indexing/searching pdf files with accented chars

eZpublish needs external tools for pdf file indexing. Path to that tool is set in binaryfile.ini

Originally i was using the pstotext too, but it does not work well and often gives garbage as output. Better tool is pdftotext from xpdf project, as suggested in this article.

In my case the above was not enough, as pdf files apparently are in iso-8859-1 but eZpublish expects UTF-8 as input.

So, the contents of ezpdftotext should be:

#!/bin/sh
pdftotext $1 -|iconv -f ISO-8859-1 -t UTF-8

Article provided by eZpedia

All text is available under the terms of the GNU Free Documentation License

Powered by eZ Publish 6.0.2stable

Hosted by USA eZ Publish Community Partner : Brookins Consulting