Document indexing for Search - General Discussions - General Discussions - Progress Community
 General Discussions

Document indexing for Search

  • Document indexing for Search
  • Will version 4.0 have the ability to index the content of documents like PDF or Word Documents that will then be searchable? So if someone were to do a seach from the live site, and there was a match in a PDF file that is linked from a page within the site, will it show in the results?
  • Hello apollo,

    There will be search engine for all content items. The exact content of a given file will not be indexed in 4.0, we will try to provide this functionality out of the box in 4.1

    Greetings,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  • Thanks for the response Ivan.

    Do you know of a solution that we could integrate in the meantime that will provide this functionality? Or do you have an idea when 4.1 will be available?
  • Hi apollo,

    The search engine for Sitefinity 4 has not been implemented so far - we will do this after the BETA. We will use Lucene engine. To search inside document content you have to use 3rd party framework to extract the content of the files. Most probably we will provider  API that allows you to extract the content from the doc files. For other file types you could use some open source libraries as Apache PDFBox or iTextSharp.

    If you use PDFBox you can read the stream by using PDDocument.load(stream); and then call getText of  PDFTextStripper instance.

    You can use Apache PDFBox or iTextSharp with custom provider in 3.x editions as well.

    Greetings,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  • hello,
    can you please update me on this.... will I be able to search inside PDF with the realease of 14/01/2011?
    Thanks
  • Hello,

    In the official version of Sitefinity 4.0  we will not have PDF indexing. This will be implemented on a later stage, but we have not scheduled a time frame for the implementation yet.

    Kind regards,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  • and for what concern the DOC/DOCX File will it be avaiable? could I implement my own pdf search service and integrate it with sitefinity?
    Thanks
  • Hi,

    Can you provide us a sample of implementing a custom search provider in SF 4.0. We have gone through extensive development for document indexing in SF3.7.

    Regards,
    Jean
  • Hi apollo,

    We do not have a sample that shows how to create a custom index. The index is based on pipes, so you can check this post.

    Regards,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  • Hello Ivan,
    when the 4.1 will be released? Thanks
  • Ivan where can I find a sample of using PagePipe to develop my custom index?
    Thanks
  • Hello Paolo,

    We made a sample that shows how to create a custom pipes and it will be included in the Q1 release scheduled in the middle of April.

    Best wishes,
    Ivan Dimitrov
    the Telerik team

  • Hello Ivan,
    since I need to end the first part of the project for the end of april is it possible to have it before?? I don't want to overflow the deadline for the search index...thanks
  • Hi Paolo,
     
    The sample will be released the next week. We will post a lin to this forum post when we are done.
    We are sorry for not being to speed up our delivery. We are currently focused on the coming Q1 release next week and all our efforts go in this direction.

    I hope the suggested timing can work for  you.

    All the best,
    Kalina
    the Telerik team