Search-Indexing large sites

Hey everyone,

first of all I have to say I really love Pimcore, it’s the best CMS I’ve ever worked with, and I’ve seen a few! Keep up the good work :+1:
I have a question regarding the lucene search indexer. We have a customer with a large site (20+ countries, each with ~8.000 unique URLs), and the Lucene search indexer is taking way too long, to the point where it simply does not finish anymore but crashes. Is there a way to build the index in parallel for the different countries, use multiple indexes, or something similar?

Thanks in advance!

Lucene is quite slow cause it crawls the website, it also doesn’t really support altering a document. Look at this: https://github.com/dachcom-digital/pimcore-dynamic-search. This is the new search engine from Dachcom and works way better and much faster.

I will, thanks. I came across this when I checked out which search engine to use, but since it is still 0.x I was assuming that it’s not ready for production use.

Hey @dpfaffenbauer, do you have any hints on how to use the bundle? Apart from the example config, there is no documentation yet. I tried running the indexer, but all I get is:
./bin/console dynamic-search:run -c default -f
[22:31:47] DEBUG : data queue cleared. Affected jobs: 0 [“queue”,“maintenance”]
[22:31:47] DEBUG : warm up provider [“trinity_data”,“default”]
[22:31:47] DEBUG : warm up provider [“lucene”,“default”]
[22:31:47] DEBUG : execute provider for dispatch type “index” [“trinity_data”,“default”]
[22:31:47] DEBUG : cooling down provider [“trinity_data”,“default”]
[22:31:47] DEBUG : cooling down provider [“lucene”,“default”]

When I submit a search query, it says “no results”. I expected it at least to index documents out of the box - what do I have to do here?
Thanks!

Update: I got one step further by playing around with the addSimpleDocumentFieldDefinition() calls and adding several objects classes. These are now indexed and I can sorta make the results render - very hacky still but starting to come together (
But it’s still not indexing documents, and also it seems to ignore object variants. Since the site makes heavy use of those, can they be indexed as well?

Okay, making some headway now. After more digging, I now assume I should create one Definition per object class / type of object I want to add, and tag them with dynamic_search.document_definition_builder so they get picked up by the engine so transform documents. Is this the right approach?

I managed to coerce it into indexing documents now by adding another Definition class. There are a couple of questions that I still have:

  • How do I make it index object variants?
  • Did I really have to create another definition class, or would you rather recommend to only use one?
  • How do I index the actual content on pages (e.g. stuff in area bricks), rather than just properties like title or description?
  • What is the right approach for multilanguage/multisite instances? Do I create one context per language?
  • Is there a configuration reference somewhere?

Sorry for the multiple posts and thanks in advance for supporting!

Documentation would make things easier, yes.

So, here is how it works:

You need to create a Definition for every type you want to index: asset, object, document, preferably also a single Definition per Object Class.

Such a Definition looks like this:

<?php

namespace AppBundle\DynamicSearch\Definition\Document;

use DynamicSearchBundle\Document\Definition\DocumentDefinitionBuilderInterface;
use DynamicSearchBundle\Document\Definition\DocumentDefinitionInterface;
use DynamicSearchBundle\Normalizer\Resource\ResourceMetaInterface;

class ProductDefinition implements DocumentDefinitionBuilderInterface
{
    /**
     * {@inheritdoc}
     */
    public function isApplicable(string $contextName, ResourceMetaInterface $resourceMeta)
    {
        if ($resourceMeta->getResourceCollectionType() !== 'object') {
            return false;
        }

        if ($resourceMeta->getResourceSubType() !== 'Product') {
            return false;
        }

        return true;
    }

    /**
     * {@inheritdoc}
     */
    public function buildDefinition(DocumentDefinitionInterface $definition, array $normalizerOptions)
    {
        $definition
            ->addSimpleDocumentFieldDefinition([
                'name' => 'type',
                'index_transformer' => [
                    'type' => 'keyword',
                ],
                'data_transformer' => [
                    'type' => 'normalizer_value_callback',
                    'configuration' => ['value' => 'object'],
                ],
            ])
            ->addSimpleDocumentFieldDefinition([
                'name' => 'data_type',
                'index_transformer' => [
                    'type' => 'keyword',
                ],
                'data_transformer' => [
                    'type' => 'normalizer_value_callback',
                    'configuration' => ['value' => 'product'],
                ],
            ])
            ->addSimpleDocumentFieldDefinition([
                'name' => 'product_id',
                'index_transformer' => [
                    'type' => 'keyword',
                ],
                'data_transformer' => [
                    'type' => 'object_getter_extractor',
                    'configuration' => ['method' => 'getId'],
                ],
            ])
            ->addSimpleDocumentFieldDefinition([
                'name'              => 'locale',
                'index_transformer' => [
                    'type' => 'keyword',
                ],
                'data_transformer'  => [
                    'type'          => 'normalizer_value_callback',
                    'configuration' => ['value' => $normalizerOptions['locale']]
                ]
            ])
            ->addSimpleDocumentFieldDefinition([
                'name' => 'title',
                'index_transformer' => [
                    'type' => 'text',
                    'configuration' => [
                        'boost' => 50,
                    ],
                ],
                'data_transformer' => [
                    'type' => 'object_getter_extractor',
                    'configuration' => ['method' => 'getName', 'arguments' => [$normalizerOptions['locale']]],
                ],
            ])
            ->addSimpleDocumentFieldDefinition([
                'name' => 'sub_title',
                'index_transformer' => [
                    'type' => 'text',
                    'configuration' => [
                        'boost' => 50,
                    ],
                ],
                'data_transformer' => [
                    'type' => 'object_getter_extractor',
                    'configuration' => ['method' => 'getSubTitle', 'arguments' => [$normalizerOptions['locale']]],
                ],
            ])
            ->addSimpleDocumentFieldDefinition([
                'name' => 'description',
                'index_transformer' => [
                    'type' => 'text',
                ],
                'data_transformer' => [
                    'type' => 'object_getter_extractor',
                    'configuration' => [
                        'method' => 'getProductDescription',
                        'arguments' => [$normalizerOptions['locale']],
                    ],
                ],
            ])
            ->addSimpleDocumentFieldDefinition([
                'name' => 'miscibility_description',
                'index_transformer' => [
                    'type' => 'text',
                ],
                'data_transformer' => [
                    'type' => 'object_getter_extractor',
                    'configuration' => [
                        'method' => 'getMiscibilityDescription',
                        'arguments' => [$normalizerOptions['locale']],
                    ],
                ],
            ])
            ->addSimpleDocumentFieldDefinition([
                'name' => 'uri',
                'index_transformer' => [
                    'type' => 'unIndexed',
                ],
                'data_transformer' => [
                    'type' => 'object_path_generator',
                    'configuration' => [
                        'arguments' => [
                            '_locale' => $normalizerOptions['locale'],
                        ],
                    ],
                ],
            ]);
    }
}
AppBundle\DynamicSearch\Definition\Document\ProductDefinition:
        tags:
            - { name: dynamic_search.document_definition_builder }

In there, you add fields to your document and use transformers to get data from your object/asset/document:

<?php

namespace AppBundle\DynamicSearch\Transformer\Field;

use DynamicSearchBundle\Resource\Container\ResourceContainerInterface;
use DynamicSearchBundle\Resource\FieldTransformerInterface;
use Pimcore\Model\DataObject\Product;
use Symfony\Component\OptionsResolver\OptionsResolver;

class ProductIndicationTitlesExtractor implements FieldTransformerInterface
{
    /**
     * @var array
     */
    protected $options;

    /**
     * {@inheritdoc}
     */
    public function configureOptions(OptionsResolver $resolver)
    {
        $resolver->setRequired(['locale']);
        $resolver->setAllowedTypes('locale', ['string', 'null']);
    }

    /**
     * {@inheritdoc}
     */
    public function setOptions(array $options)
    {
        $this->options = $options;
    }

    /**
     * {@inheritdoc}
     */
    public function transformData(string $dispatchTransformerName, ResourceContainerInterface $resourceContainer)
    {
        $product = $resourceContainer->getResource();
        if (!$product instanceof Product) {
            return null;
        }

        $indications = $product->getProductIndications();
        if (!is_array($indications) || count($indications) === 0) {
            return null;
        }

        $values = [];
        foreach ($indications as $indication) {
            $name = $indication->getWebsiteTitle($this->options['locale']);

            if (!$name) {
                continue;
            }

            $values[] = $name;
        }

        if (count($values) === 0) {
            return null;
        }

        return implode(',', array_unique($values));
    }
}

AppBundle\DynamicSearch\Transformer\Field\ProductIndicationTitlesExtractor:
        tags:
            - { name: dynamic_search.resource.field_transformer, identifier: product_indications_titles_extractor, resource_scaffolder: trinity_data_scaffolder }

you also have to define how your index and how the search works, you do that by creating a new context config:

dynamic_search:
    enable_pimcore_element_listener: true
    context:
        website
            index_provider:
                service: 'lucene'
                options:
                    database_name: 'website'
                    analyzer:
                        forced_locale: de
                        filter:
                            -
                                on_index_time: true
                                on_query_time: true
                                locale_aware: true
                                class: '\DsLuceneBundle\Lucene\Filter\Stemming\SnowBallStemmingFilter'
                        stop_words:
                            on_index_time: true
                            on_query_time: true
                            libraries:
                                -
                                    locale: de
                                    file: '%%dsl_stop_words_lib_path%%/de'
                                -
                                    locale: en
                                    file: '%%dsl_stop_words_lib_path%%/en'
            data_provider:
                service: 'trinity_data'
                normalizer:
                    service: 'trinity_localized_resource_normalizer'
                    options:
                        locales: ['de']
                options:
                    always:
                        index_object: true
                        index_document: true
                        object_ignore_unpublished: true
                        object_class_names:
                            - Product
                        object_types:
                            - object
            output_channels:
                autocomplete:
                    service: 'lucene_autocomplete'
                suggestions:
                    service: 'lucene_suggestions'
                    normalizer:
                        service: 'lucene_document_key_value_normalizer'
                    options:
                        phrased_search: true
                        fuzzy_search: true
                        wildcard_search: true
                        #restrict_search_fields:
                        #    - 'sku'
                        #    - 'title'
                search:
                    service: 'lucene_search'
                    use_frontend_controller: true
                    normalizer:
                        service: 'lucene_document_key_value_normalizer'
                    options:
                        phrased_search: true
                        fuzzy_search: true
                        wildcard_search: true
                    paginator:
                        enabled: true
                        max_per_page: 50

that is sort of it…

Hey @dpfaffenbauer, thanks for the input. I will play around with it some more tonight and see how far I get. Good to hear I haven’t been entirely on the wrong track until now. :wink: