Import definitions - import of 2 million objects


#1

I am about to run import of about 2 million objects of the same class via the imports definitions bundle.
Are there any tips in speeding this up?
I am afraid with parallelization I might be getting dead locks on DB layer since it will be writing to the same tables.
Do you have any experience?
Thanks


#2

Well, it depends on how complex your import is. Things I do to optimize imports:

  • If you have a lot of assets to import, try checking if the file actually changed before updating it.
  • only create relational objects in an interpreter if necessary.
  • spilt up imports into several definitions and only run necessary ones.

The biggest problem is Pimcore though. It’s persisting model is quite slow. Also use the version where they introduced delta updates. That should also speed things up.


#3

I am importing only data objects and in this case with no relations. It’s the initial import so all of records are new.

#1
I installed oci8 driver for connection to Oracle and external sql provider works well - except I had to change the code in AbstractSqlProvider.php to fetch only one record in getColumns() as I am querying for many records and it would not get into the Mapping.

data = $query->fetch();

instead of

$data = $query->fetchAll();

and change $data[0] to $data in the same function.
Is this for a PR?

#2
My query returns all of those 2 million records from oracle DB and this kills the pimcore server with out of memory.
Can this be somehow done directly in import definitions bundle to handle this in chunks?
Or do I need to split it and handle myself?

Thanks!


#4

#1 Yes please create a PR.
#2 I don’t have a solution for that, I currently run bigger imports in batch. So, only 100 at once, then the next 100. With 2 million records, this still will take forever though.