DEV Community

Bhavani Ikkurthi
Bhavani Ikkurthi

Posted on • Updated on

Offline Content Migration in AEM using oak-upgrade

Content migration between any 2 Adobe Experience Manager(AEM) instances presents its own challenges, in some cases those are compounded by the method. In this post, I present a way to tackle content migration in offline mode using oak-upgrade. oak-upgrade module has been traditionally used to upgrade from JCR 2.0 to the Oak node store. With the newest AEM platforms shipping with the oak repository, it has found its calling in migrating content between them β€” sidegrade.

oak-upgrade is a Swiss army knife for copying content between virtually any repositories.

Below is a minimalistic rundown of the steps involved in migrating content between 2 repositories using oak-upgrade.


SETUP

  • Both source and destination oak repositories are the same version. This is typically the case when both source and destination AEM instances are the same version. The oak repository version can be looked up at a few places.
  • /crx/explorer/config/index.jsp. Look for jcr.repository.version.
  • /system/console/jmx/com.adobe.granite%3Atype%3DRepository. Look for jcr.repository.version.
  • This information is also readily available in other areas of AEM like CRXDE|Lite homescreen. However, with CRXDE|Lite disabled on vital AEM servers this may be not easily accessible.
    AEM OAK Repository Version Information as shown in CRXDE|Lite<br>

  • The repository is owned by a restricted user. This is generally a best practice to secure the repository at File System(FS) layer. In this case our user is crx that owns the crx-quickstart/repository path on the FS.

  • sudo is available so the process can be started as user crx.

  • Ensure a FS snapshot of crx-quickstart/repository folder from the source AEM instance is available.

  • Content is being migrated from author-to-author and publish-to-publish.

START

  • Remove all custom indexes on the repository. Depending on the type(sync, async, nrt) these indexes could prolong the migration time. Using oak-upgrade will involve opening of the tar files in the repository to copy them. As a result, a reindex will occur on the start up of AEM post-migration. For large sync or nrt indexes, this could potentially add a few hours! to the startup time.
  • Stop destination aem. Verify once, twice..several times that the destination aem has stopped. It also helps to verify that the stopping of aem was clean. Ensure there are no repository or tar errors during the shutdown
  • Mount FS snapshot of the source AEM instance. At this juncture, ensure, ensure, ensure no aem java processes are running.
  • Run the oak-upgrade:
#nohup sudo -u crx java -Xmx20000m -jar oak-upgrade-1.6.16.jar /source/crx-quickstart/repository /dest/crx-quickstart/repository --copy-binaries --src-datastore=/source/crx-quickstart/repository/datastore --datastore=/dest/crx-quickstart/repository/repository/datastore --copy-versions=true --copy-orphaned-versions=false --include-paths=/content,/etc/cloudservices,/etc/cloudsettings,/etc/designs,/etc/segmentation,/etc/tags,/etc/workflow,/var/audit,/jcr:system/rep:namespaces &
  • It is suggested to use absolute paths wherever possible. The content paths in the above command do not reflect the new 6.4+ content structure of AEM. Modify per the need.
  • Scan nohup.out for any errors.
  • When finished the output in nohup.out looks like something below:
15.01.2019 23:16:49.356 [main] *INFO*  org.apache.jackrabbit.oak.upgrade.RepositoryUpgrade - Updating indexes     ____
15.01.2019 23:16:49.745 [main] *INFO*  org.apache.jackrabbit.oak.upgrade.RepositoryUpgrade - Checking node types: Traversed #30000 /jcr:system/jcr:versionStorage/50/1f/78/501f7852-307a-4230-a201-d74f8be71b86
15.01.2019 23:16:51.008 [main] *INFO*  org.apache.jackrabbit.oak.plugins.index.IndexUpdate - /oak:index/uuid => Indexed 540000 nodes in 2.669 s ...
15.01.2019 23:16:51.606 [main] *INFO*  org.apache.jackrabbit.oak.upgrade.RepositoryUpgrade - Updating indexes    / ___|
15.01.2019 23:16:52.072 [main] *INFO*  org.apache.jackrabbit.oak.upgrade.RepositoryUpgrade - Checking node types: Traversed #40000 /jcr:system/jcr:versionStorage/f9/e1
15.01.2019 23:16:52.461 [main] *INFO*  org.apache.jackrabbit.oak.upgrade.RepositoryUpgrade - Checking node types: Traversed #50000 /jcr:system/jcr:versionStorage/ba/d4/7f
15.01.2019 23:16:52.926 [main] *INFO*  org.apache.jackrabbit.oak.plugins.index.IndexUpdate - /oak:index/uuid => Indexed 550000 nodes in 1.918 s ...
15.01.2019 23:17:15.032 [main] *INFO*  org.apache.jackrabbit.oak.upgrade.RepositoryUpgrade - Commit hook EditorHook : (CompositeEditorProvider : ([TypeEditorProvider, IndexEditorProvider])) processed commit in 4.266 min
15.01.2019 23:18:09.231 [main] *INFO*  org.apache.jackrabbit.oak.segment.file.FileStore - TarMK closed: /dest/crx-quickstart/repository/segmentstore
15.01.2019 23:18:11.195 [main] *INFO*  org.apache.jackrabbit.oak.segment.file.ReadOnlyFileStore - TarMK closed: /source/crx-quickstart/repository/segmentstore
  • Start up the destination AEM. Remember that the startup may take longer due to indexing. Ensure async indexing lanes are running and not failing by monitoring these mbeans. It is a good idea to let the indexer finish before restarting aem. Time it takes depends on size of repo.

/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Dasync%2Ctype%3DIndexStats

/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Dfulltext-async%2Ctype%3DIndexStats

STOP


In my opinion, content migration has traditionally been a DevOps/AEM administrator task but I encourage folks in developer roles also to get their hands dirty. This is an opportunity to learn the innards of Adobe Experience Manager(AEM).
References:
https://jackrabbit.apache.org/oak/docs/migration.html
Special thanks to:
@Adobe β€” Matt Vesely, Josh Hamer, Tom Blackford

Top comments (0)