The following diagram describes the MTC custom resources (CRs). Each object is a standard Kubernetes CR.
You can manage the MTC resources with the standard create, read, update, and delete operations using the
oc clients or directly, using the web interface.
You can view the resources of a migration plan in the MTC web console:
Click the Options menu next to a migration plan and select View migration plan resources.
The migration plan resources are displayed as a tree.
Click the arrow of a Backup or Restore resource to view its pods.
Click the Copy button of a pod to copy the
oc get command to your clipboard and paste the command on the command line.
Click View Raw to inspect a pod.
The resource is displayed in JSON format.
Usually the objects that you should examine depend on the stage at which the migration failed. The MTC debug flowchart provides information about which resources are relevant, depending on this failure stage.
Stage migrations have one
Backup and one
Final migrations have two
Backup and two
Restore resources. The first
Backup resource captures the original, unaltered state of the application and its Kubernetes objects. This
Backup is the source of truth.
Then, the application is quiesced and a second
Backup captures the storage-related resources (PVs, PVCs, data).
Restore restores these storage objects on the target cluster. The final
Restore restores the original application
Backup to the target cluster.
You can view the migration debug tree and query specific label selectors.
migmigrationresources associated with the
$ oc get migmigration -l 'migration.openshift.io/migplan-name=test'
NAME READY PLAN STAGE ITINERARY PHASE 09a8bf20-fdc5-11ea-a447-cb5249018d21 test false Final Completed
The columns display the associated plan name, itinerary step, and phase.
migmigrationresources associated with the
$ oc get migmigration -l 'migration.openshift.io/migplan-name=test' -o\ go-template-file=./go-cli-templates/migmigration-completed-list.template
Name: 51886050-6d52-11eb-98e7-b515603f1bc7 Migplan: test Result: SucceededWithWarnings Name: b4d045f0-6d58-11eb-98e7-b515603f1bc7 Migplan: test Result: SucceededWithWarnings
Result parameter indicates a status of
migmigrationresources associated with the
$ oc get migmigration -l 'migration.openshift.io/migplan-name=test' -o \ go-template-file=go-cli-templates/migmigration-display-warning-list.template
Name: 51886050-6d52-11eb-98e7-b515603f1bc7 Migplan: test Warning: DirectVolumeMigrationFailed Message: DirectVolumeMigration (dvm): openshift-migration/51886050-6d52-11eb-98e7-b515603f1bc7-z9zfj failed. See in dvm status.Errors errors: <no value> Name: b4d045f0-6d58-11eb-98e7-b515603f1bc7 Migplan: test Warning: DirectVolumeMigrationFailed Message: DirectVolumeMigration (dvm): openshift-migration/b4d045f0-6d58-11eb-98e7-b515603f1bc7-4c894 failed. See in dvm status.Errors errors: <no value>
$ oc get backup -n openshift-migration
NAME AGE 88435fe0-c9f8-11e9-85e6-5d593ce65e10 6m42s
You can use the same command to view
$ oc describe backup 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration
$ oc get dvm -l migmigration=<uid>
NAME AGE b4d045f0-6d58-11eb-98e7-b515603f1bc7-4c894 5d21h
$ oc get dvm -l migmigration=<uid> \ -o go-template-file=go-cli-templates/dvm-display-failure-list.template
Name: b4d045f0-6d58-11eb-98e7-b515603f1bc7-4c894 State: Failed Phase: WaitForRsyncClientPodsCompleted Message: The migration has failed. See: Errors. errors: [One or more pods are in error state]
See Viewing migration custom resources for more information.
restoreresources with the Velero CLI
You can debug
restore resources with the Velero command line interface (CLI). The Velero CLI is included in the
Velero CLI examines lower-level information in the object storage associated with a
Restore resource. This information can reveal why a resource was not restored or why a Velero operation failed.
Velero CLI commands use the following syntax:
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> <command> <resource_id>
You can specify
velero-<pod> -n openshift-migrationin place of
$(oc get pods -n openshift-migration -o name | grep velero).
For a full list of commands, run
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero --help
describe command provides a summary of warnings and errors associated with a Velero resource:
velero <resource> describe <resource_id>
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
logs command provides a lower level output of the logs associated with a Velero resource:
velero <resource> logs <resource_id>
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
You can debug a partial failure by using the Velero
logs command to examine the
restore logs. A partial failure occurs when Velero encounters an issue but the issue does not cause a migration to fail.
For example, if a custom resource (CR) cannot be restored because the custom resource definition (CRD) is missing or is a different version on the target cluster, Velero logs the issue and then processes the rest of the objects in the
In the future, Velero error reporting will be improved. See MIG-353: Enhance Velero error reporting so problems that cause partial failures (and even full failures) are more visible in structured way
The following example describes how to debug a partially failed
restore caused by GVK incompatibility. The CRD version on the source cluster differs from the CRD version on the target cluster. See the GVK incompatibility exercise for details.
MigMigrationinstance associated with the partial failure:
oc get migmigration ccc7c2d0-6017-11eb-afab-85d0007f5a19 -o yaml
status: conditions: - category: Warn durable: true lastTransitionTime: "2021-01-26T20:48:40Z" message: 'Final Restore openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf: partially failed on destination cluster' status: "True" type: VeleroFinalRestorePartiallyFailed - category: Advisory durable: true lastTransitionTime: "2021-01-26T20:48:42Z" message: The migration has completed with warnings, please look at `Warn` conditions. reason: Completed status: "True" type: SucceededWithWarnings
restoreresource by running the
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore describe ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
Phase: PartiallyFailed (run 'velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf' for more information) Errors: Velero: <none> Cluster: <none> Namespaces: gvk-demo: error restoring gvkdemoes.konveyor.openshift.io/gvk-demo/gvk-demo: the server could not find the requested resource
restorelogs by running the
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
time="2021-01-26T20:48:37Z" level=info msg="Attempting to restore GvkDemo: gvk-demo" logSource="pkg/restore/restore.go:1107" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf time="2021-01-26T20:48:37Z" level=info msg="error restoring gvk-demo: the server could not find the requested resource" logSource="pkg/restore/restore.go:1170" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
restore log error message,
the server could not find the requested resource, indicates the cause of the partially failed migration.
The following error message might appear when you log in to the MTC console for the first time:
A certificate error has occurred, likely caused by using self-signed CA certificates in one of the clusters. Navigate to the following URL and accept the certificate: `https://www.example.com:6443/.well-known/oauth-authorization-server`. If an "Unauthorized" message appears after you have accepted the certificate, refresh the web page. To fix this issue permanently, add the certificate to your web browser's trust store.
Possible causes are self-signed certificates or network access issues.
Self-signed CA certificates:
oauth-authorization-serverURL and accept the certificate.
MTC 1.3.1 and earlier: The MTC console performs OAuth authentication on the client side.
The console requires uninterrupted network access to the API server and the OAuth server.
MTC 1.3.2 and later: OAuth authentication is performed on the backend.
If you acce[t] a self-signed certificate and a blank page appears, followed by a
Connection has timed out message, the likely cause is a web proxy blocking access to the OAuth server.
Configure the web proxy configuration to allow access to the
oauth-authorization-server URL. See BZ#1890675.
You can use the
must-gather tool to collect information for troubleshooting or for opening a customer support case on the Red Hat Customer Portal. The
openshift-migration-must-gather-rhel8 image collects migration-specific logs and Custom Resource data that are not collected by the default
must-gather command on your cluster:
$ oc adm must-gather --image=openshift-migration-must-gather-rhel8:v1.3.0
must-gather tool generates a local directory that contains the collected data.
For direct volume migrations, we create a few dependencies:
If direct volume migration fails to complete, it could be one of the following buckets of errors:
Error in creating dependencies: If creating any of the above dependencies fails, it could lead to a failure. The error message will be present in DVM status and in the errors fields. Use the dvm failure list command to find this.
Stuck in waiting for dependencies to be healthy: It could be the case that the pods on either source or destination is not running or the route is not admitted. In both these cases, DVM is hang until the pod is running or route is admitted. When the pods are stuck in a non-healthy state, you will find a Warning in 10 mins of DVM being stuck here. Use the dvm command to see warning on the dvm status
Rsync exits with error: this happens when all the dependencies are met and healthy and the rysnc fail because of some reason. The dvm controller does not clean up the failed rsync client pods so the logs can be inspected.
A very likely cause of direct volume migration failing to complete is the Rsync transfer pods on the target cluster remain in a
MTC migrates namespaces with all annotations in order to preserve security context constraints and scheduling requirements. During direct volume migration, MTC creates Rsync transfer pods on the target cluster in the namespaces that were migrated from the source cluster. If the target cluster does not have the same node labels as the source cluster, the Rsync transfer pods cannot be scheduled.
You can check the
migmigration CR status:
$ oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration
The output displays the following
Some or all transfer pods are not running for more than 10 mins on destination cluster
To resolve this issue, perform the following steps:
openshift.io/node-selectorannotation of the migrated namespaces on the source cluster:
$ oc get namespace -o yaml
openshift.io/node-selectorannotation to each migrated namespace on the target cluster:
apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/node-selector: "region=east" ...
You can use
must-gather to create a metrics data directory dump from the last day:
$ oc adm must-gather --image quay.io/konveyor/must-gather:latest -- /usr/bin/gather_metrics_dump
You can view the data with a local Prometheus instance.
For information about the metrics recorded by the MTC controller, see the
This documentation includes useful queries for performance monitoring.
Ensure that stage pods are cleaned up. If a migration fails during stage or copy, the stage pods are retained to allow debugging. Before retrying a migration, you must delete the stage pods manually.
If your application was quiesced during migration, you should
unquiesce it by scaling it back to its initial replica count.
This can be done manually by editing the deployment primitive (
DeploymentConfig, etc.) and setting the
spec.replicas field back to its original, non-zero value:
$ oc edit deployment <deployment_name>
Alternatively, you can scale your deployment with the
oc scale command:
$ oc scale deployment <deployment_name> --replicas=<desired_replicas>
When a source application is quiesced during migration, MTC adds a label indicating the original replica count to the
apiVersion: extensions/v1beta1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" migration.openshift.io/preQuiesceReplicas: "1"
The following procedure removes the MTC Operator and cluster-scoped resources:
Delete the Migration Controller and its resources:
$ oc delete migrationcontroller <resource_name>
Wait for the MTC Operator to finish deleting the resources.
Uninstall the MTC Operator:
$ oc delete ns openshift-migration
$ oc delete -f operator.yml
Delete the cluster-scoped resources:
$ oc delete $(oc get crds -o name | grep 'migration.openshift.io')
$ oc delete $(oc get crds -o name | grep 'velero')
$ oc delete $(oc get clusterroles -o name | grep 'migration.openshift.io')
$ oc delete clusterrole migration-operator
$ oc delete $(oc get clusterroles -o name | grep 'velero')
$ oc delete $(oc get clusterrolebindings -o name | grep 'migration.openshift.io')
$ oc delete clusterrolebindings migration-operator
$ oc delete $(oc get clusterrolebindings -o name | grep 'velero')