The following diagram describes the MTC custom resources (CRs). Each object is a standard Kubernetes CR.
You can manage the MTC resources with the standard create, read, update, and delete operations using the kubectl
and oc
clients or directly, using the web interface.
You can view the resources of a migration plan in the MTC web console:
Click the Options menu next to a migration plan and select View migration plan resources.
The migration plan resources are displayed as a tree.
Click the arrow of a Backup or Restore resource to view its pods.
Click the Copy button of a pod to copy the oc get
command to your clipboard and paste the command on the command line.
Click View Raw to inspect a pod.
The resource is displayed in JSON format.
Usually the objects that you should examine depend on the stage at which the migration failed. The MTC debug flowchart provides information about which resources are relevant, depending on this failure stage.
Stage migrations have one Backup
and one Restore
resource.
Final migrations have two Backup
and two Restore
resources. The first Backup
resource captures the original, unaltered state of the application and its Kubernetes objects. This Backup
is the source of truth.
Then, the application is quiesced and a second Backup
captures the storage-related resources (PVs, PVCs, data).
The first Restore
restores these storage objects on the target cluster. The final Restore
restores the original application Backup
to the target cluster.
You can view the migration debug tree and query specific label selectors.
migmigration
resources associated with the test
migration plan: $ oc get migmigration -l 'migration.openshift.io/migplan-name=test'
Example output
NAME READY PLAN STAGE ITINERARY PHASE
09a8bf20-fdc5-11ea-a447-cb5249018d21 test false Final Completed
The columns display the associated plan name, itinerary step, and phase.
migmigration
resources associated with the test
migration plan: $ oc get migmigration -l 'migration.openshift.io/migplan-name=test' -o\
go-template-file=./go-cli-templates/migmigration-completed-list.template
Example output
Name: 51886050-6d52-11eb-98e7-b515603f1bc7
Migplan: test
Result: SucceededWithWarnings
Name: b4d045f0-6d58-11eb-98e7-b515603f1bc7
Migplan: test
Result: SucceededWithWarnings
The Result
parameter indicates a status of Succeeded
, SucceededWithWarnings
, or Failed
.
migmigration
resources associated with the test
migration plan: $ oc get migmigration -l 'migration.openshift.io/migplan-name=test' -o \
go-template-file=go-cli-templates/migmigration-display-warning-list.template
Example output
Name: 51886050-6d52-11eb-98e7-b515603f1bc7
Migplan: test
Warning: DirectVolumeMigrationFailed
Message: DirectVolumeMigration (dvm): openshift-migration/51886050-6d52-11eb-98e7-b515603f1bc7-z9zfj failed. See in dvm status.Errors
errors: <no value>
Name: b4d045f0-6d58-11eb-98e7-b515603f1bc7
Migplan: test
Warning: DirectVolumeMigrationFailed
Message: DirectVolumeMigration (dvm): openshift-migration/b4d045f0-6d58-11eb-98e7-b515603f1bc7-4c894 failed. See in dvm status.Errors
errors: <no value>
Backup
resources: $ oc get backup -n openshift-migration
Example output
NAME AGE
88435fe0-c9f8-11e9-85e6-5d593ce65e10 6m42s
You can use the same command to view Restore
resources.
Backup
resource: $ oc describe backup 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration
$ oc get dvm -l migmigration=<uid>
Example Output
NAME AGE
b4d045f0-6d58-11eb-98e7-b515603f1bc7-4c894 5d21h
$ oc get dvm -l migmigration=<uid> \
-o go-template-file=go-cli-templates/dvm-display-failure-list.template
Example Output
Name: b4d045f0-6d58-11eb-98e7-b515603f1bc7-4c894
State: Failed
Phase: WaitForRsyncClientPodsCompleted
Message: The migration has failed. See: Errors.
errors: [One or more pods are in error state]
See Viewing migration custom resources for more information.
backup
and restore
resources with the Velero CLIYou can debug backup
and restore
resources with the Velero command line interface (CLI). The Velero CLI is included in the velero
pod.
Velero CLI examines lower-level information in the object storage associated with a Backup
or Restore
resource. This information can reveal why a resource was not restored or why a Velero operation failed.
Velero CLI commands use the following syntax:
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> <command> <resource_id>
You can specify
velero-<pod> -n openshift-migration
in place of$(oc get pods -n openshift-migration -o name | grep velero)
.
For a full list of commands, run velero --help
:
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero --help
describe
commandThe describe
command provides a summary of warnings and errors associated with a Velero resource:
velero <resource> describe <resource_id>
Example
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
logs
commandThe logs
command provides a lower level output of the logs associated with a Velero resource:
velero <resource> logs <resource_id>
Example
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
logs
commandYou can debug a partial failure by using the Velero logs
command to examine the restore
logs. A partial failure occurs when Velero encounters an issue but the issue does not cause a migration to fail.
For example, if a custom resource (CR) cannot be restored because the custom resource definition (CRD) is missing or is a different version on the target cluster, Velero logs the issue and then processes the rest of the objects in the Backup
resource.
In the future, Velero error reporting will be improved. See MIG-353: Enhance Velero error reporting so problems that cause partial failures (and even full failures) are more visible in structured way
Debug example
The following example describes how to debug a partially failed restore
caused by GVK incompatibility. The CRD version on the source cluster differs from the CRD version on the target cluster. See the GVK incompatibility exercise for details.
MigMigration
instance associated with the partial failure: oc get migmigration ccc7c2d0-6017-11eb-afab-85d0007f5a19 -o yaml
Example output
status:
conditions:
- category: Warn
durable: true
lastTransitionTime: "2021-01-26T20:48:40Z"
message: 'Final Restore openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf: partially failed on destination cluster'
status: "True"
type: VeleroFinalRestorePartiallyFailed
- category: Advisory
durable: true
lastTransitionTime: "2021-01-26T20:48:42Z"
message: The migration has completed with warnings, please look at `Warn` conditions.
reason: Completed
status: "True"
type: SucceededWithWarnings
restore
resource by running the describe
command: $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore describe ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
Example output
Phase: PartiallyFailed (run 'velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf' for more information)
Errors:
Velero: <none>
Cluster: <none>
Namespaces:
gvk-demo: error restoring gvkdemoes.konveyor.openshift.io/gvk-demo/gvk-demo: the server could not find the requested resource
restore
logs by running the logs
command: $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
Example output
time="2021-01-26T20:48:37Z" level=info msg="Attempting to restore GvkDemo: gvk-demo" logSource="pkg/restore/restore.go:1107" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
time="2021-01-26T20:48:37Z" level=info msg="error restoring gvk-demo: the server could not find the requested resource" logSource="pkg/restore/restore.go:1170" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
The restore
log error message, the server could not find the requested resource
, indicates the cause of the partially failed migration.
The following error message might appear when you log in to the MTC console for the first time:
A certificate error has occurred, likely caused by using self-signed CA certificates in one of the clusters. Navigate to the following URL and accept the certificate:
`https://www.example.com:6443/.well-known/oauth-authorization-server`.
If an "Unauthorized" message appears after you have accepted the certificate, refresh the web page.
To fix this issue permanently, add the certificate to your web browser's trust store.
Possible causes are self-signed certificates or network access issues.
Self-signed CA certificates:
oauth-authorization-server
URL and accept the certificate.Network access:
MTC 1.3.1 and earlier: The MTC console performs OAuth authentication on the client side.
The console requires uninterrupted network access to the API server and the OAuth server.
MTC 1.3.2 and later: OAuth authentication is performed on the backend.
The console requires uninterrupted network access to the Node.js server, which provides the JavaScript bundle and performs OAuth authentication, and the API server. See BZ#1878824.
If you acce[t] a self-signed certificate and a blank page appears, followed by a Connection has timed out
message, the likely cause is a web proxy blocking access to the OAuth server.
Configure the web proxy configuration to allow access to the oauth-authorization-server
URL. See BZ#1890675.
must-gather
You can use the must-gather
tool to collect information for troubleshooting or for opening a customer support case on the Red Hat Customer Portal. The openshift-migration-must-gather-rhel8
image collects migration-specific logs and Custom Resource data that are not collected by the default must-gather
image.
Run the must-gather
command on your cluster:
$ oc adm must-gather --image=openshift-migration-must-gather-rhel8:v1.3.0
The must-gather
tool generates a local directory that contains the collected data.
For direct volume migrations, we create a few dependencies:
If direct volume migration fails to complete, it could be one of the following buckets of errors:
Error in creating dependencies: If creating any of the above dependencies fails, it could lead to a failure. The error message will be present in DVM status and in the errors fields. Use the dvm failure list command to find this.
Stuck in waiting for dependencies to be healthy: It could be the case that the pods on either source or destination is not running or the route is not admitted. In both these cases, DVM is hang until the pod is running or route is admitted. When the pods are stuck in a non-healthy state, you will find a Warning in 10 mins of DVM being stuck here. Use the dvm command to see warning on the dvm status
Rsync exits with error: this happens when all the dependencies are met and healthy and the rysnc fail because of some reason. The dvm controller does not clean up the failed rsync client pods so the logs can be inspected.
A very likely cause of direct volume migration failing to complete is the Rsync transfer pods on the target cluster remain in a Pending
state.
MTC migrates namespaces with all annotations in order to preserve security context constraints and scheduling requirements. During direct volume migration, MTC creates Rsync transfer pods on the target cluster in the namespaces that were migrated from the source cluster. If the target cluster does not have the same node labels as the source cluster, the Rsync transfer pods cannot be scheduled.
You can check the migmigration
CR status:
$ oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration
The output displays the following status
message:
Some or all transfer pods are not running for more than 10 mins on destination cluster
To resolve this issue, perform the following steps:
openshift.io/node-selector
annotation of the migrated namespaces on the source cluster: $ oc get namespace -o yaml
openshift.io/node-selector
annotation to each migrated namespace on the target cluster: apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/node-selector: "region=east"
...
You can use must-gather
to create a metrics data directory dump from the last day:
$ oc adm must-gather --image quay.io/konveyor/must-gather:latest -- /usr/bin/gather_metrics_dump
You can view the data with a local Prometheus instance.
For information about the metrics recorded by the MTC controller, see the mig-operator
documentation.
This documentation includes useful queries for performance monitoring.
Ensure that stage pods are cleaned up. If a migration fails during stage or copy, the stage pods are retained to allow debugging. Before retrying a migration, you must delete the stage pods manually.
If your application was quiesced during migration, you should unquiesce
it by scaling it back to its initial replica count.
This can be done manually by editing the deployment primitive (Deployment
, DeploymentConfig
, etc.) and setting the spec.replicas
field back to its original, non-zero value:
$ oc edit deployment <deployment_name>
Alternatively, you can scale your deployment with the oc scale
command:
$ oc scale deployment <deployment_name> --replicas=<desired_replicas>
When a source application is quiesced during migration, MTC adds a label indicating the original replica count to the deployment
resource:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
migration.openshift.io/preQuiesceReplicas: "1"
The following procedure removes the MTC Operator and cluster-scoped resources:
Delete the Migration Controller and its resources:
$ oc delete migrationcontroller <resource_name>
Wait for the MTC Operator to finish deleting the resources.
Uninstall the MTC Operator:
$ oc delete ns openshift-migration
$ oc delete -f operator.yml
Delete the cluster-scoped resources:
$ oc delete $(oc get crds -o name | grep 'migration.openshift.io')
$ oc delete $(oc get crds -o name | grep 'velero')
$ oc delete $(oc get clusterroles -o name | grep 'migration.openshift.io')
$ oc delete clusterrole migration-operator
$ oc delete $(oc get clusterroles -o name | grep 'velero')
$ oc delete $(oc get clusterrolebindings -o name | grep 'migration.openshift.io')
$ oc delete clusterrolebindings migration-operator
$ oc delete $(oc get clusterrolebindings -o name | grep 'velero')