Question

1.

Upgrade of the admin server has failed.

Answer 1

Check for empty, broken, and not signed repositories in the Administration Server upgrade log file /var/log/crowbar/admin-server-upgrade.log. Fix the repository setup. Upgrade then remaining packages manually to SUSE Linux Enterprise Server 12 SP4 and SUSE OpenStack Cloud 9 using the command zypper dup. Reboot the Administration Server.

Answer 2

Timeouts for most upgrade operations can be adjusted in the /etc/crowbar/upgrade_timeouts.yml file. If the file doesn't exist, use the following template, and modify it to your needs:

              :prepare_repositories: 120
              :pre_upgrade: 300
              :upgrade_os: 1500
              :post_upgrade: 600
              :shutdown_services: 600
              :shutdown_remaining_services: 600
              :evacuate_host: 300
              :chef_upgraded: 1200
              :router_migration: 600
              :lbaas_evacuation: 600
              :set_network_agents_state: 300
              :delete_pacemaker_resources: 600
              :delete_cinder_services: 300
              :delete_nova_services: 300
              :wait_until_compute_started: 60
              :reload_nova_services: 120
              :online_migrations: 1800

The following entries may require higher values (all values are specified in seconds):

upgrade_os Time allowed for upgrading all packages of one node.
chef_upgraded Time allowed for initial crowbar_join and chef-client run on a node that has been upgraded and rebooted.
evacuate_host Time allowed for live migrate all VMs from a host.

Answer 3

The problem may occur when it is not possible to live migrate certain VMs anywhere. It may be necessary to shut down or suspend other VMs to make room for migration. Note that the Bash shell script that starts the live migration for the Compute Node is executed from the Control Node. An error message generated by the crowbarctl upgrade status command contains the exact names of both nodes. Check the /var/log/crowbar/node-upgrade.log file on the Control Node for the information that can help you with troubleshooting. You might also need to check OpenStack logs in /var/log/nova on the Compute Node as well as on the Control Nodes.

It is possible that live-migration of a certain VM takes too long. This can happen if instances are very large or network connection between compute hosts is slow or overloaded. If this case, try to raise the global timeout in /etc/crowbar/upgrade_timeouts.yml.

We recommend to perform the live migration manually first. After it is completed successfully, call the crowbarctl upgrade command again.

The following commands can be helpful for analyzing issues with live migrations:

              nova server-migration-list
              nova server-migration-show
              nova instance-action-list
              nova instance-action

Note that these commands require OpenStack administrator privileges.

The following log files may contain useful information:

/var/log/nova/nova-compute on the Compute Nodes that the migration is performed from and to.
/var/log/nova/*.log (especially log files for the conductor, scheduler and placement services) on the Control Nodes.

It can happen that active instances and instances with heavy loads cannot be live migrated in a reasonable time. In that case, you can abort a running live-migration operation using the nova live-migration-abort MIGRATION-ID command. You can then perform the upgrade of the specific node at a later time.

Alternatively, it is possible to force the completion of the live migration by using the nova live-migration-force-complete MIGRATION-ID command. However, this might pause the instances for a prolonged period of time and have a negative impact on the workload running inside the instance.

Answer 4

Possible reasons include an incorrect repository setup or package conflicts. Check the /var/log/crowbar/node-upgrade.log log file on the affected node. Check the repositories on node using the zypper lr command. Make sure the required repositories are available. To test the setup, install a package manually or run the zypper dup command (this command is executed by the upgrade script). Fix the repository setup and run the failed upgrade step again. If custom package versions or version locks are in place, make sure that they don't interfere with the zypper dup command.

Answer 5

In some cases, a node can take too long to reboot causing a timeout. We recommend to check the node manually, make sure it is online, and repeat the step.

Answer 6

If the live migration cannot be performed for certain nodes due to a timeout, Crowbar upgrades only the nodes that it was able to live-evacuate in the specified time. Because some nodes have been upgraded, it is possible that more resources will be available for live-migration when you try to run this step again. See also Q: 3.

Answer 7

An unsupported entry in the configuration file may prevent a service from starting. This causes the node to fail at the initial chef client run stage. Checking the /var/log/crowbar/crowbar_join/chef.* log files on the node is a good starting point.

Answer 8

Crowbar Web interface is accessible only when an upgrade is completed or when it is postponed. Postponing the upgrade can be done only after upgrading all Control Nodes using the crowbarctl upgrade nodes postpone command. You can then access Crowbar and save your modifications. Before you can continue with the upgrade of rest of the nodes, resume the upgrade using the crowbarctl upgrade nodes resume command.

Answer 9

Check the /var/log/crowbar/node-upgrade.log file on the node that performs the router evacuation (it should be mentioned in the error message). The ID of the router that failed to migrate (or the affected network port) is logged to /var/log/crowbar/node-upgrade.log. Use the OpenStack CLI tools to check the state of the affected router and its ports. Fix manually, if necessary. This can be done by bringing the router or port up and down again. The following commands can be useful for solving the issue:

              openstack router show ID
              openstack port list --router ROUTER-ID
              openstack port show PORT-ID
              openstack port set

Resume the upgrade by running the failed upgrade step again to continue with the router migration.

Answer 10

In the current upgrade implementation, OpenStack nodes are divided into Compute Nodes and other nodes. The crowbarctl upgrade nodes controllers command starts the upgrade of all the nodes that do not host compute services. This includes the controllers.

1.	Upgrade of the admin server has failed.
	Check for empty, broken, and not signed repositories in the Administration Server upgrade log file `/var/log/crowbar/admin-server-upgrade.log`. Fix the repository setup. Upgrade then remaining packages manually to SUSE Linux Enterprise Server 12 SP4 and SUSE OpenStack Cloud 9 using the command zypper dup. Reboot the Administration Server.
2.	An upgrade step repeatedly fails due to timeout.
	Timeouts for most upgrade operations can be adjusted in the `/etc/crowbar/upgrade_timeouts.yml` file. If the file doesn't exist, use the following template, and modify it to your needs: :prepare_repositories: 120 :pre_upgrade: 300 :upgrade_os: 1500 :post_upgrade: 600 :shutdown_services: 600 :shutdown_remaining_services: 600 :evacuate_host: 300 :chef_upgraded: 1200 :router_migration: 600 :lbaas_evacuation: 600 :set_network_agents_state: 300 :delete_pacemaker_resources: 600 :delete_cinder_services: 300 :delete_nova_services: 300 :wait_until_compute_started: 60 :reload_nova_services: 120 :online_migrations: 1800 The following entries may require higher values (all values are specified in seconds): `upgrade_os` Time allowed for upgrading all packages of one node. `chef_upgraded` Time allowed for initial `crowbar_join` and `chef-client` run on a node that has been upgraded and rebooted. `evacuate_host` Time allowed for live migrate all VMs from a host.
3.	Node upgrade has failed during live migration.
	The problem may occur when it is not possible to live migrate certain VMs anywhere. It may be necessary to shut down or suspend other VMs to make room for migration. Note that the Bash shell script that starts the live migration for the Compute Node is executed from the Control Node. An error message generated by the crowbarctl upgrade status command contains the exact names of both nodes. Check the `/var/log/crowbar/node-upgrade.log` file on the Control Node for the information that can help you with troubleshooting. You might also need to check OpenStack logs in `/var/log/nova` on the Compute Node as well as on the Control Nodes. It is possible that live-migration of a certain VM takes too long. This can happen if instances are very large or network connection between compute hosts is slow or overloaded. If this case, try to raise the global timeout in `/etc/crowbar/upgrade_timeouts.yml`. We recommend to perform the live migration manually first. After it is completed successfully, call the crowbarctl upgrade command again. The following commands can be helpful for analyzing issues with live migrations: nova server-migration-list nova server-migration-show nova instance-action-list nova instance-action Note that these commands require OpenStack administrator privileges. The following log files may contain useful information: `/var/log/nova/nova-compute` on the Compute Nodes that the migration is performed from and to. `/var/log/nova/.log` (especially log files for the conductor, scheduler and placement services) on the Control Nodes. It can happen that active instances and instances with heavy loads cannot be live migrated in a reasonable time. In that case, you can abort a running live-migration operation using the nova live-migration-abort `MIGRATION-ID`* command. You can then perform the upgrade of the specific node at a later time. Alternatively, it is possible to force the completion of the live migration by using the nova live-migration-force-complete `MIGRATION-ID` command. However, this might pause the instances for a prolonged period of time and have a negative impact on the workload running inside the instance.
4.	Node has failed during OS upgrade.
	Possible reasons include an incorrect repository setup or package conflicts. Check the `/var/log/crowbar/node-upgrade.log` log file on the affected node. Check the repositories on node using the zypper lr command. Make sure the required repositories are available. To test the setup, install a package manually or run the zypper dup command (this command is executed by the upgrade script). Fix the repository setup and run the failed upgrade step again. If custom package versions or version locks are in place, make sure that they don't interfere with the zypper dup command.
5.	Node does not come up after reboot.
	In some cases, a node can take too long to reboot causing a timeout. We recommend to check the node manually, make sure it is online, and repeat the step.
6.	N number of nodes were provided to compute upgrade using crowbarctl upgrade nodes node_1,node_2,...,node_N, but less then N were actually upgraded.
	If the live migration cannot be performed for certain nodes due to a timeout, Crowbar upgrades only the nodes that it was able to live-evacuate in the specified time. Because some nodes have been upgraded, it is possible that more resources will be available for live-migration when you try to run this step again. See also Q: 3.
7.	Node has failed at the initial chef client run stage.
	An unsupported entry in the configuration file may prevent a service from starting. This causes the node to fail at the initial chef client run stage. Checking the `/var/log/crowbar/crowbar_join/chef.*` log files on the node is a good starting point.
8.	I need to change OpenStack configuration during the upgrade but I cannot access Crowbar.
	Crowbar Web interface is accessible only when an upgrade is completed or when it is postponed. Postponing the upgrade can be done only after upgrading all Control Nodes using the crowbarctl upgrade nodes postpone command. You can then access Crowbar and save your modifications. Before you can continue with the upgrade of rest of the nodes, resume the upgrade using the crowbarctl upgrade nodes resume command.
9.	Failure occurred when evacuating routers.
	Check the `/var/log/crowbar/node-upgrade.log` file on the node that performs the router evacuation (it should be mentioned in the error message). The ID of the router that failed to migrate (or the affected network port) is logged to `/var/log/crowbar/node-upgrade.log`. Use the OpenStack CLI tools to check the state of the affected router and its ports. Fix manually, if necessary. This can be done by bringing the router or port up and down again. The following commands can be useful for solving the issue: openstack router show `ID` openstack port list --router `ROUTER-ID` openstack port show `PORT-ID` openstack port set Resume the upgrade by running the failed upgrade step again to continue with the router migration.
10.	Some non-controller nodes were upgraded after performing crowbarctl upgrade nodes controllers.
	In the current upgrade implementation, OpenStack nodes are divided into Compute Nodes and other nodes. The crowbarctl upgrade nodes controllers command starts the upgrade of all the nodes that do not host compute services. This includes the controllers.

Prev	Up	Next
Service Order on SUSE OpenStack Cloud Start-up or Shutdown	Home	Recovering from Compute Node Failure

Upgrading from SUSE OpenStack Cloud Crowbar 8 to SUSE OpenStack Cloud Crowbar 9

STONITH and Administration Server

Requirements

Important

Unsupported configurations

Existing ceilometer Data

Upgrading Using the Web Interface

Canceling the Upgrade

Dealing with Errors

Upgrading from the Command Line

Forcing Normal Mode Upgrade

Administration Server with external SMT

Correct Metadata in the PTF Repository

Shut Down Running instances in Normal Mode

Product Media Repository Copies

Dealing with Errors

Simultaneous Upgrade of Multiple Nodes

Note

Troubleshooting Upgrade Issues