Salesforce

DBDOWN every 10 hours with HOSTNAME violation on Google Cloud

« Go Back

Information

 
TitleDBDOWN every 10 hours with HOSTNAME violation on Google Cloud
URL NameDBDOWN-every-10-hours-with-HOSTNAME-violation
Article Number000154920
EnvironmentProduct: OpenEdge
Version: 11.x
OS: Centos 7.1
Other: Google Cloud Platform
Question/Problem Description
Google Cloud VM DBDOWN every 10 hours with .lk file HOSTNAME violation errors 4192 4196

The hostName in error 4192 is the alias of the internal ip number in /etc/hosts which is added by Google:
hostName.c.[PROJECT_ID].internal
 
 
Steps to Reproduce
Clarifying Information
Progress Openedge environment running without issues on the Google infrastructure previously for a very long time
 
Error Message(4192) <dbname>.lk: HOSTNAME is <hostname> , expected <hostName.c.[PROJECT_ID].internal>
(4196) <dbname>.lk is not a valid .lk file for this server.
Defect Number
Enhancement Number
Cause
The hostname is changed from <hostname> to <hostName.c.[PROJECT_ID].internal> by the DHCP services. 

Shutting the database down when the hostname is changed is expected as designed.
  • Effectively this is to prevent a database from being started from a different machine that has the same filesystem mounted
  • It is also to prevent the database from being started/accessed when it has already been started or is being accessed single-user.
The database does not bind to a variable (hostname). The hostname will be whatever the hostname is when the database is started and this is then the information that is stored in the database .lk file together with the the PID and mode.

The Watchdog process periodically checks for the existence and validity of the .lk file. If it has been removed or no longer valid, the database must be shut down because when this happens there is no protection against starting it up while it is already running, which is absolutely certain to lead to database corruption. For further information refer to Article:
Resolution
The following outlines high-level troubleshooting to resolve the DHCP Service from re-setting the hostname:

1.  Modify /usr/share/google/set-hostname and set the hostname in the metadata

This script is only called during boot time and ensures the hostname is the one needed.
It will not however prevent the system for changing the hostname after x hours. 

Calling this script every minute and change back the hostname to the short version once is detects it has been changed, helps but it's not 100% proof perfect as the WDOG process can identify the hostname change within the minute it is changed back and shutdown the database. 

Setting the hostname to the long version before starting the database causes semaphore errors and the database not to start. This is due to various defects with 'long hostnames, long uname -srvn)
SYSTEM ERROR: Unable to semAdd semaphore set : semid = -(, errno = 43. (10839)


2.   Create a DHCP exit hook to reset the hostname. 

DHCP hooks didn't work for this Centos 7.1 version.
This was the advice provided by Google Services. 

3.  Trigger a hostname change with a NetworkManager dispatcher script. 

When a DHCP request comes in to change the hostname, the script kicks in quickly change the hostname back.
 
The following script example could be improved as it currently triggers on all network events and should only be used for DHCP requests:
 
## script /etc/NetworkManager/dispatcher.d/12-force-hostname

#!/bin/bash
#
# DHCP Hook to reset the hostname to the shortname to prevent Progress database errors
#

new_host_name=$(curl --fail --silent http://metadata/computeMetadata/v1/instance/attributes/hostname  -H "Metadata-Flavor: Google")

echo "$(date) NetworkManager: set hostname to $new_host_name" >> /var/log/hostname.log
hostname 
${new_host_name%%.*}

Reference: Storing and Retrieving Instance Metadata 
https://cloud.google.com/compute/docs/storing-retrieving-metadata#default
Workaround
Notes
Keyword Phrase
Last Modified Date11/15/2021 11:51 AM

Powered by