HA-ldom agent

there were two docs: requirement and design on the opensolaris site interesting read on ha-ldom

there was a code review on ha-ldom, interesting read also

the official doc for ha-ldom for sc3.3u1

The HA for Logical Domains data service provides a mechanism for orderly startup and shutdown, fault monitoring, and automatic failover of the Logical Domains guest domain service. The Logical Domains component is protected by the HA for Logical Domains data service.

Configuration Restrictions

  • Logical Domains can be configured only as a failover data service and not as a scalable data service.

  • Do not configure the HA for Logical Domains data service with Logical Domains version 1.2 or earlier if you want to use the Logical Domains warm migration feature.

  • Logical Domains disk image file should be placed only in the global file system. The file is exported as a raw disk by the virtual disk server

  • Logical Domains guest domains installed onto a global file system with two guest domain instances (ldg0 and ldg1).

    • ls –l /global/ldoms/disks

      • –rw——- 1 root root 8589934592 Aug 23 03:31 ldg0.vdisk
      • –rw——- 1 root root 8589934592 Aug 23 03:31 ldg1.vdisk
  • ldm set-domain failure-policy=reset primary
  • setup ha-ldom example
    • clrt register SUNW.HAStoragePlus
    • clrg create ldom-fo-rg
    • clrs create ldom-fo-rg –t SUNW.HAStoragePlus –p FilesystemMountPoints=/global/ldoms ldom-hasp-rs
    • clrg online –M node1 ldom-fo-rg
    • clrt register SUNW.ldom
    • clrs create –g ldom-fo-rg –t SUNW.ldom  \
      • –p Domain_name=ldm1 \
      • -p password_file=/global/ldoms/passwd \
      • -p Resource_dependencies=ldom-hasp-rs  ldom-rs
      • (for warm Migration) clrs set –p Migration_type=MIGRATE  ldom-rs
      • passwd file is required for warm migration option
    • clrs status
    • clrs enable ldom-rs
    • clrs status
    • varify  ha-ldom
      • clrg switch –n n2 ldom-fo-rg (switch ha-ldom to n2)
      • ldm start ldm1
      • ldm list-domain ldm1
      • ldm stop ldm1
      • ldm list-domain ldm1
      • repeat on n2
    • extension property (Plugin_probe)
    • clrs set  -p Plugin_probe=”/opt/SUNWscxvm/bin/ppkssh  -P \\  fmuser:/export/fmuser/ .ssh/id_dsa:ldm1:multi-user-server: online” ldom-rs

Observations:

  • ldom as blackbox in ha-ldom
  • whole ldom guest domain  either halt/reset (cold migration)  or warm migration from node 1 to node2
  • there is no Logical Hostname here, all apps are controlled by  ldom
  • per requirement doc
    • Cold Migration
      Cold migration is the ability to migrate a shutdown or “bound” guest domain from one platform to another with an intermediate reboot. This involves the orderly stop of a guest domain on the source node and then the orderly start of the guest domain on the target node.
    • Warm Migration
      Warm migration moves a running guest domain from source node to the target node, using a suspend, copy the guest domain’s memory pages and resume sequence.
    • Live Migration
      Live migration moves a running guest domain from source node to the target node, with no perceptible suspend or resume notification.
    • Storage Availability in Guest domains
      The Guest domain storage for cold and warm migration modes can be of any supported sun cluster storage/filesystem and is accessible from all nodes simultaneously.(i.e. PxFS, NFS, iSCSI or a SAN cluster file system).

Design doc’s diagram

image

 

image

Resource Type SUNW.ldom

#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the License).
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/CDDL.txt
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/CDDL.txt.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets [] replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2009, 2010, Oracle and/or its affiliates. All rights reserved.
#
#
#ident    “@(#)SUNW.ldom    1.3    10/03/31 SMI”
#

RESOURCE_TYPE = “ldom”;
VENDOR_ID = SUNW;
RT_DESCRIPTION = “Oracle Solaris Cluster HA for xVM Server SPARC Guest Domains”;

RT_version =”3.3″;
API_version = 10;    

RT_basedir=/opt/SUNWscgds/bin;

Init                =       ../../SUNWscxvm/bin/init_xvm;
Boot                =       ../../SUNWscxvm/bin/boot_xvm;

Start                =    gds_svc_start;
Stop                =    gds_svc_stop;

Validate            =    ../../SUNWscxvm/bin/validate_xvm;
Update                 =    gds_update;

Monitor_start            =    gds_monitor_start;
Monitor_stop            =    gds_monitor_stop;
Monitor_check            =    gds_monitor_check;

Init_nodes = RG_PRIMARIES;
Failover = FALSE;

#
# Upgrade directives
#
#$upgrade
#$upgrade_from “1” anytime

# The paramtable is a list of bracketed resource property declarations
# that come after the resource-type declarations
# The property-name declaration must be the first attribute
# after the open curly of a paramtable entry
#
# The following are the system defined properties. Each of the system defined
# properties have a default value set for each of the attributes. Look at
# man rt_reg(4) for a detailed explanation.
#

    PROPERTY = Start_timeout;
    MIN = 60;
    DEFAULT = 300;
}
{
    PROPERTY = Stop_timeout;
    MIN = 60;
    DEFAULT = 300;
}
{
    PROPERTY = Validate_timeout;
    MIN = 60;
    DEFAULT = 300;
}
{
        PROPERTY = Update_timeout;
    MIN = 60;
        DEFAULT = 300;
}
{
    PROPERTY = Monitor_Start_timeout;
    MIN = 60;
    DEFAULT = 300;
}
{
    PROPERTY = Monitor_Stop_timeout;
    MIN = 60;
    DEFAULT = 300;
}
{
    PROPERTY = Monitor_Check_timeout;
    MIN = 60;
    DEFAULT = 300;
}
{
        PROPERTY = FailOver_Mode;
        DEFAULT = SOFT;
        TUNABLE = ANYTIME;
}
{
        PROPERTY = Network_resources_used;
        TUNABLE = ANYTIME;
    DEFAULT = “”;
}
{
    PROPERTY = Thorough_Probe_Interval;
    MAX = 3600;
    DEFAULT = 60;
    TUNABLE = ANYTIME;
}
{
    PROPERTY = Retry_Count;
    MAX = 10;
    DEFAULT = 2;
    TUNABLE = ANYTIME;
}
{
    PROPERTY = Retry_Interval;
    MAX = 3600;
    DEFAULT = 370;
    TUNABLE = ANYTIME;
}

{
    PROPERTY = Port_list;
    DEFAULT = “”;
    TUNABLE = ANYTIME;
}

{
        PROPERTY = Scalable;
        DEFAULT = FALSE;
        TUNABLE = AT_CREATION;
}

{
        PROPERTY = Load_balancing_policy;
        DEFAULT = LB_WEIGHTED;
        TUNABLE = AT_CREATION;
}

{
        PROPERTY = Load_balancing_weights;
        DEFAULT = “”;
        TUNABLE = ANYTIME;
}

#
# Extension Properties
#

# These two control the restarting of the fault monitor itself
# (not the server daemon) by PMF.
{
    PROPERTY = Monitor_retry_count;
    EXTENSION;
    INT;
    DEFAULT = 4;
    TUNABLE = ANYTIME;
    DESCRIPTION = “Number of PMF restarts allowed for the fault monitor”;
}

{
    PROPERTY = Monitor_retry_interval;
    EXTENSION;
    INT;
    DEFAULT = 2;
    TUNABLE = ANYTIME;
    DESCRIPTION = “Time window (minutes) for fault monitor restarts”;
}

# Time out value for the probe
{
    PROPERTY = Probe_timeout;
    EXTENSION;
    INT;
    MIN = 2;
    DEFAULT = 30;
    TUNABLE = ANYTIME;
    DESCRIPTION = “Time out value for the probe (seconds)”;
}

# Child process monitoring level for PMF (-C option of pmfadm)
# Default of -1 means: Do NOT use the -C option to PMFADM
# A value of 0-> indicates the level of child process monitoring
# by PMF that is desired.
{
    PROPERTY = Child_mon_level;
    EXTENSION;
    INT;
    DEFAULT = -1;
    TUNABLE = AT_CREATION;
    DESCRIPTION = “Child monitoring level for PMF”;
}

# This is an optional property.  Any value provided will be used as
# the absolute path to a command to invoke to validate the application.
# If no value is provided, The validation will be skipped.
#
{
        PROPERTY = Validate_command;
        EXTENSION;
        STRING;
        DEFAULT = “”;
        TUNABLE = NONE;
        DESCRIPTION = “Command to validate the  application”;
}

# This property must be specified, since this is the only mechanism
# that indicates how to start the application.  Since a value must
# be provided, there is no default.  The value must be an absolute path.
{
        PROPERTY = Start_command;
        EXTENSION;
        STRINGARRAY;
        DEFAULT = “/opt/SUNWscxvm/bin/control_xvm start -R %RS_NAME -T %RT_NAME -G %RG_NAME”;
        TUNABLE = NONE;
        DESCRIPTION = “Command to start application”;
}

# This is an optional property.  Any value provided will be used as
# the absolute path to a command to invoke to stop the application.
# If no value is provided, signals will be used to stop the application.
#
# It is assumed that Stop_command will not return until the
# application has been stopped.
{
        PROPERTY = Stop_command;
        EXTENSION;
        STRING;
        DEFAULT = “/opt/SUNWscxvm/bin/control_xvm stop -R %RS_NAME -T %RT_NAME -G %RG_NAME”;
        TUNABLE = NONE;
        DESCRIPTION = “Command to stop application”;
}

# This is an optional property.  Any value provided will be used as
# the absolute path to a command to invoke to probe the application.
# If no value is provided, the “simple_probe” will be used to probe
# the application.
#
{
    PROPERTY = Probe_command;
    EXTENSION;
    STRING;
    DEFAULT = “/opt/SUNWscxvm/bin/control_xvm probe -R %RS_NAME -G %RG_NAME -T %RT_NAME”;
    TUNABLE = NONE;
    DESCRIPTION = “Command to probe application”;
}

# This is an optional property.  It determines whether the application
# uses network to communicate with its clients.
#
{
    PROPERTY = Network_aware;
    EXTENSION;
    BOOLEAN;
    DEFAULT = FALSE;
    TUNABLE = AT_CREATION;
    DESCRIPTION = “Determines whether the application uses network”;
}

# This is an optional property, which determines the signal sent to the
# application for being stopped.
#
{
    PROPERTY = Stop_signal;
    EXTENSION;
    INT;
    MIN = 1;
    MAX = 37;
    DEFAULT = 15;
    TUNABLE = WHEN_DISABLED;
    DESCRIPTION = “The signal sent to the application for being stopped”;
}

# This is an optional property, which determines whether to failover when
# retry_count is exceeded during retry_interval.
#
{
    PROPERTY = Failover_enabled;
    EXTENSION;
    BOOLEAN;
    DEFAULT = TRUE;
    TUNABLE = WHEN_DISABLED;
    DESCRIPTION = “Determines whether to failover when retry_count is exceeded during retry_interval”;
}

# This is an optional property that specifies the log level GDS events.
#
{
    PROPERTY = Log_level;
    EXTENSION;
    ENUM { NONE, INFO, ERR };
    DEFAULT = “INFO”;
    TUNABLE = ANYTIME;
    DESCRIPTION = “Determines the log level for event based traces”;
}

{
    Property = Debug_level;
    Extension;
    Per_node;
    Int;
    Min = 0;
    Max = 2;
    Default = 0;
    Tunable = ANYTIME;
    Description = “Debug level”;
}

{
     Property = Domain_name;
    Extension;
    String;
    Minlength = 1;
    Tunable = WHEN_DISABLED;
    Description = “LDoms Guest Domain name”;
}

{
     Property = Migration_type;
    Extension;
    Enum { NORMAL, MIGRATE };
    Default = “MIGRATE”;
    Tunable = ANYTIME;
    Description = “Type of guest domain migration to be performed”;
}

{
    PROPERTY = Plugin_probe;
    EXTENSION;
    STRING;
    DEFAULT = “”;
    TUNABLE = ANYTIME;
    DESCRIPTION = “Script or command to check the guest domain”;
}

{
    PROPERTY = Password_file;
    EXTENSION;
    STRING;
    DEFAULT = “”;
    TUNABLE = WHEN_DISABLED;
    DESCRIPTION = “The complete path to the file containing the target host password”;
}

About laotsao 老曹

HopBit GridComputing LLC Rockscluster Gridengine Solaris Zone, Solaris Cluster, OVM SPARC/Ldom Exadata, SPARC SuperCluster
This entry was posted in LDOM, Solaris, SPARC, Sun Cluster. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s