Thursday, March 16, 2006

VVR Cheat Sheet

From: http://www.blight.com/~rick/veritas/vvr.html

1) Creating a replicated volume on two hosts, hostA and hostB

Before configuring, you need to make sure two scripts have been run
from /etc/rc2.d: S94vxnm-host_infod and S94vxnm-vxnetd. VVR will not
work if these scripts don't get run AFTER VVR licenses have been
instralled. So if you install VVR licenses and don't reboot
immediately after, run these scripts to get VVR to work.

Before the Primary can be set up, the Secondary must be configured.

First, use vxassist to create your datavolumes. Make sure to specify
the logtype as DCM (Data Change Map, which keeps track of data changes
if the Storage Replicator log fills up) if your replicated volumes are
asynchronous.

vxassist -g diskgroupB make sampleB 4g layout=log logtype=dcm

Then create the SRL (Storage Replicator Log) for the volume. Carefully
decide how big you want this to be, based on available bandwidth
between your hosts and how fast your writes happen.

See pages 18-25 of the SRVM Configuration Notes for detailed
(excruciatingly) notes on selecting your SRL size.

vxassist -g diskgroupB make srlB 500m

Next make the rlink object:

vxmake -g diskgroupB rlink rlinkB remote_host=hostA
remote_dg=diskgroupA remote_rlink=rlinkA local_host=hostB
synchronous=[off|override|fail] srlprot=dcm

Use synchronous=off only if you can stand to lose some data.
Otherwise, set synchronize=override or synchronize=fail. override runs
as synchronous (writes aren't committed until they reach the
secondary) until the link dies, then it switches to asynchronous,
storing pending writes to the secondary in the SRL. When the link
comes back, it resyncs the secondary and switches back to Synchronous
mode. synchronize=fail fails new updates to the primary in the case of
a downed link.

In any of the above cases, you'll lose data if the link fails and,
before the secondary can catch up to the primary, there is a failure
of the primary data volume. This is why it's important to have both
redundant disks and redundant network paths.

Now make the RVG, where you put together the datavolume, the SRL, and the rlink:

vxmake -g diskgroupB rvg rvgB rlink=rlinkB datavol=sampleB srl=srlB
primary=false

Attach the rlink to the rvg:

vxrlink -g diskgroupB att rlinkB

Start the RVG on the Secondary:

vxrvg -g diskgroupB start rvgB

Now work begins on the primary. As with the Secondary, make data
volumes, an SRL, and an rlink:

vxassist -g diskgroupA make sampleA 4g layout=log logtype=dcm

vxassist -g diskgroupA make srlA 500m

vxmake -g diskgroupA rlink rlinkA remote_host=hostB
remote_dg=diskgroupB remote_rlink=rlinkB local_host=hostA
synchronous=[off|override|fail] srlprot=dcm

Make the RVG for the primary. Only the last option is different:

vxmake -g diskgroupA rvg rvgA rlink=rlinkA datavol=sampleA srl=srlA primary=true

Now go back to the secondary. When we created the secondary,
brain-dead Veritas figured the volume on the Seconday and the Primary
would have the same name, but when we set this up, we wanted to have
the Primary datavolume named sampleA and the Secondary datavolume be
sampleB. So we need to tell the Secondary that the Primary is sampleA:

vxedit -g diskgroupB set primary_datavol=sampleA sampleB

Now you can attach the rlink to the RVG and start the RVG. On the Primary:

vxrlink -g diskgroupA att rlinkA

You should see output like this:

vxvm:vxrlink: INFO: Secondary data volumes detected with rvg rvgB as parent:
vxvm:vxrlink: INFO: sampleB: len=8388608 primary_datavol=sampleA

Finally, start I/O on the Primary:

vxrvg -g diskgroupA start rvgA

2) Removing a VVR volume

First, detach the rlinks on the Primary and then the Secondary:

vxrlink -g diskgroupA det rlinkA
vxrlink -g diskgroupB det rlinkB

Then stop the RVG on the primary and then the secondary:

vxrvg -g diskgroupA stop rvgA
vxrvg -g diskgroupB stop rvgB

On the primary, stop the datavolumes:

vxvol -g disgroupA stop sampleA

If you want to keep the datavolumes, you need to disassociate them from the RVG:

vxvol -g diskgroupA dis sampleA
vxvol -g diskgroupB dis sampleB

Finally, on both the Primary and the Secondary, remove everything:

vxedit -rf rm rvgA
vxedit -rf rm rvgB

3) Growing/Shrinking a Volume or SRL
This is exactly the same as in regular Veritas. However, VVR doesn't
sync the volume changes. To grow a volume, you first need to grow the
secondary, then the primary. To shrink a volume, first the primary and
then the secondary. You always need to make sure the Secondary is
larger than or as large as the Primary, or you will get a
configuration error from VVR.

You may need to grow an SRL if your pipe shrinks (more likely if your
pipe gets busier) or the amount of data you are sending increases. See
pages 18-25 of the SRVM Configuration Notes for detailed
(excruciatingly) notes on selecting your SRL size.

To grow an SRL, you must first stop the RVG and disassociate the SRL
from the RVG:

vxrvg stop rvgA
vxrlink det rlinkA
vxvol dis srlA

From this point, you can grow your SRL (which is now just an ordinary volume):

vxassist growto srlA 2gb

Once your SRL has been successfully grown, reassociate it with the
RVG, reattach the RLINK, and start the RVG:

vxvol aslog rvgA srlA
vxrlink -f att rlinkA
vxvg start rvgA

4) Getting info out of VVR once it's set up

You can get get useful stats out of the vxrlink command:

vxrlink [-i interval] stats rlinkA

Output should look similar to netstat. If you run without an interval,
you can see cumulative statistics. With an interval, you get the stats
that hit during the interval.

# vxrlink -i 5 rlinkA

Messages Errors Flow Control
-------- ------ ------------
# Blocks RT(usec) Timeout Stream Memory Delays NW Bytes NW Delay
28 0 512 0 0 0 0 5000 1
0 0 512 0 0 0 0 5000 1
0 0 512 0 0 0 0 5000 1
0 0 512 0 0 0 0 5000 1
0 0 512 0 0 0 0 5000 1
0 0 512 0 0 0 0 5000 1
0 0 512 0 0 0 0 5000 1
10 0 2110 0 0 0 0 5000 1
256 0 22766 0 0 0 0 5000 1
468 0 15417 18 0 0 0 5000 4
18 0 7818 0 0 0 0 5000 4
0 0 7818 0 0 0 0 5000 4
0 0 7818 0 0 0 0 5000 4

* # is the number of messages transmitted
* Blocks is the number of 512-byte blocks transmitted
* RT(usec) is the average round-trip per message (the size of the
messages affects the RT) in microseconds
* Timeout is the number of timeouts or lost packets; this can be
affected by the time-out value of the RLINK.
* Stream is the number of stream errors that occur when the RLINK
attemps to send messages faster than the network can handle.
* Memory is the number of memory errors that occur when the
secondary has insufficient buffer space to handle incoming messages.
You can tune this by changing voliomem_max_nmcompool_sz on the
secondary
* Delays, NW Bytes, and NW Delay are internal flow control
parameters that indicate how fast the RLINK is attempting to send.

The sample output is from a Secondary system, so there are no Blocks
Transmitted.

You can use vxrlink to check the status of an rlink and SRL:

vxrlink -g diskgroupA status rlinkA

Output should look like this:

Rlink rlinkA has 8 outstanding writes, occupying 520 Kbytes (1%) on the SRL

If there are no outstanding writes and the SRL has been fully played,
you will see this message:

Rlink rlinkA is up to date

You can only use this command on the primary. Trying it on the
secondary will just result in Veritas yelling at you.

vxprint can also give you some useful info, and it comes with some shortcuts.

vxprint -Pl
vxprint -Vl

The first command lists all the RLINKs on the system. The second lists
all the RVGs.

You can use this info to check VVR settings and some basic status.
Check the flags for the RLINK to make sure that both systems are
connected and consistent, the IPs and ports are set right, etc.

In the output for an RVG, you can see listed in the flags line if an
RVG is set as the Primary of Secondary node.
5) Changing the Synchronous/Asynchronous setting

to set the synchronous variable for an RLINK, do the following:

vxedit set synchronous=[off|override|fail] rlinkA

6) Failing Over from a Primary

There are two situations where you would have to fail from a primary.
The first is in preparation for an outage of the Primary, in which
case you can happily turn off your app, switch the Primary to a
Secondary, switch the Secondary to a Primary, and start this up again.

The second case is when your Primary goes down in flames and you need
to get your Secondary up as a Primary.

If your primary is still functioning:

First, you'll need to turn off your applications, umount any
filesystems on from your datavolumes, and stop the rvg:

/etc/rc3.d/S99start-app stop
umount /filesysA
vxrvg stop rvgA

If you can't umount your filesystems because of running apps, DON'T go
any further! You'll make your life harder in the future, and you might
lose data.

Once you've stopped the RVG, you need to detach the rlink,
disassociate the SRL volume (you can't edit the PRIMARY RVG attribute
while an SRL is associated), change PRIMARY to false, and bring
everything back up:

vxrlink det rlinkA
vxvol dis srlA
vxedit set primary=false rvgA
vxvol aslog rvgA srlA
vxrvg start rvgA
vxrlink -f att rlinkA

Now go to work on the Old Secondary to bring it up as the new Primary.

First you need to stop the RVG, detach the rlink, disassociate the
SRL, and turn the PRIMARY attribute on:

vxrvg stop rvgB
vxrlink det rlinkB
vxvol dis srlB
vxedit set primary=true rvgB

Veritas recommends that you use vxedit to reinitialize some values on
the RLINK to make sure you're still cool:

vxedit set remote_host=hostA local_host=hostB remote_dg=diskgroupA
remote_rlink=rlinkA rlinkB

Before you can attach the rlink, you need to change the
PRIMARY_DATAVOL attribute on both hosts to point the the Veritas
volume name of the NEW Primary:

On the new primary (e.g. hostB):
vxedit set primary_datavol=sampleB sampleB
On the new secondary (e.g. hostA):
vxedit set primary_datavol=sampleB sampleA

Now that you have that, go back to the new Primary, attach the RLINK,
and start the RVG:

vxrlink -f att rlinkB
vxrvg start rvgB

The vxrlink command should show normal output as described in section 1 above.

If the Primary is down:

First you'll need to bring up the secondary as a primary. If your
secondary datavolume is inconsistent (this is only likely if an SRL
overflow occurred and the secondary was not resynchronized before the
Primary went down) you will need to disassociate the volumes from the
RVG, fsck them if they contain filesystems, and reassociate them with
VVR. If your volumes are consistent, the task is much easier:

On the secondary, first stop the RVG, detach the RLINK, and
disassociate the SRL:

vxrvg stop rvgB
vxrlink det rlinkB
vxvol dis srlB

Make the Secondary the new Primary:

vxedit -g diskgroupB set primary=true rvgB

Now reassociate the SRL and change the primary_datavol:

vxvol aslog rvgB srlB
vxedit set primary_datavol=sampleB sampleB

If the old Primary is still down, all you need to do is start the RVG
to be able to use the datavolumes:

vxrvg start rvgB

This will allow you to keep the volumes in VVR so that once you manage
to resurrect the former Primary, you can make the necessary VVR
commands to set it up as a secondary so it can resynchronize from the
backup system. Once it has resynchronized, you can use the process
listed at the beginning of section 6 (above) to fail from the Old
Secondary/New Primary back to the original configuration.

Here's now to resynchronize the old Primary once you bring it back up:

The RVG and RLINK should be stopped and detached. If not, stop and detach:

vxrvg stop rvgA
vxrlink det rlinkA

Disassociate the SRL and make the system a secondary:

vxvol dis srlA
vxedit set primary=false rvgA

Reassociate the SRL, change the primary_datavol attribute:

vxvol aslog rvgA srlA
vxedit set primary_datavol=sampleB sampleA

Attach the RLINK and then start the RVG:

vxrlink -f att rlinkA
vxrvg start rvgA

This won't do much, as the RLINK on hostB (the Primary) should still
be detached, preventing the Secondary from connecting. Now go back to
the Primary to turn the RLINK on:

vxedit set remote_host=hostA local_host=hostB remote_dg=diskgroupA
remote_rlink=rlinkA
vxrlink -a att rlinkB

Giving the -a flag to vxrlink tells it to run in autosync mode. This
will automatically resync the secondary datavolumes from the Primary.
If the Primary is being updated faster than the Secondary can be
synced, the Secondary will never become synced, so this method is only
appropriate for certain implementations.

Once synchronization is complete, follow the instructions above (the
beginning of section 6) to transfer the Primary role back to the
original Primary system.

3 comments:

Anonymous said...

Hi,

thanks for publishing this document , its very usefull who are working on vvr. I would also rquest you to add some troubleshooting tips and tricks which might helpfull in adminstring vvr in v.cluster ,v.mgr envrnmnts.

Wrex's World said...

From Wrex:

Thanks. I am glad someone finds it usefull. I was wondering if anyone ever read this thing? Heh heh.

Yea, I reference it a bit, myself.

I'll try and make some time to add more to it. I've been working alot with VVR and VCS. Here's a hint: Grow volumes through the VVR Web console (if you have http availability). It's Sooo much easier and more intuative. I usually don't recommend GUIs, but sometimes, it just makes sense to use them. This, IMO, is one of them.

L8r,
Wrex

Anonymous said...

Hi,

Thanks for instant respond, am working in vcs and vmgr above vvr on it. I would appreciate if could mail or publish the docs.

Thanks & Regrds

Chennuri