ZFS recovery

ZFS is designed for single host file system, in a shared storage environment one can export zpool then import zpool to a different host, but by using -f (force) option one can easily import zpool into two hosts

If one has io from two hosts then the zfs metadata or data corruption can and will occur.

Since ZFS is a COW FS so there exist at some timestamps that the metadata or/and data was not corrupted

In Sun cluster, Solaris cluster one can only use ZFS in a fail-over environment, now U have the situations of three horse that can do import and export the zpool

  1. End user/SA
  2. Solaris
  3. Solaris cluster

Whenever SA or Solaris import/export Zpool the information is store in /etc/zfs/zfs.cache

so the next reboot zfs will consult the zfs.cache to speed up the import of zpool

The Solaris Cluster also create its own zfs.cache but at different location , under ccr.

These competing force could create situation that both node of cluster can import the zpool and corruption can and will occur. But if one follow the rule, if U want to use ZFS with oracle Solaris cluster then SA should never do manual import or export of zpool outside of OSC.

If possible use mirrored zpool and snap often so zfs can self recovery and always has update backup.

How to recover the data?

If one goole the zpool recovery one will find the name like

max bruning, victor latushkin U can read the story on zpool recovery.

One need deep knowledge on the zfs disk format and use some customize tool of zdb to recover the zpool. Since zfs version change in each Solaris or Solaris express or Opensolaris update, there are version of zdb available in special escalation process and need knowledgeable backend engineer to perform the task.

Since 2009 there is PSARC 2009/479 to create end user tool to recover zpool, recently introduce in opensolaris/Solaris express build 128a and will be introduce in u9 (per google post)

zpool clear  -F -n

-F
Initiates recovery mode for an unopenable pool. Attempts to discard the last few transactions in the pool to return it to an openable state. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost.

-n
Used in combination with the -F flag. Check whether discarding transactions would make the pool openable, but do not actually discard any transactions.

see also c0t0d0s0

Zfs was introduced in S10 11/06 (U3) that is almost 4 years ago, IMHO, it take 4 years to provide an end user tool to recover zpool is not acceptable.

Since update ? Zfs for root system become available it introduce new way of live update and using snapshot to backup the boot environment, one also need to learn how to recovery the root zpool

For x86 environment one should always keep a copy of the latest opensolaris pre-view livecd.

For sparc system, in http://www.genuix.org site also has latest version of livecd available

About laotsao 老曹

HopBit GridComputing LLC Rockscluster Gridengine Solaris Zone, Solaris Cluster, OVM SPARC/Ldom Exadata, SPARC SuperCluster
This entry was posted in Solaris, Solaris 10, solaris 11 express, Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s