Performing and understanding the fsck procedure on the SunStorageTek[TM] 5000 Series NAS arrays to recover from a filesystem crash

Asset ID:	1-71-1006816.1
Update Date:	2018-03-20
Keywords:

Solution Type Technical Instruction Sure

Solution 1006816.1 : Performing and understanding the fsck procedure on the SunStorageTek[TM] 5000 Series NAS arrays to recover from a filesystem crash

Related Items


Sun Storage 5310 NAS Gateway System
 Sun Storage 5220 NAS Appliance
 Sun Storage 5210 NAS Appliance
 Sun Storage 5310 NAS Appliance
 Sun Storage 5320 NAS Gateway
 Sun Storage 5320 NAS Appliance

Related Categories


PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: SE5xxx NAS
 _Old GCS Categories>Sun Microsystems>Storage - Disk>Network Attached Storage

PreviouslyPublishedAs
209471

Oracle Confidential (INTERNAL). Do not distribute to customers
Reason: Migrated distribution from Sun

Description
There are instances when the NAS filesystem may suffer from a crash.
In such cases, fsck must be run on the filesystem that has crashed.

To run a fsck on the affected volume, the user needs to understand the way the fsck works on the SunStorEdge[TM] 5210 and 5310 NAS arrays.

This will help prevent loss of time, data, unexpected failures and unreasonable expectations.

Steps to Follow
The first step in filesystem repair is to ensure that you have a complete, tested backup. The filesystem check carries some risk. Directories, files and filenames may be lost. A tested backup means that the data has been restored from tape, and checked for validity.

Fsck must be started on volumes which are already mounted. The volumes could be mounted with errors (like offline / error), read-only etc.

The volume against which the filesystem check is being run will be unavailable for the duration of the filesystem check.

Care must be taken when you start fsck on a live volume. It freezes the volume during the process of checking the volume. This means that all I/Os will fail if they have been initiated on the volume being checked.

If this is a volume containing the /etc directory then all the volumes will be unavailable for the duration of the filesystem repair process.

The fsck procedure is run from the SunStorEdge CLI.
It is recommended that the output of the fsck session be logged.

At the CLI, enter "fsck <volumename>"

You will then be asked whether repairs should be
made if errors are found. Generally, the answer should be ?y? for ?yes?. The other potentially useful option is ?n? for ?no?. This will run a check against the volume without writing the repairs.
As noted above, this can be used to make decisions about running the filesystem check.

Atleast two fsck runs back to back are required to get the filesystem in order. Sometimes even more than 2 runs to get the filesystem back in order will be required. When the filesystem has been repaired, the following message is displayed:

sfs2ck vol1: no errors

This is how the fsck works on the SunStorEdge [TM] 5210 and 5310 NAS arrays.

First run of fsck will do the following :

If the FSOLF_ERROR error flag is set, it then detects the number of error counts, temporarily disables the error flag, re-initializes the error counts and then initializes the stat on the SFS2 volume. At this time, it sets the volume error flag back again.

Second fsck will do the following:

Check the errors again. If errors still exists, the above step is re-done. If there are no more errors, then the error flag is reset and it aligns the journal. If the volume was mounted read only, it remounts it as read write.

So, as it can be seen, if there are errors, we really need to run fsck at least two times '''back to back''' 
to fix and clear the flags. Possibly more than two times will be reqired if there are more errors.

Fsck is a single threaded utility and it can be pre-empted just like any other process.

It is also possible, but very rare, that the above message will never be seen. This can occur in extreme cases where the filesystem check is unable to completely repair a volume. In these cases, the volume should be deleted and restored from tape.

An example of a failed fsck run on a filesystem that has crashed is shown below, followed by an example of fscks run on the same filesystem to repair it.

jukebox > fsck /home
Should repairs be needed, do you want them made?
By answering yes, required repairs will be made.
By answering no, the volume will only be checked
and no repairs will be made. By leaving blank,
you will be asked to decide only if a repair is
required.
Make required repairs? yes
sfs2ck home: Pass 1 - page and node allocation maps
sfs2ck home: Pass 2 - directories and reference counts
sfs2ck home: node 0:2 link count off by -305
sfs2ck home: 1 errors
Reinitializing free counts...done
Elapsed time: 59 minutes 29 seconds

jukebox > df
volume type use% free/size requests origin
/cvol dos 33% 60.685M/89.792M 74799+0 ide1d1,1
/dvol sfs2 2% 146.031M/147.851M 108386+0 ide1d1,2
/dvol.chkpnt sfs2cpv 100% 0/147.851M 8914+0 /dvol
/home OFFLINE 22% 785.114G/1006.23G 5827+0 isp4d000,2
/isp2v1 sfs2 57% 789.497G/1.785T 759748151+597 isp2d021,1
/isp2v1.chkpnt sfs2cpv 100% 0/1.785T 8881+0 /isp2v1
/isp4v1 sfs2 1% 251.557G/251.558G 66429+0 isp2d020,1

jukebox > umount /home
jukebox > mount home
!/home: security DB initialization error
/home: mount processed, see log for details

 As seen above, the filesystem is still not repaired after the first run. Unmounting and mounting
 it indicates errors.

jukebox > df
volume type use% free/size requests origin
/cvol dos 33% 60.685M/89.792M 74801+0 ide1d1,1
/dvol sfs2 2% 146.031M/147.851M 108388+0 ide1d1,2
/dvol.chkpnt sfs2cpv 100% 0/147.851M 8914+0 /dvol
/home OFFLINE 100% 80K/1006.23G 0+0 isp4d000,2
/isp2v1 sfs2 57% 789.497G/1.785T 759752022+671 isp2d021,1
/isp2v1.chkpnt sfs2cpv 100% 0/1.785T 8881+0 /isp2v1
/isp4v1 sfs2 1% 251.557G/251.558G 66431+0 isp2d020,1

jukebox > fsck /home

Should repairs be needed, do you want them made?
By answering yes, required repairs will be made.
By answering no, the volume will only be checked
and no repairs will be made. By leaving blank,
you will be asked to decide only if a repair is
required.
Make required repairs? yes
sfs2ck home: Pass 1 - page and node allocation maps
sfs2ck home: Pass 2 - directories and reference counts
sfs2ck home: node 0:2 link count off by -305
sfs2ck home: 1 errors
Reinitializing free counts...done
Elapsed time: 48 minutes 42 seconds

jukebox > df
volume type use% free/size requests origin
/cvol dos 33% 60.685M/89.792M 75099+0 ide1d1,1
/dvol sfs2 2% 146.031M/147.851M 109062+0 ide1d1,2
/dvol.chkpnt sfs2cpv 100% 0/147.851M 8963+0 /dvol
/home OFFLINE 22% 785.114G/1006.23G 214+0 isp4d000,2
/isp2v1 sfs2 58% 779.459G/1.785T 762032229+4725 isp2d021,1
/isp2v1.chkpnt sfs2cpv 100% 0/1.785T 8930+0 /isp2v1
/isp4v1 sfs2 1% 251.557G/251.558G 66820+0 isp2d020,1

jukebox > fsck /home
Should repairs be needed, do you want them made?
By answering yes, required repairs will be made.
By answering no, the volume will only be checked
and no repairs will be made. By leaving blank,
you will be asked to decide only if a repair is
required.
Make required repairs? yes
sfs2ck home: Pass 1 - page and node allocation maps
sfs2ck home: Pass 2 - directories and reference counts
sfs2ck home: no errors --------> no error message seen
Clearing error flag of /home ... done.
/home is currently read-only.
Return it to normal service ? [no] yes
Aligning journal, stand by...
/home now in service

As can be seen above, the df command shows the filesystem as OFFLINE after the first run of the fsck.
On the second run, however, we see that the filesystem errors are cleared as indicated by the sfs2ck home: no errors message and the journal is aligned ( NAS filesystem is a Journaling Filesystem).
The filesystem is then restored.

NOTE: Unmounting and mounting the filesystem between the fsck runs causes the errors to be set again after the first run and interferes with the fsck procedures. Another run of fsck after the unmounting and mounting of the filesystem , does not repair the affected volume.

Product
Sun StorageTek 5310 NAS Gateway/Cluster System
Sun StorageTek 5320 NAS Gateway/Cluster System
Sun StorageTek 5320 NAS Appliance
Sun StorageTek 5310 NAS Gateway System
Sun StorageTek 5310 NAS Appliance
Sun StorageTek 5210 NAS Appliance
Sun StorageTek 5320 NAS Appliance
Sun StorageTek 5220 NAS Appliance

Internal Comments
This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the “Document Feedback” alias(es) listed below:

storage-nas-domain@sun.com
The Knowledge Work Queue for this article is KNO-STO-NAS.

fsck, NAS, 5210, 5310, 5310C, 5320, 5320C, filesystem, 5220, audited
Previously Published As
85250

Change History
Date: 2007-10-01
User Name: 71396
Action: Approved
Comment: Performed final review of article.

Check for Currency 05-MAR-2018
No changes required.

Attachments

This solution has no attachment