Sunday, November 10, 2024

How to Recover Files from a Dropped ASM Disk Group

Alireza Kamrani 

11/10/2024


This post describes a few tips and techniques on how to recover ASM files from a dropped disk group and an example of how to 'undrop' an ASM disk group


Read files from a dropped ASM diskgroup with amdu


amdu is a diagnostic utility shipped with Oracle installations since 11g (can also be used for 10g, see references). It allows to read data from unmounted diskgroup and can be used in various support-type operations. For the case in question we can read files from a dropped diskgroup with


amdu -dis '/dev/mapper/MYLUN*p1' -former -extract ORCL_DATADG.256


This command will extract ASM file number 256 from the unmounted diskgroup ORCL_DATADG physically accessible from the path /dev/mapper/MYLUN*p1 (edit the LUN path as relevant for your system). File 256 in the ORCL_DATADG diskgroup in this example is the control file of the test database we want to recover.

Note: ASM starts numbering user-visible files from 256 (lower file numbers are used for the ASM system files, often hidden, see also X$KFFIL and more details in the references). It is very likely that file number 256 is a controlfile in 10g and 11g systems, as the controlfile is often the first file created in a disk group used for Oracle DBs. I have just observed a brand new 12c databases that file number 256 is the password file (new 12c feature), in that system the DB controfile is file number 257. 


The outcome of the amdu command above is to extract a copy of the file into the local file system(in a custom created directory). 

From the controlfile we can easily get a list of the rest of the DB files if needed. 

For example we can used the command string on the restored controlfile and process the output to find the name of the rest of the DB files.


Where are we with our disk group recovery? We have demonstrated a simple method to extract any file from a dropped disk group. Good news, we could recover the entire 'lost DB' onto a local filesystem.

Can we do better than that, for example recover all the files in one go and into ASM?


Undrop an ASM diskgroup with kfed


kfed is another great ASM diagnostic and repair tool shipped with Oracle. It allows to read and write ASM metadata from the disk header structures. Obviously writing into ASM disk headers is an unsupported activity that we can do at our own risk (or rather under guidance of Oracle support if needed).

Block number 0 of ASM allocation unit number 0 (see references for details) of each ASM disk contains, among others, a key called kfdhdb.hdrsts. Dropped disks have kfdhdb.hdrsts=4 (KFDHDR_FORMER), while disks that are members of a diskgroup have kfdhdb.hdrsts=3 (KFDHDR_MEMBER).

The 'trick' here is to read all the disk headers, one by one with kfed, change the value of kfdhdb.hdrsts from 4 to 3 and write the headers back into the disks.

Let's see this with a simple example, where we first create a diskgroup and then we dropp it to test the undrop procedure. The following assumes using a mixture of sqlplus commands on ASM and DB instances and running kfed from the OS prompt, as indicated.


1. Test Scenario:


We create an ASM  disk group for testing, add a DB tablespace to it and then drop the diskgroup to prepare for the next step (undrop see point 2 below).


ASM_INSTANCE> create diskgroup ORCL_TESTDG external redundancy disk '/dev/mapper/MYLUN1_p1';



ORCL_DB> create tablespace testdrop datafile '+ORCL_TESTDG' size 1G;


ORCL_DB> alter tablespace testdrop offline; -- this is needed or else diskgroup drop will fail as you cannot drop a diskgroup with open files


ASM_INSTANCE> drop diskgroup RDTEST2_TESTDROP  including contents;


2. Example of how to undrop the disk group and recuperate its files


We read the block header for each disk of the diskgroup (1 disk only in this example) and copy it to a local file:


kfed read /dev/mapper/MYLUN1_p1 aunum=0 blknum=0 text=dumpfile_MYLUN1_p1


Manual edit of the local copy of the header block:


vi dumpfile_MYLUN1_p1


📍replace the line:

kfdhdb.hdrsts:                        4 ; 0x027: KFDHDR_FORMER


with:

kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER


We write the modified block header for each disk of the diskgroup (1 disk only in this example):


kfed write /dev/mapper/MYLUN1_p1 aunum=0 blknum=0 text=dumpfile_MYLUN1_p1



3. We can now check that our disk group and its files are back:


ASM_INSTANCE> alter diskgroup RDTEST2_TESTDROP mount;

ORCL_DB> alter tablespace testdrop online; 


Note, this procedure has been tested on Oracle for Linux, versions 11.2.0.4 and 12.2.


🅾️How to resotore the database from AMDU after Disk corruption ?

 

NOTE:This is not a replacement for RMAN backups to restore the contents of corrupted ASM diskgroups (which is supported). 

This procedure is at best 'best effort' and not supported! 

Because with this method we may not be able to restore the database completely. 

 

1. Create pfile of lost database , from database alert.log startup messages.

2. Startup database in nomount

3. Get the controlfile number from db alert log , it will show while starting the database here in this eg. its 256
eg., control_files='+<DGNAME>/<DB_NAME>/controlfile/current.256.709676643'

If you have controlfile backup already in non-asm location edit the pfile to point to location of non-asm and then mount the database
If you dont have controlfile backup then go to step 4 after determining the file# from step 3

4. $ amdu -diskstring <asm_diskstring> -extract <DGNAME>.256 

5. shutdown the database and change the control_file location to point to the extracted file location

6. startup mount the database

7. once mounted, get the datafile file numbers using "select name from v$datafile"
  And get online redo logfile from "select * from v$logfile".

8. Extract all datafiles and redolog files in similar manner

9. alter database rename <datafile 1> to < newly extracted location>

10. open the database


amdu_extract:

amdu_extract calls the Oracle ASM Metadata Dump Utility (AMDU) command to extract a file using an Oracle ASM alias name.


The following is an example of the amdu_extract command used to extract a file from the data disk group.


ASMCMD> amdu_extract data data/orcl/my_alias_filename /devices/disk*


Conclusions


We have discussed a few tips on how to read and recover files from a dropped disk group using the amdu utility and went through the main steps of an example procedure showing how to 'undrop' an ASM disk group using kfed.


Alireza Kamrani 

11/10/2024



No comments:

Post a Comment

Apply multiple Oracle patches Simultaneously

Apply multiple Oracle patches Simultaneously ♠️ Alireza Kamrani ♠️         16 Jan 2025 Step 1. Download all patches and unzip them in a co...