Crash Recovery in Databases: Definition, Phases, and ARIES Explained

What is Crash Recovery?

Crash recovery is the process by which the database is moved back to a consistent and usable state. This is done by rolling back incomplete transactions and completing committed transactions still in memory when the crash occurred.

When the database is in a consistent and usable state, it has attained what is known as a point of consistency. Following a transaction failure, the database must be recovered.

Crash recovery
Crash recovery

Conditions that can result in transaction failure:

1. A power failure on the machine causes the database manager and the database partitions on it to go down.

2. A hardware failure such as memory corruption, or disk, CPU, or network failure.

3. A serious operating system error that causes the DB to go down

Introduction to ARIES (Algorithms for Recovery and Isolation Exploiting Semantics

ARIES is recovery algorithm designed to work with no-force, steal database approach. It is used by IBM DB2, MS SQL Server and many other database systems.

The 3 main principles behind the ARIES recovery algorithm:

1. Write Ahead Logging: Any change to an object is first recorded in the log, and then the log must be written to stable storage before changes to the object are written to a disk.

2. Repeating History during Redo: On restart, after a crash, ARIES retraces the actions of a database before the crash and brings the system back to the exact state that it was in before the crash. The n it undoes the transaction still active at crash time.

3. Logging Changes during Undo: Change made to the database while undoing transactions are logged to ensure such an action isn’t repeated in the event of repeated restarts.

Read the ARIES research paper

What Are The Recovery Procedure after Crash? 

The recovery works in three phases.

1. Analysis Phase: The first phase, analysis, computes all the necessary information from the log file.

2. REDO Phase: The Redo phase restores the database to the exact state at the crash, including all the changes of uncommitted transactions that were running at that point time.

3. UNDO Phase: The undo phase then undoes all uncommitted changes, leaving the database in a consistent state. After the redo phase the database reflects the exact state at the crash. However, the changes of uncommitted transactions have to be undone to restore the database to a consistent state.

Other Recovery Related to Data Structure

The Write-Ahead Log Protocol: Write Ahead Logging (WAL) is family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. In a system using WAL, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log. WAL allows updates of a database to be done in one place.

Atomicity: This is the property of transaction processing whereby either all the operations of a transactions are executed or none of them are executed (all-or-nothing)

Durability: This is the ACID property which guarantees that transactions that have committed will survive permanently.

Log: A transaction log (also transaction journal, database log, binary log or audit trail) is a history of actions executed by a database management system to guarantee ACID properties over crashes or hardware failure. Physically, a log is a file of updates done to the database, stored in stable storage.

Check Pointing: Check pointing is basically consists of storing a snapshot of the current application state, and later on, use it for restarting the execution in case of failure. A check point record is written into the log periodically at that point when the system writes out to the database on disk all DBMS buffers that have been modified. This is a periodic operation that can reduce the time for recovery from a crash.

Check points are used to make recovery more efficient and to control the reuse of primary and secondary log files. In the case of crash, backup files will be used to recover the database to the point of crash.

Media Recovery: Media recovery deals with failure of the storage media holding the permanent database, in particular disk failures. The traditional database approach for media recovery uses archive copies (dumps) of the database as well as archive logs. Archive copies represent snapshots of the database and are periodically taken.

The archive log contains the log records for all committed changes which are not yet reflected in the archive copy. In the event of a media failure, the current database can be reconstructed by using the latest archive copy and redoing all changes in chronological order from the archive log.

A faster recovery from disk failures is supported by disk organizations like RAID (redundant arrays of independent disks) which store data redundantly on several disks. However, they do not eliminate the need for archive based media recovery since they cannot completely rule out the possibility of data loss, e.g when multiple disk fail.

Evaluation

1. Explain crash recovery

2. Explain the following terms in crash recovery (i) Media recovery (ii) Check point (iii) The Write-
Ahead log protocol

3. Discuss the concepts of ARIES in crash recovery.

READING ASSIGNMENT

Understanding Data Processing for senior secondary schoolsby Dinehin Victoria pages 261 – 267

Questions

1. Discuss the concept of ARIES in crash recovery

2. Explain the difference between media recovery and checkpoint.

3. Explain the difference between a system crash and a media failure.

FAQs

What is crash recovery in a database?

Crash recovery is the process of restoring a database to a consistent state after a system failure by rolling back incomplete transactions and completing committed ones

What are the three phases of ARIES crash recovery?

ARIES crash recovery consists of the Analysis, Redo, and Undo phases. The analysis phase identifies affected transactions, redo restores committed changes, and undo reverts uncommitted transactions

How does Write-Ahead Logging (WAL) help in crash recovery?

Write-Ahead Logging (WAL) ensures that changes are logged before being applied to the database, enabling recovery by replaying logs and maintaining atomicity and durability.

 
What is the difference between media recovery and system crash recovery?

Media recovery restores a database after storage failures (e.g., disk crashes) using backup copies, while system crash recovery deals with failures like power outages or memory corruption by rolling back transactions.

Why is checkpointing important in database recovery?

Checkpointing reduces recovery time by periodically saving database snapshots, ensuring that only recent transactions need to be redone in case of a crash

IMG 20161027 WA0019
I.T Entrepreneur | Digital Marketing Specialist | Website Developer, | Computer Lecturer | Content Creator at Webbpedia Learning | +2347084530359 | admin@webbpedia.com

Samuel Okeke is a highly experienced and skilled Website Developer, Computer Lecturer, IT Instructor, Digital Marketing Expert, Computer Engineer, and Author with over a decade of experience in the educational, digital marketing and IT sectors. He has proven track records of developing and sustaining successful educational projects, including Acadlly, Audio School, and Certifications Exam Prep. He possess a strong passion for education and a commitment to making a positive impact on people and society.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *