Crash Recovery in Transport Layer

In this tutorial, you will learn the basic concepts of crash recovery in Transport Layer. After reading this tutorial, you will learn about how the Transport Layer handles crashes that can interrupt data transfer between applications, including those caused by server failures or computer crashes.

Contents:

  1. Crash Recovery
  2. Situation of Crashes
  3. Events on Server
  4. Recovery from the Upper Layers
  5. End-to-End Acknowledgement

Crash Recovery

It may be possible that hosts or routers on the network are subject to crashes, or the connection is established for a long time such as large software or media downloads. Crash recovery of these is a problem. Recovery from the router and the network is a bit easier if the transport unit is entirely within the hosts.

  • Let us assume the scenario in which the client host is sending a large file to a file server using a simple stop-and-wait protocol.
  • The file server receives chunks of large files sent by the client host and sends them to the transport user of the transport layer.
  • The file server suddenly crashes during transmission. When the file server comes back up, its tables are reinitialized, and it forgets where it was before.
  • The file server sends a broadcast segment to all other hosts saying that it has just crashed and is requesting information from its clients about the status of all open connections.
  • The customer may be in Segment Outstanding (S1) or No Segment Outstanding (S0) status. Based on the state information, the subscriber decides whether to retransmit the most recent segment information.

Situation of Crashes

No matter how well the client and the server are programmed, there are situations in which the transport layer’s protocols fail to fine-tune.

advertisement
advertisement

The diagram below explains the failure status of the server.

Failure of Server
  • For example, the transport unit of the server sends the acknowledgment. When the acknowledgment is sent to the client, the server writes the segment to the application process.
  • Here, sending the acknowledgment and writing a segment on the application process are separate events, and they cannot be done together.
  • Now, if the acknowledgment sent by the server reaches client-1 and the application process crashes while writing the segment, then when the server is back up, it sends a broadcast packet to each host. When client-1 receives the broadcast message, it thinks that the segment has already arrived, so it will not resends the segment.
  • In the second scenario, the server writes the receive segment to the application process and sends an acknowledgment to the client. But the server crashed while sending the acknowledgment. Therefore, when the server backs up and sends the broadcast packet to all hosts, client-1 resends the previous segment, as it does not receive any acknowledgment from the server.
  • Servers are programmed in two ways, either accept first or write the first segment.
  • The client is programmed in four ways, either always retransmitting the last segment, or never transmitting the segment, or retransmitting only in state S0, or only in state S1.

Events on Server

On the server, three events are possible that are sending an acknowledgment (A), writing to an output process (W), and crashing (C). These events can occur in six different sequences, AC(W), AWC, C(AW), C(WA), WAC, and WC(A). Here, parentheses define that neither A nor W follows C after the server crashes.

The diagram below shows eight combinations of client and server strategies.

Events and Strategies of Client and Server
  • As you can see in the diagram, each strategy used by sending hosts has some event that causes the protocol to fail.
  • For example, if the client always retransmits segments, the AWC event generates an unrecognized duplicate packet, even when the other two events (AC(W), C(AW)) are working properly.

Recovery From the Upper Layers

If the client and the server are communicating and exchanging segments before the server attempts to write to the application process, then the client knows what is about to happen. But when the server crashes, the client doesn’t know whether the server crashed just before segment writing or after segment writing.

advertisement
  • Separate events occur one after the other, but not simultaneously. When a host crash or host recovery process occurs, it is not transparent to the higher or upper layers.
  • In general terms, “If a crash occurs at layer N+1, layer N+1 can only recover from its previous layer, which is layer N”. If the higher layer has sufficient status information of the crashed server, then the status information is used to restore where the server was before the problem occurred.
  • The transport layer can recover from failures in the network layer.

End-to-End Acknowledgement

The recovery problem from the higher layer N+1 leads to a problem called end-to-end acknowledgment.

  • The transport layer protocol is end-to-end, which means it is not chained like the lower layers.
  • Now consider, the remote server has a database, and the user is entering a request for a transaction against the remote database.
  • The server is programmed in such a way that first the segments are sent to the upper layers and then it sends the acknowledgment for the segments received.
  • When the server sends an end-to-end acknowledgment to all hosts it receives segments, it does not mean that it will be around long enough to update the database.
  • Actual end-to-end acknowledgment, to be received, means that the work is actually done and its absence defines that the work has not been done, which is impossible to obtain.

Key Points to Remember

Here is the list of key points we need to remember about “Crash Recovery in Transport Layer”.

advertisement
  • No matter how well the client and the server are programmed, there are situations in which the transport layer’s protocols fail to fine-tune.
  • On the server, three events are possible that are sending an acknowledgment (A), writing to an output process (W), and crashing (C).
  • If the higher layer has sufficient status information of the crashed server, then the status information is used to restore where the server was before the problem occurred.

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.