Establish a connection directly

SRT allows direct connection between the source and target, in contrast to many existing video transmission systems that require a centralized server to collect signals from remote locations and redirect them to one or more Destinations. The central server-based architecture has a single point of failure that can also be a bottleneck during periods of high traffic. Transmitting signals through the hub also increases end-to-end signal transmission time and may double the cost of the bandwidth because two links need to be implemented: one from the source to the hub and the other from the center to the destination. By using a direct source-to-destination connection, SRT can reduce latency, eliminate central bottlenecks, and reduce network costs. 

Using the ARQ mechanism for packet delivery

Comparing the three packet delivery mechanisms, the top is an uncorrected data stream, and the output signal generates an error each time the packet is lost. In the middle, according to the Forward Error Correction (FEC) mechanism, it adds a fixed amount of extra data to the stream, which can be used to recreate the lost packets. At the bottom, according to the Automatic Repeat-reQuest (ARQ) mechanism, the sender resends the lost packet according to the request of the receiver, thereby avoiding the constant bandwidth consumption of the FEC.

ARQ works by establishing a two-way connection between the video source and the target. Each outbound packet is given a unique serial number that the recipient uses to determine whether all incoming packets have been correctly received in the correct order. If the packet is lost in the network, the receiver can create a list of serial numbers for the lost packet and automatically send a request to the sender for retransmission. This process can be repeated multiple times for networks with high error rates (at a specific time or at the time of the failure). ARQ requires caching at the sending location (in order to temporarily store packets if retransmission is required), before the video decoder or other receiver is sent, set a buffer at the receiving location to rearrange the packets to the correct one. order.

SRT uses the ARQ mechanism mainly because it can handle the most common types of errors on the Internet, that is, the loss is mainly caused by random packet loss. These errors can be easily fixed by the sender by simply retransmitting any packets that do not arrive at the receiver. If a packet containing a bit error arrives at the receiver, they will be treated as a lost packet and the sender will be asked to retransmit them. Another benefit is that the SRT provides a high resolution time stamp for each packet to accurately reproduce the timing of the media stream as it is output at the receiving end. This helps ensure that downstream devices can correctly decode video and audio signals.FEC is only applicable to systems that can support the extra bandwidth required for FEC data, as well as systems that can withstand signal interruptions that may occur when the network error rate exceeds the threshold.

Use UDP packet format

Each packet sent during an SRT session uses the UDP (User Datagram Protocol) packet format, which provides low-overhead, low-latency packet delivery. Most real-time media transport networks designed for professional applications use UDP because it provides a stable, repeatable packet delivery system with consistent throughput.

The reason for not using TCP (Transmission Control Protocol) is that TCP requires all bytes of the stream to be delivered exactly in their original order. Although this sounds like a good way to send a video, experience has shown that this is not the case. With the video, some missing bytes can be corrected or ignored in the worst case. With TCP, it is not possible to skip bad bytes; instead, as long as it is needed, the protocol will continue to retry sending lost data. This is the source of many freeze frames and the reason for the “rebuffering” symbol in a crowded network environment that can have a significant impact on the viewer. The third impact of TCP is subtle, but important for video transmission. TCP automatically reduces the packet transmission rate when network congestion occurs. Although this behavior helps to reduce overall congestion in the network, it does not apply to video signals because the speed of the video signal cannot be lower than its nominal bit rate. 

Start with handshake and function information exchange

The SRT provides three different handshake modes that allow devices to communicate with each other and set the necessary data to send and receive packets, such as IP addresses. The first is the invocation mode, where the SRT endpoint attempts to connect to a remote device with a known address and UDP port number. The second is the listener mode, in which the SRT device continuously monitors the incoming traffic flow to monitor it to a defined address and port number to wait for a connection from the caller device. The third mode is called “aggregation,” where two endpoints act as both a caller and a listener to make it easier to establish a connection through a particular type of firewall.

Each handshake requires a two-way confirmation of the endpoint ID and password by using a secure cookie before proceeding. After the handshake process is complete, the caller and listener exchange their functions and configuration. Both ends of the network need to know the overall delay between the two endpoints in order to be able to establish the correct buffer size to handle the packet retransmission delay. Connection bandwidth can also be estimated and communicated to allow video to be compressed to accommodate the capacity of the network. You can choose to exchange encryption keys between the sender and the receiver to encrypt the video and audio content within the IP packet using AES 128/192/256-bit encryption to make the transmission more secure.