Optimizing your video-enabled drone design

Story

January 30, 2017

Dennis Barrett

Texas Instruments

Over the next decade, it is estimated that nearly $98 billion will be spent globally on aerial drones and other unmanned aircraft. While the commercial applications for drone technology are forecasted to drive a global market of more than $127 billion, consumer drones are reaping the benefits of these investments. To better understand the safe operation of drones, let's dive into a key requirement for enabling safe flight: low-latency video transmission.

To operate outside of a user’s line of sight, any drone needs to feature an onboard camera with real-time transmission capabilities. When selecting the right system for the job, it is important to keep these factors in mind:

Low power consumption: Lower power increases flight times
Low latency: Lower latency enables faster reactions
Wireless link robustness: Robust connections increase accuracy and responsiveness
Range: Longer range extends distance of operation
Autonomy: Additional sensors enable safer flights

Of these considerations, low-latency video compression and transmission is of the utmost importance. The following methods outline several ways developers can reduce latency to a drone’s video compression and transmission system:

Video capture: A higher frame rate means lower capture times (Tcap). For example, a 30-fps camera takes 33 ms to capture each frame of video. This number is reduced to 16.5 ms for 60-fps video capture.
Compression or encoding: Compression techniques are used to reduce the data rate needed for transmitting video frames. The H.264 compression standard is a very common technique for recording and compressing video in drones. Compression is generally a compute-intensive task. The time required to encode (Tenc) depends on the choice of encoding engine and features used.
Transmission: Drones communicate to the ground station using a wireless communication mechanism like Wi-Fi connectivity. The resulting transmission delay (Ttx) depends on the available data bandwidth. For example, if a 720p30 stream is encoded at 1 Mbps and the available bandwidth is 2 Mbps, the time taken to send a stream to the ground station is 16.5 ms.
Network: Depending on the need, an aerial system may be connected to remote ground stations via a network. If this is the case, additional delays (Tnw) may result within the network.
Receive: If the ground station is also wirelessly connected to the network, then additional latency (Trx) similar to that of transmission is involved in the system.
Decompression or decoding: The compressed video stream needs to be decompressed at the receiving station. Like encoding, this decoding process is also compute-intensive, introducing a decoding delay (Tdec) to the system.
Display: Just like video capture, there will be display latency (Tdisp) depending on the refresh rate.

One thing to also note is that a drone communicating directly with ground stations does not need to rely on a network, resulting only in a single transmission delay (Ttx) (i.e., Tnw = 0 and Trc= 0).

To better illustrate the total latency from capture to display during frame-by-frame operations, Figure 1 details a timeline of this process.

Figure 1: Video capture and display timeline.

A specific example of total latency may be found in Table 1.

Table 1: center

Outlined in Table 1 is a high-latency scenario for controlling drone operations. Here it takes 118.7 ms for the operator to see the collected video. If a drone is traveling at 15 meters per second, it will have moved 1.8 meters when the remote operator sees the need for a flight change; during this time the drone could crash.

To help reduce this possibility, the H.264 standard introduces the concept of slices. A slice is composed of several macroblocks (a two-dimensional unit of a video frame), which are encoded independently. Using this approach enables each slice to be decoded separately without referencing another. However, while low-latency encoding offers flexibility in how to arrange these slices, using the natural row order is most efficient.

When the number of slices in a frame is greater than one, developers have the ability to reduce not only encoding time but also overall latency. In this situation, the system only has to wait for one frame to be captured before encoding begins, automatically triggering its transmission. The impact is that the capture, encode, transmission, receive, decode and display process is no longer serial but parallelized, introducing a theoretical reduction of delay by a factor of N at each step. This makes the overall latency: T = Tcap + (Tenc + Ttx + Tnw + Trx + Tdec + Tdisp)/N (Figure 2).

Figure 2: Slice-based impact on processing timeline.

In theory, the effective time will be reduced by a factor of N between the encode and display processes. However, practically, the time may not always scale linearly with the number of slices, an effect caused by the overhead required to set up and process individual slices. Table 2 shows sample latency for slice-based encoding based on a rate of 30 slices per frame.

Table 2: Slice-based latency example.

As in this example, even if an effective encoding/decoding time is achieved, the latency still takes half the time of frame-based encoding (single slice per frame). By using this process, the remote drone pilot is able to react at least three times faster.

One tradeoff to consider: While a higher number of slices will speed up the encoding and transmission process, it also reduces the compression ratio. This method increases the number of bits used for a slice along with the effective transmission time. The designer must optimize the end-to-end system; ultimately, it will be up to them adjust this parameter accordingly.

Digital media processors leverage integrated hardware engines along with frame-to-memory ISPs designed for the low-latency encoding and decoding of videos, using multiple slices per frame. Figure 3 shows a drone’s digital media processor in a low-latency video-encoding Wi-Fi system.

Figure 3: The TMS320DM368 digital media processor in a low-latency video-encoding Wi-Fi system for use in drones.

Wi-Fi and Bluetooth combo-connectivity devices are equipped with advanced features required for drones, such as antenna diversity, maximal ratio combining, dual-band support (2.4- and 5-GHz bands), rate management, and optimized data path.

If onboard monitoring of the aerial system is desired, using the UART interface exchanges control data with the drone’s central control unit, which enables autonomous collision avoidance.

Drones represent an exciting technology platform for engineers that are constrained by most design variables like the size, weight, power, and cost [SWaP-C]. Controlled flight – a challenge unchanged since the time of the Wright brothers – requires low-latency video processing, whether delivery is over a wireless connection for the operator or ultimately fully autonomous operation. By employing slice-based processing of the full video frame and streaming multiple channels of compressed video, designers will be able to provide flexible, ultra-low-latency video delivery for drone flight.

References

1. Marcelo Balve, Business Insider, 13 Oct 2014 – Commercial Drones: Assessing the Potential for a New Drone-Powered Economy – www.businessinsider.com/the-market-forcommercial-drones-2014-2

2. PwC May 2016 – Clarity from above – PwC global report on the commercial applications of drone technology – www.pwc.pl/clarityfromabove

Dennis Barrett is a Product Marketing Engineer, Video and Vision Analytics, at Texas Instruments. He has worked on DSP, processor, and controller solutions for 30 years.

Texas Instruments www.ti.com