This article first appeared in the September 1995 issue of Communications Systems Design
MPEG video compression has been the subject of much media hype in the computer press during the past two years. Even though PC use of MPEG has lagged, we believe that the continuing convergence of computers with TV apppliances will fuel the adoption of MPEG in the computer segment. While the computer industry has sung the praises of multimedia and MPEG, broadcast entrepreneurs have been methodically installing an infrastructure that now regularly delivers video via MPEG technology. Consumers in many marke ts can directly receive MPEG encoded video transmissions from providers like DirecTV. The direct satellite broadcast approach has resulted in one of the most rapid diffusions of technology into the consumer marketplace ever. In the first year of commercia l availability, DirecTV sold over 350,000 units. That's over ten times the penetration rate of the VHS VCR. In addition, some broadcasters, premium channels, and cable companies use MPEG encoded streams as a part of their transmission network.
MPEG (Motion Pictures Experts Group) was an outgrowth of earlier standards work for digital compression of still pictures. The international MPEG committee started in 1988 with the goal of standardizing video and audio for compact discs. The compression work for MPEG 1 was based upon film or other progressive sources. By 1990 the MPEG committee had created a data structure syntax for Source Input Format (SIF) video and compact disc audio using a combined data rate of 1.5 Mbit/Sec. This system approximated the perceptual quality of VHS consumer video tape, making MPEG video compression a visually acceptable technology. Although this MPEG standard was viable for progressive sources like film, it lacked the techniques to deal with standard broadcast interlaced video with good compression.
In 1992 over 200 companies from around the world were involved in the MPEG draft development, demonstrating strong support for the technology specification. Today, MPEG 2 syntax has been adopted for the United States Grand Alliance HDTV specification, the European Digital Video Broadcasting Group, and for the high density compact disc. MPEG 2 and MPEG 2 "near compliant" is also the backbone of commercially operational Direct Broadcast Satellite systems like DirecTV. Since the standard was not finalized be fore the DBS systems had to begin their infrastrucure development, there is a lag in making the video stream of commercial systems fully MPEG 2 compliant. This marketing choice has some implications for set top decoders that we'll discuss later. MPEG encoded video offers broadcasters the ability to transmit more programs on a given transmission channel. This means more channels at a lower infrastructure cost. In addition, digitally encoded streams eliminate drift, remove certain kinds of analog distortion , and reduce system wide maintenance.
If MPEG is so widely embraced by diverse groups and has so many benefits, then why isn't it more pervasive in personal computer systems? We categorize issues into four groups: cost to the consumer, title availability, and misunderstandings about MPEG capability.
The second PC alternative is based on what has been fairly expensive hardware decoders. MPEG decoder integrated chips remained at a high price due to the relative lack of competition. Two years ago Sigma Labs provided the sole PC based MPEG hardware decoder. Today there are about a dozen suppliers of add-in MPEG decoder boards. Now that most of the major semiconductor firms have produced or announced MPEG decoder devices, we expect prices to decline. We believe that the cost to the consumer for a hardware accelerated MPEG solution will likely fall below $50 within the next three years based on a scaleable hardware/software decoder architecture that can be embedded either on the motherboard or in a high performance video card.
For computer video, consumers buy applications to satisfy a number of wants and needs. But, the technology is never the application. For motion video, applications include the control software to create a game, educational program, video player, or other video related product. Title availability had been held back by a lack of industry cooperation on the Application Programming Interface (API). Since no standards existed for the API in 1993, Sigma Designs created their own interface to play Philips CD-i discs and CDs containing proprietary extensions for interactive titles. Concern over Sigma Designs' ownership of the API and the long term potential for competitive conflicts of interest led to the formation of the Open PC MPEG Consortium (OM1). The group of 90 hardware, software, and content suppliers created a Windows MPEG 1 MCI Command Set that is very similar to Sigma's but eliminates the proprietary features. The existence of an open industry standard for MPEG 1 applications makes developing titles a less risky proposition, and when combined with direct OS API support will encourage title providers to embrace the technology.
Applications also require the availability of appropriate source materials. Since the broadcast industry is using MPEG, it is reasonable to expect that the library of source material will expand both for PC based interactive titles and pure playback products. Unlike many other technologies, MPEG has multiple market segments that will use the underlying technology. If for no other reason than access to the converted feature length films produced for other markets, personal computer video will include MPEG decode as an option. The limitation of source material may become an ongoing issue for PCs based on the broadcast industry's choice of MPEG 2 as compared to the current PC choice of MPEG 1. MPEG 1 is not a perfect subset of MPEG 2 even though the two standards use many of the same compression approaches. In general, MPEG 1 streams created by a full featured encoder will not be compatible with MPEG 2.
MPEG currently consists of two operating specifications, MPEG 1 and MPEG 2. MPEG 1 was developed for progressive source materials like film, while MPEG 2 was enhanced to address the interlaced materials common in broadcast TV. Both standards include video , audio, and systems components such as time stamping for synchronization. MPEG 1 defines a bit stream syntax for compressed audio and video optimized to not exceed a bandwidth of 1.5 Mbit/Sec. The bandwidth restrictions fit the capabilities of single speed uncompressed CD ROM and Digital Audio Tape. Many people have taken the bandwidth design goal as a fundamental limit to MPEG 1 capability, but that is not the case. MPEG 1 defines the ability to process fields up to 4095 x 4095 and bit rates of 100 Mbit /Sec. As a practical systems tradeoff, many suppliers produce systems capable of much lower levels of resolution. Table 1 details some common MPEG digital image resolutions and their existing counterparts.
180x120 QSIF (Quarter SIF), video clips (PC quasi standard image size) 352x240 SIF, CD Whitebook Movies, video games 352x480 HHR, VHS equivalent 480x480 Bandlimited 4.2 MHz broadcast NTSC 544x480 Laser disc, Bandlimited PAL/SECAM 640x480 Square pixel NTSC 720x480 CCIR 601, Studio D-1, upper limit of Main level Table 1The syntax of MPEG 1 and 2 provide efficient ways to represent image sequences in compact coded data form. Figure 1 depicts a taxonomy of image sources from progressive, such as movie film, to interlaced such as television. In addition to the structure of the bit stream, the MPEG specification defines the reconstruction process. Algorithms that make up the decoding process are determined in large part by the semantics of the MPEG bit stream definition. The semantics of MPEG can be used to exploit video characteristics such as spatial redundancy, uniform motion, spatial masking, and spatial redundancy. Each of these characteristics can be the basis for image compression and data reduction. While the precise mechanism for decoding is not rigorously defined, the results are. The specification includes maximal permitted error in the reconstructed image. Some of these video characteristics and their exploitation are detailed in table 2.
Condition coding technique
------------ ---------------------
Spatial correlation
transform coding with 8x8 DCT (Discrete
Cosine Transform)
Visual response of eye
lossy scalar quantization of DCT
coefficients exploits reduced visual acuity
at higher spatial frequencies
Wide area correlation
prediction of DC coefficients in the 8x8 DCT
block
Spatial masking
macroblock quantization scale factor
content dependent coding
macroblock quantization scale factor
bit stream token encoding
variable length coding of macroblock address
increment, macroblock type, coded block
pattern, error magnitude of motion vector
prediction and DC coefficient error
Sparse matrix of DCT coefs
end of block token
local picture characteristics
adaptive quantization, block based coding,
macroblock type
constant step sizes in adaptive quantization
special macroblock type codes provide new
quantization
temporal redundancy
motion vectors and forward/backward
macroblock use 16x16 granularity
smooth optical flow areas
prediction of motion vectors
occlusion
forwards or backwards temporal prediction in
B pictures
non-integer PEL boundaries
half-pel interpolation
limited motion in P pictures
skipped macroblocks
co-planar motion in P pictures
skipped macroblocks
Table 2
While much attention has been placed on the decoding process by articles in the popular press, we wondered why there was so much variation in the quality of the images that we viewed. Since the decoders are defined to produce specific outputs, we were able to test several decoder implementations to verify that they were within specification. Still, we noticed artifacts and other annoying features appearing in the playback of some MPEG streams. We tried adjusting various parameters of the decoders, but the problems remained. We then turned our attention to the encoding process. Our experiments in encoding show that the source of video data and the quality of the encoding process have a very strong impact on the quality of the decoded image. MPEG encoding appears simple, but has many areas where preprocessing and compression choices will have a substantial impact both on compression efficiency and playback quality. Conceptually, MPEG encoding is a seven step process:
In our discussions with various companies, we discovered how the very high compression numbers were generated. Companies wrongly assume that quoting a higher compression ratio is preferable. The fact is that bit streams either conform to the MPEG standard or they don't. And, the compressed stream either meets the delivery channels' bandwidth limits or it doesn't. With the goal of quoting the highest possible compression numbers, people begin with the most popular studio signal known as D-1 or CCIR 601 digital video. This signal is coded at 270 Mbit/sec. We derive the 270 Mbit/sec using the following:
luminance 858 samples/line * 525 lines/frame * 30 frames/sec * 10 bits/sample ~= 135 Mbit/sec
R-Y 429 samples/line * 525 lines/frame * 30 frames/sec * 10 bits/sample ~= 68 Mbits/sec
B-Y 429 samples/line * 525 lines/frame * 30 frames/sec * 10 bits/sample ~= 68 Mbits/sec
Total 27 Msamples/sec * 10 bits/sample = 270 Mbit/sec
Using this simplistic approach to defining compression, we come up with an amazing 235:1 by dividing 270 by the defined CD ROM data rate of 1.15.
Let's look at this calculation a little closer. Television broadcast contains active information during the non-blanking intervals. So, only 720 out of the 858 luminance samples per line contain information. Actually, the number of 720 is in debate among TV engineers, with a consensus that the actual number is somewhere between 704 and 720. In a similar manner, there are only 480 lines that contain picture information, with some debate about an upper limit of 496. For MPEG 1 and MPEG 2 conformance points , Constrained Parameters Bitstreams and Main Level respectively, the numbers are chosen to be 704 samples * 480 lines for luminance and 352 * 480 lines for each of the two chrominance images. Now we compute a new compression rate:
luminance 704 samples/line * 480 lines (* 30 fps * 10 bits/sample ~= 104 Mbit/sec
chrominance 2 channels * 352 samples/line * 480 lines * 30 fps * 10 bits/sample ~=104 Mbit/sec
Total ~207 Mbit/sec for a compression of 180:1 (207/1.15).
All of our calculations are based on studio quality equipment that uses 10 bit samples. MPEG defines 8 bit samples, so the actual compression ratio is 180 * (8/10) = 144:1. The additional 2 bits of quantization in the studio equipment is used to suppress noise in multi-generation video. In tests of MPEG alternative quantization levels, the additional 2 bits did not provide discernible improvement in the video for single generation video.
So far we've considered the obvious source representation issues. Now there are a number of important but easily hidden signal qualities to consider.
The studio standard CCIR 601 represents the chroma signals with half the horizontal samples as the luminance signal. At the same time, it employs full vertical resolution. This ratio of subsampled components is designated 4:2:2. MPEG 1 and MPEG 2 both define the use of 4:2:0 for consumer applications. In this case both chrominance signals have half the resolution of the luminance signal. By reducing the resolution in the vertical direction, we now have a chrominance frame of 352 x 240. This gives an average of 1.5 samples per pixel, 1 for Y, 0.25 for Cr, and 0.25 for Cb. Now we recompute the compression as:
704 pixels * 480 lines * 30 fps * 8 bits/sample * 1.5 samples/pixel = 122 Mbit/sec yielding a compression ratio of 108:1.
The next question to address is the basic frame size. CCIR 601 is converted to a SIF image by subsampling 2:1 in both the horizontal and vertical directions. Overall, this results in a 4:1 drop in compression. Subsampling is an important step in pre-processing the source since it will have an impact on the quality of the compressed image. Computationally simple reduction is done by simply discarding every other line or sample. Higher quality compressed images can be developed from pre-processed images that are the result of FIR filtering or other decimation techniques. Improper decimation will guarantee the presence of image artifacts. Regardless of the quality of the subsampling, the compression ratio gets reduced one more time:
352 pixels * 240 lines * 30 fps 8 bits/sample * 1.5 samples/pixel ~= 30 Mbit/sec for a ratio of 26:1.
This reduced ratio is more in line with our experience and suggests that MPEG is still a very aggressive compression scheme, but not at the more than 200:1 level often referenced. While the reduced ratio applies to broadcast images in North America, many source images are converted from film at a frame rate of 24 frames per second. This last factor can reduce the compression factor to 21:1 in many cases.
There is actually quite good news in the reduced compression ratio. Film sources have an additional 20% bits available in comparison to broadcast sources. This means that film sources can have better quality reconstructed images, which meets with consumer expectations.
The luminance response curve of VHS places -3dB near 2 MHz. This VHS bandlimit is equivalent to 200 samples per line. VHS chroma is equivalent to about 80 samples per line. If we consider only the sampling density, MPEG is superior to all VHS parameters except vertical resolution. where VHS wins 480:240. When other real world analog factors like interfield crosstalk and the TV monitor's Kell factor are taken into account, the difference becomes less than 2:1, and may in fact be equal in many real world systems. In addition, timing errors, and other tape related issues further reduce the real resolution. Regardless of the offsetting factors, a nominal VHS tape and monitor will out perform MPEG SIF for high complexity sources. For "normal" viewing, MPEG-1 SIF and VHS are very close in resolution, but MPEG will always suffer from the loss of detail for high spatial frequency materials. In particular, the loss of detail is readily discernible for images that contain text.
Progressive source broadcast NTSC quality can be approximated with a bitstream of about 3 Mbit/sec. PAL requires a higher rate of about 4 Mbit/sec. High spatial complexity sources like sports may require a higher bit rate of 5 to 6 Mbit/sec. Material that is broadcast from a 30 fps source will in addition require a proportionately higher bit rate to achieve similar results.
Laserdisc bandlimited signals are often defined by manufacturers to be capable of 425 lines and 567 samples per line. An equivalent digital representation can be approximated by a 567 * 480 * 30 fps system. Regardless of the superior theoretical representation, well encoded progressive sources with medium detail can achieve Laserdisc and SVHS clarity. As with the VHS analysis, higher MPEG bit rates are required to match Laserdisc capabilities.
From a purely theoretical point of view, MPEG can never achieve the same quality as an uncompressed signal. However, for a very large portion of the consumer segment, MPEG at SIF size and 1.15 Mbit/sec can be nearly impossible to discern from an analog VHS image. Likewise, with appropriate selections for image size and compressed bit rate, MPEG can approximate SVHS, broadcast NTSC, and Laserdisc. From our evaluation, MPEG 1 SIF format does not meet SVHS or Laserdisc quality. Moreover, MPEG 2 does nothing to improve the fundamental underlying issues that affect image quality.
The fact that multiple channels are broadcast using the same transponder allows yet another level of compression tradeoff to be made. Unlike a single source MPEG compression, a broadcast service can adapt the rate control for several image and audio sources simultaneously. Adaptive rate control spanning multiple programs can give the highest bandwidth to complex sources while reducing the bandwidth to another source, but requires a more complex uplink control facility. This type of statistical multiplexing will sometimes lead to blurs or artifacts when the combined ideal bandwidth of the programs on a single transponder is too high. In these cases, one or more programs will have their bit rates dropped to meet the real bandwidth of the transponder. By careful program scheduling, broadcasters can minimize the impact of the fixed transponder bandwidth on the broadcast quality. As a complicating factor, adaptive rate control of multiple sources requires real time encoding for some of the programs. Thus, compression and quality will not be as high as possible with non-real time compression. In addition, broadcasters also encrypt the signal to reduce piracy of programs at the uplink facility. Regardless of these complication, the real world results shown by DBS providers are excellent, and closely match what we've been able to produce in the laboratory from the viewer's perception.
MPEG requires that satellite transponders have digital capabilities. Hughes' new generation of satellites for digital communications is the most powerful ever launched, The HS601 satellites for DBS operate in the BSS portion of the Ku band (12.2-12.7 GHz). The DirecTV satellites have 16 transponders each capable of 120 watts. This higher output is needed to reach the 18 inch dish antennas installed at consumer sites. Radiated power from the transponders is between 48 and 53 dBW depending on the configurat ion of the transponders. While the requirements of satellites may seem simple, the actual command and control for these sophisticated systems is quite involved and requires a significant ground based infrastructure apart from the MPEG services.
Set top requirements have spawned much debate, but the reality of commercial DBS services has established at least the first few series of set top products. Set top decoders perform several functions in addition to the pure MPEG decode. These additional functions include decryption and pay per view billing. Commercial premium programs require some sort of encryption in order to prohibit signal piracy. Digital encryption is a stronger method than many of the existing quasi analog techniques, making the entire delivery system more secure. Set top units for DirecTV include a low speed modem capability for communicating information such as billing details to the DBS provider. In this approach to pay per view, the consumer chooses items to view and is billed monthly. This contrasts with current pay per view systems that require the consumer to make a phone call to activate the reception of the program. Marketing experts contend that eliminating the phone call will increase orders for premium programs. The set top box makes a toll free phone call once per month to download the pay per view information.
Architecturally, set top boxes are based on processors commonly found in computer workstations. In addition, these set top units are reprogrammable via the satellite broadcast. This decision was made for some very basic reasons. At the time that DBS originated in 1994, the only standard for MPEG was MPEG 1. Suppliers who participated in the MPEG standards committee anticipated MPEG 2 and in fact adopted some of the MPEG 2 capabilities before the standard was approved. The decision to enter the market before MPEG 2 was ratified meant that either a pseudo standard would be used for many years, or the set top units needed to be reprogrammable. Reprogrammability allows the broadcaster to download new code into the set top automatically. This permits bug fixes, standards upgrades, and inclusion of new services. Reprogrammability also means a higher unit cost as compared to non-reprogrammable approaches. As a rule of thumb, RAM is about four times as area intensive as ROM. This translates into at least a four-fold increase in program memory cost. On the other hand, memory requirements for MPEG decode are high. The set top must be able to store several frames of image in order to complete the decode process. As DBS based MPEG gains in maturity, we expect fixed function devices to replace the generally reprogrammable processor. When combined with ROM coded control programs, there is a lot of potential for cost reduction in the set top unit.
Distribution channels for MPEG encoded programs are varied and pose different tradeoffs. When developing an image compression system based on MPEG you will need to consider the source of your images: progressive or interlaced, the amount of time and budget that you will have for image encoding, the nature of the scenes to be encoded, your end customers' familiarity with studio quality video, the allowable point of decode cost, and any interactivity. Finally, you will need to examine the bandwidth capabilities of your delivery vehicle. Single speed CD drives will limit bandwidth to at most 1.5 Mbit/sec if used solely for video, or 1.15 Mbit/sec for a combined stream. Satellite transmission, fiber optics, and high speed networks will provide you with capaci ty to meet SVHS and higher quality requirements. MPEG provides you with a syntax to implement high performance compressed video systems. This toolkit approach permits many alternatives; you don't have to live with the solutions described in the popular press. The solution that you choose will be in large part determined by where your product line fits in the convergence of TVs and PCs.