COMPRESSION TECHNIQUES
VIDEO SYSTEM
The video system is concerned with capturing , compressing and transmitting video on one thread and receiving , decompressing & displaying on the other . Due to the system target bit rate image compression is the main focus of the video system .To accomplish a good compression ratio, image redundancy must be removed .
Digital video :
Digital video is well established in the broadcast industry & now with the emergence of powerful compression techniques . It is appearing in consumer products . However , the digital video used television studios is quite different from one might found in a personal computer .It is obvious when asking about the essential characteristics shared by the various forms of digital video .
For analog television broadcast , analog video is a sequence of frames , encoded by a continuous electrical signal occurring at some constant rate , the frame rate .With digital video , we have a sequence of frames which re no digital images , perhaps compressed in some manner . But the key concept that must be associated with digital video is the notion of liming .
Digital video is not just any sequence of images , but a sequence with accompanying information indicating the duration frames are to be displayed , (whereas in analog video , frame duration is constant ) . Frame duration in digital video displayed is not always constant , many digital video sequences have varying frame rates .
Digital video representation :
Although there are many different formats for digital video formats , these formats can b compared along certain dimension .These dimensions include :
a-Color model :
Digital color images require some form of encoding color . A procedure for specifying colors is called a color model ( or color space ) .i)RGB :
Colors are represented by a numeric tripe specifying red (R) , green (G) , and blue (B) intensities . This numeric values can be easily mapped to voltages for the red green , and blue guns in color CRT ’s .
ii)HSB :
Colors are represented by a tripe representing hue(H) (the dominant color of a sample) . saturation (S) (the intensity of the color) & brightness (B) (the amount of gray in the color brighter colors have less gray ) .
CHAPTER 16
iii)YUV :
A color model used within television industry . Y represents luminance , and can be thought of as the black-and-white portion of a video signal . U .V are color difference signals (they involve the difference between Y , B & R respectively . U & V represents the color portion of a video signal & are called chrominance or simply chroma .
YUV is suited for video broadcast since it makes efficient use of bandwidth . The human eye has greater sensitivity to the changing in luminance than to color changing , so it is possible to better utilize available bandwidth by allocating less to chroma & more to luminance .
b-Analog formats sampled :
Digital video frames are generally obtained in two ways , synthesis , usually by computer programming , sampling , at an analog video signal .Sampling rate is crucial in specifying a digital format for sampled analog video , since its value determines storage requirements & data transfer rates . Sampling theory gives a lower limit for the frequency at which to sample in order to reconstruct a signal properly . This limit , the Nyquist rate , is twice the highest frequency within the signal . Sampling below the Nyquist rate introduces frequency interference that leads to distortion in the reconstructed image .
c-Sample size & quantization :
Sample size is the number of bits used to represent sample values , while quantization refers to the mapping from the continuous range of the analog signal to discrete sample values .
Generally , digital video representations can be divided into two categories , high data rate formats , and low data rate formats . The high data rate formats have little , or , no compression , picture quality , and ease of processing are of primary importance . The low data rate formats are intended for real time applications like video phone , video conferencing , and television broadcasting and here we indeed need the COMPRESSION of the transmitted video signal .
Image compression , a global view :
Compression is the key to low data rate digital video & there ha been much work recently in compression techniques & algorithms . Methods for video compression can be compared through the following :1)lossy versus lossless compression :
In a lossy technique , the reconstructed image doesn’t match exactly the original image but differs a little . An example of a lossy technique is any transform technique in which an image is represented by a certain mathematical transform coefficients with some of those coefficients neglected which leads to the lost of some information from the original image , but a transform technique has a high compression ratio .
CHAPTER 16
2)Real time compression :
Real time compression & decompression techniques are those employed in real time applications like video conferencing . If a digital video is stored in a compressed form , then only real time decompression is needed for playback , like the case of CD drives .
3)Interframe versus intraframe compression :
Interframe compression uses the fact that successive video frames tend to be similar so , it removes this temporal redundancy by simply differencing between successive frames & more over predicts & interpolate intermediate frames from independently coded key frame (Key frame is a frame that differs significantly from other frames) . In intraframe compression , no prediction nor interpolation , and all frames are independently coded .
Current approaches to compression techniques :
Image compression addresses the problem of reducing the amount of data required to represent digital image .The main basis of the reduction process is the removal of redundant data .Interest in image compression dates back more than 25 years. The initial focus of research efforts in this field was on the development of analog method for reducing video transmission bandwidth, a process called bandwidth compression .
Over the years the need for image compression has grown steadily , currently it is recognized as “ enabling technology “ where video compression enable us to use it in many applications such as in faxes and the control of remotely piloted vehicles in military and other large applications.
As we said how to efficiently compress the video data is one of the most important functions in multimedia applications . Most existing video coding techniques such as (H.261) and (MPEG) have been designed to compress video data using discrete cosine transform (DCT) , the (DPCM) ,and the motion compression (MC) . On of the major problems in MC/DCT based techniques the corresponding high computation complexity which inhibits the real time applications of it by using software only . As a result , the cost of such video applications is high and the implementation and maintenance are difficult .
Recently , to realize the video coder by soft ware to reduce both the cost and complexity of multimedia systems has been widely discussed .The popular video coder (PVC) have been proposed to achieve this goal , the PVC was applied to implement the video phone systems , and was shown good performance . However the coding speed and the compression ratio of the (PVC) are not high enouto apply it to higher resolution and/or higher speed video applications , such as video conferencing systems . But (PVC) still has low compression ratio which is the purpose of using another technique which is popular video coder 2 (PVC 2) .
The PVC2 improves the performance of the huffman coder of the PVC and applies adaptive resolution reduction technique for intraframe coding . With the aids of this new techniques ,the PVC2 compress the video data more efficiently than PVC does. More interestingly the coding speed of PVC2 is much more faster than other well-known software video coders such as the video for windows and MPEG coder.
CHAPTER 16
1)Moving Picture Expert Group (MPEG) :
The Moving Picture Expert Group (MPEG) , is an ISO working group has developed a video compression standard based on interframe coding an Discrete Cosine Transform (DCT) techniques . The standard also addresses audio encoding . The initial standard for video known as MPEG-1 leads to data rates of about 1 M bit/s . A recent proposal MPEG-2 is intended for broadcast quality video at data rates between 2 & 15 M bit/s . One problem with interframe coding is that reverse playing & random access are difficult as every frame depends on another .2)Joint Photographic Expert Group (JPEG) :
Joint Photographic Expert Group (JPEG) is a compression technique intended for still images . It can be used in digital video compression by treating each frame as an image to be compressed independently (This technique used in digital video is called Motion JPEG) .
For a given frame rate & resolution , JPEG compressed video will have higher data rate than interframe compression techniques such as MPEG-1 & MPEG-2 , but since frames are compressed independently , it is an easier matter to re-arrange the order of the frames & to play back in reverse or at variable rates .
IMAGE COMPRESSION
What Is Image Compression
?Image processing is the process of removing the redundancy in an image , this redundancy comes out of the fact that video images have very low variations in its contents (i.e. the colors in an image) , so every pixel is related to the surrounding pixels which means that
a respectable sum of information is found in every pixel about the neighboring pixels.
Types Of Redundancy :
For still image we have one type of redundancy , called spatial redundancy or the redundancy in the horizontal and the vertical axis which is presented in repetition of pixel colors in large areas of the image .
For moving pictures we have in addition to the spatial redundancy another type of redundancy that is called temporal redundancy or infra frame redundancy this type comes up from the fact that a very small change occurs from a frame to the frame next to it .
1)Intraframe redundancy removal :
Temporal redundancy can be removed by simply differing between frames.
Redundancy Removal By Direct Differencing :
The simplest way is to apply direct Differencing between corresponding pixels in the current and previous frames .
CHAPTER 16
N.B. :
a)the first frame and subsequent refresh frames are encoded and transmitted as is , to overcome the effect of noise .
b)the need of removing the redundancy is to minimize the B.W. the price paid of coarse is added complexity to the transmitter and receiver . also significant time may be spent on the process if complex motion estimation techniques are implemented .
Basic Concept :
First , frames are subdivided into blocks whose dimensions are matched with other block dimension choices in the system. the process is to try getting the closet match for a block in current frame and in another frame .The difference block is then calculated as the difference between the current block and the closet match ,also the motion vector (dx , dy) is produced. these information used at receiver to reproduce the image .The mismatch function M(p,q) is describes the amount of mismatch between the two blocks p & q .
Block Search Algorithm :
Block searching involves trying a set of blocks in a certain region (called the search window) & testing the best match. an example of used algorithms :
Full Search
:The algorithm can be summarized as follow :
1)for a given block in the current frame, construct a search window in the previous frame and scan all possible blocks at a 1-pixel step.
2)for each block calculate the mismatch function.
3)select the block with minimum mismatch.
4)calculate the motion vector as (dx , dy)=(x_current-x_prev , y_current-y_prev).
Search Sequence
:Considering a block X in the current frame , the blocks in its search windows(in the previous frame) can be labeled as :
UL |
U |
UR |
L |
X |
R |
BL |
B |
BR |
The eight search blocks around the search points are checked in two cycles:
Basic cycle : X(only in the first stage), U,R,B,L
Secondary cycle :UR,BR,BL,UL
in the case of speed favoring searches may be limited to 5(x+4) searches only.
CHAPTER 16
Timing :
Information about timing are hardware and platform dependent and is affected significantly at changing any of the parameters described above . It is also affected by frame rate (inversely varying ) Finally , processing time changes from one frame to the other depending on the number of search cuts that occur and their location in the search tree .
2)Spatial redundancy removal :
This type of redundancy can be removed by the use of transform techniques.
What Is Image Transform
?Due to the high repetition of pixels colors, we need some mathematical technique to transform a matrix represents the values of image colors to another domain in which the transformed matrix doesn’t represent the values of the colors , instead it represent the variations in the pixels colors .One of most common transforms is the Discrete Cosine Transform (DCT), this transform is widely used by very popular compression techniques like JPEG for still images and MPEG for moving pictures, as the core of there compression algorithm .
Some other transformation techniques like Discrete Fourier Transform (DFT) & Discrete Sine Transform (DST) & other transforms are used but not so widely as the DCT according to the complexity of computations and large computation time related to those transforms .
Image compression using transformation :
The transformed image is defined as[V]=[C][U][C]T
where [V] is the transformed image matrix.
[U] is the original image matrix.
[C] is the transform matrix.
To get the [U] matrix back :
[U]=[C]-1[V][[C]T ]-1
How to choose the [C] matrix :
From the above equation and from the previous discussion we find that the [C] matrix have to satisfy two essential condition :
1)it gives a transformed matrix that represents the variations in pixels colors in the original matrix , not there values .
CHAPTER 16
2)it should be real and orthogonal matrix so that [C]-1 = [C]T , hence the [U] matrix can be obtained back from the transformed matrix in small time from : [U] = [C]T [V] [C] as it’s faster to get the transpose of a matrix than to get its inverse . We find that the DCT satisfies these two conditions .
Color Quantization Of Images :
The transmission of a true colored image in which red, green, and blue colors of each pixel is represented by 8 bits each over a stacked bandwidth is inefficient , so here appears the need of choosing a small number of colors that describes the and reflect subjective image quality. This set of available colors, called a color palette .since the images usually contain a wide range of colors which must then be quantized by a palette with limited size. This color Quantization problem is considered into two parts:
1)the selection of an optimal color palette .
2)optimal mapping of each pixel of the image to a color from the palette.
Algorithm :
There is an algorithm for design of tree structured color palettes incorporating performance criteria which reflect subjective evaluation of image quality. Tree structured color palette greatly reduce the computational requirements of the palette design and pixel mapping tasks while allowing colors to be properly allocated density populated areas of the color space. The algorithm produce higher quality displayed images and require less computations than previous methods .
What is a palette ? And , What is the need for it ?
Video monitors display color images by modulating the intensity of the three primary colors red green and blue of each pixel at the image. in a digitized color image each primary color is usually quantized by 8 bits of resolution in order to eliminate distinguishable Quantization steps in luminance color hue, and color saturation . thus, full-colored display system use 24 bits to specify the color of each pixel on the screen. An alternative approach, which used by many currently available displays, is to provide a limited number of bits , such as 8 bits for specifying the color at each pixel. Each of this 28=256 values is then used as an index into a user defined table of colors each entry in the table contains a 24 bits value which specify the color red , green and blue components. In this way the user is allowed to select small subject of colors , called a palette , from the full range of 224=1677216 colors . the drawback of this scheme is that it restricts the number of colors which may be simultaneously displayed .
Since natural images typically contains a large number of distinguishable colors, displaying such images with a limited palette is difficult.
CHAPTER 16
Palette Design and Pixel Mapping Algorithm :
The approach we take is structured around algorithms performing two tasks. The first task , called color palette design, select the best possible set of colors for a particular image . the second task , called pixel mapping, associates each pixel of the image with a color from this palette to yield the highest quality image .
A number of approach have been suggested for the design of color palette which involve iterative refinement of some initial palette . however, these algorithms suffer from disadvantage of being computationally intensive. In many applications, the efficiency of algorithms for performing color palette design and pixel mapping is very important. This is particularly true in applications involving large image data bases . in these applications, images are in some common format which must be displayed on monitors with different limitations on color palette size. In such an environment, the color palette design and pixel mapping must be performed at the time the image is displayed , which is the case in video phone. This make computational efficiency of critical importance .
Even if the color palette is transmitted with each image, standard image coding techniques require that the pixel mapping be performed locally at the time each image is displayed. This is because standard image coding techniques make use of the high correlation between the values of the primary colors of neighboring pixels to reduce the transmission bit rate .the color palette indices of neighboring pixels aren’t highly correlated and, therefore, cannot be coded with high efficiency. Thus, even if a color palette of correct size is provided with each stored image, the pixel mapping function must be performed after the image retrieved from the data base and decoded into a full-color format .
Quantizers:
Quantizer is the way to convert analog data which is received into digital data . The main idea of Quantizer is to convert data into a fixed number of levels .Quantizer either software or hardware . Quantizer either uniform or non-uniform . In case of non-uniform quantizers the distances between the levels are not equal , and as an example of quantizer is the digital to analog circuits in which the analog signals are converted to fixed number of levels .
One problem we face in quantizer is noise due to the difference between the true level of the signal and the quantized level of the signal . this noise is called Granular noise .
CHAPTER 16
AUDIO SYSTEM
In many applications we are transmitting a large amount of data (for a good quality speech using PCM we need a data rate of 64 kb/s ), this is exaggerated if we are not transmitting speech only (as in video-phone applications) .But we found that we are limited by the 4kHZ channel bandwidth which can’t bear this high rate of data transmission we also have limitations concerning the storage medium required for storing this large amount of data for future reuse .
compression techniques serve as a solution for the problem .
Audio compression techniques :
Prediction & transform operations serve to remove the redundancy in an input wave form , &thus permit bit rate efficient digitization .
Speech wave form redundancy is utilized in time domain operations to realize straightforward reductions in bit rate , for a specified quality of digitization .
Differential pulse code modulation (DPCM) coders , which are based on quantizing a prediction error signal , are important examples of predictive coding systems .
DPCM systems with one bit quantizers constitute an important subclass which is called delta modulation (DM) .Transform coding (block quantization) techniques use frequency domain approaches for redundancy removal .
The above various compression techniques will be discussed in more details now .
Differential pulse code modulation (DPCM) :
By representing a correlated waveform in terms of appropriate difference samples , or prediction error samples , one can achieve an increased SNR at a given bit rate , or equivalently ,
a reduction of bit rate for a given requirement of SNR . This can be appreciated at least qualitatively for the simple but important case when the correlation between adjacent samples approaches unity .
In this case , assume that the encoder represents the waveform as a succession of adjacent sample differences x(n) - x(n-1) that these difference samples are quantized for transmission , and that a decoder recovers an approximation to the input essentially by integrating quantized adjacent sample differences .
In general , the quantizer input to a DPCM coder is a prediction error or difference signal
d(n) = x(n) - x`(n) where x`(n) is a prediction of x(n) .
The simplest DPCM predictor is the unit delay operator defined by x`(n) = x(n-1). In this case the DPCM decoder is simply a perfect integrator .
With a more general predictor (all pole predictor) in the DPCM coder ,
N
x`(n) = S hj . x(n-j) where N is the order of the predictor .
j=1
h for j=1,...................., N is a set of predictor coefficient .
The DPCM decoder takes a more general form as well . The more general decoder is sometimes referred as leaky integrator .
CHAPTER 16
Slope overloading & granular noise :
There are two types of reconstruction errors in DPCM coders , namely overload distortion & granular noise . We are , now , no longer quantizing x(n) , as in the case of PCM , but we are quantizing the difference signal d(n) . As a re, overload errors in DPCM are characterized not as amplitude overload , but as slope overload .Slope overload occurs whenever the largest step size of the quantizer is too small for local values of d(n) , resulting in a lag of staircase output function y(n) with respect to a fast-changing input , for example the rapid increase of speech amplitude at the onset of the pitch period , or the rapid change of image intensity at a high contrast edge . The opposite situation of granular noise occurs when the input variations are too small in comparison with the smallest step size of the quantizer . Examples are in the coding of silences in speech , or of constant or nearly constant gray areas in images .
The DPCM coders just described are appropriate for certain types of applications at 32 kb/s & 24 kb/s . They operate with a fixed first order predictor for simplicity .
Resulting speech quality , is less than conservatively defined to quality ( that of 8 bit PCM ) .
Nevertheless , 32 kb/s DPCM quality can be considered acceptable for certain speech transmission applications , especially those which do not require too many multiple encoding . On a subjective quality scale of 1 to 5 , the mean option score for this coder is very close to 4 at 32 kb/s .
Voice storage systems , such as those for recovered announcements , have even less stringent demands on quality . consequently they use a smaller speech bandwidth (say 0 to 2700 HZ)
& achieve 24 kb/s coding with 4-bit quantization ( Fs = 6 kHz , R = 4 ) where R is the number of bits per sample .
The quality of a conventional bandwidth 24 kb/s system ( Fs = 8 kHz , R = 3) is less ,
i.e. the greater quantization noise with R = 3 (instead of 4) is more objectionable than the loss of bandwidth in going from 3.2 to 2.7 kHz .
Adaptive DeltaModulation (ADM) :
In ADM , instantaneous variation of step size d is employed to enhance the quality of the encoded waveform . There are several algorithms are in terms of quantizer output memory used , and recommended step size multipliers or increments .Practical implementations of ADM algorithms presuppose appropriate constraints on maximum & minimum step size d min. < d (n) < d max.
Certain types of adaptation processes also require an initial , or starting step size d start
This can be set to an arbitrary value in general . In speech coding , assuming that speech activity is preceded by silence one can put d start = d min.
The value of d max. controls the amount of slope overload distortion , while the value of d min. controls the amount of granular noise .
In the most general , adaptation logic with m-bit memory , step size d (n) will be a function of the present bit b(n) , m preceding bits , and another step size d ` , which is either the minimum step size d min. or the previous step size d (n-1) .
d (n) = F [ b(n) , b(n-1) , ............ , b(n-m) , d ` ]
CHAPTER 16
A useful adaptation rule is d (n) = M(n) d ` , where M(n) is the step size multiplier which can be chosen so that the step sizes changes during successive increases or decreases can be either linear or exponential , or any other arbitrary function .
The non-availability of explicit level information in the one bit quantizer makes the use of the most recent polarity b(n) very valuable in ADM . The use of b(n) for the coding of sample x(n) means that the quantization is carried out in two steps .
The first involves the determination of b(n) , and the second is the application of this value to determine d (n) for the quantization itself .
Our Implementation:
In our videophone application , we need very special requirements . We need a maximum compression ratio retaining a good sound quality , simply we are not transmitting speech alone thus we have a huge data rate which must be compressed as we are aiming at transmitting at a bit rate of14.4 kb/s . we don’t need a very high sound quality , only an acceptable quality is needed . (we have to take care that the ear is more sensitive to noise than the eye , thus there is a tradeoff between quality and compression ratio) .
Finally , the algorithm used for compression must has as small computational time as possible for a real time system to be established . From the above requirements , one can conclude that we have to use either Delta Modulation (DM) , ADM or Discrete Walsh Hadamard Transform(DWHT).