x264 source for verification 2026-05-22

2026-05-22 16:45:04 +08:00
commit 4647f166e5
270 changed files with 166522 additions and 0 deletions
--- a/doc/ratecontrol.txt
+++ b/doc/ratecontrol.txt
@@ -0,0 +1,44 @@
+A qualitative overview of x264's ratecontrol methods
+By Loren Merritt
+
+Historical note:
+This document is outdated, but a significant part of it is still accurate.  Here are some important ways ratecontrol has changed since the authoring of this document:
+- By default, MB-tree is used instead of qcomp for weighting frame quality based on complexity.  MB-tree is effectively a generalization of qcomp to the macroblock level.  MB-tree also replaces the constant offsets for B-frame quantizers.  The legacy algorithm is still available for low-latency applications.
+- Adaptive quantization is now used to distribute quality among each frame; frames are no longer constant quantizer, even if MB-tree is off.
+- VBV runs per-row rather than per-frame to improve accuracy.
+
+x264's ratecontrol is based on libavcodec's, and is mostly empirical. But I can retroactively propose the following theoretical points which underlie most of the algorithms:
+
+- You want the movie to be somewhere approaching constant quality. However, constant quality does not mean constant PSNR nor constant QP. Details are less noticeable in high-complexity or high-motion scenes, so you can get away with somewhat higher QP for the same perceived quality.
+- On the other hand, you get more quality per bit if you spend those bits in scenes where motion compensation works well: A given artifact may stick around several seconds in a low-motion scene, and you only have to fix it in one frame to improve the quality of the whole scene.
+- Both of the above are correlated with the number of bits it takes to encode a frame at a given QP.
+- Given one encoding of a frame, we can predict the number of bits needed to encode it at a different QP. This prediction gets less accurate if the QPs are far apart.
+- The importance of a frame depends on the number of other frames that are predicted from it. Hence I-frames get reduced QP depending on the number and complexity of following inter-frames, disposable B-frames get higher QP than P-frames, and referenced B-frames are between P-frames and disposable B-frames.
+
+
+The modes:
+
+    2pass:
+Given some data about each frame of a 1st pass (e.g. generated by 1pass ABR, below), we try to choose QPs to maximize quality while matching a specified total size. This is separated into 3 parts:
+(1) Before starting the 2nd pass, select the relative number of bits to allocate between frames. This pays no attention to the total size of the encode. The default formula, empirically selected to balance between the 1st 2 theoretical points, is "complexity ** 0.6", where complexity is defined to be the bit size of the frame at a constant QP (estimated from the 1st pass).
+(2) Scale the results of (1) to fill the requested total size. Optional: Impose VBV limitations. Due to nonlinearities in the frame size predictor and in VBV, this is an iterative process.
+(3) Now start encoding. After each frame, update future QPs to compensate for mispredictions in size. If the 2nd pass is consistently off from the predicted size (usually because we use slower compression options than the 1st pass), then we multiply all future frames' qscales by the reciprocal of the error. Additionally, there is a short-term compensation to prevent us from deviating too far from the desired size near the beginning (when we don't have much data for the global compensation) and near the end (when global doesn't have time to react).
+
+    1pass, average bitrate:
+The goal is the same as in 2pass, but here we don't have the benefit of a previous encode, so all ratecontrol must be done during the encode.
+(1) This is the same as in 2pass, except that instead of estimating complexity from a previous encode, we run a fast motion estimation algo over a half-resolution version of the frame, and use the SATD residuals (these are also used in the decision between P- and B-frames). Also, we don't know the size or complexity of the following GOP, so I-frame bonus is based on the past.
+(2) We don't know the complexities of future frames, so we can only scale based on the past. The scaling factor is chosen to be the one that would have resulted in the desired bitrate if it had been applied to all frames so far.
+(3) Overflow compensation is the same as in 2pass. By tuning the strength of compensation, you can get anywhere from near the quality of 2pass (but unpredictable size, like +- 10%) to reasonably strict filesize but lower quality.
+
+    1pass, constant bitrate (VBV compliant):
+(1) Same as ABR.
+(2) Scaling factor is based on a local average (dependent on VBV buffer size) instead of all past frames.
+(3) Overflow compensation is stricter, and has an additional term to hard limit the QPs if the VBV is near empty. Note that no hard limit is done for a full VBV, so CBR may use somewhat less than the requested bitrate. Note also that if a frame violates VBV constraints despite the best efforts of prediction, it is not re-encoded.
+
+    1pass, constant ratefactor:
+(1) Same as ABR.
+(2) The scaling factor is a constant based on the --crf argument.
+(3) No overflow compensation is done.
+
+    constant quantizer:
+QPs are simply based on frame type.
--- a/doc/regression_test.txt
+++ b/doc/regression_test.txt
@@ -0,0 +1,24 @@
+Here is one test method which checks that the encoder's
+view of decoded pictures in the same as the decoder's view.
+This ensures that there is no distortion besides what is
+inherently caused by compression.
+
+# Install and compile x264 :
+git clone git://git.videolan.org/x264.git x264
+cd x264
+./configure
+make
+cd ..
+
+# Install and compile JM reference decoder :
+wget http://iphome.hhi.de/suehring/tml/download/jm17.2.zip
+unzip jm17.2.zip
+cd JM
+sh unixprep.sh
+cd ldecod
+make
+cd ../..
+
+./x264/x264 input.yuv --dump-yuv fdec.yuv -o output.h264
+./JM/bin/ldecod.exe -i output.h264 -o ref.yuv
+diff ref.yuv fdec.yuv
--- a/doc/standards.txt
+++ b/doc/standards.txt
@@ -0,0 +1,11 @@
+x264 is written in C. The particular variant of C is: intersection of C99 and gcc>=3.4.
+checkasm is written in gcc, with no attempt at compatibility with anything else.
+
+We make the following additional assumptions which are true of real systems but not guaranteed by C99:
+* Two's complement.
+* Signed right-shifts are sign-extended.
+* int is 32-bit or larger.
+
+x86-specific assumptions:
+* The stack is 16-byte aligned. We align it on entry to libx264 and on entry to any thread, but the compiler must preserve alignment after that.
+* We call emms before any float operation and before returning from libx264, not after each mmx operation. So bad things could happen if the compiler inserts float operations where they aren't expected.
--- a/doc/threads.txt
+++ b/doc/threads.txt
@@ -0,0 +1,95 @@
+Historical notes:
+Slice-based threads was the original threading model of x264.  It was replaced with frame-based threads in r607.  This document was originally written at that time.  Slice-based threading was brought back (as an optional mode) in r1364 for low-latency encoding.  Furthermore, frame-based threading was modified significantly in r1246, with the addition of threaded lookahead.
+
+Old threading method: slice-based
+application calls x264
+x264 runs B-adapt and ratecontrol (serial)
+split frame into several slices, and spawn a thread for each slice
+wait until all threads are done
+deblock and hpel filter (serial)
+return to application
+In x264cli, there is one additional thread to decode the input.
+
+New threading method: frame-based
+application calls x264
+x264 requests a frame from lookahead, which runs B-adapt and ratecontrol parallel to the current thread, separated by a buffer of size sync-lookahead
+spawn a thread for this frame
+thread runs encode, deblock, hpel filter
+meanwhile x264 waits for the oldest thread to finish
+return to application, but the rest of the threads continue running in the background
+No additional threads are needed to decode the input, unless decoding is slower than slice+deblock+hpel, in which case an additional input thread would allow decoding in parallel.
+
+Penalties for slice-based threading:
+Each slice adds some bitrate (or equivalently reduces quality), for a variety of reasons: the slice header costs some bits, cabac contexts are reset, mvs and intra samples can't be predicted across the slice boundary.
+In CBR mode, multiple slices encode simultaneously, thus increasing the maximum misprediction possible with VBV.
+Some parts of the encoder are serial, so it doesn't scale well with lots of cpus.
+
+Some numbers on penalties for slicing:
+Tested at 720p with 45 slices (one per mb row) to maximize the total cost for easy measurement. Averaged over 4 movies at crf20 and crf30. Total cost: +30% bitrate at constant psnr.
+I enabled the various components of slicing one at a time, and measured the portion of that cost they contribute:
+    * 34% intra prediction
+    * 25% redundant slice headers, nal headers, and rounding to whole bytes
+    * 16% mv prediction
+    * 16% reset cabac contexts
+    * 6% deblocking between slices (you don't strictly have to turn this off just for standard compliance, but you do if you want to use slices for decoder multithreading)
+    * 2% cabac neighbors (cbp, skip, etc)
+The proportional cost of redundant headers should certainly depend on bitrate (since the header size is constant and everything else depends on bitrate). Deblocking should too (due to varying deblock strength).
+But none of the proportions should depend strongly on the number of slices: some are triggered per slice while some are triggered per macroblock-that's-on-the-edge-of-a-slice, but as long as there's no more than 1 slice per row, the relative frequency of those two conditions is determined solely by the image width.
+
+
+Penalties for frame-base threading:
+To allow encoding of multiple frames in parallel, we have to ensure that any given macroblock uses motion vectors only from pieces of the reference frames that have been encoded already. This is usually not noticeable, but can matter for very fast upward motion.
+We have to commit to one frame type before starting on the frame. Thus scenecut detection must run during the lowres pre-motion-estimation along with B-adapt, which makes it faster but less accurate than re-encoding the whole frame.
+Ratecontrol gets delayed feedback, since it has to plan frame N before frame N-1 finishes.
+
+Benchmarks:
+cpu: 8core Nehalem (2x E5520) 2.27GHz, hyperthreading disabled
+kernel: linux 2.6.34.7, 64-bit
+x264: r1732 b20059aa
+input: http://media.xiph.org/video/derf/y4m/1080p/park_joy_1080p.y4m
+
+NOTE: the "thread count" listed below does not count the lookahead thread, only encoding threads.  This is why for "veryfast", the speedup for 2 and 3 threads exceeds the logical limit.
+
+threads  speedup       psnr
+      slice frame   slice  frame
+x264 --preset veryfast --tune psnr --crf 30
+ 1:   1.00x 1.00x  +0.000 +0.000
+ 2:   1.41x 2.29x  -0.005 -0.002
+ 3:   1.70x 3.65x  -0.035 +0.000
+ 4:   1.96x 3.97x  -0.029 -0.001
+ 5:   2.10x 3.98x  -0.047 -0.002
+ 6:   2.29x 3.97x  -0.060 +0.001
+ 7:   2.36x 3.98x  -0.057 -0.001
+ 8:   2.43x 3.98x  -0.067 -0.001
+ 9:         3.96x         +0.000
+10:         3.99x         +0.000
+11:         4.00x         +0.001
+12:         4.00x         +0.001
+
+x264 --preset medium --tune psnr --crf 30
+ 1:   1.00x 1.00x  +0.000 +0.000
+ 2:   1.54x 1.59x  -0.002 -0.003
+ 3:   2.01x 2.81x  -0.005 +0.000
+ 4:   2.51x 3.11x  -0.009 +0.000
+ 5:   2.89x 4.20x  -0.012 -0.000
+ 6:   3.27x 4.50x  -0.016 -0.000
+ 7:   3.58x 5.45x  -0.019 -0.002
+ 8:   3.79x 5.76x  -0.015 -0.002
+ 9:         6.49x         -0.000
+10:         6.64x         -0.000
+11:         6.94x         +0.000
+12:         6.96x         +0.000
+
+x264 --preset slower --tune psnr --crf 30
+ 1:   1.00x 1.00x  +0.000 +0.000
+ 2:   1.54x 1.83x  +0.000 +0.002
+ 3:   1.98x 2.21x  -0.006 +0.002
+ 4:   2.50x 2.61x  -0.011 +0.002
+ 5:   2.93x 3.94x  -0.018 +0.003
+ 6:   3.45x 4.19x  -0.024 +0.001
+ 7:   3.84x 4.52x  -0.028 -0.001
+ 8:   4.13x 5.04x  -0.026 -0.001
+ 9:         6.15x         +0.001
+10:         6.24x         +0.001
+11:         6.55x         -0.001
+12:         6.89x         -0.001
--- a/doc/vui.txt
+++ b/doc/vui.txt
@@ -0,0 +1,177 @@
+Video Usability Information (VUI) Guide
+by Christian Heine ( sennindemokrit at gmx dot net )
+
+1. Sample Aspect Ratio
+-----------------------
+
+* What is it?
+    The Sample Aspect Ratio (SAR) (sometimes called Pixel Aspect Ratio or just
+    Pel Aspect Ratio) is defined as the ratio of the width of the sample to the
+    height of the sample. While pixels on a computer monitor generally are
+    "square" meaning that their SAR is 1:1, digitized video usually has rather
+    odd SARs. Playback of material with a particular SAR on a system with
+    a different SAR will result in a stretched/squashed image. A correction is
+    necessary that relies on the knowledge of both SARs.
+
+* How do I use it?
+    You can derive the SAR of an image from the width, height and the
+    display aspect ratio (DAR) of the image as follows:
+
+    SAR_x   DAR_x * height
+    ----- = --------------
+    SAR_y   DAR_y * width
+
+    for example:
+    width x height = 704x576, DAR = 4:3 ==> SAR = 2304:2112 or 12:11
+
+    Please note that if your material is a digitized analog signal, you should
+    not use this equation to calculate the SAR. Refer to the manual of your
+    digitizing equipment or this link instead.
+
+    A Quick Guide to Digital Video Resolution and Aspect Ratio Conversions
+    http://www.iki.fi/znark/video/conversion/
+
+* Should I use this option?
+    In one word: yes. Most decoders/ media players nowadays support automatic
+    correction of aspect ratios, and there are just few exceptions. You should
+    even use it, if the SAR of your material is 1:1, as the default of x264 is
+    "SAR not defined".
+
+2. Overscan
+------------
+
+* What is it?
+    The term overscan generally refers to all regions of an image that do
+    not contain information but are added to achieve a certain resolution or
+    aspect ratio. A "letterboxed" image therefore has overscan at the top and
+    the bottom. This is not the overscan this option refers to. Neither refers
+    it to the overscan that is added as part of the process of digitizing an
+    analog signal. Instead it refers to the "overscan" process on a display
+    that shows only a part of the image. What that part is depends on the
+    display.
+
+* How do I use this option?
+    As I'm not sure about what part of the image is shown when the display uses
+    an overscan process, I can't provide you with rules or examples. The safe
+    assumption would be "overscan=show" as this always shows the whole image.
+    Use "overscan=crop" only if you are sure about the consequences. You may
+    also use the default value ("undefined").
+
+* Should I use this option?
+    Only if you know exactly what you are doing. Don't use it on video streams
+    that have general overscan. Instead try to to crop the borders before
+    encoding and benefit from the higher bitrate/ image quality.
+
+    Furthermore the H264 specification says that the setting "overscan=show"
+    must be respected, but "overscan=crop" may be ignored. In fact most
+    playback equipment ignores this setting and shows the whole image.
+
+3. Video Format
+----------------
+
+* What is it?
+    A purely informative setting, that explains what the type of your analog
+    video was, before you digitized it.
+
+* How do I use this option?
+    Just set it to the desired value. ( e.g. NTSC, PAL )
+    If you transcode from MPEG2, you may find the value for this option in the
+    m2v bitstream. (see ITU-T Rec. H262 / ISO/IEC 13818-2 for details)
+
+* Should I use this option?
+    That is entirely up to you. I have no idea how this information would ever
+    be relevant. I consider it to be informative only.
+
+4. Full Range
+--------------
+
+* What is it?
+    Another relic from digitizing analog video. When digitizing analog video
+    the digital representation of the luma and chroma levels is limited to lie
+    within 16..235 and 16..240 respectively. Playback equipment usually assumes
+    all digitized samples to be within this range. However most DVDs use the
+    full range of 0..255 for luma and chroma samples, possibly resulting in an
+    oversaturation when played back on that equipment. To avoid this a range
+    correction is needed.
+
+* How do I use this option?
+    If your source material is a digitized analog video/TV broadcast it is
+    quite possible that it is range limited. If you can make sure that it is
+    range limited you can safely set full range to off. If you are not sure
+    or want to make sure that your material is played back without
+    oversaturation, set if to on. Please note that the default for this option
+    in x264 is off, which is not a safe assumption.
+
+* Should I use this option?
+    Yes, but there are few decoders/ media players that distinguish
+    between the two options.
+
+5. Color Primaries, Transfer Characteristics, Matrix Coefficients
+-------------------------------------------------------------------
+
+* What is it?
+    A videophile setting. The average users won't ever need it.
+    Not all monitor models show all colors the same way. When comparing the
+    same image on two different monitor models you might find that one of them
+    "looks more blue", while the other "looks more green". Bottom line is, each
+    monitor model has a different color profile, which can be used to correct
+    colors in a way, that images look almost the same on all monitors. The same
+    goes for printers and film/ video digitizing equipment. If the color
+    profile of the digitizing equipment is known, it is possible to correct the
+    colors and gamma of the decoded h264 stream in a way that the video stream
+    looks the same, regardless of the digitizing equipment used.
+
+* How do I use these options?
+    If you are able to find out which characteristics your digitizing equipment
+    uses, (see the equipment documentation or make reference measurements)
+    then find the most suitable characteristics in the list of available
+    characteristics (see H264 Annex E) and pass it to x264. Otherwise leave it
+    to the default (unspecified).
+    If you transcode from MPEG2, you may find the values for these options in
+    the m2v bitstream. (see ITU-T Rec. H262 / ISO/IEC 13818-2 for details)
+
+* Should I use these options?
+    Only if you know exactly what you are doing. The default setting is better
+    than a wrong one. Use of this option is not a bad idea though.
+    Unfortunately I don't know any decoder/ media player that ever even
+    attempted color/gamma/color matrix correction.
+
+6. Chroma Sample Location
+--------------------------
+
+* What is it?
+    A videophile setting. The average user won't ever notice a difference.
+    Due to a weakness of the eye, it is often economic to reduce the number of
+    chroma samples in a process called subsampling. In particular x264 uses
+    only one chroma sample of each chroma channel every block of 2x2 luma
+    samples. There are a number of possibilities on how this subsampling is
+    done, each resulting in another relative location of the chroma sample
+    towards the luma samples. The Chroma Sample Location matters when the
+    subsampling process is reversed, e.g. the number of chroma samples is
+    increased. This is most likely to happen at color space conversions. If it
+    is not done correctly the chroma values may appear shifted compared to the
+    luma samples by at most 1 pixel, or strangely blurred.
+
+* How do I use this option?
+    Because x264 does no subsampling, since it only accepts already subsampled
+    input frames, you have to determine the method yourself.
+
+    If you transcode from MPEG1 with proper subsampled 4:2:0, and don't do any
+    color space conversion, you should set this option to 1.
+    If you transcode from MPEG2 with proper subsampled 4:2:0, and don't do any
+    color space conversion, you should set this option to 0.
+    If you transcode from MPEG4 with proper subsampled 4:2:0, and don't do any
+    color space conversion, you should set this option to 0.
+
+    If you do the color space conversion yourself this isn't that easy. If the
+    filter kernel of the subsampling is ( 0.5, 0.5 ) in one direction then the
+    chroma sample location in that direction is between the two luma samples.
+    If your filter kernel is ( 0.25, 0.5, 0.25 ) in one direction then the
+    chroma sample location in that direction is equal to one of the luma
+    samples. H264 Annex E contains images that tell you how to "transform" your
+    Chroma Sample Location into a value of 0 to 5 that you can pass to x264.
+
+* Should I use this option?
+    Unless you are a perfectionist, don't bother. Media players ignore this
+    setting, and favor their own (fixed) assumed Chroma Sample Location.
+