needs a branch with more work to be done, partial inlining that is invisible to performance (5kb which is nothing), PGO + BOLT optimization path yielding better results regardless
Force inlining of the ExecuteCommand function to reduce CPU overhead in the GPU command processing hot path. Additionally, silence debug logging metadata within the function to prevent string-processing logic from blocking compiler optimizations. Includes safe guards for multi-compiler and cross-platform compatibility.
Signed-off-by: Collecting <collecting@noreply.localhost>
This commit aims to implement the NVDEC (Nvidia Decoder) functionality, with video frame decoding being handled by the FFmpeg library.
The process begins with Ioctl commands being sent to the NVDEC and VIC (Video Image Composer) emulated devices. These allocate the necessary GPU buffers for the frame data, along with providing information on the incoming video data. A Submit command then signals the GPU to process and decode the frame data.
To decode the frame, the respective codec's header must be manually composed from the information provided by NVDEC, then sent with the raw frame data to the ffmpeg library.
Currently, H264 and VP9 are supported, with VP9 having some minor artifacting issues related mainly to the reference frame composition in its uncompressed header.
Async GPU is not properly implemented at the moment.
Co-Authored-By: David <25727384+ogniK5377@users.noreply.github.com>