
Thus, only the rotational transform component is present in the extrinsic parameters of the cameras the translational component is absent or small enough to be ignored. The structure of commercial 360-degree video recording equipment is generally set so that the optical axes of the cameras pass through the same point, and the cameras photograph radially using an omni-directional angle of view. Commercial 360-degree imaging equipment selects intrinsic parameters according to the camera model used and extrinsic parameters specific to the rig of a specific structure used to mount cameras. Three hundred and sixty-degree image stitching tools use the intrinsic and extrinsic parameters of the cameras used in the shooting to stitch images. The stitching process is a critical step in determining the quality of panoramic images, and research is being actively conducted to improve stitching performance.

Estimation of the correct alignment to relate various pairs of images, a choice of a final compositing surface to warp aligned images, and seamless cutting and blending of overlapped images are required for image stitching even in the presence of parallax, lens distortion, scene motion, and exposure difference. Generally, 360-degree cameras are configured to shoot images from all directions by radially arranging the cameras of a narrow field of view (FoV) around the same point, and then stitching the captured images offline.Ī stitching process is required to generate a 360-degree or panoramic photo-realistic image from images captured by a plurality of cameras. A 360-degree image is synthesized using images taken using a plurality of cameras with wide-angle or fisheye lenses. Īs VR media change from graphic to real images, various 360-degree image capturing equipment types and shooting techniques are being developed. VR media are applied in many fields, such as broadcasting, education, and games, and they are being developed or planned by many companies as a core application service in the 5G era with augmented reality (AR) media. In particular, as high-quality visual information, virtual reality (VR) media has attracted much attention because it can maximize immersion of users in 3D or ultra high definition (UHD) media. Immersive media may include multisensory information such as high-quality visual information, multichannel audio information, and tactile information. With the development of information and communications technology (ICT) such as 5G and artificial intelligence and changes to the content creation environment, there is a growing demand for immersive media, which refers to a medium that conveys information of all types of senses in the scene to maximize immersion and presence for user satisfaction. The proposed method is shown to reduce parallax and misalignment distortion in segments with plain texture or large parallax, and significantly improve geometric distortion and pixel distortion compared to conventional methods. The performance of the proposed method is evaluated by comparing the subjective quality, geometric distortion, and pixel distortion of video sequences stitched using the proposed and conventional methods. Finally, the stitched video frame is synthesized by stacking the stitched matched segment pairs and the foreground segments to the reference frame plane by descending order of the area. Second, to prevent degradation of the stitching quality of plain or noisy videos, the homography for each matched segment pair is estimated using the temporally consistent feature points. Region-based stitching is performed on matched segment pairs, assuming that segments of the same semantic class are on the same plane. First, video frame pairs for stitching are divided into segments of different classes through semantic segmentation.

In this paper, we propose a semantic segmentation-based static video stitching method to reduce parallax and misalignment distortion for sports stadium scenes with dynamic foreground objects.
