Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library

Resources

Contributing

Please read the contribution guidelines before starting work on a pull request.

Summary of the guidelines:

  • One pull request per issue;
  • Choose the right base branch;
  • Include tests and documentation;
  • Clean up "oops" commits before submitting;
  • Follow the coding style guide.
Comments
  • CUDA backend for the DNN module

    CUDA backend for the DNN module

    More up-to-date info available here (unofficial)


    How to use build and use the CUDA backend?

    How to use multiple GPUs?

    There are many ways to make use of multiple GPUs. Here is one which I think is the safest and the least complex solution. It makes use of the fact that the CUDA runtime library maintains a separate CUDA context for each CPU thread.

    Suppose you have N devices.

    Create N threads.
    Assign a CUDA device to each thread by calling cudaSetDevice or cv::cuda::setDevice in that thread. Each thread is now associated with a device.
    You can create any number of cv::dnn::Net objects in any of those threads and the network will use the device associated with that thread for memory and computation.
    

    Benchmarks

    Demo Video: https://www.youtube.com/watch?v=ljCfluWYymM

    Project summary/benchmarks: https://gist.github.com/YashasSamaga/a84cf2826ab2dc755005321fe17cd15d

    Support Matrix for this PR ## Current Support Matrix: (not updated)

    Blip | Meaning ---- | --------- ✔️ | supports all the configurations that are supported by all the existing backends (and might support more than what's currently supported) 🔵 | partially supported (fallback to CPU for unsupported configurations) :x: | not supported (fallback to CPU)

    Layer | Status | Constraints | Notes ---------------------------------------- | ------ | ------------- | -------------- Activations | ✔️ Batch Normalization | ✔️ Blank Layer | ✔️ Concat Layer | ✔️ Const Layer | ✔️ Convolution 2d | ✔️ | | asymmetric padding is disabled in layer constructor but the backend supports it Convolution 3d | ✔️ | | asymmetric padding is disabled in layer constructor but the backend supports it Crop and resize | :x: | Crop Layer | ✔️ | | forwarded to Slice Layer Detection Output Layer | :x: | Deconvolution 2d | 🔵 | padding configuration should not lead to extra uneven padding Deconvolution 3d | 🔵 | padding configuration should not lead to extra uneven padding Elementwise Layers | ✔️ | Eltwise Layer | ✔️ | Flatten Layer | ✔️ | Fully Connected Layer | ✔️ | Input Layer | :x: | Interp Layer | ✔️ | Local Response Normalization | ✔️ | Max Unpooling 2d | ✔️ | Max Unpooling 3d | ✔️ | MVN Layer | :x: | Normalize Layer | 🔵 | Only L1 and L2 norm supported Padding Layer | ✔️ Permute Layer | ✔️ Pooling 2d | 🔵 | Only max and average pooling supported | supports asymmetric padding Pooling 3d | 🔵 | Only max and average pooling supported | supports asymmetric padding Prior Box Layer | ✔️ Proposal Layer | :x: Region Layer | ✔️ | NMS performed using CPU Reorg Layer | ✔️ | Reshape Layer | ✔️ | Resize Layer | ✔️ Scale Layer | ✔️ Shift Layer | ✔️ | | forwarded to Scale Layer Shuffle Channel Layer | ✔️ Slice Layer | ✔️ Softmax Layer | ✔️ Split Layer | ✔️ LSTM Layer | :x:

    Known issues:

    1. Tests for some of the SSD based networks fail on Jetson Nano

    References: #14585

    Results:

    • https://github.com/opencv/opencv/pull/14827#issuecomment-522229894
    • https://github.com/opencv/opencv/pull/14827#issuecomment-523456312
    force_builders_only=Custom,linux,docs
    buildworker:Custom=linux-4
    docker_image:Custom=ubuntu-cuda:18.04
    
  • Issues with recognition whilst using IP Stream only

    Issues with recognition whilst using IP Stream only

    System information (version)
    • OpenCV => 3.1
    • Operating System / Platform => Linux
    Detailed description

    I will try ask in here as got no response on the forum. I have been working with OpenCV in an application since last year. The first version I was capturing the frame from a webcam and using Haarcascades and without an issue it would recognise a face nearly every time.

    I came into some issues with getting a stable web based stream going, after trying multiple solutions I moved to a new way. This is still using the exact same webcam except Linux Motion is accessing it and OpenCV is now connecting to the mjpeg stream from Linux Motion through a secure Nginx server, the stream is on the same device as the OpenCV script.

    Since doing this the quality of the stream has increased massively, but, it now no longer detects faces, at all hardly, I have compared screen shots of frames from when OpenCV was accessing the webcam and frames from when OpenCV is accessing the stream, and apart from the improved quality of the frames there really is not any difference, yet OpenCV refuses to identify a face, it is literally a case where I have to move the camera around and hold a position to identify a face, before I could be walking past on the other side of the room and it would detect my face.

    After trying everything i could think of and find on Google, I went back to the webcam, instantly it was detecting my face in whatever position, in what ever light. I have tried multiple other ways of streaming to the web again but still not successful so have moved back to the Motion stream again to try work this out.

    Can anyone shed any light on this, it does not make sense to me that an improvement in quality suddenly breaks facial identification. I have tried playing with the frame settings, resolution, contrast, hue, brightness and nothing I can do seems to work.

    Steps to reproduce
    self.OpenCVCapture.open('http://MOTION_STREAM_ADDRESS/stream.mjpg')
    self.OpenCVCapture.set(5, 30) 
    self.OpenCVCapture.set(3,1280)
    self.OpenCVCapture.set(4,720)
    self.OpenCVCapture.set(10,1)
    

    Then run through Haarcascades for detection.

  • OpenCV 3.1.0 simple VideoCapture and waitKey crashes after a while on OS X 10.11.2

    OpenCV 3.1.0 simple VideoCapture and waitKey crashes after a while on OS X 10.11.2

    OpenCV 3.1.0 is installed through brew install opencv3 --with-contirb --with-qt5 and the following program crashes after a while:

    #include <opencv2/core.hpp>
    #include <opencv2/highgui.hpp>
    #include <opencv2/videoio.hpp>
    
    int main(int argc, const char * argv[]) {
        cv::VideoCapture cap(0);
        cv::Mat frame;
        while (cap.read(frame)) {
            imshow("Frame", frame);
            if (cv::waitKey(1) == 'q') {
                break;
            }
        }
        return 0;
    }
    

    The stack trace is the following:

    2015-12-24 09:54:22.297 basic-capture[86100:4481590] -[CaptureDelegate doFireTimer:]: unrecognized selector sent to instance 0x103600680
    2015-12-24 09:54:22.313 basic-capture[86100:4481590] An uncaught exception was raised
    2015-12-24 09:54:22.313 basic-capture[86100:4481590] -[CaptureDelegate doFireTimer:]: unrecognized selector sent to instance 0x103600680
    2015-12-24 09:54:22.313 basic-capture[86100:4481590] (
        0   CoreFoundation                      0x00007fff95766ae2 __exceptionPreprocess + 178
        1   libobjc.A.dylib                     0x00007fff90699f7e objc_exception_throw + 48
        2   CoreFoundation                      0x00007fff95769b9d -[NSObject(NSObject) doesNotRecognizeSelector:] + 205
        3   CoreFoundation                      0x00007fff956a2601 ___forwarding___ + 1009
        4   CoreFoundation                      0x00007fff956a2188 _CF_forwarding_prep_0 + 120
        5   Foundation                          0x00007fff9c7d385b __NSFireTimer + 95
        6   CoreFoundation                      0x00007fff956acbc4 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20
        7   CoreFoundation                      0x00007fff956ac853 __CFRunLoopDoTimer + 1075
        8   CoreFoundation                      0x00007fff9572ae6a __CFRunLoopDoTimers + 298
        9   CoreFoundation                      0x00007fff95667cd1 __CFRunLoopRun + 1841
        10  CoreFoundation                      0x00007fff95667338 CFRunLoopRunSpecific + 296
        11  HIToolbox                           0x00007fff8f2f3935 RunCurrentEventLoopInMode + 235
        12  HIToolbox                           0x00007fff8f2f3677 ReceiveNextEventCommon + 184
        13  HIToolbox                           0x00007fff8f2f35af _BlockUntilNextEventMatchingListInModeWithFilter + 71
        14  AppKit                              0x00007fff967d10ee _DPSNextEvent + 1067
        15  AppKit                              0x00007fff96b9d943 -[NSApplication _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 454
        16  libqcocoa.dylib                     0x000000010555ae5a _ZN21QCocoaEventDispatcher13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE + 1034
        17  libopencv_highgui.3.1.dylib         0x000000010083c596 cvWaitKey + 178
        18  basic-capture                       0x0000000100001666 main + 246
        19  libdyld.dylib                       0x00007fff8a0335ad start + 1
        20  ???                                 0x0000000000000001 0x0 + 1
    )
    2015-12-24 09:54:22.314 basic-capture[86100:4481590] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[CaptureDelegate doFireTimer:]: unrecognized selector sent to instance 0x103600680'
    *** First throw call stack:
    (
        0   CoreFoundation                      0x00007fff95766ae2 __exceptionPreprocess + 178
        1   libobjc.A.dylib                     0x00007fff90699f7e objc_exception_throw + 48
        2   CoreFoundation                      0x00007fff95769b9d -[NSObject(NSObject) doesNotRecognizeSelector:] + 205
        3   CoreFoundation                      0x00007fff956a2601 ___forwarding___ + 1009
        4   CoreFoundation                      0x00007fff956a2188 _CF_forwarding_prep_0 + 120
        5   Foundation                          0x00007fff9c7d385b __NSFireTimer + 95
        6   CoreFoundation                      0x00007fff956acbc4 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20
        7   CoreFoundation                      0x00007fff956ac853 __CFRunLoopDoTimer + 1075
        8   CoreFoundation                      0x00007fff9572ae6a __CFRunLoopDoTimers + 298
        9   CoreFoundation                      0x00007fff95667cd1 __CFRunLoopRun + 1841
        10  CoreFoundation                      0x00007fff95667338 CFRunLoopRunSpecific + 296
        11  HIToolbox                           0x00007fff8f2f3935 RunCurrentEventLoopInMode + 235
        12  HIToolbox                           0x00007fff8f2f3677 ReceiveNextEventCommon + 184
        13  HIToolbox                           0x00007fff8f2f35af _BlockUntilNextEventMatchingListInModeWithFilter + 71
        14  AppKit                              0x00007fff967d10ee _DPSNextEvent + 1067
        15  AppKit                              0x00007fff96b9d943 -[NSApplication _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 454
        16  libqcocoa.dylib                     0x000000010555ae5a _ZN21QCocoaEventDispatcher13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE + 1034
        17  libopencv_highgui.3.1.dylib         0x000000010083c596 cvWaitKey + 178
        18  basic-capture                       0x0000000100001666 main + 246
        19  libdyld.dylib                       0x00007fff8a0335ad start + 1
        20  ???                                 0x0000000000000001 0x0 + 1
    )
    libc++abi.dylib: terminating with uncaught exception of type NSException
    
  • C++ cv::VideoCapture.open(0) always return false on Android

    C++ cv::VideoCapture.open(0) always return false on Android

    System information (version)
    • OpenCV => 3.4.1
    • Operating System macOS High Sierra 10.13.6 / Platform => Android 7.1.1 API 25
    • Compiler => Android NDK
    Detailed description
    Context

    I'm writing a cross platform OpenCV based C++ library. The consuming code is a React Native Application through a react native native module.

    To be perfectly clear, there is no access from Java Code to C++ OpenCV on Android. There are events with the result of the OpenCV C++ code sent to Javascript through the react native bridge.

    My native library is compiled on Android as a SHARED library. It is dynamically linked to the libopencv_world.so that is produced by the compilation of OpenCV C++ for Android.

    What it does

    Basically, it opens the device's default camera and take snapshots.

    The outcome

    This code is then ran on iOS and Android.

    This is working perfectly well on iOS. It fails on Android.

    Here is the failing part of C++ code on Adndroid:

    Steps to reproduce
    // cap is a cv::VideoCapture object    
    if (cap.open(0))
            {
                cap.set(cv::CAP_PROP_FRAME_WIDTH, CAM_WIDTH);
                cap.set(cv::CAP_PROP_FRAME_HEIGHT, CAM_HEIGHT);
            }
            else
            {
                reject("false", "cap.open(0) returned false");
            }
    
  • OpenCV3 python calls to FlannBasedMatcher::knnMatch fail with error

    OpenCV3 python calls to FlannBasedMatcher::knnMatch fail with error

    The following code returns an error.

        sift = x2d.SIFT_create(1000)
        features_l, des_l = sift.detectAndCompute(im_l, None)
        features_r, des_r = sift.detectAndCompute(im_r, None)
    
        FLANN_INDEX_KDTREE = 1
        index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
        search_params = dict(checks=50)
        flann = cv2.FlannBasedMatcher(index_params,search_params)
    

    The error returned:

    opencv/modules/python/src2/cv2.cpp:161: error: (-215) The data should normally be NULL! in function allocate
    

    I tried this with Python 2.7. FLANN_INDEX_KDTREE is set to 1 unlike here, since in modules/flann/include/opencv2/flann/defines.h I found it set to 1 on line 84.

  • fixing cap_pvpapi interface

    fixing cap_pvpapi interface

    This PR tries to fix the following 3 reported issues

    • http://code.opencv.org/issues/3946
    • http://code.opencv.org/issues/3947
    • http://code.opencv.org/issues/3948

    A complete remake of the PvAPI API interface solving bugs but also

    • allowing the use of MANTA type cameras
    • allowing multiple AVT cameras at the same time
  • fixing waitKey commands to be universal

    fixing waitKey commands to be universal

    As a follow up for the discussion held in PR #7098, and the discussion started in this topic, we decided to implement a waitChar function that always returns the correct value if the returned value is needed for further comparison to ASCII codes of char inputs.

  • AttributeError: 'module' object has no attribute 'face'

    AttributeError: 'module' object has no attribute 'face'

    I got an error when running opencv in Python on raspberry pi.

    I tried to find and apply it to fix the error, but it did not work out. I also confirmed that the module "face" is in the file opencv_contrib-3.3.0. I do not know why for some reason.

    error 1

    Traceback (most recent call last): File "training.py", line 13, in recognizer = cv2.face.createLBPHFaceRecognizer() AttributeError: 'module' object has no attribute 'face'

    error 2

    Traceback (most recent call last): File "training.py", line 13, in help(cv2.face) AttributeError: 'module' object has no attribute 'face'

    error3

    Traceback (most recent call last): File "training.py", line 13, in help(cv2.face.createLBPHFaceRecognizer) AttributeError: 'module' object has no attribute 'face'

    python : 3.5.3 opencv-3.3.0 opencv_contrib-3.3.0

    source code

    Import OpenCV2 for image processing

    Import os for file path

    import cv2, os

    Import numpy for matrix calculation

    import numpy as np

    Import Python Image Library (PIL)

    from PIL import Image

    Create Local Binary Patterns Histograms for face recognization

    recognizer = cv2.face.createLBPHFaceRecognizer()

    Using prebuilt frontal face training model, for face detection

    detector = cv2.CascadeClassifier("haarcascade_frontalface_default.xml");

    Create method to get the images and label data

    def getImagesAndLabels(path):

    # Get all file path
    imagePaths = [os.path.join(path,f) for f in os.listdir(path)] 
    
    # Initialize empty face sample
    faceSamples=[]
    
    # Initialize empty id
    ids = []
    
    # Loop all the file path
    for imagePath in imagePaths:
    
        # Get the image and convert it to grayscale
        PIL_img = Image.open(imagePath).convert('L')
    
        # PIL image to numpy array
        img_numpy = np.array(PIL_img,'uint8')
    
        # Get the image id
        id = int(os.path.split(imagePath)[-1].split(".")[1])
        print(id)
    
        # Get the face from the training images
        faces = detector.detectMultiScale(img_numpy)
    
        # Loop for each face, append to their respective ID
        for (x,y,w,h) in faces:
    
            # Add the image to face samples
            faceSamples.append(img_numpy[y:y+h,x:x+w])
    
            # Add the ID to IDs
            ids.append(id)
    
    # Pass the face array and IDs array
    return faceSamples,ids
    

    Get the faces and IDs

    faces,ids = getImagesAndLabels('dataset')

    Train the model using the faces and IDs

    recognizer.train(faces, np.array(ids))

    Save the model into trainer.yml

    recognizer.save('trainer/trainer.yml')

  • [GSOC] New camera model for stitching pipeline

    [GSOC] New camera model for stitching pipeline

    Merge with extra: https://github.com/opencv/opencv_extra/pull/303

    This PR contains all work for New camera model for stitching pipeline GSoC 2016 project.

    GSoC Proposal

    Stitching pipeline is a well established code in OpenCV. It provides good results for creating panoramas from camera captured images. Main limitation of stitching pipeline is its expected camera model (perspective transformation). Although this model is fine for many applications working with camera captured images, there are applications which aren't covered by current stitching pipeline.

    New camera model

    Due to physical constraints it is possible for some applications to expect much simpler transform with less degrees of freedom. Those are situations when input data are not subject to perspective transform. The transformation can be much simpler, such as affine transformation. Datasets considered here includes images captured by special hardware (such as book scanners[0] that tries hard to eliminate perspective), maps from laser scanning (produced from different starting points), preprocessed images (where perspective was compensated by other robust means, taking advantage of physical situation, e.g. for book scanners we would use data from calibration to compensate remaining perspective). In all those situations we would like to obtain image mosaic under affine transformation.

    I'd like to introduce new camera model based on affine transformation to stitching pipeline. This would include:

    • New Matcher using affine transformation (cv::estimateRigidTransform) to estimate H
    • New Estimator aware of affine model.
    • Defining and documenting this new model for CameraParams (e.g. now translation is always expected to be zero, this might not be true for affine transformation)
    • Integration works in compositing part of pipeline (there might be changes necessary depending how we would decide to represent affine model in CameraParams)
    • New options for high-level API to be able to use affine model instead of current one simply
    • Producing new sample code for stitching pipeline
    • Improving current documentation (current documentation does not mention details about current camera model, this might need some clarification)

    I used approach based on affine transformation to merge maps produced by multiple robots [1] for my robotics project. It shows a good results. However, as mentioned earlier applications for this model are much broader than that.

    Parallelism for FeaturesFinder

    To make usage of stitching pipeline more comfortable and performant for large number of images, I’d like also to improve FeaturesFinder to allow finding features in parallel. All camera models and other users of FeaturesFinder may take benefit from that. The API could be similar to FeaturesMatcher::operator ()(features, pairwise_matches, mask).

    This could be with TBB in similar manner as mentioned method in FeaturesMatcher, which is already being used in stitching pipeline so there would be almost no additional overhead in starting new threads in typical scenarios, because these threads are there already for FeaturesMatcher. This change would be fully integrated into high level stitching interface.

    There might be some changes necessary in finders to ensure thread-safety. Where thread-safety can’t be ensured or it does not make sense (GPU finders), parallelization would be disabled and all images would be processed in serial manner so this method would be always safe to use regardless of underlying finder. This approach is also similar to FeaturesMatcher.

    Benefits to OpenCV

    • New transform options for stitching pipeline
    • Performance improvements through parallel processing of image features

    implemented goals (all + extras)

    new camera model

    • [x] affine matcher
    • [x] affine estimator
    • [x] affine warper
    • [x] affine bundle adjusters
    • tests for affine stitching
      • [x] basic affine stitching integration test
      • [x] tests for affine matcher
      • [x] integration tests on real-word scans
      • [x] affine warper tests
      • [x] affine bundle adjusters tests
    • [x] integrating with high level API (Stitcher)
    • [x] stitching_detailed sample
    • [x] stitching simple sample

    parallel feature finding

    • [x] parallel API in feature finder
    • [x] tests (incl. perf tests)
    • [x] integrating with Stitcher

    implemented extras

    • [x] robust fuctions for affine transform estimations
      • [x] add support for least median robust method
      • [x] tests for LMEDS
      • [x] Levenberg–Marquardt algorithm-based refining for affine estimation functions
      • [x] tests and docs
      • [x] perf tests
    • [x] stitching tutorial for high level API
      • [x] add examples of running the samples on testdata in opencv_extra
    • [x] fix existing stitching tests with SURF (SURF was disabled)

    video

    short video presenting this project

    other work

    During this GSoC I have also coded some related work, that is not going to be included (mostly because we has chosen different approach or the work has been merged under this PR). It is listed here for completeness.

    PRs:

    • #6560
    • #6609
    • #6615
    • #6642

    commits:

    • eba30a89737d4ded755f07cff75fd861864cf09a
    • 150daa2dc57a258ba61a01e12901518b6b4d98e8
  • [GSoC] Add siamrpnpp.py

    [GSoC] Add siamrpnpp.py

    GSoC '20 : Real-time Single Object Tracking using Deep Learning (SiamRPN++)

    Overview

    Proposal : https://summerofcode.withgoogle.com/projects/#4979746967912448 Mentors : Liubov Batanina @l-bat, Stefano Fabri @bhack, Ilya Elizarov @ieliz Student : Jin Yeob Chung @jinyup100

    Details of the Pull Request

    • Export of the torch implementation of the SiamRPN++ visual tracker to ONNX
      • Please refer to (https://gist.github.com/jinyup100/7aa748686c5e234ed6780154141b4685) or Code to generate ONNX models at the bottom of this PR description
    • Addition of siamrpnpp.py in the opencv/samples/dnn repository
      • SiamRPN++ visual tracker can be performed on a sample video input
      • Parsers include:
        • --input_video path to sample video input
        • --target_net path to target branch of the visual tracker
        • --search_net path to search branch of the visual tracker
        • --rpn_head path to head of the visual tracker
        • --backend selection of the computation backend
        • --target selection of the computation target device
    • Additional samples of the visual tracker performed on videos are available at:
      • https://drive.google.com/drive/folders/1k7Z_SHaBWK_4aEQPxJJCGm3P7y2IFCjY?usp=sharing
    Examples

    Pull Request Readiness Checklist

    See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

    • [X] I agree to contribute to the project under OpenCV (BSD) License.
    • [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
    • [X] The PR is proposed to proper branch
    • [X] There is reference to original bug report and related work
    • [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
    • [X] The feature is well documented and sample code can be built with the project CMake
    Code to generate ONNX Models

    The code shown below to generate the ONNX models of siamrpn++ is also available from : https://gist.github.com/jinyup100/7aa748686c5e234ed6780154141b4685

    ball_track

    The Final Version of the Pre-Trained Weights and successfully converted ONNX format of the models using the codes are available at::

    Pre-Trained Weights in pth Format https://drive.google.com/file/d/11bwgPFVkps9AH2NOD1zBDdpF_tQghAB-/view?usp=sharing

    Target Net : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1dw_Ne3UMcCnFsaD6xkZepwE4GEpqq7U_/view?usp=sharing

    Search Net : Import :heavy_check_mark: Export :heavy_check_mark: https://drive.google.com/file/d/1Lt4oE43ZSucJvze3Y-Z87CVDreO-Afwl/view?usp=sharing

    RPN_head : Import : :heavy_check_mark: Export :heavy_check_mark:
    https://drive.google.com/file/d/1zT1yu12mtj3JQEkkfKFJWiZ71fJ-dQTi/view?usp=sharing

    import numpy as np
    import onnx
    import torch
    import torch.nn as nn
    
    # Class for the Building Blocks required for ResNet
    class Bottleneck(nn.Module):
        expansion = 4
    
        def __init__(self, inplanes, planes, stride=1,
                     downsample=None, dilation=1):
            super(Bottleneck, self).__init__()
            self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
            self.bn1 = nn.BatchNorm2d(planes)
            padding = 2 - stride
            if downsample is not None and dilation > 1:
                dilation = dilation // 2
                padding = dilation
    
            assert stride == 1 or dilation == 1, \
                "stride and dilation must have one equals to zero at least"
    
            if dilation > 1:
                padding = dilation
            self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                                   padding=padding, bias=False, dilation=dilation)
            self.bn2 = nn.BatchNorm2d(planes)
            self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
            self.bn3 = nn.BatchNorm2d(planes * 4)
            self.relu = nn.ReLU(inplace=True)
            self.downsample = downsample
            self.stride = stride
    
        def forward(self, x):
            residual = x
    
            out = self.conv1(x)
            out = self.bn1(out)
            out = self.relu(out)
    
            out = self.conv2(out)
            out = self.bn2(out)
            out = self.relu(out)
    
            out = self.conv3(out)
            out = self.bn3(out)
    
            if self.downsample is not None:
                residual = self.downsample(x)
    
            out += residual
    
            out = self.relu(out)
    
            return out
        
    # End of Building Blocks
    
    # Class for ResNet - the Backbone neural network
    
    class ResNet(nn.Module):
        "ResNET"
        def __init__(self, block, layers, used_layers):
            self.inplanes = 64
            super(ResNet, self).__init__()
            self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=0,  # 3
                                   bias=False)
            self.bn1 = nn.BatchNorm2d(64)
            self.relu = nn.ReLU(inplace=True)
            self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
            self.layer1 = self._make_layer(block, 64, layers[0])
            self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
    
            self.feature_size = 128 * block.expansion
            self.used_layers = used_layers
            layer3 = True if 3 in used_layers else False
            layer4 = True if 4 in used_layers else False
    
            if layer3:
                self.layer3 = self._make_layer(block, 256, layers[2],
                                               stride=1, dilation=2)  # 15x15, 7x7
                self.feature_size = (256 + 128) * block.expansion
            else:
                self.layer3 = lambda x: x  # identity
    
            if layer4:
                self.layer4 = self._make_layer(block, 512, layers[3],
                                               stride=1, dilation=4)  # 7x7, 3x3
                self.feature_size = 512 * block.expansion
            else:
                self.layer4 = lambda x: x  # identity
    
            for m in self.modules():
                if isinstance(m, nn.Conv2d):
                    n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                    m.weight.data.normal_(0, np.sqrt(2. / n))
                elif isinstance(m, nn.BatchNorm2d):
                    m.weight.data.fill_(1)
                    m.bias.data.zero_()
    
        def _make_layer(self, block, planes, blocks, stride=1, dilation=1):
            downsample = None
            dd = dilation
            if stride != 1 or self.inplanes != planes * block.expansion:
                if stride == 1 and dilation == 1:
                    downsample = nn.Sequential(
                        nn.Conv2d(self.inplanes, planes * block.expansion,
                                  kernel_size=1, stride=stride, bias=False),
                        nn.BatchNorm2d(planes * block.expansion),
                    )
                else:
                    if dilation > 1:
                        dd = dilation // 2
                        padding = dd
                    else:
                        dd = 1
                        padding = 0
                    downsample = nn.Sequential(
                        nn.Conv2d(self.inplanes, planes * block.expansion,
                                  kernel_size=3, stride=stride, bias=False,
                                  padding=padding, dilation=dd),
                        nn.BatchNorm2d(planes * block.expansion),
                    )
    
            layers = []
            layers.append(block(self.inplanes, planes, stride,
                                downsample, dilation=dilation))
            self.inplanes = planes * block.expansion
            for i in range(1, blocks):
                layers.append(block(self.inplanes, planes, dilation=dilation))
    
            return nn.Sequential(*layers)
    
        def forward(self, x):
            x = self.conv1(x)
            x = self.bn1(x)
            x_ = self.relu(x)
            x = self.maxpool(x_)
    
            p1 = self.layer1(x)
            p2 = self.layer2(p1)
            p3 = self.layer3(p2)
            p4 = self.layer4(p3)
            out = [x_, p1, p2, p3, p4]
            out = [out[i] for i in self.used_layers]
            if len(out) == 1:
                return out[0]
            else:
                return out
            
    # End of ResNet
    
    # Class for Adjusting the layers of the neural net
    
    class AdjustLayer_1(nn.Module):
        def __init__(self, in_channels, out_channels, center_size=7):
            super(AdjustLayer_1, self).__init__()
            self.downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
                nn.BatchNorm2d(out_channels),
                )
            self.center_size = center_size
    
        def forward(self, x):
            x = self.downsample(x)
            l = 4
            r = 11
            x = x[:, :, l:r, l:r]
            return x
    
    class AdjustAllLayer_1(nn.Module):
        def __init__(self, in_channels, out_channels, center_size=7):
            super(AdjustAllLayer_1, self).__init__()
            self.num = len(out_channels)
            if self.num == 1:
                self.downsample = AdjustLayer_1(in_channels[0],
                                              out_channels[0],
                                              center_size)
            else:
                for i in range(self.num):
                    self.add_module('downsample'+str(i+2),
                                    AdjustLayer_1(in_channels[i],
                                                out_channels[i],
                                                center_size))
    
        def forward(self, features):
            if self.num == 1:
                return self.downsample(features)
            else:
                out = []
                for i in range(self.num):
                    adj_layer = getattr(self, 'downsample'+str(i+2))
                    out.append(adj_layer(features[i]))
                return out
            
    class AdjustLayer_2(nn.Module):
        def __init__(self, in_channels, out_channels, center_size=7):
            super(AdjustLayer_2, self).__init__()
            self.downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
                nn.BatchNorm2d(out_channels),
                )
            self.center_size = center_size
    
        def forward(self, x):
            x = self.downsample(x)
            return x
    
    class AdjustAllLayer_2(nn.Module):
        def __init__(self, in_channels, out_channels, center_size=7):
            super(AdjustAllLayer_2, self).__init__()
            self.num = len(out_channels)
            if self.num == 1:
                self.downsample = AdjustLayer_2(in_channels[0],
                                              out_channels[0],
                                              center_size)
            else:
                for i in range(self.num):
                    self.add_module('downsample'+str(i+2),
                                    AdjustLayer_2(in_channels[i],
                                                out_channels[i],
                                                center_size))
    
        def forward(self, features):
            if self.num == 1:
                return self.downsample(features)
            else:
                out = []
                for i in range(self.num):
                    adj_layer = getattr(self, 'downsample'+str(i+2))
                    out.append(adj_layer(features[i]))
                return out
            
    # End of Class for Adjusting the layers of the neural net
    
    # Class for Region Proposal Neural Network
    
    class RPN(nn.Module):
        "Region Proposal Network"
        def __init__(self):
            super(RPN, self).__init__()
    
        def forward(self, z_f, x_f):
            raise NotImplementedError
            
    class DepthwiseXCorr(nn.Module):
        "Depthwise Correlation Layer"
        def __init__(self, in_channels, hidden, out_channels, kernel_size=3, hidden_kernel_size=5):
            super(DepthwiseXCorr, self).__init__()
            self.conv_kernel = nn.Sequential(
                    nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                    nn.BatchNorm2d(hidden),
                    nn.ReLU(inplace=True),
                    )
            self.conv_search = nn.Sequential(
                    nn.Conv2d(in_channels, hidden, kernel_size=kernel_size, bias=False),
                    nn.BatchNorm2d(hidden),
                    nn.ReLU(inplace=True),
                    )
            self.head = nn.Sequential(
                    nn.Conv2d(hidden, hidden, kernel_size=1, bias=False),
                    nn.BatchNorm2d(hidden),
                    nn.ReLU(inplace=True),
                    nn.Conv2d(hidden, out_channels, kernel_size=1)
                    )
            
        def forward(self, kernel, search):    
            kernel = self.conv_kernel(kernel)
            search = self.conv_search(search)
            
            feature = xcorr_depthwise(search, kernel)
            
            out = self.head(feature)
            
            return out
    
    class DepthwiseRPN(RPN):
        def __init__(self, anchor_num=5, in_channels=256, out_channels=256):
            super(DepthwiseRPN, self).__init__()
            self.cls = DepthwiseXCorr(in_channels, out_channels, 2 * anchor_num)
            self.loc = DepthwiseXCorr(in_channels, out_channels, 4 * anchor_num)
    
        def forward(self, z_f, x_f):
            cls = self.cls(z_f, x_f)
            loc = self.loc(z_f, x_f)
            
            return cls, loc
    
    class MultiRPN(RPN):
        def __init__(self, anchor_num, in_channels):
            super(MultiRPN, self).__init__()
            for i in range(len(in_channels)):
                self.add_module('rpn'+str(i+2),
                        DepthwiseRPN(anchor_num, in_channels[i], in_channels[i]))
            self.weight_cls = nn.Parameter(torch.Tensor([0.38156851768108546, 0.4364767608115956,  0.18195472150731892]))
            self.weight_loc = nn.Parameter(torch.Tensor([0.17644893463361863, 0.16564198028417967, 0.6579090850822015]))
    
        def forward(self, z_fs, x_fs):
            cls = []
            loc = []
            
            rpn2 = self.rpn2
            z_f2 = z_fs[0]
            x_f2 = x_fs[0]
            c2,l2 = rpn2(z_f2, x_f2)
            
            cls.append(c2)
            loc.append(l2)
            
            rpn3 = self.rpn3
            z_f3 = z_fs[1]
            x_f3 = x_fs[1]
            c3,l3 = rpn3(z_f3, x_f3)
            
            cls.append(c3)
            loc.append(l3)
            
            rpn4 = self.rpn4
            z_f4 = z_fs[2]
            x_f4 = x_fs[2]
            c4,l4 = rpn4(z_f4, x_f4)
            
            cls.append(c4)
            loc.append(l4)
            
            def avg(lst):
                return sum(lst) / len(lst)
    
            def weighted_avg(lst, weight):
                s = 0
                fixed_len = 3
                for i in range(3):
                    s += lst[i] * weight[i]
                return s
    
            return weighted_avg(cls, self.weight_cls), weighted_avg(loc, self.weight_loc)
            
    # End of class for RPN
    
    def conv3x3(in_planes, out_planes, stride=1, dilation=1):
        "3x3 convolution with padding"
        return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                         padding=dilation, bias=False, dilation=dilation)
    
    def xcorr_depthwise(x, kernel):
        """
        Deptwise convolution for input and weights with different shapes
        """
        batch = kernel.size(0)
        channel = kernel.size(1)
        x = x.view(1, batch*channel, x.size(2), x.size(3))
        kernel = kernel.view(batch*channel, 1, kernel.size(2), kernel.size(3))
        conv = nn.Conv2d(batch*channel, batch*channel, kernel_size=(kernel.size(2), kernel.size(3)), bias=False, groups=batch*channel)
        conv.weight = nn.Parameter(kernel)
        out = conv(x) 
        out = out.view(batch, channel, out.size(2), out.size(3))
        out = out.detach()
        return out
    
    class TargetNetBuilder(nn.Module):
        def __init__(self):
            super(TargetNetBuilder, self).__init__()
            # Build Backbone Model
            self.backbone = ResNet(Bottleneck, [3,4,6,3], [2,3,4])
            # Build Neck Model
            self.neck = AdjustAllLayer_1([512,1024,2048], [256,256,256])
        
        def forward(self, frame):
            features = self.backbone(frame)
            output = self.neck(features)
            return output
    
    class SearchNetBuilder(nn.Module):
        def __init__(self):
            super(SearchNetBuilder, self).__init__()
            # Build Backbone Model
            self.backbone = ResNet(Bottleneck, [3,4,6,3], [2,3,4])
            # Build Neck Model
            self.neck = AdjustAllLayer_2([512,1024,2048], [256,256,256])
            
        def forward(self, frame):
            features = self.backbone(frame)
            output = self.neck(features)
            return output
     
    class RPNBuilder(nn.Module):
        def __init__(self):
            super(RPNBuilder, self).__init__()
    
            # Build Adjusted Layer Builder
            self.rpn_head = MultiRPN(anchor_num=5,in_channels=[256, 256, 256])
    
        def forward(self, zf, xf):
            # Get Feature
            cls, loc = self.rpn_head(zf, xf)
    
            return cls, loc
        
    """Load path should be the directory of the pre-trained siamrpn_r50_l234_dwxcorr.pth
     The download link to siamrpn_r50_l234_dwxcorr.pth is shown in the description"""
    
    current_path = os.getcwd()
    load_path = os.path.join(current_path, "siamrpn_r50_l234_dwxcorr.pth")
    pretrained_dict = torch.load(load_path,map_location=torch.device('cpu') )
    pretrained_dict_backbone = pretrained_dict
    pretrained_dict_neck_1 = pretrained_dict
    pretrained_dict_neck_2 = pretrained_dict
    pretrained_dict_head = pretrained_dict
    pretrained_dict_target = pretrained_dict
    pretrained_dict_search = pretrained_dict
    
    # The shape of the inputs to the Target Network and the Search Network
    target = torch.Tensor(np.random.rand(1,3,127,127))
    search = torch.Tensor(np.random.rand(1,3,125,125))
    
    # Build the torch backbone model
    target_net = TargetNetBuilder()
    target_net.eval()
    target_net.state_dict().keys()
    target_net_dict = target_net.state_dict()
    
    # Load the pre-trained weight to the torch target net model
    pretrained_dict_target = {k: v for k, v in pretrained_dict_target.items() if k in target_net_dict}
    target_net_dict.update(pretrained_dict_target)
    target_net.load_state_dict(target_net_dict)
    
    # Export the torch target net model to ONNX model
    torch.onnx.export(target_net, torch.Tensor(target), "target_net.onnx", export_params=True, opset_version=11,
                      do_constant_folding=True, input_names=['input'], output_names=['output_1,', 'output_2', 'output_3'])
    
    # Load the saved torch target net model using ONNX
    onnx_target = onnx.load("target_net.onnx")
    
    # Check whether the ONNX target net model has been successfully imported
    onnx.checker.check_model(onnx_target)
    print(onnx.checker.check_model(onnx_target))
    onnx.helper.printable_graph(onnx_target.graph)
    print(onnx.helper.printable_graph(onnx_target.graph))
    
    # Build the torch backbone model
    search_net = SearchNetBuilder()
    search_net.eval()
    search_net.state_dict().keys()
    search_net_dict = search_net.state_dict()
    
    # Load the pre-trained weight to the torch target net model
    pretrained_dict_search = {k: v for k, v in pretrained_dict_search.items() if k in search_net_dict}
    search_net_dict.update(pretrained_dict_search)
    search_net.load_state_dict(search_net_dict)
    
    # Export the torch target net model to ONNX model
    torch.onnx.export(search_net, torch.Tensor(search), "search_net.onnx", export_params=True, opset_version=11,
                      do_constant_folding=True, input_names=['input'], output_names=['output_1,', 'output_2', 'output_3'])
    
    # Load the saved torch target net model using ONNX
    onnx_search = onnx.load("search_net.onnx")
    
    # Check whether the ONNX target net model has been successfully imported
    onnx.checker.check_model(onnx_search)
    print(onnx.checker.check_model(onnx_search))
    onnx.helper.printable_graph(onnx_search.graph)
    print(onnx.helper.printable_graph(onnx_search.graph))
    
    # Outputs from the Target Net and Search Net
    zfs_1, zfs_2, zfs_3 = target_net(torch.Tensor(target))
    xfs_1, xfs_2, xfs_3 = search_net(torch.Tensor(search))
    
    # Adjustments to the outputs from each of the neck models to match to input shape of the torch rpn_head model
    zfs = np.stack([zfs_1.detach().numpy(), zfs_2.detach().numpy(), zfs_3.detach().numpy()])
    xfs = np.stack([xfs_1.detach().numpy(), xfs_2.detach().numpy(), xfs_3.detach().numpy()])
    
    # Build the torch rpn_head model
    rpn_head = RPNBuilder()
    rpn_head.eval()
    rpn_head.state_dict().keys()
    rpn_head_dict = rpn_head.state_dict()
    
    # Load the pre-trained weights to the rpn_head model
    pretrained_dict_head = {k: v for k, v in pretrained_dict_head.items() if k in rpn_head_dict}
    pretrained_dict_head.keys()
    rpn_head_dict.update(pretrained_dict_head)
    rpn_head.load_state_dict(rpn_head_dict)
    rpn_head.eval()
    
    # Export the torch rpn_head model to ONNX model
    torch.onnx.export(rpn_head, (torch.Tensor(np.random.rand(*zfs.shape)), torch.Tensor(np.random.rand(*xfs.shape))), "rpn_head.onnx", export_params=True, opset_version=11,
                      do_constant_folding=True, input_names = ['input_1', 'input_2'], output_names = ['output_1', 'output_2'])
    
    # Load the saved rpn_head model using ONNX
    onnx_rpn_head_model = onnx.load("rpn_head.onnx")
    
    # Check whether the rpn_head model has been successfully imported
    onnx.checker.check_model(onnx_rpn_head_model)
    print(onnx.checker.check_model(onnx_rpn_head_model))    
    onnx.helper.printable_graph(onnx_rpn_head_model.graph)
    print(onnx.helper.printable_graph(onnx_rpn_head_model.graph))
    
    
  • Refactor core module for type-safety

    Refactor core module for type-safety

    Merge with opencv/opencv_contrib#1768 and opencv/opencv_extra#518

    This pullrequest changes

    • Provides a backward-compatible API based on enums for type-safty. Refer to #12288 for further details

    Utilized options/preprocessors:

    • CV_TYPE_SAFE_API with CMAKE option ENABLE_TYPE_SAFE_API to enable enum-based type-safe API
    • CV_TYPE_COMPATIBLE_API with CMAKE option ENABLE_COMPATIBLE_API to enable the overloaded int-based API. Although it is enabled by default and using it would raise deprecation warnings, it still recommended to disable it in the build farm to enforce good practices. CV_COMPATIBLE_API is only available when ENABLE_TYPE_SAFE_API is set.
    • CV_TRANSNATIONAL_API utilized internally, and should be removed after migration completes.
    force_builders=Custom,linux32,win32,windows_vc15,Android pack,ARMv7,ARMv8
    docker_image:Custom=ubuntu-cuda:16.04
    docker_image:Docs=docs-js
    
  • (4.x) Merge 3.4

    (4.x) Merge 3.4

    #21847 from lamm45:imgproc-tform-doc #21871 from xiongzhen:apply-predictor-to-lzw-only #21882 from duanqn:improve-doc #21916 from chenjunnn:patch-1 #21917 from asenyaev:asen/self_hosted_runner_linux_3.4 #21924 from fengyuentau:workflow_arm64_3.4 #21933 from Yulv-git:3.4-typos1 #21935 from Yulv-git:3.4-typos3 #21943 from vrabaud:3.4_proc #21954 from Darkyenus:patch-1 #21963 from hellodoge:imwrite_fix #21964 from Julian-Sz:patch-1 #21970 from asenyaev:asen/filtering_tests_3.4 #21974 from cxcorp:fix-js-test-globals #21975 from asenyaev:asen/fix_terminating_windows_actions #21977 from asenyaev:asen/win_contrib #21980 from asenyaev:asen/move_variables_to_the_host

    Previous "Merge 3.4": #21936

    build_image:Docs=docs-js:18.04
    build_image:Custom=javascript
    buildworker:Custom=linux-f1
    
  • Dummy support of CV_32U/CV_64S/CV_64U

    Dummy support of CV_32U/CV_64S/CV_64U

    I understand that there are many reason to not support CV algorithms for CV_32U/CV_64S/CV_64U However, just wrapping data in cv::Mat is very handy. Would it be possible to support CV_32U/CV_64S/CV_64U for data wrapping only (supporting basic memory operations like copy/roi/raw data access) ?

  • Morello fixes

    Morello fixes

    Make OpenCV build and run self-tests on Morello.

    Pull Request Readiness Checklist

    See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

    • [ ] I agree to contribute to the project under Apache 2 License.
    • [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
    • [ ] The PR is proposed to the proper branch
    • [ ] There is a reference to the original bug report and related work
    • [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
    • [ ] The feature is well documented and sample code can be built with the project CMake
  • Build fails after WITH_OPENGL is added (3.4.17)

    Build fails after WITH_OPENGL is added (3.4.17)

    Rebuilding OpenCV 3.4.17 that already had WITH_OPENCL and WITH_CUDA after enabling WITH_OPENGL fails.

    • OpenCV => 3.4.17
    • Operating System / Platform =>Windows 10 x64
    • Compiler => Visual Studio 2019
    Detailed description

    The Debug build fails on:

    1. opencv_python3 project's configuration is NOT pointing to python39_d.lib anymore but to python39.lib even though cmake configuration provides the path to python39_d.lib. Before adding WITH_OPENGL this was correct. Easy workaround is the correct this manually in the VS project property page (and is a non-issue for non-debug builds).

    2. opencv_world3417 build fails with unresolved external symbol clGetGLContextInfoKHR; showstopper. no workaround.

    Steps to reproduce

    image image Configure and Generate using these "WITH"s and build debug version via Visual Studio 2019.

    Issue submission checklist
    • [X] I report the issue, it's not a question
    • [X] I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
    • [X] I updated to the latest OpenCV version and the issue is still there
    • [ ] There is reproducer code and related data files: videos, images, onnx, etc
  • After setting the decoder to hevc_cuvid, VideoCapture reads video more slowly

    After setting the decoder to hevc_cuvid, VideoCapture reads video more slowly

    System information (version)
    • OpenCV => 4.5.4
    • Operating System / Platform => Ubuntu 20.04.3 LTS
    • Compiler => gcc version 9.4.0
    Detailed description

    I have a code that tests how long VideoCapture takes to read a video.

    uint64_t get_timestamp() {
        using namespace std::chrono;
        return duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
    }
    
    int main(int argc, char** argv)
    {
        auto start = get_timestamp();
        cv::VideoCapture cap("base.ts", cv::CAP_FFMPEG
            // , { cv::CAP_PROP_HW_ACCELERATION, cv::VIDEO_ACCELERATION_ANY }
        );
        if (!cap.isOpened())
        {
            std::cerr << "Open file fails" << std::endl;
            return -1;
        }
        cv::Mat frame;
        while (true) {
            cap >> frame;
            if (frame.empty()) {
                break;
            }
        }
        auto end = get_timestamp();
        std::cout << (end - start) << std::endl;
    }
    

    The build information for my ffmpeg is as follows:

    ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
    built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
    configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
    libavutil      56. 31.100 / 56. 31.100
    libavcodec     58. 54.100 / 58. 54.100
    libavformat    58. 29.100 / 58. 29.100
    libavdevice    58.  8.100 / 58.  8.100
    libavfilter     7. 57.100 /  7. 57.100
    libavresample   4.  0.  0 /  4.  0.  0
    libswscale      5.  5.100 /  5.  5.100
    libswresample   3.  5.100 /  3.  5.100
    libpostproc    55.  5.100 / 55.  5.100
    

    And ffmpeg has a hevc_cuvid decoder:

    ffmpeg -decoders|grep hevc
     VFS..D hevc                 HEVC (High Efficiency Video Coding)
     V..... hevc_v4l2m2m         V4L2 mem2mem HEVC decoder wrapper (codec hevc)
     V..... hevc_cuvid           Nvidia CUVID HEVC decoder (codec hevc)
    

    I use "base.ts"(same as mentioned in the cpp code ) to test ffmpeg's hevc_cuvid decoder:

    time ffmpeg -vcodec hevc_cuvid -i base.ts -f null /dev/null
    frame= 6132 fps=2191 q=-0.0 Lsize=N/A time=00:06:01.05 bitrate=N/A speed= 129x
    
    real	0m1.560s
    user	0m0.740s
    sys	0m0.216s
    
    --------------------------------------------------------------
    
    time ffmpeg -i base.ts -f null /dev/null
    frame= 6132 fps=2191 q=-0.0 Lsize=N/A time=00:06:01.05 bitrate=N/A speed= 129x
    
    real	0m2.840s
    user	0m10.606s
    sys	0m0.175s
    

    So It looks like ffmpeg's hardware decoding acceleration is working.

    I build OpenCV with the following configuration:

    cmake \
      -DCMAKE_BUILD_TYPE=Release \
      -DOPENCV_GENERATE_PKGCONFIG=ON \
      -DBUILD_TESTS=OFF \
      -DBUILD_PERF_TESTS=OFF \
      -DBUILD_EXAMPLES=OFF \
      -DWITH_CUDA=OFF \
      -DWITH_FFMPEG=ON \
      -DDCMAKE_INSTALL_PREFIX=/usr/local \
      -DBUILD_opencv_python2=OFF \
      -DBUILD_opencv_python3=ON \
      ..
    

    First I export OPENCV_FFMPEG_CAPTURE_OPTIONS then execute the cpp code, the execution duration is as follows

    export OPENCV_FFMPEG_CAPTURE_OPTIONS="video_codec;hevc_cuvid"
    
    real    0m5.395s
    user    0m5.404s
    sys     0m0.220s
    

    nvidia-smi info:

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
    |  0%   47C    P2    61W / 260W |    309MiB / 11264MiB |      5%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A      1100      G   /usr/lib/xorg/Xorg                 35MiB |
    |    0   N/A  N/A      1828      G   /usr/lib/xorg/Xorg                 80MiB |
    |    0   N/A  N/A      1955      G   /usr/bin/gnome-shell               20MiB |
    |    0   N/A  N/A     58130      C   ...enchmark/OpenCV_benchmark      157MiB |
    +-----------------------------------------------------------------------------+
    

    PID: 58130 is my executable. So OpenCV does use GPU resources.

    Then I unset OPENCV_FFMPEG_CAPTURE_OPTIONS and re-execute the code:

    unset OPENCV_FFMPEG_CAPTURE_OPTIONS
    
    real    0m3.035s
    user    0m13.068s
    sys     0m0.122s
    

    nvidia-smi info:

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
    |  0%   43C    P8    12W / 260W |    149MiB / 11264MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A      1100      G   /usr/lib/xorg/Xorg                 35MiB |
    |    0   N/A  N/A      1828      G   /usr/lib/xorg/Xorg                 80MiB |
    |    0   N/A  N/A      1955      G   /usr/bin/gnome-shell               20MiB |
    +-----------------------------------------------------------------------------+
    

    The process of my executable is not shown in the results, and the CPU usage is higher than before.

    So my question is: when execute the cpp code, why hevc_cuvid decoders is slower than cpu. Thanks.

    Steps to reproduce
    Issue submission checklist
    • [ ] I report the issue, it's not a question
    • [x] I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
    • [x] I updated to the latest OpenCV version and the issue is still there
    • [x] There is reproducer code and related data files: videos, images, onnx, etc
  • Modify randomSampling in point cloud

    Modify randomSampling in point cloud

    Fix a bug: the output format should be determined by the channel of the sampled_pts.

    Pull Request Readiness Checklist

    See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

    • [x] I agree to contribute to the project under Apache 2 License.
    • [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
    • [x] The PR is proposed to the proper branch
    • [ ] There is a reference to the original bug report and related work
    • [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
    • [ ] The feature is well documented and sample code can be built with the project CMake
An open library of computer vision algorithms

VLFeat -- Vision Lab Features Library Version 0.9.21 The VLFeat open source library implements popular computer vision algorithms specialising in imag

May 6, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

May 11, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

May 10, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

May 8, 2022
A C++ standalone library for machine learning

Flashlight: Fast, Flexible Machine Learning in C++ Quickstart | Installation | Documentation Flashlight is a fast, flexible machine learning library w

May 16, 2022
libsvm websitelibsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website

Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification,

May 10, 2022
mlpack: a scalable C++ machine learning library --
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

May 14, 2022
oneAPI Data Analytics Library (oneDAL)
oneAPI Data Analytics Library (oneDAL)

Intel® oneAPI Data Analytics Library Installation | Documentation | Support | Examples | Samples | How to Contribute Intel® oneAPI Data Analytics Libr

May 20, 2022
A C library for product recommendations/suggestions using collaborative filtering (CF)

Recommender A C library for product recommendations/suggestions using collaborative filtering (CF). Recommender analyzes the feedback of some users (i

May 16, 2022
RNNLIB is a recurrent neural network library for sequence learning problems. Forked from Alex Graves work http://sourceforge.net/projects/rnnl/

Origin The original RNNLIB is hosted at http://sourceforge.net/projects/rnnl while this "fork" is created to repeat results for the online handwriting

May 10, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library,  for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

May 20, 2022
FoLiA library for C++

Libfolia: FoLiA Library for C++ Libfolia (c) CLS/ILK 2010 - 2021 Centre for Language Studies, Radboud University Nijmegen Induction of Linguistic Know

Dec 31, 2021
MITIE: library and tools for information extraction

MITIE: MIT Information Extraction This project provides free (even for commercial use) state-of-the-art information extraction tools. The current rele

May 8, 2022
Flashlight is a C++ standalone library for machine learning
Flashlight is a C++ standalone library for machine learning

Flashlight is a fast, flexible machine learning library written entirely in C++ from the Facebook AI Research Speech team and the creators of Torch and Deep Speech.

May 11, 2022
An optimized neural network operator library for chips base on Xuantie CPU.

简介 CSI-NN2 是 T-HEAD 提供的一组针对无剑 SoC 平台的神经网络库 API。抽象了各种常用的网络层的接口,并且提供一系列已优化的二进制库。 CSI-NN2 的特性: 开源 c 代码版本的参考实现。 提供玄铁 CPU 的汇编优化实现。

Apr 26, 2022
C++ NN 🧠 A simple Neural Network library written in C++

C++ NN ?? A simple Neural Network library written in C++ Installation ??

Mar 4, 2022
ML++ - A library created to revitalize C++ as a machine learning front end
ML++ - A library created to revitalize C++ as a machine learning front end

ML++ Machine learning is a vast and exiciting discipline, garnering attention from specialists of many fields. Unfortunately, for C++ programmers and

May 11, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

May 20, 2022
This repository is used for storing sourcecode related to final project of Computer Graphics and Computer Vision
This repository is used for storing sourcecode related to final project of Computer Graphics and Computer Vision

Computer Graphics and Computer Vision Description: This repository is used for storing sourcecode related to final project of Computer Graphics and Co

Sep 15, 2021