Caffe: a fast open framework for deep learning.

Caffe

Build Status License

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Custom distributions

Community

Join the chat at https://gitter.im/BVLC/caffe

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BAIR/BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}
Owner
Berkeley Vision and Learning Center
Autonomous Perception Research
Berkeley Vision and Learning Center
Comments
  • Caffe OpenCL support

    Caffe OpenCL support

    DISCONTINUED, now available as official Caffe branch here: https://github.com/BVLC/caffe/tree/opencl

    Technical Report

    Available on arXiv: http://arxiv.org/abs/1509.03371

  • OpenCL Backend

    OpenCL Backend

    About

    The proposed changes add OpenCL support to Caffe. All GPU functions can be executed using AMD GPUs w/ OpenCL 1.2 or 2.0 as well as nVidia GPUs w/ OpenCL 1.1.

    Build Instructions

    https://github.com/lunochod/caffe/wiki/OpenCL-Backend

    OpenCL Tests

    All GPU tests successfully complete using this OpenCL version of Caffe.

    Performance and Stability

    The main goal was to provide an OpenCL port to the Caffe community. As such it is not yet optimized for performance or stability.

    Help Wanted

    Let's make it better and faster together.

  • Multi-GPU

    Multi-GPU

    Uses CUDA peer-to-peer for communication, and parts of #1148. SGD is now synchronous instead of asynchronous, as @longjon showed bandwidth on one box is actually high enough. We haven’t really benchmarked yet, but it seems to work great. It also gets rid of the momentum coordination problem.

    The synchronization code needs to hook into the solver, so it is a bit more invasive than before, but still pretty isolated. I refactored solver.cpp to separate the regularization and gradient compute phases so that they can be invoked at different times by the parallel solver.

    One thing still missing is the way to compute the actual number of iterations. For now each solver runs as if it was by itself, so the run is going to take as long as without parallelism. I guess we could adapt the solver to run 1/N steps instead. Also the batch size should be experimented with, as now effectively N times larger. On that, would it be more convenient to switch to the number of images to compute progress, instead of iterations, to be independent of batch size?

    To try it, run the samples in example/parallel/

  • Unrolled recurrent layers (RNN, LSTM)

    Unrolled recurrent layers (RNN, LSTM)

    (Replaces #1873)

    Based on #2032 (adds EmbedLayer -- not needed for, but often used with RNNs in practice, and is needed for my examples), which in turn is based on #1977.

    This adds an abstract class RecurrentLayer intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time. RecurrentLayer implementations (here, just RNNLayer and LSTMLayer) specify the recurrent architecture by filling in a NetParameter with appropriate layers.

    RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T) is copied into the initial hidden state for the new forward pass (h_0), allowing for exact inference across arbitrarily long sequences, even if T == 1. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.

    Note that the T x N arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.

    There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

    I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:

    cd data/coco
    ./get_coco_aux.sh # download train/val/test splits
    ./download_tools.sh # download official COCO tool
    cd tools
    python setup.py install # follow instructions to install tools and download COCO data if needed
    cd ../../.. # back to caffe root
    ./examples/coco_caption/coco_to_hdf5_data.py
    

    Then, you can train a language model using ./examples/coco_caption/train_language_model.sh, or train LRCN for captioning using ./examples/coco_caption/train_lrcn.sh (assuming you have downloaded models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel).

    Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores.

  • Improved CMake scripts

    Improved CMake scripts

    @shelhamer @jeffdonahue @baeuml @kloudkl @akosiorek @Yangqing @BlGene

    Hello all,

    hope I referenced everyone who participated cmake development (at least I didn't find others)

    Following discussion here https://github.com/BVLC/caffe/pull/1586, as I promised, I prepared slightly improved caffe cmake scripts. The improvement was developed using Ubuntu 14.04 and tested on Yosemite (with libstdc++). I believe Windows support now is as difficult as compiling all dependencies. But I prefer to postpone testing on Windows until current very linux-ish build scripts and behaviour are slightly adapted for cross-platform use and some dependencies are made optional.

    Description of changes and new features added

    Added OpenCV like formatted configuration log https://travis-ci.org/BVLC/caffe/jobs/45700248

    Added CaffeConfig.cmake generation for build/install cases. This allows you to connect Caffe to your application using CMake's find_package(Caffe). For more detailed description see below.

    BUILD_SHARED_LIB=ON (default) or OFF build caffe as shared library. In CMake it not good practice to build both shared and static simultaneously. That’s why the switch.

    CPU_ONLY = OFF(default) or ON Forces excluding CUDA support. Also Caffe will compile in CPU_ONLY mode if CUDA Toolkit is not installed or found by cmake. Before build please read chic configuration log dumped by cmake to control this case.

    USE_CUDNN = ON (default) If enabled and cudnn is found build with it, otherwise build without.

    CUDA_ARCH_NAME=Auto(default), All, Fermi, Kepler, Maxwell, Manual specifies target GPU architecture, Selecting concrete value reduces CUDA code compilation time (for instance compilation for sm_20 and sm_30 is twice longer than just for one of them). In case of Auto, cmake will make an attempt to detect GPUS installed in your computer and compile only for them. In case of manual, new CUDA_ARCH_BIN/PTX cmake variables are created where space separated list of architectures should be set. Example, CUDA_ARCH_BIN=”20 21(20) 50”

    BUILD_docs = ON (default)

    • If doxygen installed and found enables doc target. Use make docs to build and make jekyll to run web server. html docs are built in <source folder>/doxygen, and next symlink is created in <source folder>/docs/doxygen. Functionality from scripts/gather_examples.sh is now implemented in cmake, but copy_notebook.py is still required.
    • Source folder for generation is used because .Doxyfile contains relative paths and I prefer not to modify it now, but think generation in binary folder is better

    BUILD_python = ON (default) Build python interface if all required dependencies found, otherwise excluded from build automatically

    BUILD_matlab = OFF(default) Enables building matlab interfaces. Currently it supports both Octave and Matlab. For Octave set Octave_compiler to mkoctfile if not found automatically. For Matlab specify Matlab_DIR or Matlab_mex and Matlab_mexext if again not found automatically. If both installed and found, to select which one to use, set Matlab_build_mex_using=Matlab(default) or Octave. Note matlab wrappers can only be built if BUILD_SHARED_LIB=On. On macos both doesn’t compile.

    Proto-files Now protobuf files ARE NOT copied to <caffe_root>/include/caffe/proto anymore. Instead they are generated to <build_dir>/include/caffe/proto. Know one may include old headers, but this is interest rates to payback of technical debt appeared due to incorrect original cmake scripts design. Also removed them from .gitignore

    test.testbin

    • Now NO cmake_test_defines.hpp and sample_data_list.txt are configured by cmake to source directory and NO -DCMAKE_BUILD definition added and all *.in templates were removed. This is because make runtest command is executed in source directory, and embedding absolute paths to test cpp-files is not required! Consider configure_file() to source folder as antipattern. However, one may return such embedding by uncommenting couple lines in srcs/test/CMakeLists.txt.
    • All garbage targets (one per each test file) were removed because they flood IDEs while compilation time reduction is controversial. I replaced them with option BUILD_only_tests that allows quickly include only selected tests. Example: cmake -DBUILD_only_tests=="common,net,blob,im2col_kernel"

    Yosemite support I was able to compile with CUDA support using the Caffe instruction with libstdc++ and patching opencv as here https://github.com/Itseez/opencv/commit/32f6e1a554dea1849ee3a53fea171cbd5969ef41. Accelerate.framework support added. Matlab interface was failed to compile.

    Temporary changes

    • make symlink creates symlink [caffe_root]/build -> cmake_build_directory
    • Now all examples are built without .bin suffix and next symlink with .bin suffix created nearby. So that tutorials could work. Once naming standardized, should remove this.

    Including Caffe in your CMake project via find_package()

    git clone [email protected]:BVLC/caffe.git. 
    cd caffe && mkdir cmake_build && cd cmake_build
    cmake .. -DBUILD_SHARED_LIB=ON
    

    Verify that cmake found everything and in proper locations. After can run make -j 12 right now or better do this:

    cmake . -DCMAKE_BUILD_TYPE=Debug     # switch to debug
    make -j 12 && make install           # installs by default to build_dir/install
    cmake . -DCMAKE_BUILD_TYPE=Release   # switch to release
    make -j 12 && make install           # doesn’t overwrite debug install
    make symlink
    

    After the operations complete, caffe tutorials should work from caffe root directory. Let’s now see how to connect caffe to a C++ application that uses Caffe API with cmake. Prepare the following script:

    cmake_minimum_required(VERSION 2.8.8)
    
    find_package(Caffe)
    include_directories(${Caffe_INCLUDE_DIRS})
    add_definitions(${Caffe_DEFINITIONS})    # ex. -DCPU_ONLY
    
    add_executable(caffeinated_application main.cpp)
    target_link_libraries(caffeinated_application ${Caffe_LIBRARIES})
    

    Run CMake to configure this application and generate build scripts or IDE project. It will automatically find Caffe in its build directory and pick up all necessarily dependencies (includes, libraries, definitions) and application will compile without any additional actions. Caffe dependencies will also have been included. If you have several Caffe builds or for some reason Cmake wasn’t able find Caffe automatically, you may specify Caffe_DIR=<path-to-caffe-build-dir> in Cmake and this guarantees that everything will work.

    Specified Caffe_DIR to build directory leads to always using a build configuration (say, Release or Debug) Caffe compiled last time for. If you set Caffe_DIR=<caffe-install-dir>/share/Caffe where both configurations have been installed, proper debug or release caffe binaries will be selected depending on for which configuration you compile your caffeinated_application.

    Enjoy!!

    (Fixed typos in CUDA architectures - @Noiredd)

  • Provide a Caffe package in Debian

    Provide a Caffe package in Debian

    Status

    Caffe packages are available for Debian/unstable.
    Caffe packages are failing to build for Ubuntu-devel and need to be patched.

    Last update: Dec.20 2016

    Draft guide

    Deploy Caffe with merely one command.

    Brief Guide for Debian/unstable users

    Only experienced linux users are recommended to try Debian/unstable (Sid). To install caffe, first make sure you have something like the follows in file /etc/apt/sources.list: (Uncomment the second line if you want to re-compile caffe locally.)

    deb http://ftp.cn.debian.org/debian sid main contrib non-free
    #deb-src http://ftp.cn.debian.org/debian sid main contrib non-free
    

    Then update apt cache and install it. Note, you cannot install both the cpu version and the cuda version.

    # apt update
    # apt install [ caffe-cpu | caffe-cuda ]
    # caffe
    

    It should work out of box. I hope this work is helpful since there are many people struggling at the Caffe compiling process.

    Here are some notes:

    • Please re-compile OpenBLAS locally with optimization flags for sake of performance. This is highly recommended if you are writing a paper. The way to re-compile OpenBLAS from Debian source is very similar with the next subsection.
    • If you are going to install caffe-cuda, it will automatically pull the CUDA package and the nvidia driver packages. The installation procress may fail if any part of the caffe dependency chain gets into trouble. That is to say, please take care if you have manually installed or significantly modified nvidia driver or CUDA toolkit or protobuf or any other related stuff.
    • if you encountered any problem when installing caffe-cuda on a clean Debian system, report bug to me (via Debian's bug tracking system) please.
    • If you encountered any problem when installing caffe-cpu, please report bug to me via Debian's bug tracking system.
    • Both of caffe-cpu and caffe-cuda contain a manpage (man caffe) and a bash complementation script (caffe <TAB><TAB>, caffe train <TAB><TAB>). Both of them are still not merged into caffe master.
    • The python interface is Python3 version: python3-caffe-{cpu,cuda}. No plan to support python2.

    Compiling your custom caffe package on Debian/unstable

    There is no promise for the content in this subsection. If you just want to compile again from the source without any change, the following should work as expected. If you want to compile it with e.g. CUDNN support, you should at least be able to read and hack the file debian/rules under the source tree (It's a Makefile).

    First make sure you have a correct deb-src line in your apt source list file. Then we compile caffe with several simple commands.

    # apt update
    # apt install build-essential debhelper devscripts    # These are standard package building tools
    # apt build-dep [ caffe-cpu | caffe-cuda ]    # the most elegant way to pull caffe build dependencies
    # apt source [ caffe-cpu | caffe-cuda ]    # download the source tarball
    # cd caffe-XXXX    # now we enter into the source tree
    [ ... optional, make your custom changes at your own risk ... ]
    # debuild -B -j4    # build caffe with 4 parallel jobs (similar to make -j4)
    [ ... building ...]
    # debc    # optional, if you want to check the package contents
    # debi    # install the generated packages
    

    FAQ

    1. where is caffe-cudnn?
      Due to legal reason the cudnn library cannot be redistributed. I'll be happy to make this package when CUDNN becomes re-distributable. The workaround is to install cudnn by yourself, and hack at least the debian/rules file if you really want the caffe *.deb packages with CUDNN support.

    2. how to report bug via Debian bug tracking system?
      See https://www.debian.org/Bugs/ .

    3. I installed the CPU version, what should I do if I want to switch to CUDA verison?
      sudo apt install caffe-cuda, apt's dependency resolver is smart enough for this.

    4. Where is the examples, the models and other documentation stuff?
      sudo apt install caffe-doc; dpkg -L caffe-doc

  • Caffe Opencl - ViennaCL - Could not find kernel `fill_float`

    Caffe Opencl - ViennaCL - Could not find kernel `fill_float`

    Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help. Do not post such requests to Issues. Doing so interferes with the development of Caffe.

    Please read the guidelines for contributing before submitting this issue.

    Issue summary

    Upon running my network, I get the following error:

    ViennaCL: FATAL ERROR: Could not find kernel 'fill_float' from program ''
    Number of kernels in program: 0
    
    Error:
    Kernel not found
    

    Steps to reproduce

    If you are having difficulty building Caffe or training a model, please ask the caffe-users mailing list. If you are reporting a build error that seems to be due to a bug in Caffe, please attach your build configuration (either Makefile.config or CMakeCache.txt) and the output of the make (or cmake) command.

    Your system configuration

    Operating system: Ubuntu 16.04 Compiler: g++ 5.4 CUDA version (if applicable): 8 CUDNN version (if applicable): Latest one BLAS: Titan Xp on OpenCL drivers Python or MATLAB version (for pycaffe and matcaffe respectively):

  • Any simple example?

    Any simple example?

    Hi,

    I started with Caffe and the mnist example ran well. However, I can not understand how am I suppose to use this for my own data for a classification task. What should be the data format? Where should I specify the files? How do I see the results for a test set? All of these are not mentioned at all in the documentation. Any pointers will be appreciated, thanks.

  • Yet another batch normalization PR

    Yet another batch normalization PR

    This PR squashes together #1965 and #3161 to make sure that proper credit is given. The final functionality is much more like #3161: we ultimately decided that the scale/shift could be implemented as a separate layer (and should hence get its own PR) and the data shuffling, if it gets merged, should also be done as a separate PR (I have not reviewed that code closely enough to say whether it is mergeable). This version includes the global stats computations, and fixes the issue where #3161 was using the biased variance estimate (took a little while to convince myself that this is indeed the correct estimator to use).

    It would be great if @ducha-aiki and @jeffdonahue could take a look at this.

  • Multi-GPU Data Parallelism (with Parallel Data Layers)

    Multi-GPU Data Parallelism (with Parallel Data Layers)

    This is my package of #2870 (and originally, #2114)

    Modification: Allow data layers (and also PythonLayer when used as data layer) to be shared among worker solver's training net, and also test net for future-proof if one wants to do Multi-GPU testing. Data layers are locked during forward to ensure sequential forward. Now all worker solvers fetch data from one single data layer.

    This ensure that single-gpu training is consistent with multi-gpu training, and allow tests in #2870 to pass. Otherwise in #2870 (#2114) , there are multiple data layers created for worker solver, and these data layers are unaware of each other. This can be a serious issue if one uses deterministic data layers or turn off shuffling. In such case, since data layers in each worker solver reads the same data, one eventually gets same gradient on each solver, so it is almost equivalent to multiply learning rate by GPU number. This is definitely not the desired behavior of Multi-GPU data parallelism, since one should train on different subsets of dataset. Although in #2114 a DataReader is provided, it only applied to leveldb and lmdb, and is hardly extensible to other data layers.

    DataReader is preserved in this PR and LMDB/LEVELDB DataLayer is not shared.

    TODOs

    • [x] Add ShareInParallel function to layer.hpp, data_layer.hpp and pythonlayer.hpp .
    • [x] Implement share layers during net construction, construct top blobs of shared layers.
    • [x] Add lock to forward in layer.hpp to lock layers.
    • [x] Share layers during workersolver construction.
    • [x] ~~Remove DataReader. Restore old behavior of DataLayer.~~ DataReader is kept.
    • [x] Test make runtest on multiple GPU machine.
    • [x] Test multi-gpu training on MNIST. (log: https://gist.github.com/ronghanghu/d66d63882c25b31b6148)
    • [x] Test multi-gpu training on ILSVRC.
    • [x] Fix NVCC warning on boost/thread.hpp to get Travis CI pass.

    Drawback

    Multi-GPU training is numerically non-deterministic on data layers excepted for LMDB/LEVELDB DataLayer, see https://github.com/BVLC/caffe/pull/2903#issuecomment-130133266

  • ND convolution with im2col

    ND convolution with im2col

    This PR extends convolution to N spatial axes, where Caffe's current convolution supports only 2D convolution (with 2 spatial axes: height and width). For 2D convolution, this implementation doesn't compare favorably with the existing one -- I haven't done much benchmarking, but I believe it's 25-75% slower on both CPU and GPU. So before this could be merged, I'd need to restore the existing implementation and use it as the default "engine" for 2D convolutions (but this more destructive version makes it easier to tell what I was thinking from looking at the diff). If anyone has any suggestions on improving the performance or thoughts on why it might be so much slower, I'd love to hear them.

    Edit: benchmarking this on alexnet, it's about 33% slower:

    @ master:

    I0305 21:07:25.042047 22060 caffe.cpp:271] Average Forward pass: 486.327 ms.
    I0305 21:07:25.042064 22060 caffe.cpp:273] Average Backward pass: 824.147 ms.
    I0305 21:07:25.042079 22060 caffe.cpp:275] Average Forward-Backward: 1310.68 ms.
    

    @ nd-convolution:

    I0305 21:02:03.827594 12909 caffe.cpp:271] Average Forward pass: 681.38 ms.
    I0305 21:02:03.827608 12909 caffe.cpp:273] Average Backward pass: 1068.98 ms.
    I0305 21:02:03.827623 12909 caffe.cpp:275] Average Forward-Backward: 1750.56 ms.
    
  • import error: segment fault when import caffe

    import error: segment fault when import caffe

    Issue summary

    i make all && make pycaffe successfully and i can run the examples successfully,but when i import caffe i got segment fault without any errror mesg.And I used lldb to debug the core file,but noting as well.And I commented "import_array1()" at the end of $CAFFE_ROOT/python/caffe/_caffe.cpp and remake,and i can import successfully.However something is error cause i run the py-faster-rcnn demo.py got a error message that is "numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject".Anybody knows reason?

    System configuration

    I use anaconda manage python env

    • Operating system: Mac os11.4
    • Compiler:
    • CUDA version (if applicable):
    • CUDNN version (if applicable):
    • BLAS: openblas 1.0
    • Python version (if using pycaffe): python 3.7.13
    • MATLAB version (if using matcaffe):
  • Makefile

    Makefile

    Important - read before submitting

    Please read the guidelines for contributing before submitting this issue!

    Please do not post installation, build, usage, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

    Issue summary

    Steps to reproduce

    Tried solutions

    System configuration

    • Operating system:
    • Compiler:
    • CUDA version (if applicable):
    • CUDNN version (if applicable):
    • BLAS:
    • Python version (if using pycaffe):
    • MATLAB version (if using matcaffe):

    Issue checklist

    • [ ] read the guidelines and removed the first paragraph
    • [ ] written a short summary and detailed steps to reproduce
    • [ ] explained how solutions to related problems failed (tick if found none)
    • [ ] filled system configuration
    • [ ] attached relevant logs/config files (tick if not applicable)
  • caffe time -model -weights -gpu=0

    caffe time -model -weights -gpu=0

    caffe time -gpu

    Issue summary

    caffe time -model=xxx -weighs=xxx -gpu=0 the log is: I0312 15:29:30.427956 2367 caffe.cpp:406] Average time per layer: I0312 15:29:30.427961 2367 caffe.cpp:409] data forward: 0.0018944 ms. I0312 15:29:30.427969 2367 caffe.cpp:412] data backward: 0.0018848 ms. I0312 15:29:30.427975 2367 caffe.cpp:409] conv1 forward: 0.10807 ms. I0312 15:29:30.427982 2367 caffe.cpp:412] conv1 backward: 0.182646 ms. I0312 15:29:30.427989 2367 caffe.cpp:409] relu1 forward: 0.0140288 ms. I0312 15:29:30.427994 2367 caffe.cpp:412] relu1 backward: 0.0018432 ms. I0312 15:29:30.428000 2367 caffe.cpp:409] norm1 forward: 0.0628864 ms. I0312 15:29:30.428007 2367 caffe.cpp:412] norm1 backward: 0.105226 ms. I0312 15:29:30.428014 2367 caffe.cpp:409] pool1 forward: 0.0158592 ms. I0312 15:29:30.428020 2367 caffe.cpp:412] pool1 backward: 0.0018784 ms. I0312 15:29:30.428027 2367 caffe.cpp:409] conv2 forward: 0.291235 ms. I0312 15:29:30.428033 2367 caffe.cpp:412] conv2 backward: 0.515402 ms. I0312 15:29:30.428040 2367 caffe.cpp:409] relu2 forward: 0.0101152 ms. I0312 15:29:30.428048 2367 caffe.cpp:412] relu2 backward: 0.0018592 ms. I0312 15:29:30.428056 2367 caffe.cpp:409] norm2 forward: 0.137219 ms. I0312 15:29:30.428066 2367 caffe.cpp:412] norm2 backward: 0.256826 ms. I0312 15:29:30.428073 2367 caffe.cpp:409] pool2 forward: 0.0133536 ms. I0312 15:29:30.428084 2367 caffe.cpp:412] pool2 backward: 0.0024768 ms. I0312 15:29:30.428092 2367 caffe.cpp:409] conv3 forward: 0.14239 ms. I0312 15:29:30.428098 2367 caffe.cpp:412] conv3 backward: 0.3532 ms. I0312 15:29:30.428107 2367 caffe.cpp:409] relu3 forward: 0.008976 ms. I0312 15:29:30.428114 2367 caffe.cpp:412] relu3 backward: 0.0020128 ms. I0312 15:29:30.428123 2367 caffe.cpp:409] conv4 forward: 0.117597 ms. I0312 15:29:30.428130 2367 caffe.cpp:412] conv4 backward: 0.292886 ms. I0312 15:29:30.428138 2367 caffe.cpp:409] relu4 forward: 0.0090048 ms. I0312 15:29:30.428145 2367 caffe.cpp:412] relu4 backward: 0.001872 ms. I0312 15:29:30.428153 2367 caffe.cpp:409] conv5 forward: 0.109824 ms. I0312 15:29:30.428160 2367 caffe.cpp:412] conv5 backward: 0.368051 ms. I0312 15:29:30.428165 2367 caffe.cpp:409] relu5 forward: 0.0088512 ms. I0312 15:29:30.428174 2367 caffe.cpp:412] relu5 backward: 0.0018848 ms. I0312 15:29:30.428182 2367 caffe.cpp:409] pool5 forward: 0.0117792 ms. I0312 15:29:30.428189 2367 caffe.cpp:412] pool5 backward: 0.00256 ms. I0312 15:29:30.428197 2367 caffe.cpp:409] fc6 forward: 0.417875 ms. I0312 15:29:30.428205 2367 caffe.cpp:412] fc6 backward: 3.15267 ms. I0312 15:29:30.428212 2367 caffe.cpp:409] relu6 forward: 0.0122656 ms. I0312 15:29:30.428264 2367 caffe.cpp:412] relu6 backward: 0.0018912 ms. I0312 15:29:30.428273 2367 caffe.cpp:409] drop6 forward: 0.0127136 ms. I0312 15:29:30.428282 2367 caffe.cpp:412] drop6 backward: 0.001856 ms. I0312 15:29:30.428292 2367 caffe.cpp:409] fc7 forward: 0.1988 ms. I0312 15:29:30.428300 2367 caffe.cpp:412] fc7 backward: 2.72682 ms. I0312 15:29:30.428308 2367 caffe.cpp:409] relu7 forward: 0.0122848 ms. I0312 15:29:30.428316 2367 caffe.cpp:412] relu7 backward: 0.0019136 ms. I0312 15:29:30.428328 2367 caffe.cpp:409] drop7 forward: 0.0126016 ms. I0312 15:29:30.428339 2367 caffe.cpp:412] drop7 backward: 0.0018944 ms. I0312 15:29:30.428347 2367 caffe.cpp:409] fc8 forward: 0.109283 ms. I0312 15:29:30.428378 2367 caffe.cpp:412] fc8 backward: 2.68584 ms. I0312 15:29:30.428388 2367 caffe.cpp:409] prob forward: 0.0146496 ms. I0312 15:29:30.428395 2367 caffe.cpp:412] prob backward: 0.0018528 ms. I0312 15:29:30.428421 2367 caffe.cpp:417] Average Forward pass: 55.8925 ms. I0312 15:29:30.428429 2367 caffe.cpp:419] Average Backward pass: 65.4428 ms. I0312 15:29:30.430272 2367 caffe.cpp:421] Average Forward-Backward: 127.954 ms. I0312 15:29:30.430285 2367 caffe.cpp:423] Total Time: 1279.54 ms. I0312 15:29:30.430291 2367 caffe.cpp:424] *** Benchmark ends ***

    The sum of forward_time_per_layer is not equal to the average forward pass(2.01ms < 55.89ms) , please help me to solve it, thanks very much.

  • make runtest error: no CUDA-capable device is detected

    make runtest error: no CUDA-capable device is detected

    Issue summary

    Hi,

    I'm installing caffe 1.0 on wsl2 Ubuntu 20.04 I already managed to get make all make test to run without error.

    However, when I run make runtest, I got a bunch of errors.

    (base) b***@DESKTOP-****:/mnt/c/Users/bx/caffe-1.0$ make runtest
    .build_release/tools/caffe
    caffe: command line brew
    usage: caffe <command> <args>
    
    commands:
      train           train or finetune a model
      test            score a model
      device_query    show GPU diagnostic information
      time            benchmark model execution time
    
      Flags from tools/caffe.cpp:
        -gpu (Optional; run in GPU mode on given device IDs separated by ','.Use
          '-gpu all' to run on all available GPUs. The effective training batch
          size is multiplied by the number of devices.) type: string default: ""
        -iterations (The number of iterations to run.) type: int32 default: 50
        -level (Optional; network level.) type: int32 default: 0
        -model (The model definition protocol buffer text file.) type: string
          default: ""
        -phase (Optional; network phase (TRAIN or TEST). Only used for 'time'.)
          type: string default: ""
        -sighup_effect (Optional; action to take when a SIGHUP signal is received:
          snapshot, stop or none.) type: string default: "snapshot"
        -sigint_effect (Optional; action to take when a SIGINT signal is received:
          snapshot, stop or none.) type: string default: "stop"
        -snapshot (Optional; the snapshot solver state to resume training.)
          type: string default: ""
        -solver (The solver definition protocol buffer text file.) type: string
          default: ""
        -stage (Optional; network stages (not to be confused with phase), separated
          by ','.) type: string default: ""
        -weights (Optional; the pretrained weights to initialize finetuning,
          separated by ','. Cannot be set simultaneously with snapshot.)
          type: string default: ""
    .build_release/test/test_all.testbin 0 --gtest_shuffle
    Cuda number of devices: 0
    Setting to use device 0
    Current device id: 0
    Current device name:
    Note: Randomizing tests' orders with a seed of 55461 .
    [==========] Running 2101 tests from 277 test cases.
    [----------] Global test environment set-up.
    [----------] 5 tests from EmbedLayerTest/1, where TypeParam = caffe::CPUDevice<double>
    [ RUN      ] EmbedLayerTest/1.TestForwardWithBias
    E0307 22:13:32.392771  7483 common.cpp:114] Cannot create Cublas handle. Cublas won't be available.
    E0307 22:13:32.432719  7483 common.cpp:121] Cannot create Curand generator. Curand won't be available.
    [       OK ] EmbedLayerTest/1.TestForwardWithBias (114 ms)
    [ RUN      ] EmbedLayerTest/1.TestGradient
    E0307 22:13:32.469251  7483 common.cpp:141] Curand not available. Skipping setting the curand seed.
    [       OK ] EmbedLayerTest/1.TestGradient (7 ms)
    [ RUN      ] EmbedLayerTest/1.TestForward
    [       OK ] EmbedLayerTest/1.TestForward (0 ms)
    [ RUN      ] EmbedLayerTest/1.TestSetUp
    [       OK ] EmbedLayerTest/1.TestSetUp (0 ms)
    [ RUN      ] EmbedLayerTest/1.TestGradientWithBias
    [       OK ] EmbedLayerTest/1.TestGradientWithBias (11 ms)
    [----------] 5 tests from EmbedLayerTest/1 (132 ms total)
    
    [----------] 8 tests from SliceLayerTest/2, where TypeParam = caffe::GPUDevice<float>
    [ RUN      ] SliceLayerTest/2.TestGradientTrivial
    F0307 22:13:32.488232  7483 syncedmem.hpp:22] Check failed: error == cudaSuccess (100 vs. 0)  no CUDA-capable device is detected
    *** Check failure stack trace: ***
        @     0x7fc281c001c3  google::LogMessage::Fail()
        @     0x7fc281c0525b  google::LogMessage::SendToLog()
        @     0x7fc281bffebf  google::LogMessage::Flush()
        @     0x7fc281c006ef  google::LogMessageFatal::~LogMessageFatal()
        @     0x7fc280783103  caffe::SyncedMemory::mutable_cpu_data()
        @     0x7fc280600779  caffe::Blob<>::Reshape()
        @     0x7fc280600bce  caffe::Blob<>::Reshape()
        @     0x7fc280600c80  caffe::Blob<>::Blob()
        @     0x55ab7cfc2a6a  caffe::SliceLayerTest<>::SliceLayerTest()
        @     0x55ab7cfc2e20  testing::internal::TestFactoryImpl<>::CreateTest()
        @     0x55ab7d0633c1  testing::internal::HandleExceptionsInMethodIfSupported<>()
        @     0x55ab7d05b106  testing::TestInfo::Run()
        @     0x55ab7d05b265  testing::TestCase::Run()
        @     0x55ab7d05b78c  testing::internal::UnitTestImpl::RunAllTests()
        @     0x55ab7d05b857  testing::UnitTest::Run()
        @     0x55ab7cb36217  main
        @     0x7fc2801060b3  __libc_start_main
        @     0x55ab7cb3dd9e  _start
    make: *** [Makefile:534: runtest] Aborted
    

    NO CUDA issue

    It cannot find CUDA!! But I have CUDA and driver installed. Verified with: nvcc --version and nvidia-smi

    (base) b***@DESKTOP-****:/mnt/c/Users/bx/caffe-1.0$ nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Thu_Feb_10_18:23:41_PST_2022
    Cuda compilation tools, release 11.6, V11.6.112
    Build cuda_11.6.r11.6/compiler.30978841_0
    
    
    (base) b***@DESKTOP-****:/mnt/c/Users/bx/caffe-1.0$ nvidia-smi
    Mon Mar  7 22:25:20 2022
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 510.47.03    Driver Version: 511.79       CUDA Version: 11.6     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
    |  0%   51C    P8    11W / 120W |    431MiB /  3072MiB |      5%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    
    

    Makefile.config

    ## Refer to http://caffe.berkeleyvision.org/installation.html
    # Contributions simplifying and improving our build system are welcome!
    
    # cuDNN acceleration switch (uncomment to build with cuDNN).
    USE_CUDNN := 1
    
    # CPU-only switch (uncomment to build without GPU support).
    # CPU_ONLY := 1
    
    # uncomment to disable IO dependencies and corresponding data layers
    # USE_OPENCV := 0
    # USE_LEVELDB := 0
    # USE_LMDB := 0
    
    # uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
    #	You should not set this flag if you will be reading LMDBs with any
    #	possibility of simultaneous read and write
    # ALLOW_LMDB_NOLOCK := 1
    
    # Uncomment if you're using OpenCV 3
    OPENCV_VERSION := 3
    
    # To customize your choice of compiler, uncomment and set the following.
    # N.B. the default for Linux is g++ and the default for OSX is clang++
    # CUSTOM_CXX := g++
    
    # CUDA directory contains bin/ and lib/ directories that we need.
    CUDA_DIR := /usr/local/cuda
    # CUDA_DIR := /usr/local/cuda-11.6
    # On Ubuntu 14.04, if cuda tools are installed via
    # "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
    # CUDA_DIR := /usr
    
    # CUDA architecture setting: going with all of them.
    # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
    # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
    CUDA_ARCH := -gencode arch=compute_50,code=sm_50 \
    		#-gencode arch=compute_20,code=sm_20 \
    		#-gencode arch=compute_20,code=sm_21 \
    		#-gencode arch=compute_30,code=sm_30 \
    		#-gencode arch=compute_35,code=sm_35 \
    		#-gencode arch=compute_50,code=sm_50 \
    		-gencode arch=compute_52,code=sm_52 \
    		-gencode arch=compute_60,code=sm_60 \
    		-gencode arch=compute_61,code=sm_61 \
    		-gencode arch=compute_61,code=compute_61
    
    # BLAS choice:
    # atlas for ATLAS (default)
    # mkl for MKL
    # open for OpenBlas
    BLAS := atlas
    # Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
    # Leave commented to accept the defaults for your choice of BLAS
    # (which should work)!
    # BLAS_INCLUDE := /path/to/your/blas
    # BLAS_LIB := /path/to/your/blas
    
    # Homebrew puts openblas in a directory that is not on the standard search path
    # BLAS_INCLUDE := $(shell brew --prefix openblas)/include
    # BLAS_LIB := $(shell brew --prefix openblas)/lib
    
    # This is required only if you will compile the matlab interface.
    # MATLAB directory should contain the mex binary in /bin.
    # MATLAB_DIR := /usr/local
    # MATLAB_DIR := /Applications/MATLAB_R2012b.app
    
    # NOTE: this is required only if you will compile the python interface.
    # We need to be able to find Python.h and numpy/arrayobject.h.
    # PYTHON_INCLUDE := /usr/include/python2.7 \
    		# /usr/lib/python2.7/dist-packages/numpy/core/include
    
    # Anaconda Python distribution is quite popular. Include path:
    # Verify anaconda location, sometimes it's in root.
    # ANACONDA_HOME := $(HOME)/anaconda
    ANACONDA_HOME := /home/bear233/anaconda3
    # PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
    		# $(ANACONDA_HOME)/include/python3.9 \
    		# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include
    
    # Uncomment to use Python 3 (default is Python 2)
     PYTHON_LIBRARIES := boost_python3 python3.8
     PYTHON_INCLUDE := /usr/include/python3.8 \
                     # /usr/lib/python3.8/dist-packages/numpy/core/include
    
    # We need to be able to find libpythonX.X.so or .dylib.
    PYTHON_LIB := /usr/lib
    # PYTHON_LIB := $(ANACONDA_HOME)/lib
    
    # Homebrew installs numpy in a non standard path (keg only)
    # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
    # PYTHON_LIB += $(shell brew --prefix numpy)/lib
    
    # Uncomment to support layers written in Python (will link against Python libs)
    WITH_PYTHON_LAYER := 1
    
    # Whatever else you find you need goes here.
    INCLUDE_DIRS := $(PYTHON_INCLUDE)/usr/local/incllude  /usr/include/hdf5/serial/
    # LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
    
    # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
    # INCLUDE_DIRS += $(shell brew --prefix)/include
    # LIBRARY_DIRS += $(shell brew --prefix)/lib
    
    # NCCL acceleration switch (uncomment to build with NCCL)
    # https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0)
    # USE_NCCL := 1
    
    # Uncomment to use `pkg-config` to specify OpenCV library paths.
    # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
    # USE_PKG_CONFIG := 1
    
    # N.B. both build and distribute dirs are cleared on `make clean`
    BUILD_DIR := build
    DISTRIBUTE_DIR := distribute
    
    # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
    # DEBUG := 1
    
    # The ID of the GPU that 'make runtest' will use to run unit tests.
    TEST_GPUID := 0
    
    # enable pretty build (comment to see full commands)
    Q ?= @
    
    

    System configuration

    • Operating system: Linux(WSL2)
    • CUDA version (if applicable): 11.6
    • CUDNN version (if applicable): 8.3.2
    • Python version (if using pycaffe): 3.9.7

    Could someone please help me with it? I already tried out most solutions I found on internet and no luck. Many thanks!!!!!!

  • IMPORT ERROR

    IMPORT ERROR

    Traceback (most recent call last): File "demo.py", line 6, in import caffe
    File "/home/dong/work/caffe/python/caffe/init.py", line 1, in from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer File "/home/dong/work/caffe/python/caffe/pycaffe.py", line 13, in from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver,
    ImportError: /home/dong/work/caffe/build/lib/libcaffe.so.1.0.0: unexpected reloc type 0x03

  • make all error 出现:undefined reference to cv::imread(std::__cxx11::basic_string

    make all error 出现:undefined reference to cv::imread(std::__cxx11::basic_string

    Important - read before submitting

    Please read the guidelines for contributing before submitting this issue!

    Please do not post installation, build, usage, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

    Issue summary

    Steps to reproduce

    Tried solutions

    System configuration

    • Operating system:
    • Compiler:
    • CUDA version (if applicable):
    • CUDNN version (if applicable):
    • BLAS:
    • Python version (if using pycaffe):
    • MATLAB version (if using matcaffe):

    Issue checklist

    • [ ] read the guidelines and removed the first paragraph
    • [ ] written a short summary and detailed steps to reproduce
    • [ ] explained how solutions to related problems failed (tick if found none)
    • [ ] filled system configuration
    • [ ] attached relevant logs/config files (tick if not applicable)
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

May 17, 2022
yolov5 onnx caffe

环境配置 ubuntu:18.04 cuda:10.0 cudnn:7.6.5 caffe: 1.0 OpenCV:3.4.2 Anaconda3:5.2.0 相关的安装包我已经放到百度云盘,可以从如下链接下载: https://pan.baidu.com/s/17bjiU4H5O36psGrHlF

Feb 22, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022
header only, dependency-free deep learning framework in C++14
header only, dependency-free deep learning framework in C++14

The project may be abandoned since the maintainer(s) are just looking to move on. In the case anyone is interested in continuing the project, let us k

May 19, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

May 19, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

Apr 14, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

May 13, 2022
Plaidml - PlaidML is a framework for making deep learning work everywhere.
Plaidml - PlaidML is a framework for making deep learning work everywhere.

A platform for making deep learning work everywhere. Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | R

May 16, 2022
CubbyDNN - Deep learning framework using C++17 in a single header file
CubbyDNN - Deep learning framework using C++17 in a single header file

CubbyDNN CubbyDNN is C++17 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and I

Nov 23, 2021
Caffe2 is a lightweight, modular, and scalable deep learning framework.

Source code now lives in the PyTorch repository. Caffe2 Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the origin

May 19, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

May 10, 2022
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

Apr 12, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

The Microsoft Cognitive Toolkit is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.

May 20, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

May 18, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

May 9, 2022
LibDEEP BSD-3-ClauseLibDEEP - Deep learning library. BSD-3-Clause

LibDEEP LibDEEP is a deep learning library developed in C language for the development of artificial intelligence-based techniques. Please visit our W

Mar 15, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs
 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

May 20, 2022