Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more


banner

Apache MXNet (incubating) for Deep Learning

GitHub release (latest SemVer) GitHub stars GitHub forks GitHub contributors GitHub issues good first issue GitHub pull requests by-label GitHub license Twitter Twitter Follow

Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scalable to many GPUs and machines.

MXNet is more than a deep learning project. It is a community on a mission of democratizing AI. It is a collection of blue prints and guidelines for building deep learning systems, and interesting insights of DL systems for hackers.

Licensed under an Apache-2.0 license.

Branch Build Status
master CentOS CPU Build Status CentOS GPU Build Status Clang Build Status
Edge Build Status Miscellaneous Build Status Sanity Build Status
Unix CPU Build Status Unix GPU Build Status Website Build Status
Windows CPU Build Status Windows GPU Build Status Documentation Status
v1.x CentOS CPU Build Status CentOS GPU Build Status Clang Build Status
Edge Build Status Miscellaneous Build Status Sanity Build Status
Unix CPU Build Status Unix GPU Build Status Website Build Status
Windows CPU Build Status Windows GPU Build Status Documentation Status

Features

  • NumPy-like programming interface, and is integrated with the new, easy-to-use Gluon 2.0 interface. NumPy users can easily adopt MXNet and start in deep learning.
  • Automatic hybridization provides imperative programming with the performance of traditional symbolic programming.
  • Lightweight, memory-efficient, and portable to smart devices through native cross-compilation support on ARM, and through ecosystem projects such as TVM, TensorRT, OpenVINO.
  • Scales up to multi GPUs and distributed setting with auto parallelism through ps-lite, Horovod, and BytePS.
  • Extensible backend that supports full customization, allowing integration with custom accelerator libraries and in-house hardware without the need to maintain a fork.
  • Support for Python, Java, C++, R, Scala, Clojure, Go, Javascript, Perl, and Julia
  • Cloud-friendly and directly compatible with AWS and Azure.

Contents

What's New

Ecosystem News

Stay Connected

Channel Purpose
Follow MXNet Development on Github See what's going on in the MXNet project.
MXNet Confluence Wiki for Developers MXNet developer wiki for information related to project development, maintained by contributors and developers. To request write access, send an email to send request to the dev list .
[email protected] mailing list The "dev list". Discussions about the development of MXNet. To subscribe, send an email to [email protected] .
discuss.mxnet.io Asking & answering MXNet usage questions.
Apache Slack #mxnet Channel Connect with MXNet and other Apache developers. To join the MXNet slack channel send request to the dev list .
Follow MXNet on Social Media Get updates about new features and events.

Social Media

Keep connected with the latest MXNet news and updates.

Apache MXNet on Twitter

Contributor and user blogs about MXNet

reddit Discuss MXNet on r/mxnet

Apache MXNet YouTube channel

Apache MXNet on LinkedIn

History

MXNet emerged from a collaboration by the authors of cxxnet, minerva, and purine2. The project reflects what we have learned from the past projects. MXNet combines aspects of each of these projects to achieve flexibility, speed, and memory efficiency.

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems, 2015

Comments
  • Windows GPU accuracy extremely bad

    Windows GPU accuracy extremely bad

    Hey i'm quite new to mxnet, I followed the installation instructions and succeeded in installing it on windows 8.1 64 bit, I then ran the train_mnist.py --network lenet without a problem, quite slow but the accuracy at the end is good at around 99.2, but when I run it as --network lenet --gpus 0 to use my gpu its definitely a lot faster but the accuracy never gets above 10% which is terrible, there must be something wrong theoretically it should be the same accuracy right? I installed cuda 7.5 and also extracted cuddn v3 just as indicated, everything runs without a problem except the accuracy is terrible, i'm running on a laptop with a nvidia 660m graphics card, it has compute capability 3.0.

    After running the file I get Train-accuracy=0.098825

  • [Discussion] Sharing Operators between DL Frameworks

    [Discussion] Sharing Operators between DL Frameworks

    See This Link for discussion repo

    This discussion started from https://github.com/dmlc/minpy/issues/129, with @soumith THC is a tensor library that backs torch. I open this issue in MXNet repo so more developers can see it.

    First of all, it is possible reuse operator libraries between frameworks, for example

    • Support for THC and Torch Module was done in Torch Plugin, with interfacing to torch's lua library.
    • MXNet supports reuse operators from caffe

    It is always interesting to see interchangeability happen. For example, schedule pytorch operations in mxnet's async engine, or run mxnet's declarative API to directly share data with pytorch's array.

    However, there is some engineering obstacles in doing so, which I would like to explain what these obstacles are, and hopefully this can motivate the community to move forward, and make this easier.

    Coupled Operator Data Structure Components

    An operator can mean many things, here are some basic components on what the operators are:

    • Data structure that holds(shape) pointers to the array
    • Possible memory allocator to handle run-time memory allocation
    • Resource handles, if external resources is needed
    • Scheduling related objects if array support synchronize execution

    Why such coupling prevents reuse? There are two reasons

    • Many systems have their own memory allocator and ways of resource handling code.
    • While having memory allocator enables runtime memory allocations, sometimes memory allocation is not preferred at all(e.g. BLAS calls where all memory are pre-allocated)

    To resolve this problem, an operator library design should enable operators that accept user managed memory resources, when possible, not introduce allocator or resource management, but give hints to the user(CuDNN's workspace requirement eliminates the need to internal memory allocator).

    From this point of view, CuDNN an cuBLAS are good examples. THC is nice, but still encapsulate memory allocator(which is needed sometimes for dynamic operators).

    Lack of Unified Operator Interface

    The second obstacle is mainly lack of common operator interface. This is a problem of CUDNN and THC that prevents reusing. Take CuDNN for example, each CuDNN API is a C function, with its own interface, to adopt the operator, there need to be one(or multiple) adapting function per operator.

    Consider instead, if there is an unified operator interface(the following is a mock design), where each TBlob is a reference to the data fields and shape, and every function gets registered to the registry with their name

    using FCompute = std::function<void (
       array_view<TBlob> ins, array_view<TBlob> outs, map kwargs, stream stream)>
    

    Then it only takes one function to extract, and reuse all operators and automatically expose them to front end. In MXNet, it even directly generates the symbolic counterpart from the same imperative operator, if gradient is provided.

    Problem of One Unified Operator Interface

    There is always a flip side of the coin. Assume that we go with a unified operator interface. As a matter of fact, that is what MXNet, TensorFlow and Caffe have done. The problem now becomes what the interface should look like? One trap that framework designer always falls into is that we need one interface that rules them all.

    Since one interface rules them all, we want to support all possible operators, what about the ones that need runtime memory allocations? Maybe add memory allocator to it, what about the ones that is asynchronize? In the end, the interface have to include memory-allocator, scheduling module in some way, and that introduces the "Coupled Operator Data Structure Components" problem. The operator interface become deeply coupled with the rest of the framework and not reusable.

    A Better Solution: A Few Unified Interfaces

    Can we get the best of both worlds, having as few data structures and interfaces as possible, while still not introducing coupling to allocator and scheduling as much as possible? I think the answer is yes and we need to jump out from the ideal of one interface that rules all the operators.

    I can categorize the operators roughly in three categories

    • type1: Basic operators: The ones that can do shape inference based on input shape, can take memory pointer, stream and go
    • type2: Basic+ operators: Same as basic operator, but also need to declare some additional resources(workspace)
    • type3: Complicated operators: The ones that requires runtime memory allocator, its output shape depends on content of the data.

    If we design for general operator interface, the answer will usually looks like type3. However, type 1 and 2 dominates 90%+ of the major operators we are using. If we design one operator interfaces for each type, this problem is solved. So that frameworks can pull and interact with each type in their own way. It is much easier to do things like static memory planning if type1 and type2 are explicitly introduced. This is one additional layer of wrapping on top of THC and CuDNN is is lacking so far.

    A registry system like NNVM could come very handy to easily resgister these informations, and get pull out by the libraries.

    The Hope

    I have always hopped that there is a minimum set of operator interface standard in C++, that can be shared across libraries. I think we have a good idea on what the solution looks like. While most system tends to become opague and coupled, I think this kind of transparent way can help evolve the community in a healthy way. This being said, there is always effort to make these happen. This involves a open discussion on what the interfaces should be and commitment from framework builders. I would really love to see this happen, and that is why I spend more than one hour writing this.

    Unfortunately, most frameworks already have kinda of "enough collection of operators", so having a unified operator interface will contribute little to each framework in terms of usability in short term. Naturally this would be given lower priority. That is why commitment is needed to bring this out for longer term benefit

  • [Discussion] MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0)

    [Discussion] MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0)

    Let's start a discussion here about the roadmap towards MXNet 2.0. We are looking for:

    • New features that are useful to your research and development.
    • Improvements and patches to existing features.
    • APIs that should be fixed.

    If you have any item that you'd like to propose to have in the roadmap, please do:

    • Create (or locate existing) issue for the item, note the issue number.
    • Comment in this issue: 1) the above issue number, 2) one sentence of what the item is about and why it's useful to you.
    • Indicate whether you'd be willing to help out on the item.

    Given that this would be a major release, we'd have the opportunity to make backward incompatible changes. This would allow us to visit some topics that require large changes such as dropping support for python2, transitioning fully to cmake, making the tensor library numpy-compatible, or even new programming models.


    Now that we decided to follow semantic versioning for releases, it would be a good idea to coordinate features and API changes to make the best use of the next major release. Thus, I propose that we use this issue to track the APIs we'd like to change in the next major version.

    The candidates I've collected so far:

    1. remove legacy ops such as batch-norm v1
    2. reorganizing namespace for utility functions such as download in #9671
    3. transform argument in the constructor of existing vision dataset API.

    Once there are more of such requests, I will try to organize these API-breaking requests better.

  • [FEATURE] Enable dynamic linking with MKL and compiler based OpenMP

    [FEATURE] Enable dynamic linking with MKL and compiler based OpenMP

    OneMKL 2021.3 fixed linking OpenMP while using SDL and MKL_THREADING_LAYER set to GNU.

    Description

    OneMKL 2021.3 fixes the issue described here. Thus, it enables linking with MKL dynamic libraries without having multiple OneMPs in a single process. It is possible due to linking MxNET with oneMKL Single Dynamic Library (SDL) and then setting the appropriate threading layer at run time in a function mkl_threading_layer() (or through environment variable MKL_THREADING_LAYER).

    Connected with: [#19610], [#18255] and [#17794].

    Changes

    1. Add oneMKL 2021.3 to ubuntu docker images.
    2. Enable MKL SDL (MKL_USE_SINGLE_DYNAMIC_LIBRARY) as the default linking when MKL version is grower than 2021.2 and static linking is turned off. (Bug no: MKLD-11109, OneMKL release notes) .
    3. Otherwise, MKL static libraries are taken into account and used to build MxNET library.
    4. Add support of the new oneMKL file structure in the FindBLAS.cmake file (fix comes from the cmake 3.20: #6210 ).

    Comments

    Does using oneMKL 2021.3 as the recommended one should be mentioned in the documentation?

  • [RFC] Build with MKL-DNN (or DNNL)

    [RFC] Build with MKL-DNN (or DNNL)

    From https://github.com/apache/incubator-mxnet/issues/19610:

    Intel MKL-DNN was renamed with DNNL in its v1.1 release. Since then, the MXNet community has been working on the transition to DNNL to leverage the latest features and optimizations from the library. That includes using the string “DNNL” or “dnnl” for future development and communication. We propose to promote the flag “USE_DNNL” since MXNet 2.0 and start deprecating “USE_MKLDNN” at the same time. DNNL source code resides in the 3rdparty/mkldnn folder of the MXNet repository and is released and distributed along with MXNet source code. If one wants to build MXNet with DNNL to accelerate the execution on Intel CPU, she/he needs to enable -DUSE_DNNL=ON in CMake. However, this flag has been set to ON by default for all platforms except edge devices. On the contrary, to disable the DNNL acceleration, one needs to set -DUSE_DNNL=OFF explicitly in the CMake command line or the CMake configuration file. As both MXNet and DNNL are under quick development with different release cadence, we decide to link the DNNL library into MXNet statically to avoid mis-linking in the user's environment. Given this, we need to set DNNL_LIBRARY_TYPE to STATIC when building DNNL. Some additional flags to build DNNL:

    DNNl_CPU_RUNTIME: Need set it to SEQ explicitly when USE_OPENMP=OFF; DNNL_ARCH_OPT_FLAGS: Need pass compiler options to this build flag in string. Eg. -march or -mtune for GCC. MKLDNN_BUILD_TESTS and MKLDNN_BUILD_EXAMPLES: We set these two flags to OFF to speed up the compilation. One thing that needs to be taken care of is that the header dnnl_config.h and dnnl_version.h will be generated dynamically during compilation and will be copied to the installation destination when calling make install. That means these two headers are not distributed with DNNL source code. For downstream projects which are including these headers need to find them in the installation path rather than the source code path.

    I prepared three commits regarding this point of main RFC:

    1. changing USE_MKLDNN flag name to USE_ONEDNN to make it consistent with actual library name I believe that this commit is complete and if there is such a will can be merged into master.

    2. changing MXNET_USE_MKLDNN* flags names to MXNET_USE_ONEDNN* also for consistency reasons This commit regards changing inner MXNET flag so that it will be consistent with the actual lib name. To avoid creating even bigger mixture of mkldnn/dnnl/onednn acronyms I believe it should be accompanied with another commit changing acronyms used in mxnet function names and comments regarding this particular lib to oneDNN.

    3. changing the 3rdparty/mkldnn folder name to 3rdparty/onednn for consistency.

  • [MXNET-500]Test cases improvement for MKLDNN on Gluon

    [MXNET-500]Test cases improvement for MKLDNN on Gluon

    Description

    This PR is a "follow-up" of previously merged #10764 . In this PR, the followings are covered:

    1. Refine the cases on nn.Conv2D and change the input shape to hit the MKLDNN code path;
    2. Adding more test cases cover other gluon layers, like BN, Dense/FC, Pooling, Deconv etc. from the "MKLDNN-specialty" perspective;
    3. Data coverage cases for some gluon layers, such as Conv2D, BN, Concat etc.

    Checklist

    Essentials

    Please feel free to remove inapplicable items for your PR.

    • [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [x] All changes have test coverage:
    • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
    • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
    • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
    • [ ] Code is well-documented:
    • For user-facing API changes, API doc string has been updated.
    • For new C++ functions in header files, their functionalities and arguments are documented.
    • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
    • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
    • [x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

    Changes

    All the changes is reflected by tests/python/mkl/test_mkldnn.py

    Comments

    1. For the correctness check on gluon computation, it follows the design used by tests/python/unittest/test_gluon.py, and therefore, the helper functions defined in tests/python/unitest/common.py is also used.
  • import Julia binding

    import Julia binding

    I imported it via git subtree to keep all git history. About the ci, I added a entry in runtime_function.sh: unittest_ubuntu_cpu_julia06. Please check it in commit b7d9731 .

    See also #8727.

    cc @marcoabreu, @pluskid, @vchuravy

    TODO

    • [x] add license header to .jl file: 63ffca39
    • [ ] add releasing instruction to wiki
    • [ ] Jenkins doc build
  • Port convolutions to cuDNN v8 API

    Port convolutions to cuDNN v8 API

    Description

    This change ports Convolution and Deconvolution operations to cuDNN v8 API. Legacy API support is dropped, as per this RFC: https://github.com/apache/incubator-mxnet/issues/20618.

    The change also includes some cuDNN v8 API general support stuff, to be re-used later when more operations are ported to the v8 API.

    Finally, auto-tuning functionality is moved from cuDNN into MXNet, hence some memory management changes were required.

    Checklist

    Essentials

    • [X] Changes are complete (i.e. I finished coding on this PR)
    • [X] All changes have test coverage
    • [X] Code is well-documented
  • OpenMP Error

    OpenMP Error

    Description

    Compiled MxNet has duplicate OpenMP library link to both libomp and libiomp.

    Error Message

    (Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

    OMP: Error #15: Initializing libiomp5.so, but found libomp.so already initialized.
    OMP: Hint This means that multiple copies of the OpenMP runtime have been linked
    into the program. That is dangerous, since it can degrade performance or cause
    incorrect results. The best thing to do is to ensure that only a single OpenMP
    runtime is linked into the process, e.g. by avoiding static linking of the
    OpenMP runtime in any library. As an unsafe, unsupported, undocumented
    workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to
    allow the program to continue to execute, but that may cause crashes or silently
    produce incorrect results. For more information, please see
    http://www.intel.com/software/products/support/.
    

    To Reproduce

    I have both Intel MKL and MKLDNN library installed on Ubuntu 18.04. Use the following config to compile MxNet will lead the error shown above.

    cmake -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
    ninja -v
    

    What have you tried to solve it?

    After I deleted 3rdparty/openmp, and recompiled mxnet, this error no longer occurs.

    Environment

    Ubuntu 18.04, installed with Intel MKL and MKLDNN library.

  • [Discussion] 1.5.1 Patch Release

    [Discussion] 1.5.1 Patch Release

    Let's start a discussion here about the known issues with 1.5.0 to put into a patch release.

    Create (or locate existing) issue/pull request for the item, note the issue/pull request number. Comment in this issue: 1) the above issue number, 2) one sentence of what the item is about and why it's important. Indicate whether you'd be willing to help out on the item. Share the ETA if you're driving the item and have an guesstimate on when it will be done.

    cc @apache/mxnet-committers

  • [v0.9.3] Amalgamation for Android broken

    [v0.9.3] Amalgamation for Android broken

    Amalgamation for Android still breaking in the recent release:

    from mxnet_predict0.cc:3:
    [...]/mxnet/mxnet/amalgamation/../dmlc-core/include/dmlc/logging.h:18:22: fatal error: execinfo.h: No such file or directory
     #include <execinfo.h>
                          ^
    compilation terminated.
    make: *** [mxnet_predict0.d] Error 1
    

    Commenting out that #include <execinfo.h> creates the further error:

    In file included from mxnet_predict0.cc:4:0:
    [...]/mxnet/amalgamation/../src/ndarray/ndarray.cc:16:30: fatal error: opencv2/opencv.hpp: No such file or directory
     #include <opencv2/opencv.hpp>
                                  ^
    compilation terminated.
    make: *** [mxnet_predict0.d] Error 1
    

    It looks like the USE_OPENCV = 0 is being ignored?

  • Can't get mxnet function decorators

    Can't get mxnet function decorators

    Description

    I want to get mxnet function decorators in python. I can get the decorators for Tensorflow as follows:

    Given that we have the following tensorflow API:

    tf.math.floor(2.5)
    

    When I run the code, the function arguments are set inside tensorflow object.

    APIname = "tf.math.floor"
    apit_split = APIname.split('.')
    func_name = apit_split[-1]
    
    module_obj = tf
    if len(func_name_list) > 1:
       for module_name in apit_split[:-1]:
         module_obj = getattr(module_obj, module_name)
    myfunction = getattr(module_obj, func_name)
    

    And the output is:

    deco

    As you can see, I have the decorators for the function. Now for mxnet, I have the following code snippet:

    Given that we have the following mxnet API:

    from mxnet import ndarray 
    x = ndarray.ones((2,3))
    

    When I run the code, the function arguments are set inside ndarray object.

    APIname = "ndarray.ones"
    apit_split = APIname.split('.')
    func_name = apit_split[-1]
    
    module_obj = ndarray
    if len(func_name_list) > 1:
       for module_name in apit_split[:-1]:
         module_obj = getattr(module_obj, module_name)
    myfunction = getattr(module_obj, func_name)
    

    The output is:

    decomx

    As you can see, there is no decorator for the function. Any idea? thanks.

    References

    https://fossies.org/linux/tensorflow/tensorflow/python/util/tf_decorator.py

  • Graduation Tasks

    Graduation Tasks

    The ASF board has approved a resolution to graduate MXNet into a full top level project. Thanks to everyone for your help to get to this point.

    To transition from the Apache Incubator to a new TLP, there's a few action items we need to do to complete the transition. These are identified by https://incubator.apache.org/guides/transferring.html#life_after_graduation.

    If you can help with these tasks, please respond in this issue.

    • Update Incubator status records
      • [x] Update the incubator status file (in svn) - @josephevans
      • [x] Update podling status page - @josephevans
    • Source repo changes
      • [x] Create ASF Infra TLP request - @josephevans - https://issues.apache.org/jira/browse/INFRA-23856
      • [x] Git repos will be renamed to drop the "incubator-" prefix, notify developers to change their remotes
      • [x] Post announcement to dev list telling everyone the repo is about to be moved
      • [x] Post an announcement containing instructions for developers describing how to change their git remotes.
      • [x] Update website, Jenkins, wikis, pom.xml, and other resources to point to the new repository location - https://github.com/apache/incubator-mxnet/pull/21148
    • Websites
      • [x] If you have any fully qualified links to your podling.incubator.apache.org on your website change them - https://github.com/apache/incubator-mxnet/pull/21148
    • Mailing Lists
      • [ ] If you use podling.incubator.apache.org format email addresses, please start using podling.apache.org
      • [ ] Check project-private mailing list membership. Mentors should be allowed to remain if they wish to do so. The subscriber list should otherwise match that on the resolution. See this and the EZMLM "Moderator’s and Administrator’s Manual".
      • [ ] Update mail addresses including: issue tracking messages (see administration documentation)
      • [ ] Double-check that all of your lists have sufficient active moderators.
    • Issue Tracking
      • [ ] Ask infra to move the podling to its own top level category in JIRA, if using JIRA (I think we previously used JIRA)
    • Distribution mirrors
      • [ ] Create new distribution area (dist.apache.org) release/${project} and dev/${project} folders can be created by PMC members. - new releases will go here. Need to decide on what to do with previous releases under incubator (see section under distribution mirrors here.
  • Gluon RNN cannot perform deferred initialization when specifying sequence_length parameter in the forward process

    Gluon RNN cannot perform deferred initialization when specifying sequence_length parameter in the forward process

    Description

    from mxnet import nd
    from mxnet.gluon import rnn
    
    # create a simple GRU network
    net = rnn.GRU(hidden_size=1, num_layers=2, bidirectional=True, layout="NTC", use_sequence_length=True)
    net.initialize()
    
    x = nd.random.uniform(shape=(2, 3, 2))
    valid_length = nd.array([1, 2])
    
    # pass the sequence_length parameter
    print(net(x, sequence_length=nd.array([1, 2])))
    

    Error Message

    ---------------------------------------------------------------------------
    DeferredInitializationError               Traceback (most recent call last)
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:1495, in HybridBlock.forward(self, x, *args)
       1494 try:
    -> 1495     params = {k: v.data(ctx) for k, v in self._reg_params.items()}
       1496 except DeferredInitializationError:
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:1495, in <dictcomp>(.0)
       1494 try:
    -> 1495     params = {k: v.data(ctx) for k, v in self._reg_params.items()}
       1496 except DeferredInitializationError:
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/parameter.py:574, in Parameter.data(self, ctx)
        571     raise RuntimeError("Cannot return a copy of Parameter '%s' on ctx %s via data() " \
        572                        "because its storage type is %s. Please use row_sparse_data() " \
        573                        "instead." % (self.name, str(ctx), self._stype))
    --> 574 return self._check_and_get(self._data, ctx)
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/parameter.py:230, in Parameter._check_and_get(self, arr_list, ctx)
        229 if self._deferred_init:
    --> 230     raise DeferredInitializationError(
        231         "Parameter '%s' has not been initialized yet because initialization was " \
        232         "deferred. Actual initialization happens during the first forward pass. " \
        233         "Please pass one batch of data through the network before accessing Parameters. " \
        234         "You can also avoid deferred initialization by specifying in_units, " \
        235         "num_features, etc., for network layers."%(self.name))
        236 raise RuntimeError(
        237     "Parameter '%s' has not been initialized. Note that " \
        238     "you should initialize parameters and create Trainer " \
        239     "with Block.collect_params() instead of Block.params " \
        240     "because the later does not include Parameters of " \
        241     "nested child Blocks"%(self.name))
    
    DeferredInitializationError: Parameter 'gru0_l0_i2h_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.
    
    During handling of the above exception, another exception occurred:
    
    MXNetError                                Traceback (most recent call last)
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:1192, in HybridBlock._deferred_infer_shape(self, *args)
       1191 try:
    -> 1192     self.infer_shape(*args)
       1193 except Exception as e:
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:1410, in HybridBlock.infer_shape(self, *args)
       1409 """Infers shape of Parameters from inputs."""
    -> 1410 self._infer_attrs('infer_shape', 'shape', *args)
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:1394, in HybridBlock._infer_attrs(self, infer_fn, attr, *args)
       1393 """Generic infer attributes."""
    -> 1394 inputs, out = self._get_graph(*args)
       1395 args, _ = _flatten(args, "input")
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:1060, in HybridBlock._get_graph(self, *args)
       1059 with self.name_scope():
    -> 1060     out = self.hybrid_forward(symbol, *grouped_inputs, **params)  # pylint: disable=no-value-for-parameter
       1061 out, self._out_format = _flatten(out, "output")
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/rnn/rnn_layer.py:254, in _RNNLayer.hybrid_forward(self, F, inputs, states, sequence_length, **kwargs)
        251             raise ValueError(
        252                 "Invalid recurrent state shape. Expecting %s, got %s."%(
        253                     str(info['shape']), str(state.shape)))
    --> 254 out = self._forward_kernel(F, inputs, states, sequence_length, **kwargs)
        256 # out is (output, state)
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/rnn/rnn_layer.py:288, in _RNNLayer._forward_kernel(self, F, inputs, states, sequence_length, **kwargs)
        287 rnn_fn = F.npx.rnn if is_np_array() else F.RNN
    --> 288 rnn = rnn_fn(inputs, params, *rnn_args, use_sequence_length=self._use_sequence_length,
        289              state_size=self._hidden_size, projection_size=self._projection_size,
        290              num_layers=self._num_layers, bidirectional=self._dir == 2,
        291              p=self._dropout, state_outputs=True, mode=self._mode,
        292              lstm_state_clip_min=self._lstm_state_clip_min,
        293              lstm_state_clip_max=self._lstm_state_clip_max,
        294              lstm_state_clip_nan=self._lstm_state_clip_nan)
        296 if self._mode == 'lstm':
    
    File <string>:199, in RNN(data, parameters, state, state_cell, sequence_length, state_size, num_layers, bidirectional, mode, p, state_outputs, projection_size, lstm_state_clip_min, lstm_state_clip_max, lstm_state_clip_nan, use_sequence_length, name, attr, out, **kwargs)
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/_ctypes/symbol.py:143, in _symbol_creator(handle, args, kwargs, keys, vals, name, is_np_op, output_is_list)
        142 elif kwargs:
    --> 143     s._compose(name=name, **kwargs)
        144 else:
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/symbol/symbol.py:507, in Symbol._compose(self, *args, **kwargs)
        506     args = c_handle_array(args)
    --> 507 check_call(_LIB.MXSymbolCompose(
        508     self.handle, name, num_args, keys, args))
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/base.py:246, in check_call(ret)
        245 if ret != 0:
    --> 246     raise get_last_ffi_error()
    
    MXNetError: Traceback (most recent call last):
      [bt] (99) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (98) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (97) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (96) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (95) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (94) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (93) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (92) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (91) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (90) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (89) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (88) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (87) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (86) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1510) [0x555c22190eb0]
      [bt] (85) /root/miniconda3/envs/autogluon/bin/python(+0x166b2e) [0x555c2211fb2e]
      [bt] (84) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (83) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (82) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (81) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (80) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x71b) [0x555c221900bb]
      [bt] (79) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (78) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (77) /root/miniconda3/envs/autogluon/bin/python(+0x17cd15) [0x555c22135d15]
      [bt] (76) /root/miniconda3/envs/autogluon/bin/python(+0x1955c2) [0x555c2214e5c2]
      [bt] (75) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1b5b) [0x555c221914fb]
      [bt] (74) /root/miniconda3/envs/autogluon/bin/python(+0x1955c2) [0x555c2214e5c2]
      [bt] (73) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1b5b) [0x555c221914fb]
      [bt] (72) /root/miniconda3/envs/autogluon/bin/python(+0x1955c2) [0x555c2214e5c2]
      [bt] (71) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x71b) [0x555c221900bb]
      [bt] (70) /root/miniconda3/envs/autogluon/bin/python(+0x13b23d) [0x555c220f423d]
      [bt] (69) /root/miniconda3/envs/autogluon/bin/python(+0x1f98df) [0x555c221b28df]
      [bt] (68) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (67) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x546a) [0x555c22194e0a]
      [bt] (66) /root/miniconda3/envs/autogluon/bin/python(_PyObject_MakeTpCall+0x3bf) [0x555c220ea13f]
      [bt] (65) /root/miniconda3/envs/autogluon/bin/python(+0x194d2b) [0x555c2214dd2b]
      [bt] (64) /root/miniconda3/envs/autogluon/bin/python(_PyObject_FastCallDict+0xe7) [0x555c221176b7]
      [bt] (63) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (62) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x888) [0x555c22185c28]
      [bt] (61) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (60) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (59) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (58) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (57) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (56) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (55) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (54) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (53) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (52) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0xd5f) [0x555c221860ff]
      [bt] (51) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (50) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (49) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (48) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (47) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (46) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (45) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (44) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (43) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (42) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (41) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (40) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (39) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (38) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (37) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0xd5f) [0x555c221860ff]
      [bt] (36) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (35) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (34) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (33) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (32) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (31) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (30) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (29) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (28) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (27) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (26) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (25) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (24) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (23) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (22) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0xd5f) [0x555c221860ff]
      [bt] (21) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (20) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (19) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (18) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (17) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x71b) [0x555c221900bb]
      [bt] (16) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x534) [0x555c22186b64]
      [bt] (15) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (14) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (13) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (12) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (11) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (10) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (9) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x5434) [0x555c22194dd4]
      [bt] (8) /root/miniconda3/envs/autogluon/bin/python(_PyObject_MakeTpCall+0x3bf) [0x555c220ea13f]
      [bt] (7) /root/miniconda3/envs/autogluon/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x13c95) [0x7fc8a64cbc95]
      [bt] (6) /root/miniconda3/envs/autogluon/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x319) [0x7fc8a64cb1e9]
      [bt] (5) /root/miniconda3/envs/autogluon/lib/python3.8/lib-dynload/../../libffi.so.7(+0x6067) [0x7fc8a64b2067]
      [bt] (4) /root/miniconda3/envs/autogluon/lib/python3.8/lib-dynload/../../libffi.so.7(+0x69dd) [0x7fc8a64b29dd]
      [bt] (3) /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so(NNSymbolCompose+0x1c5) [0x7fc8589fa8c5]
      [bt] (2) /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so(nnvm::Symbol::Compose(dmlc::array_view<nnvm::Symbol const*> const&, std::unordered_map<std::string, nnvm::Symbol const*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, nnvm::Symbol const*> > > const&, std::string const&)+0x1ba5) [0x7fc858a0f875]
      [bt] (1) /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so(nnvm::KeywordArgumentMismatch(char const*, std::vector<std::string, std::allocator<std::string> > const&, dmlc::array_view<std::string> const&)+0x1ff) [0x7fc858a128ff]
      [bt] (0) /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7fc8539b309f]
    [14:28:38] /work/mxnet/3rdparty/tvm/nnvm/src/core/symbolic.cc:90: Symbol.ComposeKeyword argument name state_cell not found.
    Candidate arguments:
            [0]data
            [1]parameters
            [2]state
            [3]sequence_length
    
    
    
    During handling of the above exception, another exception occurred:
    
    ValueError                                Traceback (most recent call last)
    Input In [1], in <cell line: 10>()
          7 x = nd.random.uniform(shape=(2, 3, 2))
          8 valid_length = nd.array([1, 2])
    ---> 10 print(net(x, sequence_length=nd.array([1, 2])))
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/rnn/rnn_layer.py:240, in _RNNLayer.__call__(self, inputs, states, sequence_length, **kwargs)
        237     states = [states]
        239 if self._use_sequence_length:
    --> 240     return super(_RNNLayer, self).__call__(inputs, states, sequence_length, **kwargs)
        241 else:
        242     return super(_RNNLayer, self).__call__(inputs, states, **kwargs)
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:825, in Block.__call__(self, *args)
        822 for hook in self._forward_pre_hooks.values():
        823     hook(self, args)
    --> 825 out = self.forward(*args)
        827 for hook in self._forward_hooks.values():
        828     hook(self, args, out)
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:1497, in HybridBlock.forward(self, x, *args)
       1495     params = {k: v.data(ctx) for k, v in self._reg_params.items()}
       1496 except DeferredInitializationError:
    -> 1497     self._deferred_infer_shape(x, *args)
       1498     for _, v in self.params.items():
       1499         v._finish_deferred_init()
    
    File ~/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/gluon/block.py:1196, in HybridBlock._deferred_infer_shape(self, *args)
       1193 except Exception as e:
       1194     error_msg = "Deferred initialization failed because shape"\
       1195                 " cannot be inferred. {}".format(e)
    -> 1196     raise ValueError(error_msg)
    
    ValueError: Deferred initialization failed because shape cannot be inferred. Traceback (most recent call last):
      [bt] (99) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (98) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (97) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (96) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (95) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (94) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (93) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (92) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (91) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (90) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (89) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (88) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (87) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (86) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1510) [0x555c22190eb0]
      [bt] (85) /root/miniconda3/envs/autogluon/bin/python(+0x166b2e) [0x555c2211fb2e]
      [bt] (84) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (83) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (82) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (81) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (80) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x71b) [0x555c221900bb]
      [bt] (79) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x1b7) [0x555c221867e7]
      [bt] (78) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x4c0) [0x555c2218fe60]
      [bt] (77) /root/miniconda3/envs/autogluon/bin/python(+0x17cd15) [0x555c22135d15]
      [bt] (76) /root/miniconda3/envs/autogluon/bin/python(+0x1955c2) [0x555c2214e5c2]
      [bt] (75) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1b5b) [0x555c221914fb]
      [bt] (74) /root/miniconda3/envs/autogluon/bin/python(+0x1955c2) [0x555c2214e5c2]
      [bt] (73) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1b5b) [0x555c221914fb]
      [bt] (72) /root/miniconda3/envs/autogluon/bin/python(+0x1955c2) [0x555c2214e5c2]
      [bt] (71) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x71b) [0x555c221900bb]
      [bt] (70) /root/miniconda3/envs/autogluon/bin/python(+0x13b23d) [0x555c220f423d]
      [bt] (69) /root/miniconda3/envs/autogluon/bin/python(+0x1f98df) [0x555c221b28df]
      [bt] (68) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (67) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x546a) [0x555c22194e0a]
      [bt] (66) /root/miniconda3/envs/autogluon/bin/python(_PyObject_MakeTpCall+0x3bf) [0x555c220ea13f]
      [bt] (65) /root/miniconda3/envs/autogluon/bin/python(+0x194d2b) [0x555c2214dd2b]
      [bt] (64) /root/miniconda3/envs/autogluon/bin/python(_PyObject_FastCallDict+0xe7) [0x555c221176b7]
      [bt] (63) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (62) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x888) [0x555c22185c28]
      [bt] (61) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (60) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (59) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (58) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (57) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (56) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (55) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (54) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (53) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (52) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0xd5f) [0x555c221860ff]
      [bt] (51) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (50) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (49) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (48) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (47) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (46) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (45) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (44) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (43) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (42) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (41) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (40) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (39) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (38) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (37) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0xd5f) [0x555c221860ff]
      [bt] (36) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (35) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x7d) [0x555c220f020d]
      [bt] (34) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (33) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (32) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (31) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (30) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (29) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (28) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (27) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (26) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (25) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (24) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (23) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (22) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0xd5f) [0x555c221860ff]
      [bt] (21) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (20) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (19) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (18) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (17) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x71b) [0x555c221900bb]
      [bt] (16) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x534) [0x555c22186b64]
      [bt] (15) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (14) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x1f07) [0x555c221918a7]
      [bt] (13) /root/miniconda3/envs/autogluon/bin/python(PyObject_Call+0x319) [0x555c220f04a9]
      [bt] (12) /root/miniconda3/envs/autogluon/bin/python(+0x166bf8) [0x555c2211fbf8]
      [bt] (11) /root/miniconda3/envs/autogluon/bin/python(_PyFunction_Vectorcall+0x594) [0x555c22186bc4]
      [bt] (10) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalCodeWithName+0x260) [0x555c22185600]
      [bt] (9) /root/miniconda3/envs/autogluon/bin/python(_PyEval_EvalFrameDefault+0x5434) [0x555c22194dd4]
      [bt] (8) /root/miniconda3/envs/autogluon/bin/python(_PyObject_MakeTpCall+0x3bf) [0x555c220ea13f]
      [bt] (7) /root/miniconda3/envs/autogluon/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x13c95) [0x7fc8a64cbc95]
      [bt] (6) /root/miniconda3/envs/autogluon/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x319) [0x7fc8a64cb1e9]
      [bt] (5) /root/miniconda3/envs/autogluon/lib/python3.8/lib-dynload/../../libffi.so.7(+0x6067) [0x7fc8a64b2067]
      [bt] (4) /root/miniconda3/envs/autogluon/lib/python3.8/lib-dynload/../../libffi.so.7(+0x69dd) [0x7fc8a64b29dd]
      [bt] (3) /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so(NNSymbolCompose+0x1c5) [0x7fc8589fa8c5]
      [bt] (2) /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so(nnvm::Symbol::Compose(dmlc::array_view<nnvm::Symbol const*> const&, std::unordered_map<std::string, nnvm::Symbol const*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, nnvm::Symbol const*> > > const&, std::string const&)+0x1ba5) [0x7fc858a0f875]
      [bt] (1) /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so(nnvm::KeywordArgumentMismatch(char const*, std::vector<std::string, std::allocator<std::string> > const&, dmlc::array_view<std::string> const&)+0x1ff) [0x7fc858a128ff]
      [bt] (0) /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7fc8539b309f]
    [14:28:38] /work/mxnet/3rdparty/tvm/nnvm/src/core/symbolic.cc:90: Symbol.ComposeKeyword argument name state_cell not found.
    Candidate arguments:
            [0]data
            [1]parameters
            [2]state
            [3]sequence_length
    

    To Reproduce

    just run the code in the first section

    What have you tried to solve it?

    from mxnet import nd
    from mxnet.gluon import rnn
    
    # create a simple GRU network
    net = rnn.GRU(hidden_size=1, num_layers=2, bidirectional=True, layout="NTC", use_sequence_length=True)
    net.initialize()
    
    x = nd.random.uniform(shape=(2, 3, 2))
    valid_length = nd.array([1, 2])
    
    # try to perform deferred initialization without the sequence_length parameter, that will fail of course
    # because we specified use_sequence_length=True when we constructed the network
    # But in this process, the deferred initialization did finish
    try:
        net(x)
    except Exception:
        pass
    
    # so this step can run successfully
    print(net(x, sequence_length=nd.array([1, 2])))
    

    so another method to avoid this bug is:

    from mxnet import nd
    from mxnet.gluon import rnn
    
    # create a simple GRU network, add input_size parameter to avoid deferred initialization
    net = rnn.GRU(hidden_size=1, num_layers=2, bidirectional=True, layout="NTC", use_sequence_length=True, input_size=2)
    net.initialize()
    
    x = nd.random.uniform(shape=(2, 3, 2))
    valid_length = nd.array([1, 2])
    
    print(net(x, sequence_length=nd.array([1, 2])))
    

    Environment

    We recommend using our script for collecting the diagnostic information with the following command curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3

    Environment Information

    mxnet 1.9.1 python 3.8 osx and ubuntu

    # Paste the diagnose.py command output here
    

    (autogluon) /mnt/workspace> curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3----------Python Info---------- Version : 3.8.13 Compiler : GCC 7.5.0 Build : ('default', 'Mar 28 2022 11:38:47') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 22.1.2 Directory : /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/pip ----------MXNet Info----------- Version : 1.9.1 Directory : /root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet Commit hash file "/root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source. Library : ['/root/miniconda3/envs/autogluon/lib/python3.8/site-packages/mxnet/libmxnet.so'] Build features: ✔ CUDA ✔ CUDNN ✔ NCCL ✔ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✖ CPU_SSE4_1 ✖ CPU_SSE4_2 ✖ CPU_SSE4A ✖ CPU_AVX ✖ CPU_AVX2 ✔ OPENMP ✖ SSE ✖ F16C ✖ JEMALLOC ✔ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✔ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✔ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ✖ TVM_OP ----------System Info---------- Platform : Linux-4.19.91-009.ali4000.alios7.x86_64-x86_64-with-glibc2.17 system : Linux node : dsw37054-6dbb9d6d5b-5tjcl release : 4.19.91-009.ali4000.alios7.x86_64 version : #1 SMP Mon Jan 25 10:47:38 CST 2021 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 15,17,19,21,23,25,27 Off-line CPU(s) list: 0-14,16,18,20,22,24,26,28-63 Thread(s) per core: 0 Core(s) per socket: 32 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz Stepping: 4 CPU MHz: 2499.996 BogoMIPS: 4999.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 33792K NUMA node0 CPU(s): 0-63 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0033 sec, LOAD: 1.0720 sec. Error open Gluon Tutorial(en): http://gluon.mxnet.io, HTTP Error 404: Not Found, DNS finished in 0.4433908462524414 sec. Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)>, DNS finished in 0.7784233093261719 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.2974 sec, LOAD: 0.6419 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0341 sec, LOAD: 3.1754 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.5267374515533447 sec. ----------Environment---------- KMP_DUPLICATE_LIB_OK="True" KMP_INIT_AT_FORK="FALSE"

  • USE_DIST_KVSTORE  triggers

    USE_DIST_KVSTORE triggers "undefined reference to 'void mxnet::op::ElemwiseBinaryOp::DnsCsrDnsOp'" linker error

    Description

    After switching to Ubuntu 22.04 with latest gcc/g++ v11 and CUDA 11.7 with NVIDIA driver 515.65.01 for Tesla V100S GPU cards I tried to compile mxnet 1.9.1 for our new environment because we need to get the R-package installed/updated as well. However, while most of the mxnet build seem to suceed fine, the build unfortunately stops right at trying to link img2rec with an error message mentioned in the next section. Trying to skip the img2rec build ends up in similar linker errors for other tools for which I could not find any solution. Also looking at similar issue tickets like https://github.com/apache/incubator-mxnet/pull/18761 and https://github.com/apache/incubator-mxnet/pull/18357 did not end up in a fix we could apply for the issue.

    Any help in trying to solve this issue would be highly appreciated.

    Error Message

    $ cmake --build . --parallel 1
    Consolidate compiler generated dependencies of target objects
    [  8%] Built target objects
    [  8%] Built target libzmq-static
    Consolidate compiler generated dependencies of target dnnl_cpu_x64
    [ 27%] Built target dnnl_cpu_x64
    Consolidate compiler generated dependencies of target dnnl_common
    [ 32%] Built target dnnl_common
    Consolidate compiler generated dependencies of target dnnl_cpu
    [ 41%] Built target dnnl_cpu
    [ 41%] Built target dnnl
    Consolidate compiler generated dependencies of target intgemm
    [ 41%] Built target intgemm
    [ 42%] Built target libomp-needed-headers
    Consolidate compiler generated dependencies of target omp
    [ 45%] Built target omp
    Consolidate compiler generated dependencies of target dmlc
    [ 46%] Built target dmlc
    [ 46%] Built target proto_python
    Consolidate compiler generated dependencies of target pslite
    [ 47%] Built target pslite
    Consolidate compiler generated dependencies of target mxnet
    [ 94%] Built target mxnet
    Consolidate compiler generated dependencies of target customop_lib
    [ 94%] Built target customop_lib
    Consolidate compiler generated dependencies of target transposecsr_lib
    [ 94%] Built target transposecsr_lib
    Consolidate compiler generated dependencies of target transposerowsp_lib
    [ 95%] Built target transposerowsp_lib
    Consolidate compiler generated dependencies of target subgraph_lib
    [ 95%] Built target subgraph_lib
    Consolidate compiler generated dependencies of target pass_lib
    [ 95%] Built target pass_lib
    Consolidate compiler generated dependencies of target customop_gpu_lib
    [ 95%] Built target customop_gpu_lib
    Consolidate compiler generated dependencies of target im2rec
    [ 95%] Linking CXX executable im2rec
    /usr/bin/ld: libmxnet.so: undefined reference to `void mxnet::op::ElemwiseBinaryOp::DnsCsrDnsOp<mxnet::op::mshadow_op::plus>(mshadow::Stream<mshadow::gpu>*, nnvm::NodeAttrs const&, mxnet::OpContext const&, mxnet::NDArray const&, mxnet::NDArray const&, mxnet::OpReqType, mxnet::NDArray const&, bool)'
    /usr/bin/ld: libmxnet.so: undefined reference to `void mxnet::op::ElemwiseBinaryOp::DnsCsrDnsOp<mxnet::op::mshadow_op::minus>(mshadow::Stream<mshadow::gpu>*, nnvm::NodeAttrs const&, mxnet::OpContext const&, mxnet::NDArray const&, mxnet::NDArray const&, mxnet::OpReqType, mxnet::NDArray const&, bool)'
    collect2: error: ld returned 1 exit status
    gmake[2]: *** [CMakeFiles/im2rec.dir/build.make:130: im2rec] Error 1
    gmake[1]: *** [CMakeFiles/Makefile2:749: CMakeFiles/im2rec.dir/all] Error 2
    gmake: *** [Makefile:146: all] Error 2
    

    To Reproduce

    This is the config.cmake file we are using to build mxnet 1.9.1 in our ubuntu 22.04 environment:

    set(CMAKE_BUILD_TYPE "Distribution" CACHE STRING "Build type")
    set(CFLAGS "-mno-avx" CACHE STRING "CFLAGS")
    set(CXXFLAGS "-mno-avx" CACHE STRING "CXXFLAGS")
    set(USE_CUDA ON CACHE BOOL "Build with CUDA support")
    set(USE_CUDNN ON CACHE BOOL "Build with CUDA support")
    set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
    set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
    set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
    set(USE_MKL_IF_AVAILABLE OFF CACHE BOOL "Use Intel MKL if found")
    set(USE_MKLDNN ON CACHE BOOL "Build with MKL-DNN support")
    set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
    set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
    set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
    set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
    set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
    set(USE_DIST_KVSTORE ON CACHE BOOL "Build with DIST_KVSTORE support")
    set(MXNET_CUDA_ARCH "5.0;6.0;7.0;8.0;8.6" CACHE STRING "Cuda architectures")
    set(CMAKE_CUDA_COMPILER "/usr/local/cuda-11.7/bin/nvcc" CACHE STRING "Cuda compiler")
    set(OPENMP_FILECHECK_EXECUTABLE "/usr/lib/llvm-14/bin/FileCheck")
    set(OPENMP_LLVM_LIT_EXECUTABLE "/usr/lib/llvm-14/build/utils/lit/lit.py")
    set(USE_CPP_PACKAGE ON CACHE BOOL "Build C++ Package")
    set(NCCL_ROOT "/usr/local/nccl" CACHE BOOL "NCCL install path. Supports autodetection.")
    

    Steps to reproduce

    (Paste the commands you ran that produced the error.)

    1. cmake -DCMAKE_INSTALL_PREFIX=/usr/local/mxnet-1.9.1 ..
    2. cmake --build . --parallel 20

    What have you tried to solve it?

    1. Tried to apply similar fixes like in https://github.com/apache/incubator-mxnet/pull/18357 or https://github.com/apache/incubator-mxnet/pull/18761 but to no avail.

    Environment

    n/a

  • Gradient calculation for recurrent operators is wrong

    Gradient calculation for recurrent operators is wrong

    This script creates an RNN operator and computes its input gradient 5 times for sequence lengths = 1, 2, 3, 4, 5. Then it shows each gradient element at a fixed sequence position for all the computed sequence lengths:

    import mxnet as mx
    from mxnet import autograd
    import numpy as np
    
    batch_size = 1
    data_len = 5
    input_size = 2
    output_size = 3
    
    param_shapes = {
    	'wx': [output_size, input_size], 
    	'ws': [output_size, output_size], 
    	'bx': [output_size],
    	'bs': [output_size]
    }
    fused_param_len = np.sum(
    	[np.prod(v) for v in param_shapes.values()]
    )
    
    shapes = {
    	'data': [data_len, batch_size, input_size], 
    	'par': [fused_param_len], 
    	's0': [1, batch_size, output_size]
    }
    
    sym = mx.symbol.RNN(
    	*[mx.symbol.Variable(name) for name in shapes.keys()],
    	state_size=output_size,
    	num_layers=1,
    	mode='rnn_tanh'
    )
    op = mx.ndarray.CachedOp(sym)
    
    args = [mx.np.random.uniform(size=shape, ctx=mx.cpu()) for shape in shapes.values()]
    
    def get_grad(seq_len):
    	input_data = args[0][:seq_len]
    	with autograd.record(train_mode=True):
    		input_data.attach_grad()
    		output = op(input_data, args[1], args[2], default_ctx=mx.cpu())
    	autograd.backward(output, head_grads=mx.np.ones([data_len, batch_size, output_size], ctx=mx.cpu()))
    	return input_data.grad
    
    results = []
    for i in range(1, 6):
    	print('**************')
    	print('Input gradient for sequence length = ' + str(i) + '\n')
    	results.append(get_grad(i))
    	print(results[-1])
    	print('\n')
    
    for i in range(4):
    	print('++++++++++++++')
    	print('Element #' + str(i) + ' of all input gradients')
    	for j in range(i, 5):
    		print('sequence length: ' + str(j+1) + ': ' + str(results[j][i]))
    	# [print('sequence length: ' + str(i+1) + ': ' + str(grad[i])) for grad in results[i:]]
    	print('\n')
    

    The output is:

    **************
    Input gradient for sequence length = 1
    
    [[[0.14385478 0.05408207]]]
    
    
    **************
    Input gradient for sequence length = 2
    
    [[[0.14385478 0.05408207]]
     [[0.01706791 0.00660894]]]
    
    
    **************
    Input gradient for sequence length = 3
    
    [[[0.14385478 0.05408207]]
     [[0.01706791 0.00660894]]
     [[0.0178871  0.00672178]]]
    
    
    **************
    Input gradient for sequence length = 4
    
    [[[0.14385478 0.05408207]]
     [[0.01706791 0.00660894]]
     [[0.0178871  0.00672178]]
     [[0.01958952 0.00729937]]]
    
    
    **************
    Input gradient for sequence length = 5
    
    [[[0.14385478 0.05408207]]
     [[0.01706791 0.00660894]]
     [[0.0178871  0.00672178]]
     [[0.01958952 0.00729937]]
     [[0.02612576 0.00999804]]]
    
    
    ++++++++++++++
    Element #0 of all input gradients
    sequence length: 1: [[0.14385478 0.05408207]]
    sequence length: 2: [[0.14385478 0.05408207]]
    sequence length: 3: [[0.14385478 0.05408207]]
    sequence length: 4: [[0.14385478 0.05408207]]
    sequence length: 5: [[0.14385478 0.05408207]]
    
    
    ++++++++++++++
    Element #1 of all input gradients
    sequence length: 2: [[0.01706791 0.00660894]]
    sequence length: 3: [[0.01706791 0.00660894]]
    sequence length: 4: [[0.01706791 0.00660894]]
    sequence length: 5: [[0.01706791 0.00660894]]
    
    
    ++++++++++++++
    Element #2 of all input gradients
    sequence length: 3: [[0.0178871  0.00672178]]
    sequence length: 4: [[0.0178871  0.00672178]]
    sequence length: 5: [[0.0178871  0.00672178]]
    
    
    ++++++++++++++
    Element #3 of all input gradients
    sequence length: 4: [[0.01958952 0.00729937]]
    sequence length: 5: [[0.01958952 0.00729937]]
    
    

    In the last 4 sections starting with ++++++++++++++, it can be seen that gradient elements at the same sequence position are equal across all the 5 gradient computations with sequence length 1, 2, 3, 4, 5 (if they are long enough to have that element, e.g. gradient with sequence length 2 cannot have element 3 obviously). This means that RNN behaves as if the presence of later elements in the sequence does not affect the gradient for earlier elements. But this is clearly wrong, because by the nature of recurrent computations earlier elements in the sequence DO affect later ones, hence gradient elements at the same sequence position should change if the sequence length is different. With a longer input sequence having an additional element, the gradient of all earlier elements should get an additional contribution from the new element, changing their value.

    This is not a direct comparison with a manual computation of the gradient, but pointing out this behavior is enough to conclude that the gradients computed by this op are wrong. I should also point out that this is happening for all other settings of the mode parameter of the operator, not only mode='rnn_tanh'.

  • [BUGFIX] Reject duplicate input names in C Predict API

    [BUGFIX] Reject duplicate input names in C Predict API

    Description

    Avoid a potential uninitialized read when using the C Predict API with symbols where the same input name occurs on more than one node. Other parts of MXNet appear to treat this as an error, so this change makes the C API reject it with an error too.

    Since the C Predict API has been removed in MXNet v2 this PR is for merging into the 1.x branch.

    Checklist

    Essentials

    • [X] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [X] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [ ] Code is well-documented

    I could not find any existing tests for the C Predict API. Please let me know if there's documentation that should be updated.

    Changes

    • [X] Reject symbols where a name occurs more than once in the C Predict API

    Comments

    It's possible to cause an uninitialized read in the C Predict API. If I construct a symbol and parameters file using the Python API as follows:

    import mxnet
    net = mxnet.symbol.Variable('a') + mxnet.symbol.Variable('a')
    mxnet.nd.save('foo-params', {})
    net.save('foo-symbol.json')
    

    Then substitute the contents of the generated files into this C++ code. This causes an uninitialized read and fails the final assertion.

    #include <mxnet/c_predict_api.h>
    #include <string>
    #include <cassert>
    
    namespace {
    
    const char* SYMBOL_JSON = "{\"nodes\":[{\"op\":\"null\",\"name\":\"a\",\"inputs\":[]},{\"op\":\"null\",\"name\":\"a\",\"inputs\":[]},{\"op\":\"elemwise_add\",\"name\":\"_plus0\",\"inputs\":[[0,0,0],[1,0,0]]}],\"arg_nodes\":[0,1],\"node_row_ptr\":[0,1,2,3],\"heads\":[[2,0,0]],\"attrs\":{\"mxnet_version\":[\"int\",10901]}}"; 
    const char* PARAMS_BYTES = "\x12\x01\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0\x0";
    const std::size_t PARAMS_SIZE = 32;
    
    const char* INPUT_KEYS[] = { "a" };
    const std::size_t INPUT_KEYS_SIZE = 1;
    
    const uint32_t INPUT_SHAPE_INDPTR[] = { 0, 1 };
    const uint32_t INPUT_SHAPE_DATA[] = { 1 };
    
    }
    
    int main(int, char**) {
      PredictorHandle handle = nullptr;  
    
      int res = MXPredCreate(
        SYMBOL_JSON,
        static_cast<const void*>(PARAMS_BYTES),
        PARAMS_SIZE,
        1,
        0,
        INPUT_KEYS_SIZE,
        INPUT_KEYS,
        INPUT_SHAPE_INDPTR,
        INPUT_SHAPE_DATA,
        &handle);
      assert(res == 0);
    
      static const float INPUT_DATA[] = { 5. };
      res = MXPredSetInput(handle, "a", INPUT_DATA, 1);
      assert(res == 0);
    
      res = MXPredForward(handle);
      assert(res == 0);
    
      float output = 0.0;
      res = MXPredGetOutput(handle, 0, &output, 1);
      assert(res == 0);
    
      assert(output == 10.0);
      
      return 0;
    }
    

    If instead I generate the symbol as follows the test passes:

    import mxnet
    var = mxnet.symbol.Variable('a')
    net =  var + var
    mxnet.nd.save('foo-params', {})
    net.save('foo-symbol.json')
    

    I believe the API should reject symbols where an input name appears more than once. This seems to happen in other parts of the code. For example the following code fails with an assertion error:

    import mxnet
    net = mxnet.symbol.Variable('a') + mxnet.symbol.Variable('a')
    out = net.eval(ctx = mxnet.cpu(), a = mxnet.nd.array([[5.]]))
    

    This change makes the C Predict API reject such symbols. With this change the above C++ code fails on the first assertion.

Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and deploy without Python.
Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and deploy without Python.

Python Inference Script(PyIS) Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and d

Nov 4, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Nov 17, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Nov 25, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Nov 27, 2022
PaRSEC: the Parallel Runtime Scheduler and Execution Controller for micro-tasks on distributed heterogeneous systems.

PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.

Nov 15, 2022
Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer vision
Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer vision

The MRPT project 1. Introduction Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer v

Nov 21, 2022
Caffe2 is a lightweight, modular, and scalable deep learning framework.

Source code now lives in the PyTorch repository. Caffe2 Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the origin

Nov 23, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Nov 27, 2022
A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.
A lightweight 2D Pose model  can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

Nov 28, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r

Nov 28, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Nov 24, 2022
A flexible, high-performance serving system for machine learning models

XGBoost Serving This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more informat

Nov 18, 2022
Distributed machine learning platform

Veles Distributed platform for rapid Deep learning application development Consists of: Platform - https://github.com/Samsung/veles Znicz Plugin - Neu

Nov 9, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

Nov 28, 2022
Deep Learning in C Programming Language. Provides an easy way to create and train ANNs.
Deep Learning in C Programming Language. Provides an easy way to create and train ANNs.

cDNN is a Deep Learning Library written in C Programming Language. cDNN provides functions that can be used to create Artificial Neural Networks (ANN)

Oct 27, 2022
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Nov 25, 2022