SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK

Communication Channels

Installation

Check out the code

git clone https://github.com/nod-ai/SHARK.git 

Setup your Python VirtualEnvironment and Dependencies

# Setup venv and install necessary packages (torch-mlir, nodLabs/Shark, ...).
./setup_venv.sh
# Please activate the venv after installation.

Run a demo script

python -m  shark.examples.resnet50_script --device="cpu" # Use gpu | vulkan

Run all tests on CPU/GPU/VULKAN

pytest

# If on Linux for quicker results:
pytest --workers auto

Shark Inference API

from shark_runner import SharkInference

shark_module = SharkInference(
        module = torch.nn.module class.
        (input,)  = inputs to model (must be a torch-tensor)
        dynamic (boolean) = Pass the input shapes as static or dynamic.
        device = `cpu`, `gpu` or `vulkan` is supported.
        tracing_required = (boolean) = Jit trace the module with the given input, useful in the case where jit.script doesn't work. )

result = shark_module.forward(inputs)

Model Tracking (Shark Inference)

Hugging Face Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT ✔️ (JIT) ✔️
Albert ✔️ (JIT) ✔️
BigBird ✔️ (AOT)
DistilBERT ✔️ (AOT)
GPT2 (AOT)
TORCHVISION Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
AlexNet ✔️ (Script)
DenseNet121 ✔️ (Script)
MNasNet1_0 ✔️ (Script)
MobileNetV2 ✔️ (Script)
MobileNetV3 ✔️ (Script)
Unet (Script)
Resnet18 ✔️ (Script) ✔️ ✔️
Resnet50 ✔️ (Script) ✔️ ✔️
Resnext50_32x4d ✔️ (Script)
ShuffleNet_v2 (Script)
SqueezeNet ✔️ (Script) ✔️ ✔️
EfficientNet ✔️ (Script)
Regnet ✔️ (Script)
Resnest (Script)
Vision Transformer ✔️ (Script)
VGG 16 ✔️ (Script)
Wide Resnet ✔️ (Script) ✔️ ✔️
RAFT (JIT)

For more information refer to MODEL TRACKING SHEET

Shark Trainer API

Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT
FullyConnected ✔️ ✔️

Related Project Channels

License

nod.ai SHARK is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.

Owner
nod.ai
High Performance Machine Learning
nod.ai
Comments
  • minilm_jit example doesn't work

    minilm_jit example doesn't work

    (shark.venv) [email protected]:~/github/dshark$ python -m  shark.examples.minilm_jit
    /home/a/github/dshark/shark.venv/lib/python3.7/site-packages/torch/nn/modules/module.py:1403: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
      " and ".join(warn_msg) + " are deprecated. nn.Module.state_dict will not accept them in the future. "
    Some weights of BertForSequenceClassification were not initialized from the model checkpoint at microsoft/MiniLM-L12-H384-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
            - Avoid using `tokenizers` before the fork if possible
            - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    Target triple found:x86_64-linux-gnu
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
            - Avoid using `tokenizers` before the fork if possible
            - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    (shark.venv) [email protected]:~/github/dshark$    
    
  • undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index when running resnet50_script.py

    undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index when running resnet50_script.py

    I just made a fresh Python venv and followed the readme instructions to run resnet50_script.py.

    curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
    #Install deps for test script
    pip install pillow requests tqdm torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
    python ./resnet50_script.py --device="cpu"  #use cuda or vulkan or metal 
    

    I got this error:

    Traceback (most recent call last):
      File "./resnet50_script.py", line 7, in <module>
        from shark.shark_inference import SharkInference
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 12, in <module>
        from shark.torch_mlir_utils import get_torch_mlir_module, run_on_refbackend
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/torch_mlir_utils.py", line 22, in <module>
        from torch_mlir.dialects.torch.importer.jit_ir import (
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/__init__.py", line 13, in <module>
        from torch_mlir.dialects.torch.importer.jit_ir import ClassAnnotator, ModuleBuilder
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/dialects/torch/importer/jit_ir/__init__.py", line 14, in <module>
        from ....._mlir_libs._jit_ir_importer import *
    ImportError: /home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/_mlir_libs/_jit_ir_importer.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index
    
  • Add the option to use tuned model in shark_runner

    Add the option to use tuned model in shark_runner

    • Add a flag to load pre-tuned model config file in shark_runner, and nothing is changed in inference api.
    • The example is made for minilm model with tensorflow frontend and GPU. To run the example: python -m shark.examples.shark_inference.minilm_tf --device="gpu" --model_config_path=shark/examples/shark_inference/minilm_tf_gpu_config.json
    • The config file (example/shark-inference/.json) does not work for Torch frontend directly. The torch-mlir uses linalg.batch_matmul (instead of matmul) with first dimension as 1. So one needs to generate a new config file, and append the first dimension of tile_sizes as 1. Further test is needed for model annotation with Torch frontend.
  • TorchMLIR eager mode with IREE backend

    TorchMLIR eager mode with IREE backend

    Eager mode hidden behind SharkMode (see shark/examples/eager_mode.py).

    TorchMLIRTensor([[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]], backend=EagerModeRefBackend)
    TorchMLIRTensor([[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]], backend=EagerModeRefBackend)
    

    Note also that you need https://github.com/makslevental/torch-mlir/tree/add_device as your activated torch-mlir version.

  • Add shark_importer tflite module and albert_shark_test example

    Add shark_importer tflite module and albert_shark_test example

    Shark_importer is trying to take a web link, then output SharkInference compiled files. Inside Shark_importer, it will automatically download the model effectively, find the mode type, doing the SharkInference for multiple models. The first version worked for tflite. Other models(tf/pytorch/jax) support will come later. Example usages: shark_importer(hf/openai/clip-vit-base-patch32) -> Search TF, Torch, JAX shark_importer(hf/openai/clip-vit-base-patch32, TF) → Import TF model from HF shark_importer(hf/openai/clip-vit-base-patch32, Torch) shark_importer(hf/openai/clip-vit-base-patch32, JAX) shark_importer(hf/openai/clip-vit-base-patch32, TF, precompiled=true, cpu) → VMFB

  • ORT-HF Benchmark Integration

    ORT-HF Benchmark Integration

    -Add HF Benchmarker class. -Add sample to benchmark HF model.

    Example:

        python -m benchmarks.hf_model_benchmark --num_iterations=10 --model_name="microsoft/MiniLM-L12-H384-uncased"
    
  • "is_zero" is undefined running resnet50 script.

    To reproduce...

    cloned the repo and tried running the examples both resnet and minilm.

    I keep getting

    RuntimeError: required keyword attribute 'is_zero' is undefined
    

    seems to have something to do with ModuleBuilder -> mb.import_module(module._c, class_annotator) env Using the Apple Silicon M1 Snapshot version of torch-mlir. Running on M1 macbook, python 3.9

    attached screenshot of both resnet50_script and minilm

    Disclaimer: new to torch-mlir

    image
  • Add Shark Benchmark

    Add Shark Benchmark

    -Introduce SharkBenchmark that bench models on regular torch, shark-py, and shark-c. -Integrate iree-benchmark-module into Shark. -Refactor SharkRunner to have low level API control.

  • Increase IREE LLVM stack alloc limit for dynamic cpu case on Language Models.

    Increase IREE LLVM stack alloc limit for dynamic cpu case on Language Models.

    Adds a flag to shark parser that increases LLVMCPU stack allocation size limit from 32KB to 128KB. This fixes issues with running the dynamic case for language models on CPU.

  • add option to --save_mlir to pytest runs

    add option to --save_mlir to pytest runs

    Add support to save mlir files when running the tests

    (new_dylib_venv) 139 [email protected]:~/github/shark$ IREE_SAVE_TEMPS=iree_temps_bert_dynamic  pytest tank/pytorch/bert_test.py::BertModuleTest::test_module_dynamic_cpu --save_mlir
    ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
    pytest: error: unrecognized arguments: --save_mlir
      inifile: /home/anush/github/shark/pytest.ini
      rootdir: /home/anush/github/shark```
  • Add shark_importer module and albert_shark_test example

    Add shark_importer module and albert_shark_test example

    Shark_importer is trying to take a web link, then output SharkInference compiled files. Inside Shark_importer, it will automatically download the model effectively, find the mode type, doing the SharkInference for multiple models. The first version worked for tflite. Other models(tf/pytorch/jax) support will come later. Example usages: shark_importer(hf/openai/clip-vit-base-patch32) -> Search TF, Torch, JAX shark_importer(hf/openai/clip-vit-base-patch32, TF) → Import TF model from HF shark_importer(hf/openai/clip-vit-base-patch32, Torch) shark_importer(hf/openai/clip-vit-base-patch32, JAX) shark_importer(hf/openai/clip-vit-base-patch32, TF, precompiled=true, cpu) → VMFB

  • fix xlm-roberta lowering

    fix xlm-roberta lowering

    pytest benchmarks/tests/test_benchmark.py::test_bench_xlm_roberta[False-cpu]```
    
    fails with:
    ====================================================== short test summary info =======================================================FAILED benchmarks/tests/test_benchmark.py::test_bench_xlm_roberta[False-cpu] - OSError: Can't load tokenizer for 'xlm-roberta-base'...========================================================= 1 failed in 12.31s =========================================================
    
  • Add more tflite examples for shark_importer

    Add more tflite examples for shark_importer

    Add all the available tflite examples that use shark_importer. Some of them have bugs which are from the original iree/samples. Most of the bug is because iree-tflite-tools do not support dynamic inputs right now. I leave the bug output at the end of those files.

  • Use TEmpFileSaver API in IREE

    Use TEmpFileSaver API in IREE

    Use TempFileSAver to save each PyTest output artifacts and recreate if the test fails.

    looks like you can also write your own TempFileSaver to do whatever you want (see the section just below that), like in this test: https://github.com/google/iree/blob/427a94a09be70631c5d9f89d12616bb7f1954257/compiler/src/iree/compiler/API/python/test/tools/compiler_core_test.py#L176-L199

    https://discord.com/channels/689900678990135345/689900680009482386/985726147448950805

  • HF transformers 4.19.x is broken

    HF transformers 4.19.x is broken

    (new_dylib_venv) [email protected]:~/github/shark$ pytest tank/pytorch/tests/resnet101_test.py::Resnet101ModuleTest::test_module_static_cpu
    ================================================================================================= test session starts ==================================================================================================
    platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0 -- /home/anush/github/shark/new_dylib_venv/bin/python3
    cachedir: .pytest_cache
    rootdir: /home/anush/github/shark, configfile: pytest.ini
    plugins: forked-1.4.0, xdist-2.5.0, typeguard-2.13.3
    collecting ... Fatal Python error: Aborted
    
    Current thread 0x00007efd4103a1c0 (most recent call first):
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 304 in _constant_eager_impl
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 279 in _constant_impl
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 267 in constant
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 343 in _constant_tensor_conversion_function
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 1623 in convert_to_tensor
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/profiler/trace.py", line 183 in wrapped
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 264 in args_to_matching_eager
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_stateful_random_ops.py", line 77 in non_deterministic_ints_eager_fallback
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_stateful_random_ops.py", line 50 in non_deterministic_ints
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/stateful_random_ops.py", line 80 in non_deterministic_ints
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/stateful_random_ops.py", line 381 in from_non_deterministic_state
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/generation_tf_utils.py", line 349 in TFGenerationMixin
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/generation_tf_utils.py", line 344 in <module>
      File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
      File "<frozen importlib._bootstrap_external>", line 883 in exec_module
      File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
      File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 41 in <module>
      File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
      File "<frozen importlib._bootstrap_external>", line 883 in exec_module
      File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
      File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 38 in <module>
      File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
      File "<frozen importlib._bootstrap_external>", line 883 in exec_module
      File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
      File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
      File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
      File "/usr/lib/python3.10/importlib/__init__.py", line 126 in import_module
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 872 in _get_module
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 862 in __getattr__
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 863 in __getattr__
      File "<frozen importlib._bootstrap>", line 1075 in _handle_fromlist
      File "/home/anush/github/shark/tank/pytorch/tests/test_utils.py", line 7 in <module>
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168 in exec_module
      File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
      File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
      File "/home/anush/github/shark/tank/pytorch/tests/resnet101_test.py", line 3 in <module>
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168 in exec_module
      File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
      File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
      File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
      File "/usr/lib/python3.10/importlib/__init__.py", line 126 in import_module
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/pathlib.py", line 533 in import_path
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 608 in _importtestmodule
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 519 in _getobj
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 301 in obj
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 536 in _inject_setup_module_fixture
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 522 in collect
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in <lambda>
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 537 in collect_one_node
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 768 in collect
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in <lambda>
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 537 in collect_one_node
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 643 in perform_collect
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 332 in pytest_collection
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 321 in _main
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 268 in wrap_session
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164 in main
      File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187 in console_main
      File "/home/anush/github/shark/new_dylib_venv/bin/pytest", line 8 in <module>
    
    Extension modules: torch._C, torch._C._fft, torch._C._linalg, torch._C._nn, torch._C._sparse, torch._C._special, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, PIL._imaging, PIL._imagingft, google.protobuf.pyext._message, tensorflow.python.framework.fast_tensor_util, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, scipy.sparse._csparsetools, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.strptime, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pandas._libs.ops, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg.cython_lapack, scipy.linalg._decomp_update, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label, sentencepiece._sentencepiece (total: 116)
    Aborted (core dumped)```
    
    Pinning to 4.18 as a workaround
Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration

3D+NeuroSim V1.0 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly av

Dec 21, 2021
ThunderSVM: A Fast SVM Library on GPUs and CPUs
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Jun 19, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Jun 19, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs
 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

May 31, 2022
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

May 30, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Jun 14, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

Jun 21, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Jun 18, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Jun 17, 2022
A flexible, high-performance serving system for machine learning models

XGBoost Serving This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more informat

Jun 20, 2022
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

May 30, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Jun 15, 2022
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

Jun 1, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

May 24, 2022
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI

High-Performance-Computing-Experiments Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI 实验结

Nov 27, 2021
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

Jun 19, 2022
DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++
DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++

DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provides an python interface to better interface with machine-learning toolkits. Deep RTS can process the game with over 6 000 000 steps per second and 2 000 000 steps when rendering graphics. In comparison to other solutions, such as StarCraft, this is over 15 000% faster simulation time running on Intel i7-8700k with Nvidia RTX 2080 TI.

Jun 5, 2022