SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK

Communication Channels

Installation

Check out the code

git clone https://github.com/nod-ai/SHARK.git 

Setup your Python VirtualEnvironment and Dependencies

# Setup venv and install necessary packages (torch-mlir, nodLabs/Shark, ...).
./setup_venv.sh
# Please activate the venv after installation.

Run a demo script

python -m  shark.examples.resnet50_script --device="cpu" # Use gpu | vulkan

Run all tests on CPU/GPU/VULKAN

pytest

# If on Linux for quicker results:
pytest --workers auto

Shark Inference API

from shark_runner import SharkInference

shark_module = SharkInference(
        module = torch.nn.module class.
        (input,)  = inputs to model (must be a torch-tensor)
        dynamic (boolean) = Pass the input shapes as static or dynamic.
        device = `cpu`, `gpu` or `vulkan` is supported.
        tracing_required = (boolean) = Jit trace the module with the given input, useful in the case where jit.script doesn't work. )

result = shark_module.forward(inputs)

Model Tracking (Shark Inference)

Hugging Face Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT ✔️ (JIT) ✔️
Albert ✔️ (JIT) ✔️
BigBird ✔️ (AOT)
DistilBERT ✔️ (AOT)
GPT2 (AOT)
TORCHVISION Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
AlexNet ✔️ (Script)
DenseNet121 ✔️ (Script)
MNasNet1_0 ✔️ (Script)
MobileNetV2 ✔️ (Script)
MobileNetV3 ✔️ (Script)
Unet (Script)
Resnet18 ✔️ (Script) ✔️ ✔️
Resnet50 ✔️ (Script) ✔️ ✔️
Resnext50_32x4d ✔️ (Script)
ShuffleNet_v2 (Script)
SqueezeNet ✔️ (Script) ✔️ ✔️
EfficientNet ✔️ (Script)
Regnet ✔️ (Script)
Resnest (Script)
Vision Transformer ✔️ (Script)
VGG 16 ✔️ (Script)
Wide Resnet ✔️ (Script) ✔️ ✔️
RAFT (JIT)

For more information refer to MODEL TRACKING SHEET

Shark Trainer API

Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT
FullyConnected ✔️ ✔️

Related Project Channels

License

nod.ai SHARK is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.

Owner
nod.ai
High Performance Machine Learning
nod.ai
Comments
  • minilm_jit example doesn't work

    minilm_jit example doesn't work

    (shark.venv) [email protected]:~/github/dshark$ python -m  shark.examples.minilm_jit
    /home/a/github/dshark/shark.venv/lib/python3.7/site-packages/torch/nn/modules/module.py:1403: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
      " and ".join(warn_msg) + " are deprecated. nn.Module.state_dict will not accept them in the future. "
    Some weights of BertForSequenceClassification were not initialized from the model checkpoint at microsoft/MiniLM-L12-H384-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
            - Avoid using `tokenizers` before the fork if possible
            - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    Target triple found:x86_64-linux-gnu
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
            - Avoid using `tokenizers` before the fork if possible
            - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    (shark.venv) [email protected]:~/github/dshark$    
    
  • undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index when running resnet50_script.py

    undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index when running resnet50_script.py

    I just made a fresh Python venv and followed the readme instructions to run resnet50_script.py.

    curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
    #Install deps for test script
    pip install pillow requests tqdm torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
    python ./resnet50_script.py --device="cpu"  #use cuda or vulkan or metal 
    

    I got this error:

    Traceback (most recent call last):
      File "./resnet50_script.py", line 7, in <module>
        from shark.shark_inference import SharkInference
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 12, in <module>
        from shark.torch_mlir_utils import get_torch_mlir_module, run_on_refbackend
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/torch_mlir_utils.py", line 22, in <module>
        from torch_mlir.dialects.torch.importer.jit_ir import (
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/__init__.py", line 13, in <module>
        from torch_mlir.dialects.torch.importer.jit_ir import ClassAnnotator, ModuleBuilder
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/dialects/torch/importer/jit_ir/__init__.py", line 14, in <module>
        from ....._mlir_libs._jit_ir_importer import *
    ImportError: /home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/_mlir_libs/_jit_ir_importer.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index
    
  • Add SharkDownloader for end users

    Add SharkDownloader for end users

    I add a shark_downloader.py for end users. Uses should give model name, model type, input type, input json, and local shark tank dir. If the model is not found locally, the user will have to give tank_url for download. Right now, it only supports the tosa.mlir file converts from tflite mode. Will add other support later. The albert_lite_base_tflite_mlir_test.py show you how to use it. Here is quick reference: self.shark_downloader = SharkDownloader(model_name="albert_lite_base", tank_url="https://storage.googleapis.com/shark_tank/tflite" "/albert_lite_base/albert_lite_base_tosa.mlir", local_tank_dir="./../gen_shark_tank/tflite", model_type="tflite-tosa", input_json="input.json", input_type="int32") tflite_tosa_model = self.shark_downloader.get_mlir_file() inputs = self.shark_downloader.get_inputs() self.shark_module = SharkInference(tflite_tosa_model, inputs, device=self.device, dynamic=self.dynamic, jit_trace=True) self.shark_module.set_frontend("tflite-tosa") self.shark_module.compile() self.shark_module.forward(inputs)

  • Add the option to use tuned model in shark_runner

    Add the option to use tuned model in shark_runner

    • Add a flag to load pre-tuned model config file in shark_runner, and nothing is changed in inference api.
    • The example is made for minilm model with tensorflow frontend and GPU. To run the example: python -m shark.examples.shark_inference.minilm_tf --device="gpu" --model_config_path=shark/examples/shark_inference/minilm_tf_gpu_config.json
    • The config file (example/shark-inference/.json) does not work for Torch frontend directly. The torch-mlir uses linalg.batch_matmul (instead of matmul) with first dimension as 1. So one needs to generate a new config file, and append the first dimension of tile_sizes as 1. Further test is needed for model annotation with Torch frontend.
  • TorchMLIR eager mode with IREE backend

    TorchMLIR eager mode with IREE backend

    Eager mode hidden behind SharkMode (see shark/examples/eager_mode.py).

    TorchMLIRTensor([[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]], backend=EagerModeRefBackend)
    TorchMLIRTensor([[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]], backend=EagerModeRefBackend)
    

    Note also that you need https://github.com/makslevental/torch-mlir/tree/add_device as your activated torch-mlir version.

  • DistilBert fails to lower through torch-mlir pass pipeline (illegal ops)

    DistilBert fails to lower through torch-mlir pass pipeline (illegal ops)

    Error output:

    error: failed to legalize operation 'torch.aten.view' that was explicitly marked illegal
    note: see current operation: %416 = "torch.aten.view"(%414, %415) : (!torch.vtensor<[?,?,768],f32>, !torch.list<int>) -> !torch.vtensor<[?,?,12,64],f32>                                                                                   
    Traceback (most recent call last):
      File "/home/ean/SHARK/generate_sharktank.py", line 180, in <module>
        save_torch_model(args.torch_model_csv)
      File "/home/ean/SHARK/generate_sharktank.py", line 68, in save_torch_model
        mlir_importer.import_debug(
      File "/home/ean/SHARK/shark/shark_importer.py", line 163, in import_debug
        imported_mlir = self.import_mlir(
      File "/home/ean/SHARK/shark/shark_importer.py", line 109, in import_mlir
        return self._torch_mlir(is_dynamic, tracing_required), func_name
      File "/home/ean/SHARK/shark/shark_importer.py", line 74, in _torch_mlir
        return get_torch_mlir_module(
      File "/home/ean/SHARK/shark/torch_mlir_utils.py", line 150, in get_torch_mlir_module
        pm.run(mb.module)
    RuntimeError: Failure while executing pass pipeline.
    

    Reproduce:

    • add distilbert-base-uncased,True,hf to tank/pytorch/torch_model_list.csv
    • run python generate_sharktank.py
  • Refactor TF tests for importer/runner split

    Refactor TF tests for importer/runner split

    Much more to do here but wanted to get some preliminary changes in ASAP.

    todo: -- fix repro paths for pytorch/TF tests -- move tests out of frontend directories into tank// -- benchmarks (API needs updating!)

  • Add shark_importer tflite module and albert_shark_test example

    Add shark_importer tflite module and albert_shark_test example

    Shark_importer is trying to take a web link, then output SharkInference compiled files. Inside Shark_importer, it will automatically download the model effectively, find the mode type, doing the SharkInference for multiple models. The first version worked for tflite. Other models(tf/pytorch/jax) support will come later. Example usages: shark_importer(hf/openai/clip-vit-base-patch32) -> Search TF, Torch, JAX shark_importer(hf/openai/clip-vit-base-patch32, TF) → Import TF model from HF shark_importer(hf/openai/clip-vit-base-patch32, Torch) shark_importer(hf/openai/clip-vit-base-patch32, JAX) shark_importer(hf/openai/clip-vit-base-patch32, TF, precompiled=true, cpu) → VMFB

  • ORT-HF Benchmark Integration

    ORT-HF Benchmark Integration

    -Add HF Benchmarker class. -Add sample to benchmark HF model.

    Example:

        python -m benchmarks.hf_model_benchmark --num_iterations=10 --model_name="microsoft/MiniLM-L12-H384-uncased"
    
  • "is_zero" is undefined running resnet50 script.

    To reproduce...

    cloned the repo and tried running the examples both resnet and minilm.

    I keep getting

    RuntimeError: required keyword attribute 'is_zero' is undefined
    

    seems to have something to do with ModuleBuilder -> mb.import_module(module._c, class_annotator) env Using the Apple Silicon M1 Snapshot version of torch-mlir. Running on M1 macbook, python 3.9

    attached screenshot of both resnet50_script and minilm

    Disclaimer: new to torch-mlir

    image
  • Update model annotation tool

    Update model annotation tool

    Usage: from shark.model_annotation import model_annotation with create_context() as ctx: module = model_annotation(ctx, input_contents=..., config_path=..., search_op=...)

    Example: The example is to annotate the minilm model with GPU config file. python model_annotation.py /nodclouddata/vivian/minilm_model/model.mlir /nodclouddata/vivian/minilm_model/model_config.json

  • tm_tensor dialect support on vulkan device

    tm_tensor dialect support on vulkan device

  • tm_tensor dialect support in benchmark

    tm_tensor dialect support in benchmark

  • "Could not install torch-mlir" error on regular CI runs

    The test_shark_model_suite job on IREE's CI is failing, sample: https://github.com/iree-org/iree/actions/runs/3144865881/jobs/5111621410

    https://github.com/iree-org/iree/commits/main image

    Error logs include

    ERROR: Could not find a version that satisfies the requirement torch-mlir (from versions: none)
    ERROR: No matching distribution found for torch-mlir
    Could not install torch-mlir
    

    I suspect that https://github.com/nod-ai/SHARK/blob/9035a2eed3bee1e8afc5afb82690f5dcceadd775/setup_venv.sh#L74-L79 should be updated following https://github.com/llvm/torch-mlir/commit/1dfe5efe9ee8f58deabf9382b58894a00b2d440c / https://github.com/llvm/torch-mlir/commit/8d3ca887df5ac5126fa3fc2ec3546c6322a4d066

Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration

3D+NeuroSim V1.0 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly av

Dec 21, 2021
ThunderSVM: A Fast SVM Library on GPUs and CPUs
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Sep 25, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Sep 28, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs
 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Aug 31, 2022
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

Sep 30, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Oct 2, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

Sep 22, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Oct 3, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Sep 25, 2022
A flexible, high-performance serving system for machine learning models

XGBoost Serving This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more informat

Sep 22, 2022
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

Sep 15, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Sep 30, 2022
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

Sep 27, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Sep 26, 2022
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI

High-Performance-Computing-Experiments Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI 实验结

Nov 27, 2021
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

Sep 27, 2022
DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++
DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++

DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provides an python interface to better interface with machine-learning toolkits. Deep RTS can process the game with over 6 000 000 steps per second and 2 000 000 steps when rendering graphics. In comparison to other solutions, such as StarCraft, this is over 15 000% faster simulation time running on Intel i7-8700k with Nvidia RTX 2080 TI.

Sep 26, 2022