SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK

Communication Channels

Installation

Check out the code

git clone https://github.com/nod-ai/SHARK.git 

Setup your Python VirtualEnvironment and Dependencies

# Setup venv and install necessary packages (torch-mlir, nodLabs/Shark, ...).
./setup_venv.sh
# Please activate the venv after installation.

Run a demo script

python -m  shark.examples.resnet50_script --device="cpu" # Use gpu | vulkan

Run all tests on CPU/GPU/VULKAN

pytest

# If on Linux for quicker results:
pytest --workers auto

Shark Inference API

from shark_runner import SharkInference

shark_module = SharkInference(
        module = torch.nn.module class.
        (input,)  = inputs to model (must be a torch-tensor)
        dynamic (boolean) = Pass the input shapes as static or dynamic.
        device = `cpu`, `gpu` or `vulkan` is supported.
        tracing_required = (boolean) = Jit trace the module with the given input, useful in the case where jit.script doesn't work. )

result = shark_module.forward(inputs)

Model Tracking (Shark Inference)

Hugging Face Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT ✔️ (JIT) ✔️
Albert ✔️ (JIT) ✔️
BigBird ✔️ (AOT)
DistilBERT ✔️ (AOT)
GPT2 (AOT)
TORCHVISION Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
AlexNet ✔️ (Script)
DenseNet121 ✔️ (Script)
MNasNet1_0 ✔️ (Script)
MobileNetV2 ✔️ (Script)
MobileNetV3 ✔️ (Script)
Unet (Script)
Resnet18 ✔️ (Script) ✔️ ✔️
Resnet50 ✔️ (Script) ✔️ ✔️
Resnext50_32x4d ✔️ (Script)
ShuffleNet_v2 (Script)
SqueezeNet ✔️ (Script) ✔️ ✔️
EfficientNet ✔️ (Script)
Regnet ✔️ (Script)
Resnest (Script)
Vision Transformer ✔️ (Script)
VGG 16 ✔️ (Script)
Wide Resnet ✔️ (Script) ✔️ ✔️
RAFT (JIT)

For more information refer to MODEL TRACKING SHEET

Shark Trainer API

Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT
FullyConnected ✔️ ✔️

Related Project Channels

License

nod.ai SHARK is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.

Owner
nod.ai
High Performance Machine Learning
nod.ai
Comments
  • GPT-2 torch to tosa

    GPT-2 torch to tosa

    The python script to run:

    gpt2tosa.py

    Output tosa file: https://storage.googleapis.com/shark_tank/chi-nod/gpt2/gpt2_torch_tosa.mlir Tosa file after elide big attributes: https://storage.googleapis.com/shark_tank/chi-nod/gpt2/gpt2_torch_tosa_elide.mlir

  • minilm_jit example doesn't work

    minilm_jit example doesn't work

    (shark.venv) [email protected]:~/github/dshark$ python -m  shark.examples.minilm_jit
    /home/a/github/dshark/shark.venv/lib/python3.7/site-packages/torch/nn/modules/module.py:1403: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
      " and ".join(warn_msg) + " are deprecated. nn.Module.state_dict will not accept them in the future. "
    Some weights of BertForSequenceClassification were not initialized from the model checkpoint at microsoft/MiniLM-L12-H384-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
            - Avoid using `tokenizers` before the fork if possible
            - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    Target triple found:x86_64-linux-gnu
    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
            - Avoid using `tokenizers` before the fork if possible
            - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    (shark.venv) [email protected]:~/github/dshark$    
    
  • "Could not install torch-mlir" error on regular CI runs

    The test_shark_model_suite job on IREE's CI is failing, sample: https://github.com/iree-org/iree/actions/runs/3144865881/jobs/5111621410

    https://github.com/iree-org/iree/commits/main image

    Error logs include

    ERROR: Could not find a version that satisfies the requirement torch-mlir (from versions: none)
    ERROR: No matching distribution found for torch-mlir
    Could not install torch-mlir
    

    I suspect that https://github.com/nod-ai/SHARK/blob/9035a2eed3bee1e8afc5afb82690f5dcceadd775/setup_venv.sh#L74-L79 should be updated following https://github.com/llvm/torch-mlir/commit/1dfe5efe9ee8f58deabf9382b58894a00b2d440c / https://github.com/llvm/torch-mlir/commit/8d3ca887df5ac5126fa3fc2ec3546c6322a4d066

  • undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index when running resnet50_script.py

    undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index when running resnet50_script.py

    I just made a fresh Python venv and followed the readme instructions to run resnet50_script.py.

    curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
    #Install deps for test script
    pip install pillow requests tqdm torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
    python ./resnet50_script.py --device="cpu"  #use cuda or vulkan or metal 
    

    I got this error:

    Traceback (most recent call last):
      File "./resnet50_script.py", line 7, in <module>
        from shark.shark_inference import SharkInference
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 12, in <module>
        from shark.torch_mlir_utils import get_torch_mlir_module, run_on_refbackend
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/torch_mlir_utils.py", line 22, in <module>
        from torch_mlir.dialects.torch.importer.jit_ir import (
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/__init__.py", line 13, in <module>
        from torch_mlir.dialects.torch.importer.jit_ir import ClassAnnotator, ModuleBuilder
      File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/dialects/torch/importer/jit_ir/__init__.py", line 14, in <module>
        from ....._mlir_libs._jit_ir_importer import *
    ImportError: /home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/_mlir_libs/_jit_ir_importer.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index
    
  • venv install issue

    venv install issue

    The default SHARK VENV script errors out because it claims there is no space left on the device.

    ❯ ./setup_venv.sh
    Python: /usr/bin/python3
    Python version: 3.10
    Using pip venv.. Setting up venv dir: shark.venv
    Linux detected
    Requirement already satisfied: pip in ./shark.venv/lib/python3.10/site-packages (22.3.1)
    Requirement already satisfied: setuptools in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 1)) (65.6.3)
    Requirement already satisfied: wheel in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 2)) (0.38.4)
    Requirement already satisfied: tqdm in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 5)) (4.64.1)
    Requirement already satisfied: google-cloud-storage in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 8)) (2.7.0)
    Requirement already satisfied: pytest in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 11)) (7.2.0)
    Requirement already satisfied: pytest-xdist in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 12)) (3.1.0)
    Requirement already satisfied: Pillow in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 13)) (9.4.0)
    Requirement already satisfied: parameterized in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 14)) (0.8.1)
    Requirement already satisfied: transformers in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 17)) (4.25.1)
    Requirement already satisfied: diffusers in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 18)) (0.11.1)
    Requirement already satisfied: scipy in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 19)) (1.9.3)
    Requirement already satisfied: ftfy in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 20)) (6.1.1)
    Requirement already satisfied: gradio in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 21)) (3.15.0)
    Requirement already satisfied: altair in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 22)) (4.2.0)
    Requirement already satisfied: pyinstaller in ./shark.venv/lib/python3.10/site-packages (from -r /home/alpha/AI/SHARK/requirements.txt (line 25)) (5.7.0)
    Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in ./shark.venv/lib/python3.10/site-packages (from google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (2.28.1)
    Requirement already satisfied: google-cloud-core<3.0dev,>=2.3.0 in ./shark.venv/lib/python3.10/site-packages (from google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (2.3.2)
    Requirement already satisfied: google-resumable-media>=2.3.2 in ./shark.venv/lib/python3.10/site-packages (from google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (2.4.0)
    Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in ./shark.venv/lib/python3.10/site-packages (from google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (2.11.0)
    Requirement already satisfied: google-auth<3.0dev,>=1.25.0 in ./shark.venv/lib/python3.10/site-packages (from google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (2.15.0)
    Requirement already satisfied: tomli>=1.0.0 in ./shark.venv/lib/python3.10/site-packages (from pytest->-r /home/alpha/AI/SHARK/requirements.txt (line 11)) (2.0.1)
    Requirement already satisfied: packaging in ./shark.venv/lib/python3.10/site-packages (from pytest->-r /home/alpha/AI/SHARK/requirements.txt (line 11)) (22.0)
    Requirement already satisfied: iniconfig in ./shark.venv/lib/python3.10/site-packages (from pytest->-r /home/alpha/AI/SHARK/requirements.txt (line 11)) (1.1.1)
    Requirement already satisfied: attrs>=19.2.0 in ./shark.venv/lib/python3.10/site-packages (from pytest->-r /home/alpha/AI/SHARK/requirements.txt (line 11)) (22.2.0)
    Requirement already satisfied: exceptiongroup>=1.0.0rc8 in ./shark.venv/lib/python3.10/site-packages (from pytest->-r /home/alpha/AI/SHARK/requirements.txt (line 11)) (1.1.0)
    Requirement already satisfied: pluggy<2.0,>=0.12 in ./shark.venv/lib/python3.10/site-packages (from pytest->-r /home/alpha/AI/SHARK/requirements.txt (line 11)) (1.0.0)
    Requirement already satisfied: execnet>=1.1 in ./shark.venv/lib/python3.10/site-packages (from pytest-xdist->-r /home/alpha/AI/SHARK/requirements.txt (line 12)) (1.9.0)
    Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in ./shark.venv/lib/python3.10/site-packages (from transformers->-r /home/alpha/AI/SHARK/requirements.txt (line 17)) (0.11.1)
    Requirement already satisfied: numpy>=1.17 in ./shark.venv/lib/python3.10/site-packages (from transformers->-r /home/alpha/AI/SHARK/requirements.txt (line 17)) (1.24.1)
    Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in ./shark.venv/lib/python3.10/site-packages (from transformers->-r /home/alpha/AI/SHARK/requirements.txt (line 17)) (0.13.2)
    Requirement already satisfied: pyyaml>=5.1 in ./shark.venv/lib/python3.10/site-packages (from transformers->-r /home/alpha/AI/SHARK/requirements.txt (line 17)) (6.0)
    Requirement already satisfied: regex!=2019.12.17 in ./shark.venv/lib/python3.10/site-packages (from transformers->-r /home/alpha/AI/SHARK/requirements.txt (line 17)) (2022.10.31)
    Requirement already satisfied: filelock in ./shark.venv/lib/python3.10/site-packages (from transformers->-r /home/alpha/AI/SHARK/requirements.txt (line 17)) (3.9.0)
    Requirement already satisfied: importlib-metadata in ./shark.venv/lib/python3.10/site-packages (from diffusers->-r /home/alpha/AI/SHARK/requirements.txt (line 18)) (6.0.0)
    Requirement already satisfied: wcwidth>=0.2.5 in ./shark.venv/lib/python3.10/site-packages (from ftfy->-r /home/alpha/AI/SHARK/requirements.txt (line 20)) (0.2.5)
    Requirement already satisfied: pydantic in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.10.4)
    Requirement already satisfied: pandas in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.5.2)
    Requirement already satisfied: markdown-it-py[linkify,plugins] in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (2.1.0)
    Requirement already satisfied: fsspec in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (2022.11.0)
    Requirement already satisfied: pycryptodome in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (3.16.0)
    Requirement already satisfied: ffmpy in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.3.0)
    Requirement already satisfied: markupsafe in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (2.1.1)
    Requirement already satisfied: fastapi in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.88.0)
    Requirement already satisfied: httpx in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.23.2)
    Requirement already satisfied: uvicorn in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.20.0)
    Requirement already satisfied: matplotlib in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (3.6.2)
    Requirement already satisfied: python-multipart in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.0.5)
    Requirement already satisfied: aiohttp in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (3.8.3)
    Requirement already satisfied: jinja2 in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (3.1.2)
    Requirement already satisfied: pydub in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.25.1)
    Requirement already satisfied: websockets>=10.0 in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (10.4)
    Requirement already satisfied: orjson in ./shark.venv/lib/python3.10/site-packages (from gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (3.8.3)
    Requirement already satisfied: entrypoints in ./shark.venv/lib/python3.10/site-packages (from altair->-r /home/alpha/AI/SHARK/requirements.txt (line 22)) (0.4)
    Requirement already satisfied: toolz in ./shark.venv/lib/python3.10/site-packages (from altair->-r /home/alpha/AI/SHARK/requirements.txt (line 22)) (0.12.0)
    Requirement already satisfied: jsonschema>=3.0 in ./shark.venv/lib/python3.10/site-packages (from altair->-r /home/alpha/AI/SHARK/requirements.txt (line 22)) (4.17.3)
    Requirement already satisfied: pyinstaller-hooks-contrib>=2021.4 in ./shark.venv/lib/python3.10/site-packages (from pyinstaller->-r /home/alpha/AI/SHARK/requirements.txt (line 25)) (2022.14)
    Requirement already satisfied: altgraph in ./shark.venv/lib/python3.10/site-packages (from pyinstaller->-r /home/alpha/AI/SHARK/requirements.txt (line 25)) (0.17.3)
    Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5 in ./shark.venv/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (4.21.12)
    Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in ./shark.venv/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (1.57.0)
    Requirement already satisfied: rsa<5,>=3.1.4 in ./shark.venv/lib/python3.10/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (4.9)
    Requirement already satisfied: six>=1.9.0 in ./shark.venv/lib/python3.10/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (1.16.0)
    Requirement already satisfied: cachetools<6.0,>=2.0.0 in ./shark.venv/lib/python3.10/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (5.2.0)
    Requirement already satisfied: pyasn1-modules>=0.2.1 in ./shark.venv/lib/python3.10/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (0.2.8)
    Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in ./shark.venv/lib/python3.10/site-packages (from google-resumable-media>=2.3.2->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (1.5.0)
    Requirement already satisfied: typing-extensions>=3.7.4.3 in ./shark.venv/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.10.0->transformers->-r /home/alpha/AI/SHARK/requirements.txt (line 17)) (4.4.0)
    Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in ./shark.venv/lib/python3.10/site-packages (from jsonschema>=3.0->altair->-r /home/alpha/AI/SHARK/requirements.txt (line 22)) (0.19.3)
    Requirement already satisfied: python-dateutil>=2.8.1 in ./shark.venv/lib/python3.10/site-packages (from pandas->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (2.8.2)
    Requirement already satisfied: pytz>=2020.1 in ./shark.venv/lib/python3.10/site-packages (from pandas->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (2022.7)
    Requirement already satisfied: certifi>=2017.4.17 in ./shark.venv/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (2022.12.7)
    Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./shark.venv/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (1.26.13)
    Requirement already satisfied: charset-normalizer<3,>=2 in ./shark.venv/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (2.1.1)
    Requirement already satisfied: idna<4,>=2.5 in ./shark.venv/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (3.4)
    Requirement already satisfied: frozenlist>=1.1.1 in ./shark.venv/lib/python3.10/site-packages (from aiohttp->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.3.3)
    Requirement already satisfied: multidict<7.0,>=4.5 in ./shark.venv/lib/python3.10/site-packages (from aiohttp->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (6.0.4)
    Requirement already satisfied: yarl<2.0,>=1.0 in ./shark.venv/lib/python3.10/site-packages (from aiohttp->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.8.2)
    Requirement already satisfied: aiosignal>=1.1.2 in ./shark.venv/lib/python3.10/site-packages (from aiohttp->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.3.1)
    Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in ./shark.venv/lib/python3.10/site-packages (from aiohttp->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (4.0.2)
    Requirement already satisfied: starlette==0.22.0 in ./shark.venv/lib/python3.10/site-packages (from fastapi->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.22.0)
    Requirement already satisfied: anyio<5,>=3.4.0 in ./shark.venv/lib/python3.10/site-packages (from starlette==0.22.0->fastapi->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (3.6.2)
    Requirement already satisfied: httpcore<0.17.0,>=0.15.0 in ./shark.venv/lib/python3.10/site-packages (from httpx->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.16.3)
    Requirement already satisfied: sniffio in ./shark.venv/lib/python3.10/site-packages (from httpx->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.3.0)
    Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in ./shark.venv/lib/python3.10/site-packages (from httpx->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.5.0)
    Requirement already satisfied: zipp>=0.5 in ./shark.venv/lib/python3.10/site-packages (from importlib-metadata->diffusers->-r /home/alpha/AI/SHARK/requirements.txt (line 18)) (3.11.0)
    Requirement already satisfied: mdurl~=0.1 in ./shark.venv/lib/python3.10/site-packages (from markdown-it-py[linkify,plugins]->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.1.2)
    Requirement already satisfied: linkify-it-py~=1.0 in ./shark.venv/lib/python3.10/site-packages (from markdown-it-py[linkify,plugins]->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.0.3)
    Requirement already satisfied: mdit-py-plugins in ./shark.venv/lib/python3.10/site-packages (from markdown-it-py[linkify,plugins]->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.3.3)
    Requirement already satisfied: contourpy>=1.0.1 in ./shark.venv/lib/python3.10/site-packages (from matplotlib->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.0.6)
    Requirement already satisfied: fonttools>=4.22.0 in ./shark.venv/lib/python3.10/site-packages (from matplotlib->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (4.38.0)
    Requirement already satisfied: kiwisolver>=1.0.1 in ./shark.venv/lib/python3.10/site-packages (from matplotlib->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.4.4)
    Requirement already satisfied: cycler>=0.10 in ./shark.venv/lib/python3.10/site-packages (from matplotlib->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.11.0)
    Requirement already satisfied: pyparsing>=2.2.1 in ./shark.venv/lib/python3.10/site-packages (from matplotlib->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (3.0.9)
    Requirement already satisfied: click>=7.0 in ./shark.venv/lib/python3.10/site-packages (from uvicorn->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (8.1.3)
    Requirement already satisfied: h11>=0.8 in ./shark.venv/lib/python3.10/site-packages (from uvicorn->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (0.14.0)
    Requirement already satisfied: uc-micro-py in ./shark.venv/lib/python3.10/site-packages (from linkify-it-py~=1.0->markdown-it-py[linkify,plugins]->gradio->-r /home/alpha/AI/SHARK/requirements.txt (line 21)) (1.0.1)
    Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in ./shark.venv/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=1.25.0->google-cloud-storage->-r /home/alpha/AI/SHARK/requirements.txt (line 8)) (0.4.8)
    Looking in links: https://llvm.github.io/torch-mlir/package-index/
    Requirement already satisfied: torch-mlir in ./shark.venv/lib/python3.10/site-packages (20230101.705)
    Requirement already satisfied: torch==2.0.0.dev20221231 in ./shark.venv/lib/python3.10/site-packages (from torch-mlir) (2.0.0.dev20221231+cpu)
    Requirement already satisfied: numpy in ./shark.venv/lib/python3.10/site-packages (from torch-mlir) (1.24.1)
    Requirement already satisfied: networkx in ./shark.venv/lib/python3.10/site-packages (from torch==2.0.0.dev20221231->torch-mlir) (3.0rc1)
    Requirement already satisfied: packaging in ./shark.venv/lib/python3.10/site-packages (from torch==2.0.0.dev20221231->torch-mlir) (22.0)
    Requirement already satisfied: typing-extensions in ./shark.venv/lib/python3.10/site-packages (from torch==2.0.0.dev20221231->torch-mlir) (4.4.0)
    Requirement already satisfied: sympy in ./shark.venv/lib/python3.10/site-packages (from torch==2.0.0.dev20221231->torch-mlir) (1.11.1)
    Requirement already satisfied: mpmath>=0.19 in ./shark.venv/lib/python3.10/site-packages (from sympy->torch==2.0.0.dev20221231->torch-mlir) (1.2.1)
    Successfully Installed torch-mlir
    rm: cannot remove '.use-iree': No such file or directory
    Installing https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html...
    Looking in links: https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html
    Requirement already satisfied: iree-compiler in ./shark.venv/lib/python3.10/site-packages (20230101.261)
    Requirement already satisfied: iree-runtime in ./shark.venv/lib/python3.10/site-packages (20230101.261)
    Requirement already satisfied: numpy in ./shark.venv/lib/python3.10/site-packages (from iree-compiler) (1.24.1)
    Requirement already satisfied: PyYAML in ./shark.venv/lib/python3.10/site-packages (from iree-compiler) (6.0)
    Looking in links: https://llvm.github.io/torch-mlir/package-index/, https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html, https://download.pytorch.org/whl/nightly/torch/
    Obtaining file:///home/alpha/AI/SHARK
      Installing build dependencies ... error
      error: subprocess-exited-with-error
    
      × pip subprocess to install build dependencies did not run successfully.
      │ exit code: 1
      ╰─> [33 lines of output]
          Looking in links: https://llvm.github.io/torch-mlir/package-index/, https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html, https://download.pytorch.org/whl/nightly/torch/
          Collecting setuptools>=42
            Using cached setuptools-65.6.3-py3-none-any.whl (1.2 MB)
          Collecting wheel
            Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
          Collecting packaging
            Using cached packaging-22.0-py3-none-any.whl (42 kB)
          Collecting numpy>=1.22.4
            Using cached numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
          Collecting torch-mlir>=20221021.633
            Downloading https://github.com/llvm/torch-mlir/releases/download/snapshot-20230101.705/torch_mlir-20230101.705-cp310-cp310-linux_x86_64.whl (40.0 MB)
               ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.0/40.0 MB 26.6 MB/s eta 0:00:00
          Collecting iree-compiler>=20221022.190
            Downloading https://github.com/nod-ai/SHARK-Runtime/releases/download/candidate-20230101.261/iree_compiler-20230101.261-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (55.5 MB)
               ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55.5/55.5 MB 24.5 MB/s eta 0:00:00
          Collecting iree-runtime>=20221022.190
            Downloading https://github.com/nod-ai/SHARK-Runtime/releases/download/candidate-20230101.261/iree_runtime-20230101.261-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB)
               ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 20.4 MB/s eta 0:00:00
          Collecting torch==2.0.0.dev20221231
            Using cached https://download.pytorch.org/whl/nightly/rocm5.3/torch-2.0.0.dev20221231%2Brocm5.3-cp310-cp310-linux_x86_64.whl (1557.5 MB)
          Collecting sympy
            Using cached sympy-1.11.1-py3-none-any.whl (6.5 MB)
          Collecting networkx
            Using cached networkx-2.8.8-py3-none-any.whl (2.0 MB)
          Collecting typing-extensions
            Using cached typing_extensions-4.4.0-py3-none-any.whl (26 kB)
          Collecting PyYAML
            Using cached PyYAML-6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (682 kB)
          Collecting mpmath>=0.19
            Using cached mpmath-1.2.1-py3-none-any.whl (532 kB)
          Installing collected packages: mpmath, wheel, typing-extensions, sympy, setuptools, PyYAML, packaging, numpy, networkx, torch, iree-runtime, iree-compiler, torch-mlir
          ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device
    
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: subprocess-exited-with-error
    
    × pip subprocess to install build dependencies did not run successfully.
    │ exit code: 1
    ╰─> See above for output.
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
    Before running examples activate venv with:
      source shark.venv/bin/activate
    
    ~/AI/SHARK main 48s
    
    

    I think there is just not enough space in /tmp to work with (even though this device has 16GB RAM + 16GB swap).

  • VK_ERROR_OUT_OF_DEVICE_MEMORY generating in Web UI, on Windows, Radeon VII 16GB

    VK_ERROR_OUT_OF_DEVICE_MEMORY generating in Web UI, on Windows, Radeon VII 16GB

    Getting the attached errors starting a generation in the web ui, on a clean clone with head at 136021424c3223f10b5ad2fb12f23aa46e32a0fd

    Card is AMD Radeon VII 16GB, Window 10, have nuked ~.local\shark-tank etc.

    Have an older clone, with my futzing about on a branch off of the 20221215.386 tag at 6508e3fcc9c1f0c35cc56cc60207e8c36eb41a58, with it's own --local-shark-tank set and previously generated .vmfb's, which is still working.

    nod-ai-shark-web-VK_OUT_OF_DEVICE_MEMORY.log

  • [SD] Update all the utilities to make web and CLI codebase closer

    [SD] Update all the utilities to make web and CLI codebase closer

    At this point, all the utilities of SD web and CLI are exactly same. Also the web/models/stable_diffusion/main.py can be directly executed.

    Signed-Off-by: Gaurav Shukla [email protected]

  • Add SharkDownloader for end users

    Add SharkDownloader for end users

    I add a shark_downloader.py for end users. Uses should give model name, model type, input type, input json, and local shark tank dir. If the model is not found locally, the user will have to give tank_url for download. Right now, it only supports the tosa.mlir file converts from tflite mode. Will add other support later. The albert_lite_base_tflite_mlir_test.py show you how to use it. Here is quick reference: self.shark_downloader = SharkDownloader(model_name="albert_lite_base", tank_url="https://storage.googleapis.com/shark_tank/tflite" "/albert_lite_base/albert_lite_base_tosa.mlir", local_tank_dir="./../gen_shark_tank/tflite", model_type="tflite-tosa", input_json="input.json", input_type="int32") tflite_tosa_model = self.shark_downloader.get_mlir_file() inputs = self.shark_downloader.get_inputs() self.shark_module = SharkInference(tflite_tosa_model, inputs, device=self.device, dynamic=self.dynamic, jit_trace=True) self.shark_module.set_frontend("tflite-tosa") self.shark_module.compile() self.shark_module.forward(inputs)

  • Add the option to use tuned model in shark_runner

    Add the option to use tuned model in shark_runner

    • Add a flag to load pre-tuned model config file in shark_runner, and nothing is changed in inference api.
    • The example is made for minilm model with tensorflow frontend and GPU. To run the example: python -m shark.examples.shark_inference.minilm_tf --device="gpu" --model_config_path=shark/examples/shark_inference/minilm_tf_gpu_config.json
    • The config file (example/shark-inference/.json) does not work for Torch frontend directly. The torch-mlir uses linalg.batch_matmul (instead of matmul) with first dimension as 1. So one needs to generate a new config file, and append the first dimension of tile_sizes as 1. Further test is needed for model annotation with Torch frontend.
  • TorchMLIR eager mode with IREE backend

    TorchMLIR eager mode with IREE backend

    Eager mode hidden behind SharkMode (see shark/examples/eager_mode.py).

    TorchMLIRTensor([[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]], backend=EagerModeRefBackend)
    TorchMLIRTensor([[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]], backend=EagerModeRefBackend)
    

    Note also that you need https://github.com/makslevental/torch-mlir/tree/add_device as your activated torch-mlir version.

  • DistilBert fails to lower through torch-mlir pass pipeline (illegal ops)

    DistilBert fails to lower through torch-mlir pass pipeline (illegal ops)

    Error output:

    error: failed to legalize operation 'torch.aten.view' that was explicitly marked illegal
    note: see current operation: %416 = "torch.aten.view"(%414, %415) : (!torch.vtensor<[?,?,768],f32>, !torch.list<int>) -> !torch.vtensor<[?,?,12,64],f32>                                                                                   
    Traceback (most recent call last):
      File "/home/ean/SHARK/generate_sharktank.py", line 180, in <module>
        save_torch_model(args.torch_model_csv)
      File "/home/ean/SHARK/generate_sharktank.py", line 68, in save_torch_model
        mlir_importer.import_debug(
      File "/home/ean/SHARK/shark/shark_importer.py", line 163, in import_debug
        imported_mlir = self.import_mlir(
      File "/home/ean/SHARK/shark/shark_importer.py", line 109, in import_mlir
        return self._torch_mlir(is_dynamic, tracing_required), func_name
      File "/home/ean/SHARK/shark/shark_importer.py", line 74, in _torch_mlir
        return get_torch_mlir_module(
      File "/home/ean/SHARK/shark/torch_mlir_utils.py", line 150, in get_torch_mlir_module
        pm.run(mb.module)
    RuntimeError: Failure while executing pass pipeline.
    

    Reproduce:

    • add distilbert-base-uncased,True,hf to tank/pytorch/torch_model_list.csv
    • run python generate_sharktank.py
  • textual inversion embeddings

    textual inversion embeddings

    Textual inversion embeddings are small files that can be loaded in addition to the larger (Stable Diffusion 2.1) model. They are each less then 1 MB and add an extra style (+keyword) to the text promt. Multiple can be added to SD and also multiple can be used in each text prompt.

    I found them in this video: https://www.youtube.com/watch?v=4E459tlwquU

    The person uses Automatic1111 (I'm not 100% sure but looks like) with SD2.1 So in theory it should be possible to do the same with shark. The embeddings are interesting because many exist and can be used in addition to the base model to create some unique art.

  • Allow pytests to be run with a specific precision.

    Allow pytests to be run with a specific precision.

    Ideally, we would be able to configure which precision to use when using the pytest framework. This may cross into importer and shark_tank generation, but there are a few ways to incorporate the feature regardless.

    We should have a set of importer config options that, when specified, will trigger a generation of the model artifacts locally if a matching configuration is not found in the cloud.

    This will likely need to be implemented after SHARK config is refactored into .json model cards.

  • Add support for RX480

    Add support for RX480

    This is failing as per this with the following message:

    Using target triple gcn4-480-linux
    Traceback (most recent call last):
      File "/mnt/storage/git/SHARK/shark/examples/shark_inference/stable_diffusion/main.py", line 104, in <module>
        unet = get_unet()
      File "/mnt/storage/git/SHARK/shark/examples/shark_inference/stable_diffusion/opt_params.py", line 72, in get_unet
        return get_shark_model(bucket, model_name, iree_flags)
      File "/mnt/storage/git/SHARK/shark/examples/shark_inference/stable_diffusion/utils.py", line 58, in get_shark_model
        return _compile_module(shark_module, model_name, extra_args)
      File "/mnt/storage/git/SHARK/shark/examples/shark_inference/stable_diffusion/utils.py", line 33, in _compile_module
        path = shark_module.save_module(
      File "/mnt/storage/git/SHARK/shark/shark_inference.py", line 188, in save_module
        return export_iree_module_to_vmfb(
      File "/mnt/storage/git/SHARK/shark/iree_utils/compile_utils.py", line 328, in export_iree_module_to_vmfb
        flatbuffer_blob = compile_module_to_flatbuffer(
      File "/mnt/storage/git/SHARK/shark/iree_utils/compile_utils.py", line 271, in compile_module_to_flatbuffer
        flatbuffer_blob = ireec.compile_str(
      File "/home/gap/git/SHARK-Runtime/build/compiler/bindings/python/iree/compiler/tools/core.py", line 278, in compile_str
        result = invoke_immediate(cl, immediate_input=input_bytes)
      File "/home/gap/git/SHARK-Runtime/build/compiler/bindings/python/iree/compiler/tools/binaries.py", line 196, in invoke_immediate
        raise CompilerToolError(process)
    iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
    Diagnostics:
    <stdin>:822:11: error: failed to legalize operation 'memref.load'
        %19 = linalg.generic {indexing_maps = [#map5, #map4, #map5], iterator_types = ["parallel", "parallel"]} ins(%18, %cst_2 : tensor<2x160xf32>, tensor<i64>) outs(%17 : tensor<2x160xf32>) {
              ^
    <stdin>:28:3: note: called from
      func.func @forward(%arg0: tensor<1x4x64x64xf16>, %arg1: tensor<1xf16>, %arg2: tensor<2x64x1024xf16>, %arg3: tensor<f32>) -> tensor<1x4x64x64xf16> {
      ^
    <stdin>:822:11: error: Failures have been detected while processing an MLIR pass pipeline
        %19 = linalg.generic {indexing_maps = [#map5, #map4, #map5], iterator_types = ["parallel", "parallel"]} ins(%18, %cst_2 : tensor<2x160xf32>, tensor<i64>) outs(%17 : tensor<2x160xf32>) {
              ^
    <stdin>:28:3: note: called from
      func.func @forward(%arg0: tensor<1x4x64x64xf16>, %arg1: tensor<1xf16>, %arg2: tensor<2x64x1024xf16>, %arg3: tensor<f32>) -> tensor<1x4x64x64xf16> {
      ^
    <stdin>:822:11: note: Pipeline failed while executing [`mlir::iree_compiler::IREE::HAL::TranslateExecutablesPass` on 'hal.executable' operation: @forward_dispatch_0, `mlir::iree_compiler::IREE::HAL::TranslateExecutablesPass` on 'hal.executable' operation: @forward_dispatch_2, `mlir::iree_compiler::IREE::HAL::TranslateExecutablesPass` on 'hal.executable' operation: @forward_dispatch_3, `mlir::iree_compiler::IREE::HAL::TranslateExecutablesPass` on 'hal.executable' operation: @forward_dispatch_1, `mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVLowerExecutableTarget` on 'hal.executable.variant' operation: @vulkan_spirv_fb, `SPIRVVectorize` on 'func.func' operation: @forward_dispatch_1_generic_320, `SPIRVVectorize` on 'func.func' operation: @forward_dispatch_2_generic_320, `ConvertToSPIRV` on 'builtin.module' operation, `ConvertToSPIRV` on 'builtin.module' operation]: reproducer generated at `iree-tmp/core-reproducer.mlir`
        %19 = linalg.generic {indexing_maps = [#map5, #map4, #map5], iterator_types = ["parallel", "parallel"]} ins(%18, %cst_2 : tensor<2x160xf32>, tensor<i64>) outs(%17 : tensor<2x160xf32>) {
              ^
    <stdin>:822:11: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Int64, Int8, StorageBuffer16BitAccess, StorageUniform16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, GroupNonUniform, GroupNonUniformVote, GroupNonUniformArithmetic, GroupNonUniformBallot, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformClustered, GroupNonUniformQuad, VariablePointers, VariablePointersStorageBuffer], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class, SPV_KHR_variable_pointers]>, api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<max_compute_shared_memory_size = 65536, max_compute_workgroup_invocations = 1024, max_compute_workgroup_size = [1024, 1024, 1024], subgroup_size = 64, min_subgroup_size = 32, max_subgroup_size = 64, cooperative_matrix_properties_nv = []>>}>
        %19 = linalg.generic {indexing_maps = [#map5, #map4, #map5], iterator_types = ["parallel", "parallel"]} ins(%18, %cst_2 : tensor<2x160xf32>, tensor<i64>) outs(%17 : tensor<2x160xf32>) {
              ^
    <stdin>:28:3: note: called from
      func.func @forward(%arg0: tensor<1x4x64x64xf16>, %arg1: tensor<1xf16>, %arg2: tensor<2x64x1024xf16>, %arg3: tensor<f32>) -> tensor<1x4x64x64xf16> {
      ^
    <stdin>:822:11: error: failed to serialize executables
        %19 = linalg.generic {indexing_maps = [#map5, #map4, #map5], iterator_types = ["parallel", "parallel"]} ins(%18, %cst_2 : tensor<2x160xf32>, tensor<i64>) outs(%17 : tensor<2x160xf32>) {
              ^
    <stdin>:28:3: note: called from
      func.func @forward(%arg0: tensor<1x4x64x64xf16>, %arg1: tensor<1xf16>, %arg2: tensor<2x64x1024xf16>, %arg3: tensor<f32>) -> tensor<1x4x64x64xf16> {
      ^
    
    
    Invoked with:
     iree-compile /home/gap/git/SHARK-Runtime/build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-compile - --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --iree-llvm-embedded-linker-path=/home/gap/git/SHARK-Runtime/build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=iree-tmp/core-reproducer.mlir --iree-llvm-target-cpu-features=host --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=gcn4-480-linux --iree-flow-enable-padding-linalg-ops --iree-flow-linalg-ops-padding-size=32 --iree-flow-enable-conv-img2col-transform
    
  • vulkan and CPU not working. CUDA very slow.

    vulkan and CPU not working. CUDA very slow.

    Windows 10 22H2, Xeon E5 2696v3, 32 RAM.

    Nvidia RTX 3060 12gb as primary GPU AMD RX 5600XT 6gb as secondary GPU

    Devices are in the list изображение

    • cuda nvidia works, but VERY slow. 0.62it/s Automatic1111 web-ui give me 5-6 it/s with xformers.

    • Vulcan 5600XT show 2.62 it/s but got error

    Found device AMD Radeon RX 5600 XT. Using target triple rdna2-unknown-windows. Tuned models are currently not supported for this setting. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 17.7kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 17.8kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\euler_scale_model_input_fp16_vulkan-00000000-0500-0000-0000-000000000000.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 17.3kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.7kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\euler_step_fp16_vulkan-00000000-0500-0000-0000-000000000000.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.3kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.7kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\vae2base_19dec_fp16_vulkan-00000000-0500-0000-0000-000000000000.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.8kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.8kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\unet_19dec_v2p1base_fp16_64_vulkan-00000000-0500-0000-0000-000000000000.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.0kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 15.2kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\clip_19dec_v2p1base_fp32_64_vulkan-00000000-0500-0000-0000-000000000000.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. 20it [00:07, 2.62it/s] Traceback (most recent call last): File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\routes.py", line 321, in run_predict output = await app.blocks.process_api( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api result = await self.call_function(fn_index, inputs, iterator, request) File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\blocks.py", line 856, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\main.py", line 196, in stable_diff_inf images = vae.forward((latents_numpy,)) File "C:\Software\Stable_Diffusion\SHARK\shark\shark_inference.py", line 142, in forward return self.shark_runner.run(inputs, send_to_host) File "C:\Software\Stable_Diffusion\SHARK\shark\shark_runner.py", line 95, in run return get_results( File "C:\Software\Stable_Diffusion\SHARK\shark\iree_utils\compile_utils.py", line 362, in get_results result = compiled_vm(*device_inputs) File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\iree\runtime\function.py", line 130, in __call__ self._invoke(arg_list, ret_list) File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\iree\runtime\function.py", line 154, in _invoke self._vm_context.invoke(self._vm_function, arg_list, ret_list) RuntimeError: Error invoking function: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\vma_allocator.cc:316: RESOURCE_EXHAUSTED; VK_ERROR_OUT_OF_DEVICE_MEMORY; vmaCreateBuffer; while invoking native function hal.device.queue.alloca; while calling import; [ 1] native hal.device.queue.alloca:0 - [ 0] bytecode module.forward:3102 [ <stdin>:188:10 at <stdin>:21:3, <stdin>:198:10 at <stdin>:21:3, <stdin>:199:15 at <stdin>:21:3, <stdin>:208:10 at <stdin>:21:3, <stdin>:212:10 at <stdin>:21:3, <stdin>:230:11 at <stdin>:21:3, <stdin>:235:11 at <stdin>:21:3, <stdin>:245:11 at <stdin>:21:3, <stdin>:289:11 at <stdin>:21:3, <stdin>:324:11 at <stdin>:21:3, <stdin>:329:19 at <stdin>:21:3, <stdin>:337:11 at <stdin>:21:3, <stdin>:355:11 at <stdin>:21:3, <stdin>:360:11 at <stdin>:21:3, <stdin>:370:11 at <stdin>:21:3, <stdin>:412:11 at <stdin>:21:3, <stdin>:446:11 at <stdin>:21:3, <stdin>:459:11 at <stdin>:21:3, <stdin>:465:11 at <stdin>:21:3, <stdin>:488:11 at <stdin>:21:3, <stdin>:493:11 at <stdin>:21:3, <stdin>:503:11 at <stdin>:21:3, <stdin>:545:11 at <stdin>:21:3, <stdin>:573:11 at <stdin>:21:3, <stdin>:606:11 at <stdin>:21:3, <stdin>:629:11 at <stdin>:21:3, <stdin>:593:11 at <stdin>:21:3, <stdin>:663:12 at <stdin>:21:3, <stdin>:692:12 at <stdin>:21:3, <stdin>:716:12 at <stdin>:21:3, <stdin>:621:11 at <stdin>:21:3, <stdin>:721:12 at <stdin>:21:3, <stdin>:729:12 at <stdin>:21:3, <stdin>:746:12 at <stdin>:21:3, <stdin>:741:12 at <stdin>:21:3, <stdin>:773:12 at <stdin>:21:3, <stdin>:778:12 at <stdin>:21:3, <stdin>:788:12 at <stdin>:21:3, <stdin>:830:12 at <stdin>:21:3, <stdin>:864:12 at <stdin>:21:3, <stdin>:877:12 at <stdin>:21:3, <stdin>:895:12 at <stdin>:21:3, <stdin>:900:12 at <stdin>:21:3, <stdin>:910:12 at <stdin>:21:3, <stdin>:952:12 at <stdin>:21:3, <stdin>:986:12 at <stdin>:21:3, <stdin>:999:12 at <stdin>:21:3, <stdin>:1005:12 at <stdin>:21:3, <stdin>:1032:12 at <stdin>:21:3, <stdin>:1037:12 at <stdin>:21:3, <stdin>:1047:12 at <stdin>:21:3, <stdin>:1089:12 at <stdin>:21:3, <stdin>:1123:12 at <stdin>:21:3, <stdin>:1136:12 at <stdin>:21:3, <stdin>:1154:12 at <stdin>:21:3, <stdin>:1159:12 at <stdin>:21:3, <stdin>:1169:12 at <stdin>:21:3, <stdin>:1211:12 at <stdin>:21:3, <stdin>:1245:12 at <stdin>:21:3, <stdin>:1258:12 at <stdin>:21:3, <stdin>:1264:12 at <stdin>:21:3, <stdin>:1291:12 at <stdin>:21:3, <stdin>:1296:12 at <stdin>:21:3, <stdin>:1306:12 at <stdin>:21:3, <stdin>:1348:12 at <stdin>:21:3, <stdin>:1382:12 at <stdin>:21:3, <stdin>:1395:12 at <stdin>:21:3, <stdin>:1413:12 at <stdin>:21:3, <stdin>:1418:12 at <stdin>:21:3, <stdin>:1428:12 at <stdin>:21:3, <stdin>:1470:12 at <stdin>:21:3, <stdin>:1504:12 at <stdin>:21:3, <stdin>:1517:12 at <stdin>:21:3, <stdin>:1523:12 at <stdin>:21:3, <stdin>:1550:12 at <stdin>:21:3, <stdin>:1555:12 at <stdin>:21:3, <stdin>:1565:12 at <stdin>:21:3, <stdin>:1607:12 at <stdin>:21:3, <stdin>:1641:12 at <stdin>:21:3, <stdin>:1654:12 at <stdin>:21:3, <stdin>:1672:12 at <stdin>:21:3, <stdin>:1677:12 at <stdin>:21:3, <stdin>:1687:12 at <stdin>:21:3, <stdin>:1729:12 at <stdin>:21:3, <stdin>:1763:12 at <stdin>:21:3, <stdin>:1776:12 at <stdin>:21:3, <stdin>:1777:12 at <stdin>:21:3, <stdin>:1800:19 at <stdin>:21:3, <stdin>:1789:12 at <stdin>:21:3, <stdin>:1808:12 at <stdin>:21:3, <stdin>:1812:12 at <stdin>:21:3, <stdin>:1828:12 at <stdin>:21:3, <stdin>:1833:12 at <stdin>:21:3, <stdin>:1843:12 at <stdin>:21:3, <stdin>:1885:12 at <stdin>:21:3, <stdin>:1920:12 at <stdin>:21:3, <stdin>:1933:12 at <stdin>:21:3, <stdin>:1951:12 at <stdin>:21:3, <stdin>:1956:12 at <stdin>:21:3, <stdin>:1966:12 at <stdin>:21:3, <stdin>:2008:12 at <stdin>:21:3, <stdin>:2042:12 at <stdin>:21:3, <stdin>:2055:12 at <stdin>:21:3, <stdin>:2061:12 at <stdin>:21:3, <stdin>:2084:12 at <stdin>:21:3, <stdin>:2089:12 at <stdin>:21:3, <stdin>:2099:12 at <stdin>:21:3, <stdin>:2141:12 at <stdin>:21:3, <stdin>:2175:12 at <stdin>:21:3, <stdin>:2188:12 at <stdin>:21:3, <stdin>:2206:12 at <stdin>:21:3, <stdin>:2211:12 at <stdin>:21:3, <stdin>:2221:12 at <stdin>:21:3, <stdin>:2263:12 at <stdin>:21:3, <stdin>:2297:12 at <stdin>:21:3, <stdin>:2310:12 at <stdin>:21:3, <stdin>:2316:12 at <stdin>:21:3, <stdin>:2339:12 at <stdin>:21:3, <stdin>:2344:12 at <stdin>:21:3, <stdin>:2354:12 at <stdin>:21:3, <stdin>:2396:12 at <stdin>:21:3, <stdin>:2430:12 at <stdin>:21:3, <stdin>:2443:12 at <stdin>:21:3, <stdin>:2461:12 at <stdin>:21:3, <stdin>:2466:12 at <stdin>:21:3, <stdin>:2476:12 at <stdin>:21:3, <stdin>:2518:12 at <stdin>:21:3, <stdin>:2552:12 at <stdin>:21:3, <stdin>:2565:12 at <stdin>:21:3, <stdin>:2566:12 at <stdin>:21:3, <stdin>:2589:19 at <stdin>:21:3, <stdin>:2578:12 at <stdin>:21:3, <stdin>:2597:12 at <stdin>:21:3, <stdin>:2601:12 at <stdin>:21:3, <stdin>:2617:12 at <stdin>:21:3, <stdin>:2622:12 at <stdin>:21:3, <stdin>:2632:12 at <stdin>:21:3, <stdin>:2674:12 at <stdin>:21:3, <stdin>:2709:12 at <stdin>:21:3, <stdin>:2723:12 at <stdin>:21:3, <stdin>:2727:12 at <stdin>:21:3, <stdin>:2743:12 at <stdin>:21:3, <stdin>:2748:12 at <stdin>:21:3, <stdin>:2758:12 at <stdin>:21:3, <stdin>:2800:12 at <stdin>:21:3, <stdin>:2835:12 at <stdin>:21:3, <stdin>:2840:19 at <stdin>:21:3, <stdin>:2848:12 at <stdin>:21:3, <stdin>:2859:12 at <stdin>:21:3, <stdin>:2853:12 at <stdin>:21:3, <stdin>:2882:12 at <stdin>:21:3, <stdin>:2887:12 at <stdin>:21:3, <stdin>:2897:12 at <stdin>:21:3, <stdin>:2939:12 at <stdin>:21:3, <stdin>:2973:12 at <stdin>:21:3, <stdin>:2986:12 at <stdin>:21:3, <stdin>:3004:12 at <stdin>:21:3, <stdin>:3009:12 at <stdin>:21:3, <stdin>:3019:12 at <stdin>:21:3, <stdin>:3061:12 at <stdin>:21:3, <stdin>:3095:12 at <stdin>:21:3, <stdin>:3108:12 at <stdin>:21:3, <stdin>:3114:12 at <stdin>:21:3, <stdin>:3137:12 at <stdin>:21:3, <stdin>:3142:12 at <stdin>:21:3, <stdin>:3152:12 at <stdin>:21:3, <stdin>:3194:12 at <stdin>:21:3, <stdin>:3228:12 at <stdin>:21:3, <stdin>:3241:12 at <stdin>:21:3, <stdin>:3259:12 at <stdin>:21:3, <stdin>:3264:12 at <stdin>:21:3, <stdin>:3274:12 at <stdin>:21:3, <stdin>:3316:12 at <stdin>:21:3, <stdin>:3350:12 at <stdin>:21:3, <stdin>:3363:12 at <stdin>:21:3, <stdin>:3364:12 at <stdin>:21:3, <stdin>:3387:19 at <stdin>:21:3, <stdin>:3376:12 at <stdin>:21:3, <stdin>:3395:12 at <stdin>:21:3, <stdin>:3399:12 at <stdin>:21:3, <stdin>:3415:12 at <stdin>:21:3, <stdin>:3420:12 at <stdin>:21:3, <stdin>:3430:12 at <stdin>:21:3, <stdin>:3472:12 at <stdin>:21:3, <stdin>:3507:12 at <stdin>:21:3, <stdin>:3521:12 at <stdin>:21:3, <stdin>:3541:12 at <stdin>:21:3, <stdin>:3546:12 at <stdin>:21:3, <stdin>:3556:12 at <stdin>:21:3, <stdin>:3598:12 at <stdin>:21:3, <stdin>:3633:12 at <stdin>:21:3, <stdin>:3638:19 at <stdin>:21:3, <stdin>:3646:12 at <stdin>:21:3, <stdin>:3657:12 at <stdin>:21:3, <stdin>:3651:12 at <stdin>:21:3, <stdin>:3680:12 at <stdin>:21:3, <stdin>:3685:12 at <stdin>:21:3, <stdin>:3695:12 at <stdin>:21:3, <stdin>:3737:12 at <stdin>:21:3, <stdin>:3771:12 at <stdin>:21:3, <stdin>:3784:12 at <stdin>:21:3, <stdin>:3802:12 at <stdin>:21:3, <stdin>:3807:12 at <stdin>:21:3, <stdin>:3817:12 at <stdin>:21:3, <stdin>:3859:12 at <stdin>:21:3, <stdin>:3893:12 at <stdin>:21:3, <stdin>:3906:12 at <stdin>:21:3, <stdin>:3912:12 at <stdin>:21:3, <stdin>:3935:12 at <stdin>:21:3, <stdin>:3940:12 at <stdin>:21:3, <stdin>:3950:12 at <stdin>:21:3, <stdin>:3992:12 at <stdin>:21:3, <stdin>:4026:12 at <stdin>:21:3, <stdin>:4039:12 at <stdin>:21:3, <stdin>:4057:12 at <stdin>:21:3, <stdin>:4062:12 at <stdin>:21:3, <stdin>:4072:12 at <stdin>:21:3, <stdin>:4114:12 at <stdin>:21:3, <stdin>:4148:12 at <stdin>:21:3, <stdin>:4161:12 at <stdin>:21:3, <stdin>:4162:12 at <stdin>:21:3, <stdin>:4190:12 at <stdin>:21:3, <stdin>:4195:12 at <stdin>:21:3, <stdin>:4205:12 at <stdin>:21:3, <stdin>:4247:12 at <stdin>:21:3, <stdin>:4281:12 at <stdin>:21:3, <stdin>:4295:12 at <stdin>:21:3, <stdin>:4286:19 at <stdin>:21:3 ]

    for vulkan on nvidia I get error too

    Found device NVIDIA GeForce RTX 3060. Using target triple rdna2-unknown-windows. Tuned models are currently not supported for this setting. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 17.6kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 17.2kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\euler_scale_model_input_fp16_vulkan-3290c55c-57a2-5e0d-b28b-58081544255c.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 15.4kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 15.9kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\euler_step_fp16_vulkan-3290c55c-57a2-5e0d-b28b-58081544255c.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.2kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.3kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\vae2base_19dec_fp16_vulkan-3290c55c-57a2-5e0d-b28b-58081544255c.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 17.2kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 17.2kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\unet_19dec_v2p1base_fp16_64_vulkan-3290c55c-57a2-5e0d-b28b-58081544255c.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.8kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.7kB/s] loading existing vmfb from: C:\Software\Stable_Diffusion\SHARK\web\clip_19dec_v2p1base_fp32_64_vulkan-3290c55c-57a2-5e0d-b28b-58081544255c.vmfb WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. Traceback (most recent call last): File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\routes.py", line 321, in run_predict output = await app.blocks.process_api( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api result = await self.call_function(fn_index, inputs, iterator, request) File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\blocks.py", line 856, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\main.py", line 106, in stable_diff_inf model_cache.set_models(device_key) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\cache_objects.py", line 98, in set_models self.clip = get_clip() File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\opt_params.py", line 99, in get_clip return get_shark_model(bucket, model_name, iree_flags) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\utils.py", line 58, in get_shark_model return _compile_module(shark_module, model_name, extra_args) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\utils.py", line 23, in _compile_module shark_module.load_module(vmfb_path, extra_args=extra_args) File "C:\Software\Stable_Diffusion\SHARK\shark\shark_inference.py", line 209, in load_module ) = load_flatbuffer( File "C:\Software\Stable_Diffusion\SHARK\shark\iree_utils\compile_utils.py", line 314, in load_flatbuffer return get_iree_module(flatbuffer_blob, device, func_name) File "C:\Software\Stable_Diffusion\SHARK\shark\iree_utils\compile_utils.py", line 283, in get_iree_module vm_module = ireert.VmModule.from_flatbuffer( ValueError: Error creating vm module from FlatBuffer: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\vm\bytecode_module.c:117: INVALID_ARGUMENT; FlatBuffer data is not present or less than 16 bytes (0 total)

    for CPU I get error too Tuned models are currently not supported for this setting. Using cached models from c:\temp\... 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.3kB/s] 100%|███████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:00<00:00, 16.3kB/s] No vmfb found. Compiling and saving to C:\Software\Stable_Diffusion\SHARK\web\euler_scale_model_input_fp16_cpu.vmfb "uname" не является внутренней или внешней командой, исполняемой программой или пакетным файлом. Traceback (most recent call last): File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\routes.py", line 321, in run_predict output = await app.blocks.process_api( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api result = await self.call_function(fn_index, inputs, iterator, request) File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\gradio\blocks.py", line 856, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Software\Stable_Diffusion\SHARK\shark.venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\main.py", line 106, in stable_diff_inf model_cache.set_models(device_key) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\cache_objects.py", line 94, in set_models self.schedulers = get_schedulers(args.version) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\cache_objects.py", line 64, in get_schedulers schedulers["SharkEulerDiscrete"].compile() File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\schedulers.py", line 99, in compile self.scaling_model = get_shark_model( File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\utils.py", line 58, in get_shark_model return _compile_module(shark_module, model_name, extra_args) File "C:\Software\Stable_Diffusion\SHARK\web\models\stable_diffusion\utils.py", line 33, in _compile_module path = shark_module.save_module( File "C:\Software\Stable_Diffusion\SHARK\shark\shark_inference.py", line 188, in save_module return export_iree_module_to_vmfb( File "C:\Software\Stable_Diffusion\SHARK\shark\iree_utils\compile_utils.py", line 328, in export_iree_module_to_vmfb flatbuffer_blob = compile_module_to_flatbuffer( File "C:\Software\Stable_Diffusion\SHARK\shark\iree_utils\compile_utils.py", line 245, in compile_module_to_flatbuffer args += get_iree_device_args(device, extra_args) File "C:\Software\Stable_Diffusion\SHARK\shark\iree_utils\compile_utils.py", line 37, in get_iree_device_args return get_iree_cpu_args() File "C:\Software\Stable_Diffusion\SHARK\shark\iree_utils\cpu_utils.py", line 34, in get_iree_cpu_args subprocess.run( File "C:\Users\F2\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'uname -s -m' returned non-zero exit status 1.

    So it wont work for me at all, uses strange model files, no speedup for cuda.

  • multi-gpu to speedup generation?

    multi-gpu to speedup generation?

    I have multiple vega64-56 and RTX3060 + RX 5600 XT gpus. Want to speedup Stable Diffusion. Is it possible using SHARK?

    I found that I can only select one device from the list, but cant use multiple gpus for doing one job

Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration

3D+NeuroSim V1.0 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly av

Dec 15, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Jan 5, 2023
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Dec 30, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs
 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Dec 17, 2022
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

Dec 5, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Dec 23, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

Dec 31, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Jan 5, 2023
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Dec 20, 2022
A flexible, high-performance serving system for machine learning models

XGBoost Serving This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more informat

Nov 18, 2022
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

Dec 5, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Jan 3, 2023
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

Nov 24, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Dec 30, 2022
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI

High-Performance-Computing-Experiments Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI 实验结

Nov 27, 2021
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

Dec 29, 2022
DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++
DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++

DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provides an python interface to better interface with machine-learning toolkits. Deep RTS can process the game with over 6 000 000 steps per second and 2 000 000 steps when rendering graphics. In comparison to other solutions, such as StarCraft, this is over 15 000% faster simulation time running on Intel i7-8700k with Nvidia RTX 2080 TI.

Dec 19, 2022