IBM Z Deep Neural Network Library (zDNN) provides an interface for applications making use of Neural Network Processing Assist Facility (NNPA).

zDNN API Reference

Contacts

Version

0.4.0

Table of Contents

  1. Overview

  2. Environment

  3. Common Data Types and Structs

  4. Runtime Environment Variables

  5. API Reference

  6. Usage Examples

Overview

Deep Learning Library - the deep learning library support (zDNN) is the SW enablement technology provided by IBM to meet the following requirements:

  • Specialized-function-assist instructions are intended to provide performance improvements for specific operations used in software libraries, utilities, and operating system (OS) services. The facilities and instructions described as specialized-function-assist instructions may be replaced or removed in the future. As such, the IBM recommendation for these instructions is that a software library or operating system function be used instead of directly accessing the instructions. This is the function provided by zDNN.
  • zAIU has very complex data layout requirements; these requirements arrange the tensor to enhance the performance characteristics of the operations. zDNN will format the tensor appropriately on behalf of the caller, and it will do so using an optimized approach.
  • For deep learning operations, zAIU requires the use of an internal data type (DLFLOAT16). This is a 2-byte data type, similar in concept to Brain float (BFLOAT); that is, it is an AI optimized format that is used to speed up training and inference (from 4-byte formats) while minimizing the loss of accuracy at inference time.

The zDNN library will provide a set of APIs that an exploiter will utilize to drive the desired request. zDNN will be available on both z/OS and Linux on Z; the inclusion of Linux on Z provides particular benefit, as it will allow us to enable acceleration in frameworks for z/OS via z/OS Container Extensions (zCX).


Environment

z/OS:

  • Problem state
  • AMODE64
  • XPLINK

Alignment requirements

AIU Op Limits

This implies a zDNN limitation as well at this point.

  • For all ops:

    • Number of elements in any dimension must not exceed the value returned by zdnn_get_nnpa_max_dim_idx_size()
    • Total number of bytes required for storing a transformed tensor must not exceed the value returned by zdnn_get_nnpa_max_tensor_size()

Application interfaces for zAIU Enterprise Neural Network Inference

zDNN General

The zDNN deep learning library provides the standard IBM Z software interface to the zAIU. This IBM-provided C library provides a set of functions that handle the data transformation requirements of the AIU and provide wrapper functions for the NNPA instruction primitives.

The zDNN functions use the following criteria to determine if zAIU can be used to accelerate a deep learning primitive:

  • Neural Network Processing Assist (NNPA) facility indicator in the system STFLE output.
  • Output of the NNPA-QAF (Query Available Functions) request.

Using zDNN

To use the IBM-provided zDNN C library for the NNPA instruction, follow these steps:

  1. Link or re-link applications to use the IBM-provided zDNN. The IBM-provided zDNN is a library file in the z/OS UNIX System Services file system and can be statically or dynamically linked into your applications. The paths for the zDNN archive file and the zDNN header files are:

z/OS (LE required): Path for 64-bit dynamic library files:

  • /lib/libzdnn.so
  • /lib/libzdnn.x

Path for the zDNN header files:

  • /usr/include/

The XL C/C++ compiler and the z/OS Language Environment provide various environment variables to control processing, in addition to the variables provided by the zDNN library itself.

  1. Use the environment variable _CEE_RUNOPTS to specify invocation Language Environment runtime options. For more information about using the environment variable _CEE_RUNOPTS and other C and LE variables, see z/OS XL C/C++ Programming Guide.

  2. For environment variables accepted by the zDNN library, see Runtime Environment Variables.

Linux on Z:

On Linux on Z we expect to ship source as well a package-installable library and header. The library installation will conform to the standards of the packaging method chosen.


Common Types and Structs

Include Files: zdnn.h

Version Information

Back to Table of Contents

#define ZDNN_VERSION "0.4.0"
#define ZDNN_VERNUM 0x000400 // 0x[major][minor][patch]
#define ZDNN_VER_MAJOR 0
#define ZDNN_VER_MINOR 4
#define ZDNN_VER_PATCH 0
  1. zDNN major version (ZDNN_VER_MAJOR) will be incremented if any backwards incompatible changes are introduced to the API. It may also include minor and patch level changes. Patch and minor version will be reset to 0 when major version is incremented.
  2. zDNN minor version (ZDNN_VER_MINOR) will be incremented if new, backwards compatible functionalities are introduced to the API or if any API functionalities are marked as deprecated. It may also include patch level changes. Patch version will be reset to 0 when minor version is incremented.
  3. zDNN patch version (ZDNN_VER_PATCH) will be incremented if only backwards compatible bug fixes are introduced. A bug fix being defined as an internal change that fixes incorrect behavior.

Functions for checking version incompatibility with the zDNN load library are provided and described in the Support Functions section.

zDNN zTensor

Back to Table of Contents

typedef struct zdnn_ztensor {
  zdnn_tensor_desc
      *pre_transformed_desc; // tensor's shape information before transformation
  zdnn_tensor_desc *transformed_desc; // transformed tensor's shape information
  uint64_t buffer_size;               // tensor size in bytes
  void *buffer;                       // pointer to the tensor in memory
  bool is_transformed; // indicator if data in buffer has been transformed
  char reserved[31];   // not currently used, should contain zeros.
} zdnn_ztensor;

General zTensor Requirements

Back to Table of Contents

  • buffer requirements:
    • Calling zdnn_init_ztensor_with_malloc automatically allocates and sets a valid buffer for a tensor.
    • buffer field must point to storage allocated of sufficient size to contain the transformed tensor data described by the its transformed_desc field.
    • Start of buffer field must be 4k aligned.
  • reserved should contain zeros, otherwise the program may not operate compatibly in the future.

Concatenated zTensor Requirements

Back to Table of Contents

  • For use with weights/biases/hidden-weights/hidden-biases RNN-gates tensors.
  • You must use zdnn_generate_transformed_desc_concatenated with the appropriate concatenation info
    • Do not use zdnn_generate_transformed_desc with concatenated tensors
  • The pre-transformed shape dimensions should not include the concatenation.
    • Thus, the pre-transformed shape should be that of a single gate, not the shape of the combined gates
  • Afterward transform with zdnn_transform_ztensor as normal
  • Must follow general tensor requirements

zDNN Tensor Descriptors

Back to Table of Contents

typedef struct zdnn_tensor_desc {
  zdnn_data_layouts layout; // data layout
  zdnn_data_formats format; // internal use only
  zdnn_data_types type;     // data type
  uint32_t dim4;            // number of elements in outermost dimension
  uint32_t dim3;            // ... outer dimension
  uint32_t dim2;            // ... inner dimension
  uint32_t dim1;            // number of elements in innermost dimension
} zdnn_tensor_desc;

Programming Notes

zDNN Data Layouts

Back to Table of Contents

The following are layouts for zDNN ztensor descriptors. These indicate the number and order of dimensions to expect for the ztensor data.

typedef enum zdnn_data_layouts {
  ZDNN_1D,          // 1d tensor
  ZDNN_2D,          // 2d tensor
  ZDNN_2DS,         // represents special 2D tensors required by LSTM/GRU
  ZDNN_3D,          // 3d tensor
  ZDNN_3DS,         // represents special 3D tensors required by
                    // LSTM/GRU/Softmax/Matmul
  ZDNN_ZRH,         // represents (update, reset, hidden) used by GRU
  ZDNN_4D,          // 4d tensor
  ZDNN_4DS,         // represents special 4D tensors required by LSTM/GRU output
  ZDNN_NHWC,        // 4d feature tensor in NHWC
  ZDNN_NCHW,        // 4d feature tensor in NCHW
  ZDNN_FICO,        // represents (forget, input, cell, output) used by LSTM
  ZDNN_HWCK,        // 4d kernel CNN tensor
  ZDNN_BIDIR_ZRH,   // ZRH variant to work with bidirectional LSTM/GRU output
  ZDNN_BIDIR_FICO  // FICO variant to work with bidirectional LSTM/GRU output
} zdnn_data_layouts;

Some layouts also indicate special re-arrangement of the data during ztensor transformation.

  • ZDNN_2DS - The outermost dimension of the original shape is promoted to dim4 during transformation. For example, a shape of (a, b) becomes [a, 1, 1, b] (dim4, dim3, dim2, dim1) in the transformed_desc
  • ZDNN_3DS - The outermost dimension of the original shape is promoted to dim4 during transformation. For example, a shape of (a, b, c) becomes [a, 1, b, c] (dim4, dim3, dim2, dim1) in the transformed_desc
  • ZDNN_4DS - Arrangement for RNN output tensor

The followings are set automatically in transformed_desc based on info when calling zdnn_generate_transformed_desc_concatenated():

  • ZDNN_ZRH/FICO - During transformation, the RNN input gates data are concatenated on the innermost dimension. Supported with pre_transformed_layout of ZDNN_2DS or ZDNN_3DS.
  • ZDNN_BIDIR_ZRH/FICO - Similar to ZDNN_ZRH/FICO, used when:
    1. transforming RNN input weight gate data, and
    2. the input tensor for the current RNN layer is a bidirectional RNN output from a previous RNN layer

zDNN Data Formats

Back to Table of Contents

typedef enum zdnn_data_formats {
  ZDNN_FORMAT_4DFEATURE, // tensor in AIU data layout format 0
  ZDNN_FORMAT_4DKERNEL, // tensor in AIU data layout format 1
} zdnn_data_formats;

zDNN Data Types

Back to Table of Contents

typedef enum zdnn_data_types {
  ZDNN_DLFLOAT16, // 16-bit deep learning format
  BFLOAT, // Brain floating point format
  FP16, // 16-bit IEEE-754 floating point format
  FP32, // 32-bit IEEE-754 floating point format
} zdnn_data_types;

zDNN Statuses

Back to Table of Contents

Mnemonic Constant Value Meaning
ZDNN_OK 0x00000000 Success.

Warning Statuses

Mnemonic Constant Value Meaning
ZDNN_ELEMENT_RANGE_VIOLATION 0x00020001 AIU operation resulted in data that was out of the normal range.

Note: ZDNN_ELEMENT_RANGE_VIOLATION indicates a range violation occurred for the AIU operation based on the data in the tensors. This usually indicates an overflow of the NNPA internal data type, but can also be associated with operation specific errors, such as "divide by zero". See the "z/Architecture Principles of Operation" for information about range violation on the operation that encountered the violation.

General Failing Statuses

Mnemonic Constant Value Meaning
ZDNN_INVALID_SHAPE* 0x00040001 Invalid shape information in one (or more) of the input/output tensor(s).
ZDNN_INVALID_LAYOUT 0x00040002 Invalid layout information in one (or more) of the input/output tensor(s).
ZDNN_INVALID_TYPE* 0x00040003 Invalid type information in one (or more) of the input/output tensor(s).
ZDNN_INVALID_FORMAT* 0x00040004 Invalid format information in one (or more) of the input/output tensor(s).
ZDNN_INVALID_DIRECTION 0x00040005 Invalid RNN direction.
ZDNN_INVALID_CONCAT_INFO 0x00040006 Invalid concatenation info.
ZDNN_INVALID_STRIDE_PADDING* 0x00040007 Invalid padding type parameter for current strides.
ZDNN_INVALID_STRIDES* 0x00040008 Invalid stride height or width parameter.
ZDNN_MISALIGNED_PARMBLOCK* 0x00040009 NNPA parameter block is not on double word boundary.
ZDNN_INVALID_CLIPPING_VALUE 0x0004000A Invalid clipping for the specified operation.
ZDNN_ALLOCATION_FAILURE 0x00100001 Can not allocate storage.
ZDNN_INVALID_BUFFER 0x00100002 Buffer address is NULL or not on 4K-byte boundary or insufficient buffer size.
ZDNN_CONVERT_FAILURE 0x00100003 Floating point data conversion failure.
ZDNN_INVALID_STATE 0x00100004 Invalid zTensor state.
ZDNN_UNSUPPORTED_AIU_EXCEPTION 0x00100005 AIU operation returned an unexpected exception.

Note: *In certain scenarios, these statuses are returned only if ZDNN_ENABLE_PRECHECK is enabled. When not enabled, these scenarios will lead to abnormal program termination.

Hardware Statuses

The following statuses indicate issues returned from the hardware.

Mnemonic Constant Value Meaning
ZDNN_UNSUPPORTED_PARMBLOCK 0x000C0001 NNPA parameter block format is not supported by the model.
ZDNN_UNAVAILABLE_FUNCTION 0x000C0002 Specified NNPA function is not defined or installed on the machine.
ZDNN_UNSUPPORTED_FORMAT 0x000C0010 Specified tensor data layout format is not supported.
ZDNN_UNSUPPORTED_TYPE 0x000C0011 Specified tensor data type is not supported.
ZDNN_EXCEEDS_MDIS 0x000C0012 Tensor dimension exceeds maximum dimension index size (MDIS).
ZDNN_EXCEEDS_MTS 0x000C0013 Total number of bytes in tensor exceeds maximum tensor size. (MTS).
ZDNN_MISALIGNED_TENSOR 0x000C0014 Tensor address is not on 4K-byte boundary.
ZDNN_MISALIGNED_SAVEAREA 0x000C0015 Function specific save area address is not on 4K-byte boundary.

The meaning of the following hardware statuses vary based on operation. See the operation that returned the status for the specific meaning.

Mnemonic Constant Value Meaning
ZDNN_FUNC_RC_F000 0x000CF000 Function specific response code (F000).
ZDNN_FUNC_RC_F001 0x000CF001 Function specific response code (F001).
ZDNN_FUNC_RC_F002 0x000CF002 Function specific response code (F002).
ZDNN_FUNC_RC_F003 0x000CF003 Function specific response code (F003).
ZDNN_FUNC_RC_F004 0x000CF004 Function specific response code (F004).
ZDNN_FUNC_RC_F005 0x000CF005 Function specific response code (F005).
ZDNN_FUNC_RC_F006 0x000CF006 Function specific response code (F006).
ZDNN_FUNC_RC_F007 0x000CF007 Function specific response code (F007).
ZDNN_FUNC_RC_F008 0x000CF008 Function specific response code (F008).
ZDNN_FUNC_RC_F009 0x000CF009 Function specific response code (F009).

Runtime Environment Variables

Back to Table of Contents

  • ZDNN_ENABLE_PRECHECK: true/false
    • If set to true, tensor integrity prechecks are run before issuing NNPA operations.
    • Enabling precheck may impact performance.
    • Enable to debug issues which cause hardware exceptions that otherwise would result in abnormal program termination.
  • ZDNN_STATUS_DIAG: nnnnnnnn (decimal) or 0xnnnnnnnn (hexadecimal)
    • Prints or produces diagnostic information whenever zDNN status code is equal to the specified value. Only one status value can be specified.

The following are only available when the zDNN library was built with ZDNN_CONFIG_DEBUG enabled.

  • ZDNN_LOGLEVEL: off/fatal/error/warn/info/debug/trace
    • Sets logging facility's output level
  • ZDNN_LOGMODULE: module name(s)
    • Produces log output only when the issuer's module name is in the list. You may specify multiple module names by separating them with either commas or spaces.

Programming Notes

  • Environment variables settings are checked during initial library load by zdnn_init.
  • To change environment variable settings afterward, zdnn_init must be called again manually.

API Reference

Back to Table of Contents


Support Functions

Back to Table of Contents


zdnn_init

Description

Initialize the zDNN library. This sends an NNPA_QAF to query the NNPA and loads the current environment variable settings.

This needs to be invoked at least once if zDNN library is statically-linked. It is automatically invoked if zDNN library is dynamically loaded.

Format

void zdnn_init();

Parameters

None

Returns

None


zdnn_get_nnpa_max_dim_idx_size

Description

Retrieve the maximum dimension index size value currently supported by the AIU from zDNN's internal memory.

Format

uint32_t zdnn_get_nnpa_max_dim_idx_size();

Parameters

None

Returns

Maximum dimension index size supported by the AIU


zdnn_get_nnpa_max_tensor_size

Description

Retrieve the maximum tensor size value (number of bytes required for storing a transformed tensor) currently supported by the AIU from zDNN's internal memory.

Format

uint64_t zdnn_get_nnpa_max_tensor_size();

Parameters

None

Returns

Maximum tensor size supported by the AIU


zdnn_is_nnpa_installed

Description

Interrogates the hardware to determine if the NNPA and NNP-internal data type (DLFLOAT16) conversion instructions are installed.

Use this function during application initialization to determine whether the AIU hardware is available.

Format

bool zdnn_is_nnpa_installed();

Parameters

  • None.

Returns

true if NNPA and zdnn conversion instructions are installed, false otherwise.


zdnn_is_nnpa_function_installed

Description

Query, from zDNN internal memory, if requested NNPA functions are available.

Format

bool zdnn_is_nnpa_function_installed(int count, ...);

Parameters

  • int count

    • number of NNPA functions to check
  • ... (additional arguments)

    • Function names separated by commas, e.g., NNPA_MUL, NNPA_MIN
NNPA_QAF
NNPA_ADD
NNPA_SUB
NNPA_MUL
NNPA_DIV
NNPA_MIN
NNPA_MAX
NNPA_LOG
NNPA_EXP
NNPA_RELU
NNPA_TANH
NNPA_SIGMOID
NNPA_SOFTMAX
NNPA_BATCHNORMALIZATION
NNPA_MAXPOOL2D
NNPA_AVGPOOL2D
NNPA_LSTMACT
NNPA_GRUACT
NNPA_CONVOLUTION
NNPA_MATMUL_OP
NNPA_MATMUL_OP_BCAST23

Returns

true if all queried formats are installed or if count is zero, false otherwise.


zdnn_is_nnpa_parmblk_fmt_installed

Description

Query, from zDNN internal memory, if requested parameter block formats are installed.

Format

bool zdnn_is_nnpa_parmblk_fmt_installed(int count, ...);

Parameters

  • int count

    • number of NNPA parameter block formats to check
  • ... (additional arguments)

    • NNPA parameter block formats separated by commas
NNPA_PARMBLKFORMAT_0

Returns

true if all queried formats are installed or if count is zero, false otherwise.


zdnn_is_nnpa_datatype_installed

Description

Query, from zDNN internal memory, if requested NNPA data type are installed.

Format

bool zdnn_is_nnpa_datatype_installed(uint16_t types_bitmask);

Parameters

  • uint16_t types_bitmask

    • OR'd type bitmasks as defined in zdnn_query_datatypes enum
QUERY_DATATYPE_INTERNAL1

Returns

true if all queried data types are installed, false otherwise.


zdnn_is_nnpa_layout_fmt_installed

Description

Query, from zDNN internal memory, if requested NNPA data layout format are installed.

Format

bool zdnn_is_nnpa_layout_fmt_installed(uint32_t layout_bitmask);

Parameters

  • uint32_t layout_bitmask

    • OR'd layout bitmasks as defined in zdnn_query_layoutfmts enum
QUERY_LAYOUTFMT_4DFEATURE
QUERY_LAYOUTFMT_4DKERNEL

Returns

true if all queried data layouts are installed, false otherwise.


zdnn_is_nnpa_conversion_installed

Description

Query, from zDNN internal memory, if requested NNPA data-type to/from BFP format conversions are installed.

Format

bool zdnn_is_nnpa_conversion_installed(nnpa_data_type type,
                                       uint16_t format_bitmask);

Parameters

  • nnpa_data_type type

    • NNPA data-type number as defined in nnpa_data_type enum
NNPA_DATATYPE_1
  • uint16_t format_bitmask

    • OR'd BFP format bitmasks as defined in zdnn_query_bfpfmts enum
QUERY_BFPFMT_TINY (FP16)
QUERY_BFPFMT_SHORT (FP32/BFLOAT)

Returns

true if all queried conversions are installed, false otherwise.


zdnn_get_library_version

Description

Retrieve library version number as a 32-bit hex value (0x00[major][minor][patch]).

Format

uint32_t zdnn_get_library_version();

Returns

Library version number in 0x00[major][minor][patch] format.


zdnn_get_library_version_str

Description

Retrieve the library version number and build information as a string.

Format

char *zdnn_get_library_version_str();

Returns

Library version number and build information as a string.


zdnn_refresh_nnpa_query_result

Description

Refresh zDNN in-memory query result from zAIU.

Format

zdnn_status zdnn_refresh_nnpa_query_result();

Parameters

None

Programming Notes

This is called automatically as a part of zdnn_init and should not need to be called directly. Manually refreshing query results before making other zdnn_query_* calls may noticeably impact performance.

Returns zdnn_status indications

  • ZDNN_OK
  • ZDNN_UNAVAILABLE_FUNCTION

zdnn_getsize_ztensor

Description

Used to determine the buffer size required for the transformed tensor (including concatenated) in zDNN transformed format. Requires tensor descriptor (zdnn_tensor_desc) with transformed shape information.

Format

uint64_t zdnn_getsize_ztensor(const zdnn_tensor_desc *tfrmd_desc);

Parameters

  • zdnn_tensor_desc *tfrmd_desc

    • Contains transformed information about the shape, layout and data type.

Returns zdnn_status indications

  • required buffer size in bytes

zdnn_init_pre_transformed_desc

Description

Initialize tensor descriptor (zdnn_tensor_desc) struct with pre-transformed (original) shape information.

Format

void zdnn_init_pre_transformed_desc(zdnn_data_layouts layout,
                                    zdnn_data_types type,
                                    zdnn_tensor_desc *pre_tfrmd_desc, ...);

Parameters

  • zdnn_data_layouts layout

    • data layout
  • zdnn_data_types type

    • data type
  • zdnn_tensor_desc *pre_tfrmd_desc

    • output zdnn_tensor_desc struct
  • ... (additional arguments)

    • Variadic: number of elements in each dimension in accordance to the layout, in outermost to innermost order

Returns

  • None

zdnn_generate_transformed_desc

Description

Generate transformed tensor descriptor information based on supplied pre-transformed tensor descriptor.

Format

zdnn_status zdnn_generate_transformed_desc(
    const zdnn_tensor_desc *pre_tfrmd_desc, zdnn_tensor_desc *tfrmd_desc);

Parameters

  • zdnn_tensor_desc *pre_tfrmd_desc

    • input tensor descriptor with pre-transformed shape information
  • zdnn_tensor_desc *tfrmd_desc

    • output zdnn_tensor_desc struct

zdnn_status indications

  • ZDNN_OK
  • ZDNN_INVALID_LAYOUT - pre-transformed layout is not recognized or is a layout only used for concatenated tensors.

zdnn_generate_transformed_desc_concatenated

Description

Generate concatenated transformed tensor descriptor information for RNN input-gates tensors based on a supplied pre-transformed tensor descriptor.

Format

zdnn_status zdnn_generate_transformed_desc_concatenated(
    const zdnn_tensor_desc *pre_tfrmd_desc,
    zdnn_concat_info info, zdnn_tensor_desc *tfrmd_desc);

Parameters

  • zdnn_tensor_desc *pre_tfrmd_desc

    • input tensor descriptor with pre-transformed shape information
  • zdnn_concat_info info

    • Information about how the tensors will be concatenated, consists of the RNN_TYPE, PREV_LAYER and USAGE flags OR'd together:

      RNN_TYPE flags:

      • RNN_TYPE_LSTM - For LSTM
      • RNN_TYPE_GRU - For GRU

      PREV_LAYER flags:

      • PREV_LAYER_UNI - Previous RNN layer is uni-directional
      • PREV_LAYER_NONE - Previous layer is not a RNN layer
      • PREV_LAYER_BIDIR - Previous RNN layer is bi-directional

      USAGE flags:

      • USAGE_WEIGHTS - Concatenate as input weights
      • USAGE_HIDDEN_WEIGHTS - Concatenate as input hidden-weights
      • USAGE_BIASES - Concatenate as input biases
      • USAGE_HIDDEN_BIASES - Concatenate as input hidden-biases
  • zdnn_tensor_desc *tfrmd_desc

    • output zdnn_tensor_desc struct

zdnn_status indications

  • ZDNN_OK
  • ZDNN_INVALID_LAYOUT - pre-transformed layout is not recognized or is not supported for concatenated tensors.
  • ZDNN_INVALID_CONCAT_INFO - invalid concatenation information.

zdnn_init_ztensor

Description

Initialize a zdnn_ztensor struct using the pre-transformed and transformed tensor shape information

Format

void zdnn_init_ztensor(zdnn_tensor_desc *pre_tfrmd_desc,
                       zdnn_tensor_desc *tfrmd_desc, zdnn_ztensor *output);

Parameters

  • zdnn_tensor_desc *pre_tfrmd_desc

    • input tensor descriptor with pre-transformed shape information
  • zdnn_tensor_desc *tfrmd_desc

    • input tensor descriptor with transformed shape information
  • zdnn_ztensor *output

    • The zdnn_ztensor struct being initialized.

Returns

  • None

zdnn_init_ztensor_with_malloc

Description

Same functionality as zdnn_init_ztensor, and computes the size required for the tensor in the zDNN transformed format and allocates the storage for it. Sets buffer and buffer_size fields within output.

Format

zdnn_status zdnn_init_ztensor_with_malloc(zdnn_tensor_desc *pre_tfrmd_desc,
                                          zdnn_tensor_desc *tfrmd_desc,
                                          zdnn_ztensor *output);

Parameters

  • zdnn_tensor_desc *pre_tfrmd_desc

    • input tensor descriptor with pre-transformed shape information
  • zdnn_tensor_desc *tfrmd_desc

    • input tensor descriptor with transformed shape information
  • zdnn_ztensor *output

    • The zdnn_ztensor struct being initialized.

Returns zdnn_status indications

  • ZDNN_OK
  • ZDNN_INVALID_FORMAT - tfrmd_desc->format is not recognized.
  • ZDNN_INVALID_TYPE - tfrmd_desc->type is not recognized or is a pre_tfrmd_desc type.
  • ZDNN_INVALID_SHAPE - (if any of the following are true)
    • One of tfrmd_desc->dim* dimensions is 0.
    • One of tfrmd_desc->dim* dimensions is greater than zdnn_get_nnpa_max_dim_idx_size.
      • Note: concatenation dimensions have a smaller maximum size. See LSTM or GRU.
    • The total number of tfrmd_desc elements is larger than zdnn_get_nnpa_max_tensor_size.
  • ZDNN_ALLOCATION_FAILURE - Unable to allocate required memory on a 4K boundary.

zdnn_reset_ztensor

Description

Reset a zdnn_ztensor struct for reuse.

Note this operation does not set or reset the buffer and buffer_size fields nor free the transformed area storage.

Format

void zdnn_reset_ztensor(zdnn_ztensor *ztensor);

Parameters

  • zdnn_ztensor *output

    • The zdnn_ztensor struct being reset.

Returns

  • None

zdnn_allochelper_ztensor

Description

Calculate the size required for the tensor in the zDNN transformed format and allocate the needed storage, satisfying alignment requirements. Sets buffer and buffer_size fields within ztensor.

Note that the calling application assumes ownership of this storage and is responsible for freeing it.

Format

zdnn_status zdnn_allochelper_ztensor(zdnn_ztensor *ztensor);

Parameters

  • zdnn_ztensor *ztensor

    • A zdnn_ztensor struct that contains the transformed shape information in the transformed_desc field.

Returns zdnn_status indications

  • ZDNN_OK
  • ZDNN_INVALID_FORMAT - ztensor->transformed_desc->format is not recognized.
  • ZDNN_INVALID_TYPE - ztensor->transformed_desc->type is not recognized or is a pre_transformed_desc type.
  • ZDNN_INVALID_SHAPE - (if any of the following are true)
    • One of ztensor->transformed_desc->dim* dimensions is 0.
    • One of ztensor->transformed_desc->dim* dimensions is greater than zdnn_get_nnpa_max_dim_idx_size.
      • Note: concatenation dimensions have a smaller maximum size. See LSTM or GRU.
    • The total number of transformed_desc elements is larger than zdnn_get_nnpa_max_tensor_size.
  • ZDNN_ALLOCATION_FAILURE - Unable to allocate required memory on a 4K boundary.

zdnn_free_ztensor_buffer

Description

Given an input zdnn_ztensor, zdnn_free_ztensor_buffer will free the transformed area storage associated with it.

Note that the routine does not free the storage allocated for the zdnn_ztensor struct itself.

Format

zdnn_status zdnn_free_ztensor_buffer(const zdnn_ztensor *ztensor);

Parameters

  • zdnn_ztensor *tensor

    • A zdnn_ztensor struct with field buffer pointing to storage allocated.

Returns zdnn_status indications

  • ZDNN_OK
  • ZDNN_INVALID_BUFFER - tensor->buffer is NULL

zdnn_get_status_message

Description

Retrieve status message of the status code

Format

const char *zdnn_get_status_message(zdnn_status status);

Parameters

  • zdnn_status status

    • Status code

Returns

Pointer to the description string or "(Status string is not defined.)" if status is not defined.


zdnn_reshape_ztensor

Description

Reshape and copy buffer content from source zTensor's buffer to destination zTensor's in accordance to destination zTensor's shape.

The following conditions must be satisfied:

  • Both tensor's transformed_desc must be fully initialized
  • dest->buffer must be pre-allocated
  • src must be transformed
  • dest must be not already transformed
  • Both transformed_desc->layout must be the same and either NHWC or HWCK
  • Both zTensors must contain equal number of elements

Format

zdnn_status zdnn_reshape_ztensor(const zdnn_ztensor *src, zdnn_ztensor *dest);

Parameters

  • src

    • Source zTensor to copy from
  • dest

    • Destination zTensor to copy to

Programming Notes

  • If src and dest have the same transformed_desc->dim1 dimension size, the transformed data is directly copied to the destination without untransformation.

  • If src and dest have different transformed_desc->dim1 dimension sizes, reshaping will internally un-transform the source and then re-transform the values into the destination.

Returns

  • ZDNN_OK
  • ZDNN_INVALID_SHAPE - (if any of the following are true)
    • src's and dest's transformed_desc->dim* total to different numbers of elements.
    • One of dest->transformed_desc->dim* dimensions is 0.
    • One of dest->transformed_desc->dim* dimensions is greater than zdnn_get_nnpa_max_dim_idx_size.
      • Note: concatenation dimensions have a smaller maximum size. See LSTM or GRU.
    • The total number of dest->transformed_desc-dim* elements is larger than zdnn_get_nnpa_max_tensor_size.
  • ZDNN_INVALID_LAYOUT - (if any of the following are true)
    • src's and dest's transformed_desc->layout are not the same.
    • transformed_desc->layout is not ZDNN_NHWC nor ZDNN_HWCK.
    • src->pre_transformed_desc->layout is not recognized or is not a valid pre_transformed_desc layout.
    • dest->pre_transformed_desc->layout is not recognized or is not a valid pre_transformed_desc layout.
  • ZDNN_INVALID_STATE - (if any of the following are true)
    • src is not already transformed.
    • dest is already transformed.
  • ZDNN_INVALID_FORMAT - src->transformed_desc->format is not ZDNN_FORMAT_4DFEATURE.
  • ZDNN_INVALID_TYPE (if any of the following are true)
    • src->pre_transformed_desc->type is not recognized or is a transformed_desc type.
    • dest->pre_transformed_desc->type is not recognized or is a transformed_desc type.
    • dest->transformed_desc->type is not recognized or is a pre_transformed_desc type.
  • ZDNN_INVALID_BUFFER (if any of the following are true)
    • src->buffer is NULL.
    • src->buffer is not on a 4K boundary.
    • dest->buffer is NULL.
    • dest->buffer is not on a 4K boundary.
    • dest->buffer_size is too small to hold transformed values.
  • ZDNN_CONVERT_FAILURE - Values failed to un-transform or transform.

zdnn_is_version_runnable

Description

Check if application built for zDNN version ver_num can be run on the current AIU hardware with the installed zDNN library

Format

bool zdnn_is_version_runnable(uint32_t ver_num);

Parameters

  • ver_num

    • zDNN version number from the application in 0x00[major][minor][patch] form. Typically this is ZDNN_VERNUM used to compile the application

Returns

  • true/false

zdnn_get_max_runnable_version

Description

Returns the maximum zDNN version number that the current hardware and installed zDNN library can run together. The returned value means the current runtime environment fully supports zDNN APIs set of that major.minor version and below.

Format

uint32_t zdnn_get_max_runnable_version();

Parameters

  • None

Returns

  • A 32-bit zDNN version number in 0x00[major][minor]FF form.

Data Transformation

Back to Table of Contents


zAIU requires the tensor data to be arranged in a format that enhances the performance characteristics of the operations. In this documentation, it is referred to as "transformed format". In addition, data conversions are necessary from the common formats (FP32, FP16, BFLOAT) to the internal format (DLFLOAT16) supported by the AIU. Two functions are provided:

  • 'zdnn_transform_ztensor

    • zdnn_transform_ztensor will transform the input tensor and convert the input data to the format required by the AIU. The resulting transformed ztensor can be reused as many times as necessary.

    • See zdnn_transform_ztensor for details on transforming an input tensor to the internal format.

  • zdnn_transform_origtensor

    • zdnn_transform_origtensor transforms a ztensor (usually output from an operation or network) to the format and data types that are usable by the application.

    • See zdnn_transform_origtensor for details on transforming an input tensor to the internal format.


zdnn_transform_ztensor

Description

Converts the input tensor to the supported transformed format for execution by zdnn operations. If transformation is successful the is_transformed field within ztensor will be set to true otherwise it is set to false. Transformation will fail if is_transformed was already true.

Note that the tensor layout in memory, once in transformed format, is dependent on the content of the input tensor's descriptors (zdnn_tensor_desc fields). Once converted, a zdnn_ztensor should only be manipulated by zDNN API functions.

Format

zdnn_status zdnn_transform_ztensor(zdnn_ztensor *ztensor, ...);

Parameters

  • zdnn_ztensor *tensor

    • The input zdnn_ztensor struct. pre_transformed_desc and transformed_desc must be set, is_transformed must be false. A 4k-aligned tensor storage must be pre-allocated by the caller (directly or by calling the zDNN allocation helper function) and field buffer must point to the storage.
  • ... (additional arguments)

    • Variadic: list of pointers for input data to be transformed:
      • Non-concatenated: 1 data pointer
      • LSTM concatenated: 4 data pointers, one for each input gate in Forget, Input, Cell, Output (FICO) order
      • GRU concatenated: 3 data pointers, one for each input gate in (Z)update, Reset, Hidden, (ZRH) gate order

Programming Notes

  • This function clears the pre-thread floating-point exception flags at entry, and may set FE_UNDERFLOW / FE_INVALID / FE_INEXACT / FE_OVERFLOW when it encounters errors during data conversion.

Returns zdnn_status indications

  • ZDNN_OK
  • ZDNN_INVALID_FORMAT - zdnn_ztensor->transformed_desc->format is not recognized.
  • ZDNN_INVALID_LAYOUT - (if any of the following are true)
    • zdnn_ztensor->pre_transformed_desc->layout is not recognized or is not a valid pre_transformed_desc layout.
    • zdnn_ztensor->transformed_desc->layout is not recognized or is not a valid transformed_desc layout.
  • ZDNN_INVALID_TYPE - (if any of the following are true)
    • zdnn_ztensor->pre_transformed_desc->type is not recognized or is a transformed_desc type.
    • zdnn_ztensor->transformed_desc->type is not recognized or is a pre_transformed_desc type.
  • ZDNN_INVALID_BUFFER (if any of the following are true)
    • buffer is NULL.
    • buffer is not on a 4K boundary.
    • buffer_size is too small to hold transformed values.
  • ZDNN_INVALID_SHAPE - (if any of the following are true)
    • One of zdnn_ztensor->transformed_desc->dim* dimensions is 0.
    • One of zdnn_ztensor->transformed_desc->dim* dimensions is greater than zdnn_get_nnpa_max_dim_idx_size.
      • Note: concatenation dimensions have a smaller maximum size. See LSTM or GRU.
    • The total number of transformed_desc elements is larger than zdnn_get_nnpa_max_tensor_size.
  • ZDNN_INVALID_STATE - Tensor is already transformed.
  • ZDNN_CONVERT_FAILURE - Values failed to transform.

zdnn_transform_origtensor

Description

Converts the input tensor from the zDNN transformed format back to a standard non-transformed layout. The is_transformed field within ztensor must be true.

All stick format tensors are supported, except:

  • Kernel tensors
  • Concatenated RNN input-gates tensors

Format

zdnn_status zdnn_transform_origtensor(const zdnn_ztensor *ztensor, void *out_buf);

Parameters

  • zdnn_ztensor *ztensor

    • The input zdnn_ztensor struct. pre_transformed_desc, transformed_desc and buffer must be set, is_transformed must be true.
  • void *out_buf

    • The buffer for storing the standard non-transformed tensor data. Must be pre-allocated by the caller.

Programming Notes

  • This function clears the pre-thread floating-point exception flags at entry, and may set FE_UNDERFLOW / FE_INVALID / FE_INEXACT / FE_OVERFLOW when it encounters errors during data conversion.

Returns zdnn_status indications

  • ZDNN_OK
  • ZDNN_INVALID_FORMAT - ztensor->transformed_desc->format is not ZDNN_FORMAT_4DFEATURE.
  • ZDNN_INVALID_LAYOUT - (if any of the following are true)
    • zdnn_ztensor->pre_transformed_desc->layout is not recognized or is not a valid pre_transformed_desc layout.
    • zdnn_ztensor->transformed_desc->layout is not recognized or is not a valid transformed_desc layout required by this function.
  • ZDNN_INVALID_TYPE
    • ztensor->pre_transformed_desc->type is not recognized or is a transformed_desc type.
    • ztensor->transformed_desc->type is not recognized or is a pre_transformed_desc type.
  • ZDNN_INVALID_BUFFER (if any of the following are true)
    • ztensor->buffer is NULL.
    • ztensor->buffer is not on a 4K boundary.
  • ZDNN_INVALID_STATE - ztensor is not transformed.
  • ZDNN_CONVERT_FAILURE - Values failed to un-transform.

Operations

See Table of Contents for operations list


Element-wise Operations

Back to Table of Contents


zdnn_add

Description

Given two input tensors in zDNN transformed format, performs element-wise addition and stores the result into the provided output zDNN tensor.

Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.

Format

zdnn_status zdnn_add(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
                     zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Addition

ONNX Addition


zdnn_sub

Description

Given two input tensors in zDNN transformed format, performs element-wise subtraction and stores the result into the provided output zDNN tensor.

Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.

Format

zdnn_status zdnn_sub(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
                     zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Subtraction

ONNX Subtraction


zdnn_mul

Description

Given two input tensors in zDNN transformed format, performs element-wise multiplication and stores the result into the provided output zDNN tensor.

Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.

Format

zdnn_status zdnn_mul(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
                     zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Multiplication

ONNX Multiplication


zdnn_div

Description

Given two input tensors in zDNN transformed format, performs element-wise division and stores the result into the provided output zDNN tensor.

Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.

Format

zdnn_status zdnn_div(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
                     zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Division

ONNX Division


zdnn_min

Description

Given two input tensors in zDNN transformed format, computes the element-wise minimum and stores the result into the provided output zDNN tensor.

Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.

Format

zdnn_status zdnn_min(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
                     zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Minimum

ONNX Minimum


zdnn_max

Description

Given two input tensors in zDNN transformed format, computes the element-wise maximum and stores the result into the provided output zDNN tensor.

Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.

Format

zdnn_status zdnn_max(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
                     zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)s

Framework Examples

TensorFlow Maximum

ONNX Maximum


zdnn_log

Description

Given an input tensor in zDNN transformed format, computes the natural logarithm element-wise and stores the result into the provided output zDNN tensor.

Format

zdnn_status zdnn_log(const zdnn_ztensor *input, zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Natural Logarithm

ONNX Natural Logarithm


zdnn_exp

Description

Given an input tensor in zDNN transformed format, computes the exponential element-wise and stores the result into the provided output zDNN tensor.

Format

zdnn_status zdnn_exp(const zdnn_ztensor *input, zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Exponential

ONNX Exponential


Activation Operations

Back to Table of Contents


zdnn_relu

Description

Given an input tensor in zDNN transformed format produce an output tensor where the rectified linear function, y = max(0, x) is applied to the input element-wise. If an optional clipping_value is provided, clipping is performed against the intermediate output where z = min(y, clipping_value).

Format

zdnn_status zdnn_relu(const zdnn_ztensor *input, const void *clipping_value,
                      zdnn_ztensor *output);

Parameters

  • zdnn_ztensor *input

  • void *clipping_value

    • A pointer to an FP32 value, used to clip input tensor's elements.
    • If set to NULL or 0, no clipping will occur.
    • Must not be a negative value.
  • zdnn_ztensor *output

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Rectified Linear

ONNX Rectified Linear


zdnn_tanh

Description

Given an input tensor in zDNN transformed format, produces an output tensor where the hyperbolic tangent is applied to the input element-wise.

Format

zdnn_status zdnn_tanh(const zdnn_ztensor *input, zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Hyperbolic Tangent

ONNX Hyperbolic Tangent


zdnn_sigmoid

Description

Given an input tensor in zDNN transformed format, produces an output tensor where the sigmoid function is applied to the input element-wise.

Format

zdnn_status zdnn_sigmoid(const zdnn_ztensor *input, zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Sigmoid

ONNX Sigmoid


zdnn_softmax

Description

Given an input tensor in zDNN transformed format, computes the softmax (normalized exponential) for each vector formed in dimension-1, then if act_func is not SOFTMAX_ACT_NONE, the activation function is applied to the results. Finally stores the results into the provided output zDNN tensor.

Note: Other parameters, such as axis, are not supported.

Format

zdnn_status zdnn_softmax(const zdnn_ztensor *input, void *save_area,
                         zdnn_softmax_act act_func, zdnn_ztensor *output);

Parameters

  • zdnn_ztensor *input

    • ZDNN_3DS tensor with pre-transformed shape [batch size, batch size, vector dimension size] or output from another operation that is of the correct shape.
    • Must follow general tensor requirements
  • void *save_area

    • A preallocated memory address to use for temporary storage during internal operation processing.
    • The preallocate memory must be at least 8K bytes in size, aligned on a 4k boundary.
    • If set to NULL, the operation will determine, allocate and free storage automatically.
  • zdnn_softmax_act act_func

    • Activation function to apply to the results.
    • SOFTMAX_ACT_NONE or SOFTMAX_ACT_LOG
  • zdnn_ztensor *output

Programming Notes

  • If all elements of a dimension 1 vector are the largest magnitude negative number possible for the transformed data type, accuracy may be reduced.

  • A ZDNN_3DS tensor is expected, where the transformed_desc dim1 describes the vector, and dim2 and dim4 are used to batch multiple vector requests together. Dim3 must always be 1. The zdnn_softmax operation is performed against the vector in dim1 repeating for each dim1 vector in the dim4 and dim2 dimensions.

  • Tensors that cannot be processed as vectors in dim1 or as batches of dim1 vectors must be coerced or reshaped by the caller.

    • When the entire tensor is to be processed by softmax, it can be coerced by simply creating an alternate descriptor prior to zDNN transformation. For example:
      • A 4D tensor with pre_transformed_desc dimensions 2x2x2x2 and a data array of 16 FP32 entries could have an alternate ZDNN_3DS layout pre_transformed_desc using dimensions 1x1x16 and use the same original data array prior to zdnn_transform_ztensor. After transformation, such a tensor would be valid for zdnn_softmax.
      • In another example, the 4D 2x2x2x2 tensor could be processed as 2 batches of 8 vectors using a ZDNN_3DS layout pre_transformed_desc with dimensions 1x2x8.

Returns (see zDNN Statuses for descriptions)

  • ZDNN_OK
  • warning statuses
  • ZDNN_INVALID_SHAPE
  • ZDNN_INVALID_TYPE
  • ZDNN_INVALID_FORMAT
  • ZDNN_ALLOCATION_FAILURE - A preallocated save_area was not specified and internal allocation for the required memory failed.
  • hardware statuses
    • ZDNN_FUNC_RC_F000 - input tensor input->transformed_desc->dim3 was not 1.
    • ZDNN_FUNC_RC_F001 - Invalid act_func

Framework Examples

TensorFlow Softmax

ONNX Softmax


Normalization Operations

Back to Table of Contents


zdnn_meanreduce2d

Description

Given an input tensor in zDNN transformed format, produces a downsampled tensor reducing the middle dimensions to a size of 1 based on the mean of the original values and stores the result to the provided output zDNN tensor.

Format

zdnn_status zdnn_meanreduce2d(const zdnn_ztensor *input, zdnn_ztensor *output);

Parameters

  • zdnn_ztensor *input

    • Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
    • Height and Width dimension must be less than or equal to 1024.
    • Must follow general tensor requirements
  • zdnn_ztensor *output

    • The result tensor which will hold the result of the pooling operation in its buffer.
    • Shape:
      • output dimensions batch_Num and Channel must be the same as the respective input dimensions.
      • output dimensions Height and Width must be 1.
    • Must follow general tensor requirements

Returns (see zDNN Statuses for descriptions)

  • ZDNN_OK
  • ZDNN_INVALID_SHAPE - Shape of input or output tensor is invalid based on given kernel and stride parameters
  • ZDNN_INVALID_TYPE
  • ZDNN_INVALID_FORMAT
  • hardware statuses
    • ZDNN_FUNC_RC_F001 - input tensor has a Height or Width dimension greater than allowed for zdnn_meanreduce2d.

Framework Examples

TensorFlow Reduce Mean with axis set for the Height and Width axes and keepdims set to True.

ONNX Reduce Mean


zdnn_batchnorm

Description

Given three input zDNN tensors input_a, input_b, and input_c, computes the batch-normalized result for each vector formed in dimension-1 as follows:

output = input_b * input_a + input_c

where input_b is a precomputed elementwise divide of scale and variance tensors, and input_c is a precomputed elementwise multiply of (-1) * mean and 'input_b' + input bias tensors.

Format

zdnn_status zdnn_batchnorm(const zdnn_ztensor *input_a,
                           const zdnn_ztensor *input_b,
                           const zdnn_ztensor *input_c, zdnn_ztensor *output);

Parameters

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow Batchnorm

ONNX Batchnorm


zdnn_matmul_op

Back to Table of Contents

Description

Given three input zDNN tensors input_a, input_b, and input_c, determine the matrix multiplication of input_a * input_b then perform one of the following operations, using input_c against the dot product, storing the result into the specified output zDNN tensor:

  • Addition
  • Compare - If dot product is greater than element.
  • Compare - If dot product is greater or equal to element.
  • Compare - If dot product is equal to element.
  • Compare - If dot product is not equal to element.
  • Compare - If dot product is less than or equal to element.
  • Compare - If dot product is less than element.

For an operation type of addition, input_c is added to the intermediate dot product. For operation types of comparison, the intermediate dot product is compared to input_c and if the comparison is true, the result is set to a value of 1; otherwise it is set to a value of 0.

The outermost dimension can optionally indicate that the inputs are stacks of matrices. The results for each matrix stack is independent of other stacks but all stacks are calculated in a single call.

Format

zdnn_status zdnn_matmul_op(const zdnn_ztensor *input_a,
                           const zdnn_ztensor *input_b,
                           const zdnn_ztensor *input_c,
                           zdnn_matmul_ops op_type, zdnn_ztensor *output);

Input / Output matmul tensor requirements

  • See table in this section for pre_transformed_desc and shape requirements for each tensor.
  • All tensors must either be stacked or unstacked.
  • Must follow general tensor requirements
type input_a input_b input_c result
unstacked ZDNN_2D (m, n) ZDNN_2D (n, p) ZDNN_1D (p) ZDNN_2D (m, p)
stacked ZDNN_3DS (s, m, n) ZDNN_3DS (s, n, p) ZDNN_2DS (s, p) ZDNN_3DS (s, m, p)

Parameters

  • zdnn_ztensor *input_a

  • zdnn_ztensor *input_b

  • zdnn_ztensor *input_c

    • Input tensor that will have the requested operation performed against the intermediate dot product of input_a and input_b.
    • pre_transformed shape and layout must match matmul tensor requirements
  • zdnn_matmul_ops op_type

    • Operation to perform on dot product.
      • MATMUL_OP_ADDITION
      • MATMUL_OP_GREATER
      • MATMUL_OP_GREATER_EQUAL
      • MATMUL_OP_EQUAL
      • MATMUL_OP_NOT_EQUAL
      • MATMUL_OP_LESSER_EQUAL
      • MATMUL_OP_LESSER
  • zdnn_ztensor *output

    • The output tensor which will hold the result of the operation in its buffer.
    • pre_transformed shape and layout must match matmul tensor requirements

Programming Notes

  • Care must be exercised when comparing values for equality or inequality since the order of operations and rounding may produce, what appear to be, slightly different values when they are essentially the same value.

Returns (see zDNN Statuses for descriptions)

  • ZDNN_OK
  • ZDNN_INVALID_SHAPE
  • ZDNN_INVALID_TYPE
  • ZDNN_INVALID_FORMAT
  • hardware statuses
    • ZDNN_FUNC_RC_F000 - Invalid op_type.

Framework Examples

TensorFlow MatMul

ONNX MatMul


zdnn_matmul_bcast_op

Back to Table of Contents

Description

Given three input zDNN tensors input_a, input_b, and input_c, determine the matrix multiplication of input_a * input_b, then perform one of the following operations, using input_c against the dot product, storing the result into the specified output zDNN tensor:

  • Addition

The outermost dimension for input_a can optionally indicate that the input is a stack of matrices. Each stack of input_a is then multiplied by the same input_b matrix and input_c which are broadcast over each stack of input_a. Results for each stack are returned in the corresponding stack index of output.

Format

zdnn_status zdnn_matmul_bcast_op(const zdnn_ztensor *input_a,
                                 const zdnn_ztensor *input_b,
                                 const zdnn_ztensor *input_c,
                                 zdnn_matmul_bcast_ops op_type, zdnn_ztensor *output);

Input / Output matmul broadcast tensor requirements

input_a input_b input_c result
ZDNN_3DS (s, m, n) ZDNN_2D (n, p) ZDNN_1D (p) ZDNN_3DS (s, m, p)

Parameters

  • zdnn_ztensor *input_a

  • zdnn_ztensor *input_b

    • Input tensor with the second matrix for multiplication.
    • The same single input_b matrix is broadcast and used as the multiplier for each stack dimension of input_a
    • pre_transformed shape and layout must match matmul broadcast tensor requirements
  • zdnn_ztensor *input_c

    • Input tensor that will have the requested operation performed against the intermediate dot product for each "m" dimension in output.
    • pre_transformed shape and layout must match matmul broadcast tensor requirements
  • zdnn_matmul_bcast_ops op_type

    • Operation to perform on dot product.

      • MATMUL_BCAST_OP_ADDITION
  • zdnn_ztensor *output

Programming Notes

  • zdnn_matmul_bcast_ops only supports MATMUL_BCAST_OP_ADDITION op_type, any other op_types will be ignored and may not operate compatibly in the future.

Returns (see zDNN Statuses for descriptions)

Framework Examples

TensorFlow MatMul

ONNX MatMul


zdnn_lstm

Back to Table of Contents

Description

Implements Long-Short Term Memory layer (LSTM - Hochreiter 1997).

The following formula is computed for the input tensor input(t) for all time steps:

(Default: f=Sigmoid, g=Tanh, h=Tanh):

- it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Wbi + Rbi)

- ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Wbf + Rbf)

- ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

- Ct = ft (.) Ct-1 + it (.) ct

- ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Wbo + Rbo)

- Ht = ot (.) h(Ct)

Format

zdnn_status zdnn_lstm(const zdnn_ztensor *input, const zdnn_ztensor *h0,
                      const zdnn_ztensor *c0, const zdnn_ztensor *weights,
                      const zdnn_ztensor *biases,
                      const zdnn_ztensor *hidden_weights,
                      const zdnn_ztensor *hidden_biases,
                      lstm_gru_direction direction, void *work_area,
                      zdnn_ztensor *hn_output, zdnn_ztensor *cf_output);

Also see an example in the usage example section.

LSTM Input / Output requirements

  • num_hidden dimensions:
    • Any num_hidden dimension must be less than or equal to 8192 elements.

Parameters

  • zdnn_ztensor *input

    • Input must be a tensor with the shape (num_timesteps, num_batches, num_features) prior to transformation with the zdnn_transform_ztensor API.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Must follow general tensor requirements
  • zdnn_ztensor *h0

    • Tensor containing the initial hidden state with shape (num_dirs, num_batches, num_hidden) prior to transformation with the zdnn_transform_ztensor API.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Must follow general tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *c0

    • Tensor containing the initial cell state with shape (num_dirs, num_batches, num_hidden) prior to transformation with the zdnn_transform_ztensor API.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Must follow general tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *weights

    • Tensor containing the concatenated input connection weights in Forget, Input, Cell, Output (FICO) order.
    • Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_features, num_hidden) by the caller.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Expects zdnn_concat_info having the following flags turned on:
      • RNN_TYPE_LSTM
      • USAGE_WEIGHTS
      • Appropriate PREV_LAYER flag:
        • PREV_LAYER_NONE if input tensor is not from a previous RNN layer
        • PREV_LAYER_UNI if input tensor is uni-directional output from a previous RNN layer
        • PREV_LAYER_BIDIR if input tensor is bi-directional output from a previous RNN layer
    • Must follow concatenated tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *biases

    • Tensor containing the concatenated input connection bias in Forget, Input, Cell, Output (FICO) order.
    • Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden).
    • Expects pre_transformed_desc->layout to be ZDNN_2DS.
    • Expects zdnn_concat_info having the following flags turned on:
      • RNN_TYPE_LSTM
      • USAGE_HIDDEN_WEIGHTS
      • Appropriate PREV_LAYER flag:
        • PREV_LAYER_NONE if input tensor is not from a previous RNN layer
        • PREV_LAYER_UNI if input tensor is uni-directional output from a previous RNN layer
        • PREV_LAYER_BIDIR if input tensor is bi-directional output from a previous RNN layer
    • Must follow concatenated tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *hidden_weights

    • Tensor containing the concatenated hidden connection weights in Forget, Input, Cell, Output (FICO) order.
    • Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_hidden, num_hidden) by the caller.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Expects zdnn_concat_info having the following flags turned on:
      • RNN_TYPE_LSTM
      • USAGE_BIASES
      • Appropriate PREV_LAYER flag:
        • PREV_LAYER_NONE if input tensor is not from a previous RNN layer
        • PREV_LAYER_UNI if input tensor is uni-directional output from a previous RNN layer
        • PREV_LAYER_BIDIR if input tensor is bi-directional output from a previous RNN layer
    • Must follow concatenated tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *hidden_biases

    • Tensor containing the concatenated hidden connection bias in Forget, Input, Cell, Output (FICO) order.
    • Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden).
    • Expects pre_transformed_desc->layout to be ZDNN_2DS.
    • Expects zdnn_concat_info having the following flags turned on:
      • RNN_TYPE_LSTM
      • USAGE_HIDDEN_BIASES
      • Appropriate PREV_LAYER flag:
        • PREV_LAYER_NONE if input tensor is not from a previous RNN layer
        • PREV_LAYER_UNI if input tensor is uni-directional output from a previous RNN layer
        • PREV_LAYER_BIDIR if input tensor is bi-directional output from a previous RNN layer
    • Must follow concatenated tensor requirements
    • Must follow num_hidden requirements
  • lstm_gru_direction direction

    • Direction indicator of lstm_gru_direction direction type. Valid values:
      • FWD (forward)
      • BWD (backward)
      • BIDIR (bi-directional).
    • For input and output shapes, the num_dirs dimension should be:
      • 1 for unidirectional calls such as FWD or BWD
      • 2 for bidirectional calls such that:
        • dimension 0 contains FWD values.
        • dimension 1 contains BWD values.
  • void *work_area

    • A preallocated memory address to use for temporary storage during internal operation processing.

    • If set to NULL, the operation will determine, allocate and free storage automatically.

    • Amount of required storage can be determined given the LSTM timestep, batch, and num_hidden values.

      • The sample code below creates a ztensor descriptor that is an equivalent size of the required work_area. To use this sample code yourself, replace the num_timesteps, num_batches, and num_hidden variables with your own values.

          zdnn_tensor_desc desc;
          desc.dim4 = (4 * num_timesteps) + 6;
          desc.dim3 = 1;
          desc.dim2 = num_batches;
          desc.dim1 = num_hidden;
          uint64_t work_area_size = zdnn_getsize_ztensor(&desc);
        
    • For bidirectional, twice the amount of contiguous storage is required.

    • The start of the buffer must be 4k aligned.

  • zdnn_ztensor *hn_output

    • Output results of the hidden states

    • Expects pre_transformed_desc->layout to be ZDNN_4DS.

    • Must follow general tensor requirements

    • Must follow num_hidden requirements

    • Output pre-transformed shapes:

      • all timesteps: (num_timesteps, num_dirs, num_batches, num_hidden)
      • final timestep only: (1, num_dirs, num_batches, num_hidden)
    • For bidirectional (BIDIR) output:

      • Forward and backward results are concatenated on the innermost dimension.
      • Can be used directly as input for subsequent RNN layers without needing untransformation.
        • Can not be used directly as input for other non-RNN zDNN ops.
      • Untransformation is supported.
    • Note that for BWD and the backward component of BIDIR directions, the output order matches the order of the input, not the processing order. For example, the first input timestep is the last to be processed and its result is the first timestep of the output.

  • zdnn_ztensor *cf_output

    • Output results of the cell state for the last processed timestep

    • Expects pre_transformed_desc->layout to be ZDNN_4DS.

    • Must follow general tensor requirements

    • Must follow num_hidden requirements

    • Output pre-transformed shapes:

      • (1, num_dirs, num_batches, num_hidden)
    • For bidirectional (BIDIR):

      • Forward and backward results are concatenated on the innermost dimension.
      • Can not be used directly as input for other non-RNN zDNN ops.
      • Untransformation is supported.

Summary

pre-transformed layout pre-transformed shape
input ZDNN_3DS (num_timesteps, num_batches, num_features)
h0 ZDNN_3DS (num_dirs, num_batches, num_hidden)
c0 ZDNN_3DS (num_dirs, num_batches, num_hidden)
weights ZDNN_3DS (num_dirs, num_features, num_hidden)
bias ZDNN_2DS (num_dirs, num_hidden)
hidden_weights ZDNN_3DS (num_dirs, num_hidden, num_hidden)
hidden_biases ZDNN_2DS (num_dirs, num_hidden)
hn_output ZDNN_4DS (num_timesteps, num_dirs, num_batches, num_hidden)
(last timestep only when num_timesteps = 1)
cf_output ZDNN_4DS (1, num_dirs, num_batches, num_hidden)
create transformed descriptor via
input zdnn_generate_transformed_desc
h0 zdnn_generate_transformed_desc
c0 zdnn_generate_transformed_desc
weights zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_WEIGHTS + one of the following:
PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR
bias zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_BIASES + one of the following:
PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR
hidden_weights zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_HIDDEN_WEIGHTS + one of the following:
PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR
hidden_biases zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_HIDDEN_BIASES + one of the following:
PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR
hn_output zdnn_generate_transformed_desc
cf_output zdnn_generate_transformed_desc

Returns (see zDNN Statuses for descriptions)

  • ZDNN_OK
  • ZDNN_INVALID_TYPE
  • ZDNN_INVALID_FORMAT
  • ZDNN_INVALID_SHAPE - (if any of the following are not true)
    • hn_output timesteps dimension must be 1 or the same size as input timestep dimension.
    • All tensors with a direction dimension have the same direction dimension size.
    • input timestep dimension must be greater than or equal to 1.
    • Other general shape violations (exceeds MDIS, etc.)
  • ZDNN_INVALID_DIRECTION - direction parameter was not a recognized lstm_gru_direction.
  • ZDNN_ALLOCATION_FAILURE - A preallocated work_area was not specified and internal allocation for the required memory failed.
  • hardware statuses

Framework Examples

TensorFlow LSTM

ONNX LSTM


zdnn_gru

Back to Table of Contents

Description

Implements Gated Recurrent Unit (Kyunghyun Cho 2014). Supports only reset after linear.

The following formula is computed for the input tensor input(t) for all time steps:

(Default: f=Sigmoid, g=Tanh):

- zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)

- rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)

- ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh)

- Ht = (1 - zt) (.) ht + zt (.) Ht-1

Format

zdnn_status zdnn_gru(const zdnn_ztensor *input, const zdnn_ztensor *h0,
                     const zdnn_ztensor *weights, const zdnn_ztensor *biases,
                     const zdnn_ztensor *hidden_weights,
                     const zdnn_ztensor *hidden_biases,
                     lstm_gru_direction direction, void *work_area,
                     zdnn_ztensor *hn_output);

Also see an example in the usage example section.

GRU Input / Output requirements

  • num_hidden dimensions:
    • Any num_hidden dimension must be less than or equal to 10880 elements.

Parameters

  • zdnn_ztensor *input

    • Input must be a tensor with the shape (num_timesteps, num_batches, num_features) prior to transformation with the zdnn_transform_ztensor API.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Must follow general tensor requirements
  • zdnn_ztensor *h0

    • Tensor containing the initial hidden state with shape (num_dirs, num_batches, num_hidden) prior to transformation with the zdnn_transform_ztensor API.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Must follow general tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *weights

    • Tensor containing the concatenated input connection weights in (Z)update, Reset, Hidden, (ZRH) order.
    • Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_features, num_hidden) by the caller.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Expects zdnn_concat_info having the following flags turned on:
      • RNN_TYPE_GRU
      • USAGE_WEIGHTS
      • Appropriate PREV_LAYER flag:
        • PREV_LAYER_NONE if input tensor is not from a previous RNN layer
        • PREV_LAYER_UNI if input tensor is uni-directional output from a previous RNN layer
        • PREV_LAYER_BIDIR if input tensor is bi-directional output from a previous RNN layer
    • Must follow concatenated tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *biases

    • Tensor containing the concatenated input connection bias in (Z)update, Reset, Hidden, (ZRH) order.
    • Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden).
    • Expects pre_transformed_desc->layout to be ZDNN_2DS.
    • Expects zdnn_concat_info having the following flags turned on:
      • RNN_TYPE_GRU
      • USAGE_HIDDEN_WEIGHTS
      • Appropriate PREV_LAYER flag:
        • PREV_LAYER_NONE if input tensor is not from a previous RNN layer
        • PREV_LAYER_UNI if input tensor is uni-directional output from a previous RNN layer
        • PREV_LAYER_BIDIR if input tensor is bi-directional output from a previous RNN layer
    • Must follow concatenated tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *hidden_weights

    • Tensor containing the concatenated hidden connection weights in (Z)update, Reset, Hidden, (ZRH) order.
    • Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_hidden, num_hidden) by the caller.
    • Expects pre_transformed_desc->layout to be ZDNN_3DS.
    • Expects zdnn_concat_info having the following flags turned on:
      • RNN_TYPE_GRU
      • USAGE_BIASES
      • Appropriate PREV_LAYER flag:
        • PREV_LAYER_NONE if input tensor is not from a previous RNN layer
        • PREV_LAYER_UNI if input tensor is uni-directional output from a previous RNN layer
        • PREV_LAYER_BIDIR if input tensor is bi-directional output from a previous RNN layer
    • Must follow concatenated tensor requirements
    • Must follow num_hidden requirements
  • zdnn_ztensor *hidden_biases

    • Tensor containing the concatenated hidden connection bias in (Z)update, Reset, Hidden, (ZRH) order.
    • Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden).
    • Expects pre_transformed_desc->layout to be ZDNN_2DS.
    • Expects zdnn_concat_info having the following flags turned on:
      • RNN_TYPE_GRU
      • USAGE_HIDDEN_BIASES
      • Appropriate PREV_LAYER flag:
        • PREV_LAYER_NONE if input tensor is not from a previous RNN layer
        • PREV_LAYER_UNI if input tensor is uni-directional output from a previous RNN layer
        • PREV_LAYER_BIDIR if input tensor is bi-directional output from a previous RNN layer
    • Must follow concatenated tensor requirements
    • Must follow num_hidden requirements
  • lstm_gru_direction direction

    • Direction indicator of lstm_gru_direction direction type. Valid values:
      • FWD (forward)
      • BWD (backward)
      • BIDIR (bi-directional).
    • For input shapes, the num_dirs dimension should be:
      • 1 for unidirectional calls such as FWD or BWD
      • 2 for bidirectional calls such that:
        • dimension 0 contains FWD values.
        • dimension 1 contains BWD values.
  • void *work_area

    • A preallocated memory address to use for temporary storage during internal operation processing.

    • If set to NULL, the operation will determine, allocate and free storage automatically.

    • Amount of required storage can be determined given the GRU timestep, batch, and num_hidden values.

      • The sample code below creates a ztensor descriptor that is an equivalent size of the required work_area. To use this sample code yourself, replace the num_timesteps, num_batches, and num_hidden variables with your own values.

          zdnn_tensor_desc desc;
          desc.dim4 = (3 * num_timesteps) + 5;
          desc.dim3 = 1;
          desc.dim2 = num_batches;
          desc.dim1 = num_hidden;
          uint64_t work_area_size = zdnn_getsize_ztensor(&desc);
        
    • For bidirectional, twice the amount of contiguous storage is required.

    • The start of the buffer must be 4k aligned.

  • zdnn_ztensor *hn_output

    • Output results of the hidden states

    • Expects pre_transformed_desc->layout to be ZDNN_4DS.

    • Must follow general tensor requirements

    • Must follow num_hidden requirements

    • Output pre-transformed shapes:

      • all timesteps: (num_timesteps, num_dirs, num_batches, num_hidden)
      • final timestep only: (1, num_dirs, num_batches, num_hidden)
    • For bidirectional (BIDIR) output:

      • Forward and backward results are concatenated on the innermost dimension.
      • Can be used directly as input for subsequent RNN layers without needing untransformation.
        • Can not be used directly as input for other non-RNN zDNN ops.
      • Untransformation is supported.
    • Note that for BWD and the backward component of BIDIR directions, the output order matches the order of the input, not the processing order. For example, the first input timestep is the last to be processed and its result is the first timestep of the output.

Summary

pre-transformed layout pre-transformed shape
input ZDNN_3DS (num_timesteps, num_batches, num_features)
h0 ZDNN_3DS (num_dirs, num_batches, num_hidden)
c0 ZDNN_3DS (num_dirs, num_batches, num_hidden)
weights ZDNN_3DS (num_dirs, num_features, num_hidden)
bias ZDNN_2DS (num_dirs, num_hidden)
hidden_weights ZDNN_3DS (num_dirs, num_hidden, num_hidden)
hidden_biases ZDNN_2DS (num_dirs, num_hidden)
hn_output ZDNN_4DS (num_timesteps, num_dirs, num_batches, num_hidden)
(last timestep only when num_timesteps = 1)
create transformed descriptor via
input zdnn_generate_transformed_desc
h0 zdnn_generate_transformed_desc
c0 zdnn_generate_transformed_desc
weights zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_WEIGHTS + one of the following:
PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR
bias zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_BIASES + one of the following:
PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR
hidden_weights zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_HIDDEN_WEIGHTS + one of the following:
PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR
hidden_biases zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_HIDDEN_BIASES + one of the following:
PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR
hn_output zdnn_generate_transformed_desc

Returns (see zDNN Statuses for descriptions)

  • ZDNN_OK
  • ZDNN_INVALID_TYPE
  • ZDNN_INVALID_FORMAT
  • ZDNN_INVALID_SHAPE - (if any of the following are not true)
    • hn_output timesteps dimension must be 1 or the same size as input timestep dimension.
    • All tensors with a direction dimension have the same direction dimension size.
    • input timestep dimension must be greater than or equal to 1.
    • Other general shape violations (exceeds MDIS, etc.)
  • ZDNN_INVALID_DIRECTION - direction parameter was not a recognized lstm_gru_direction.
  • ZDNN_ALLOCATION_FAILURE - A preallocated work_area was not specified and internal allocation for the required memory failed.
  • hardware statuses

Framework Examples

TensorFlow GRU

ONNX GRU


zdnn_avgpool2d

Back to Table of Contents

Description

Given an input tensor in zDNN transformed format, padding type, kernel size and kernel stride, produces a downsampled tensor reducing the middle dimensions based on the mean values within the kernel window at each step and stores the results into the provided output zDNN tensor.

Format

zdnn_status zdnn_avgpool2d(const zdnn_ztensor *input,
                           zdnn_pool_padding padding_type,
                           uint32_t kernel_height, uint32_t kernel_width,
                           uint32_t stride_height, uint32_t stride_width,
                           zdnn_ztensor *output);

Parameters

  • zdnn_ztensor *input

    • Tensor with original values to be downsampled in the output tensor.
    • Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
    • See Parameter Restrictions below for information on the expected shape of the input tensor.
    • Must follow general tensor requirements
  • padding_type

  • kernel_height

    • Size of the kernel window that passes over the input's height dimension.
    • See Parameter Restrictions below for information on the expected value of kerneL_height.
  • kernel_width

    • Size of the kernel window that passes over the input's width dimension.
    • See Parameter Restrictions below for information on the expected value of kerneL_width.
  • stride_height

    • Number of positions the kernel moves over input's height dimension at each step.
    • If stride_height is 0 then stride_width must also be 0.
    • If strides are greater than 0 then stride_height must be less than or equal to 30.
  • stride_width

    • Number of positions the kernel moves over the input's width dimension at each step.
    • If stride_height is 0 then stride_width must also be 0.
    • If strides are greater than 0 then stride_width must be less than or equal to 30.
  • zdnn_ztensor *output

    • The result tensor which will hold the result of the pooling operation its buffer.
    • Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
    • See Parameter Restrictions below for information on the expected shape of the output tensor.
    • Must follow general tensor requirements

AvgPool2D Parameter Restrictions

Parameter restrictions may vary based on provided strides and padding_type.

  • Input tensor batch_Num and Channel dimensions must always match the output tensor's respective dimensions.

  • If strides are 0:

    • Both input tensor's Height dimension and the kernel_height must match and be less than or equal to 1024.
    • Both input tensor's Width dimension and the kernel_width must match and be less than or equal to 1024.
    • Output tensor's height and width dimensions must be 1.
    • padding_type must be VALID_PADDING.
  • If strides are greater than zero:

    • kernel_width and kernel_height must be less than or equal to 64.
    • input tensor's height or weight dimension must not be greater than 1024.
    • If padding_type is SAME_PADDING:
      • Output tensor's height dimension must equal ceil((float)input's height / stride_height).
      • Output tensor's width dimension must equal ceil((float)input's width / stride_width).
    • If padding_type is VALID_PADDING:
      • Output tensor's height dimension must equal ceil((float)(input's height - kernel_height + 1) / stride_height).
      • Output tensor's width dimension must equal ceil((float)(input's width - kernel_width + 1) / stride_width).

Programming Notes

  • If the magnitude of difference between elements of input is large (greater than 10), accuracy may be reduced.

Returns (see zDNN Statuses for descriptions)

  • ZDNN_OK
  • ZDNN_INVALID_SHAPE
    • Shape of input or output tensor is invalid based on given kernel and stride parameters
    • Other general shape violations (exceeds MDIS, etc.)
  • ZDNN_INVALID_TYPE
  • ZDNN_INVALID_FORMAT
  • ZDNN_INVALID_STRIDE_PADDING
  • ZDNN_INVALID_STRIDES - One stride was non-zero, but not the other.
  • hardware statuses
    • ZDNN_EXCEEDS_MDIS will also occur if any of the following conditions occur:
      • stride_height is larger than zdnn_get_nnpa_max_dim_idx_size.
      • stride_width is larger than zdnn_get_nnpa_max_dim_idx_size.
      • kernel_height is 0 or is larger than zdnn_get_nnpa_max_dim_idx_size.
      • kernel_width is 0 or is larger than zdnn_get_nnpa_max_dim_idx_size.
    • ZDNN_FUNC_RC_F000 - Invalid padding_type
    • ZDNN_FUNC_RC_F001 - stride_height = 0 and stride_width = 0, but a kernel parameter is greater than allowed (see kernel_height or kernel_width above)
    • ZDNN_FUNC_RC_F002 - stride_height > 0 and stride_width > 0, but a kernel parameter is greater than allowed (see kernel_height or kernel_width above)
    • ZDNN_FUNC_RC_F003 - stride_height > 0 and stride_width > 0, but a stride parameter is greater than allowed (see stride_height or stride_width above)
    • ZDNN_FUNC_RC_F004 - stride_height > 0 and stride_width > 0, but either input tensor's height or weight dimension is greater than 1024.

Framework Examples

TensorFlow AvgPool

ONNX AvgPool


zdnn_maxpool2d

Back to Table of Contents

Description

Given an input tensor in zDNN transformed format, padding type, kernel size and kernel stride, produces a downsampled tensor reducing the middle dimensions based on the maximum values within the kernel window at each step and stores the results into the provided output zDNN tensor.

Format

zdnn_status zdnn_maxpool2d(const zdnn_ztensor *input,
                           zdnn_pool_padding padding_type,
                           uint32_t kernel_height, uint32_t kernel_width,
                           uint32_t stride_height, uint32_t stride_width,
                           zdnn_ztensor *output);

Parameters

  • zdnn_ztensor *input

    • Tensor with original values to be downsampled in the output tensor.
    • Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
    • See Parameter Restrictions below for information on the expected shape of the input tensor.
    • Must follow general tensor requirements
  • padding_type

  • kernel_height

    • Size of the kernel window that passes over the input's height dimension.
    • See Parameter Restrictions below for information on the expected value of kerneL_height.
  • kernel_width

    • Size of the kernel window that passes over the input's width dimension.
    • See Parameter Restrictions below for information on the expected value of kerneL_width.
  • stride_height

    • Number of positions the kernel moves over input's height dimension at each step.
    • If stride_height is 0 then stride_width must also be 0.
    • If strides are greater than 0 then stride_height must be less than or equal to 30.
  • stride_width

    • Number of positions the kernel moves over the input's width dimension at each step.
    • If stride_height is 0 then stride_width must also be 0.
    • If strides are greater than 0 then stride_width must be less than or equal to 30.
  • zdnn_ztensor *output

    • The result tensor which will hold the result of the pooling operation its buffer.
    • Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
    • See Parameter Restrictions below for information on the expected shape of the output tensor.
    • Must follow general tensor requirements

MaxPool2D Parameter Restrictions

Parameter restrictions may vary based on provided strides and padding_type.

  • Input tensor batch_Num and Channel dimensions must always match the output tensor's respective dimensions.

  • If strides are 0:

    • Both input tensor's Height dimension and the kernel_height must match and be less than or equal to 1024.
    • Both input tensor's Width dimension and the kernel_width must match and be less than or equal to 1024.
    • Output tensor's height and width dimensions must be 1.
    • padding_type must be VALID_PADDING.
  • If strides are greater than zero:

    • kernel_width and kernel_height must be less than or equal to 64.
    • input tensor's height or weight dimension must not be greater than 1024.
    • If padding_type is SAME_PADDING:
      • Output tensor's height dimension must equal ceil((float)input's height / stride_height).
      • Output tensor's width dimension must equal ceil((float)input's width / stride_width).
    • If padding_type is VALID_PADDING:
      • Output tensor's height dimension must equal ceil((float)(input's height - kernel_height + 1) / stride_height).
      • Output tensor's width dimension must equal ceil((float)(input's width - kernel_width + 1) / stride_width).

Programming Notes

  • If the magnitude of difference between elements of input is large (greater than 10), accuracy may be reduced.

Returns (see zDNN Statuses for descriptions)

  • ZDNN_OK
  • ZDNN_INVALID_SHAPE
    • Shape of input or output tensor is invalid based on given kernel and stride parameters
    • Other general shape violations (exceeds MDIS, etc.)
  • ZDNN_INVALID_TYPE
  • ZDNN_INVALID_FORMAT
  • ZDNN_INVALID_STRIDE_PADDING
  • ZDNN_INVALID_STRIDES - One stride was non-zero, but not the other.
  • hardware statuses
    • ZDNN_EXCEEDS_MDIS will also occur if any of the following conditions occur:
      • stride_height is larger than zdnn_get_nnpa_max_dim_idx_size.
      • stride_width is larger than zdnn_get_nnpa_max_dim_idx_size.
      • kernel_height is 0 or is larger than zdnn_get_nnpa_max_dim_idx_size.
      • kernel_width is 0 or is larger than zdnn_get_nnpa_max_dim_idx_size.
    • ZDNN_FUNC_RC_F000 - Invalid padding_type
    • ZDNN_FUNC_RC_F001 - stride_height = 0 and stride_width = 0, but a kernel parameter is greater than allowed (see kernel_height or kernel_width above)
    • ZDNN_FUNC_RC_F002 - stride_height > 0 and stride_width > 0, but a kernel parameter is greater than allowed (see kernel_height or kernel_width above)
    • ZDNN_FUNC_RC_F003 - stride_height > 0 and stride_width > 0, but a stride parameter is greater than allowed (see stride_height or stride_width above)
    • ZDNN_FUNC_RC_F004 - stride_height > 0 and stride_width > 0, but either input tensor's height or weight dimension is greater than 1024.

Framework Examples

TensorFlow MaxPool

ONNX MaxPool


zdnn_conv2d

Back to Table of Contents

Description

Perform 2D convolution over an input tensor in zDNN transformed format.

First the input tensor is convolved with the kernel tensor. Then the bias tensor is added to the results. Then if act_func is not CONV2D_ACT_NONE, the activation function is applied to the results. Then if act_func is set to CONV2D_ACT_RELU, and clipping_value is not NULL or 0, clipping is performed against the intermediate result where z = min(intermediate_result, clipping_value). Finally the results are stored into the provided output zDNN tensor.

Format

zdnn_status zdnn_conv2d(const zdnn_ztensor *input,
                        const zdnn_ztensor *kernel,
                        const zdnn_ztensor *bias,
                        zdnn_pool_padding padding_type,
                        uint32_t stride_height, uint32_t stride_width,
                        zdnn_conv2d_act act_func,
                        const void *clipping_value, zdnn_ztensor *output);

Parameters

Convolution 2D Requirements

strides and padding input (num_batches, height_in, width_in, channels_in) kernel (kernel_height, kernel_width, channels_in, channels_out) bias (channels_out) output (num_batches, height_out, width_out, channels_out)
both strides > 0 and =< 13, SAME padding both kernel_height and kernel_width must be =< 64 height_out = ceil(kernel_height/stride_height)
width_out = ceil(kernel_width/stride_width)
both strides > 0 and =< 13, VALID padding height_in must be > kernel_height
width_in must be > kernel_width
both kernel_height and kernel_width must be =< 64 height_out = ceil((height_in - kernel_height + 1)/stride_height)
width_out = ceil((width_in - kernel_width + 1)/stride_width)
both strides = 0, VALID padding height_in must be = kernel_height
width_in must be = kernel_width
both kernel_height and kernel_width must be =< 448 both height_out and width_out must be 1

Returns (see zDNN Statuses for descriptions)

  • ZDNN_OK
  • warning statuses
  • ZDNN_INVALID_SHAPE
    • Shape of input or output tensor is invalid based on given kernel and stride parameters
    • Other general shape violations (exceeds MDIS, etc.)
  • ZDNN_INVALID_TYPE
  • ZDNN_INVALID_FORMAT
  • ZDNN_INVALID_STRIDE_PADDING
  • ZDNN_INVALID_STRIDES
  • ZDNN_INVALID_CLIPPING_VALUE
  • hardware statuses
    • ZDNN_FUNC_RC_F000 - Invalid padding_type
    • ZDNN_FUNC_RC_F001 - Invalid act_func
    • ZDNN_FUNC_RC_F002 - stride_height = 0 and stride_width = 0, but either kernel_height or kernel_width > 448
    • ZDNN_FUNC_RC_F003 - stride_height > 0 and stride_width > 0, but either kernel_height or kernel_width > 64
    • ZDNN_FUNC_RC_F004 - Either stride_height or stride_width > 13

Framework Examples

TensorFlow Conv2D

ONNX Conv2D

Convenience Functions

Back to Table of Contents

  • None

Usage Examples

Example flow of an application calling the zDNN APIs

Back to Table of Contents

#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "zdnn.h"

// ***************************************************************************
// Sample:
//
// Create 2 zTensors a and b, and add them together via zdnn_add()
// ***************************************************************************
int main(int argc, char *argv[]) {
  zdnn_tensor_desc pre_tfrmd_desc, tfrmd_desc;
  zdnn_ztensor ztensor_a;
  zdnn_ztensor ztensor_b;
  zdnn_ztensor ztensor_out;
  zdnn_status status;

  uint32_t dim_n = 1, dim_h = 32, dim_w = 32, dim_c = 3;
  zdnn_data_types type = FP32;
  short element_size = 4; // size of each element in bytes
  uint64_t num_elements = dim_n * dim_h * dim_w * dim_c;

  // allocate tensor data storage
  void *data1 = malloc(num_elements * element_size);
  void *data2 = malloc(num_elements * element_size);
  void *data_out = malloc(num_elements * element_size);

  // read input_data

  // check status for AIU availability, supported ops, etc. here
  // status = zdnn_query(…);

  // set input tensor data to 0 to 127 sequentially and repeat
  for (uint64_t i = 0; i < num_elements; i++) {
    ((float *)data1)[i] = (float)(i & 0x7f);
    ((float *)data2)[i] = (float)(i & 0x7f);
  }

  zdnn_init_pre_transformed_desc(ZDNN_NHWC, type, &pre_tfrmd_desc, dim_n, dim_h,
                                 dim_w, dim_c);
  // generate transformed shape information
  status = zdnn_generate_transformed_desc(&pre_tfrmd_desc, &tfrmd_desc);
  assert(status == ZDNN_OK);

  // initialize zTensors and allocate 4k-aligned storage via helper function
  status =
      zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_a);
  assert(status == ZDNN_OK);
  status =
      zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_b);
  assert(status == ZDNN_OK);
  status =
      zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_out);
  assert(status == ZDNN_OK);

  // transform the feature tensor
  status = zdnn_transform_ztensor(&ztensor_a, data1);
  assert(status == ZDNN_OK);
  status = zdnn_transform_ztensor(&ztensor_b, data2);
  assert(status == ZDNN_OK);

  // perform element-wise add between the two input tensors
  status = zdnn_add(&ztensor_a, &ztensor_b, &ztensor_out);
  assert(status == ZDNN_OK);

  // transform resultant zTensor back to original data format
  status = zdnn_transform_origtensor(&ztensor_out, data_out);
  assert(status == ZDNN_OK);

  for (uint64_t i = 0; i < num_elements; i++) {
    printf("out element %" PRIu64 " %f\n", i, ((float *)data_out)[i]);
  }

  free(data1);
  free(data2);
  free(data_out);
}

Example of an application calling the zdnn_lstm API (forward)

Back to Table of Contents

// SPDX-License-Identifier: Apache-2.0
/*
 * Copyright IBM Corp. 2021
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *     http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "zdnn.h"

// Sample: LSTM
int main(int argc, char *argv[]) {
  zdnn_status status;

#ifdef STATIC_LIB
  zdnn_init();
#endif

  /***********************************************************************
   *
   * LSTM (FWD/BWD):
   *
   * INPUTS --------------------------------------------------------------
   * input           |  ZDNN_3DS  | (num_timesteps, num_batches, num_features)
   * h0              |  ZDNN_3DS  | (1, num_batches, num_hidden)
   * c0              |  ZDNN_3DS  | (1, num_batches, num_hidden)
   * weights         |  ZDNN_3DS  | (1, num_features, num_hidden)
   * biases          |  ZDNN_2DS  | (1, num_hidden)
   * hidden_weights  |  ZDNN_3DS  | (1, num_hidden, num_hidden)
   * hidden_biases   |  ZDNN_2DS  | (1, num_hidden)
   *
   * OUTPUTS -------------------------------------------------------------
   * hn_output       |  ZDNN_4DS  | (num_timesteps, 1, num_batches, num_hidden)
   *                 |            | or (1, 1, num_batches, num_hidden)
   * cf_output       |  ZDNN_4DS  | (1, 1, num_batches, num_hidden)
   ***********************************************************************/

  /***********************************************************************
   * Create input zTensor
   ***********************************************************************/

  zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc;
  zdnn_ztensor input;

  uint32_t num_timesteps = 5;
  uint32_t num_batches = 3;
  uint32_t num_features = 32;
  uint32_t num_hidden = 5;

  zdnn_data_types type = FP32;
  short element_size = 4; // size of each element in bytes

  lstm_gru_direction dir = FWD;
  uint8_t num_dirs = 1;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc,
                                 num_timesteps, num_batches, num_features);
  status =
      zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc,
                                         &input_tfrmd_desc, &input);
  assert(status == ZDNN_OK);

  uint64_t input_data_size =
      num_timesteps * num_batches * num_features * element_size;
  void *input_data = malloc(input_data_size);

  status = zdnn_transform_ztensor(&input, input_data);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create initial hidden and cell state zTensors
   ***********************************************************************/

  zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc;
  zdnn_ztensor h0, c0;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs,
                                 num_batches, num_hidden);
  status =
      zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
                                         &h0);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
                                         &c0);
  assert(status == ZDNN_OK);

  uint64_t h0c0_data_size = num_batches * num_hidden * element_size;
  void *hidden_state_data = malloc(h0c0_data_size);
  void *cell_state_data = malloc(h0c0_data_size);

  status = zdnn_transform_ztensor(&h0, hidden_state_data);
  assert(status == ZDNN_OK);
  status = zdnn_transform_ztensor(&c0, cell_state_data);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create input weights zTensor
   * Resultant zTensor is concatenated
   ***********************************************************************/

  zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc;
  zdnn_ztensor weights;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc,
                                 num_dirs, num_features, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_WEIGHTS | PREV_LAYER_NONE,
      &weights_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc,
                                         &weights_tfrmd_desc, &weights);
  assert(status == ZDNN_OK);

  uint64_t weights_data_size = num_features * num_hidden * element_size;
  void *weights_data_f = malloc(weights_data_size);
  void *weights_data_i = malloc(weights_data_size);
  void *weights_data_c = malloc(weights_data_size);
  void *weights_data_o = malloc(weights_data_size);

  status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i,
                                  weights_data_c, weights_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create biases zTensors
   * Resultant zTensors are concatenated
   ***********************************************************************/

  zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc;
  zdnn_ztensor biases;

  zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc,
                                 num_dirs, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_BIASES | PREV_LAYER_NONE,
      &biases_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc,
                                         &biases_tfrmd_desc, &biases);
  assert(status == ZDNN_OK);

  uint64_t biases_data_size = num_hidden * element_size;
  void *biases_data_f = malloc(biases_data_size);
  void *biases_data_i = malloc(biases_data_size);
  void *biases_data_c = malloc(biases_data_size);
  void *biases_data_o = malloc(biases_data_size);

  status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i,
                                  biases_data_c, biases_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create hidden weights zTensor
   * Resultant zTensor is concatenated
   ***********************************************************************/

  zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc;
  zdnn_ztensor hidden_weights;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc,
                                 num_dirs, num_hidden, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &hidden_weights_pre_tfrmd_desc,
      RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE,
      &hidden_weights_tfrmd_desc);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc,
                                         &hidden_weights_tfrmd_desc,
                                         &hidden_weights);
  assert(status == ZDNN_OK);

  uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size;
  void *hidden_weights_data_f = malloc(hidden_weights_data_size);
  void *hidden_weights_data_i = malloc(hidden_weights_data_size);
  void *hidden_weights_data_c = malloc(hidden_weights_data_size);
  void *hidden_weights_data_o = malloc(hidden_weights_data_size);

  status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f,
                                  hidden_weights_data_i, hidden_weights_data_c,
                                  hidden_weights_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create hidden biases zTensors
   * Resultant zTensors are concatenated
   ***********************************************************************/

  zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc;
  zdnn_ztensor hidden_biases;

  zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc,
                                 num_dirs, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &hidden_biases_pre_tfrmd_desc,
      RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE,
      &hidden_biases_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(
      &hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases);
  assert(status == ZDNN_OK);

  uint64_t hidden_biases_data_size = num_hidden * element_size;

  void *hidden_biases_data_f = malloc(hidden_biases_data_size);
  void *hidden_biases_data_i = malloc(hidden_biases_data_size);
  void *hidden_biases_data_c = malloc(hidden_biases_data_size);
  void *hidden_biases_data_o = malloc(hidden_biases_data_size);

  status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f,
                                  hidden_biases_data_i, hidden_biases_data_c,
                                  hidden_biases_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create output zTensor
   ***********************************************************************/

  // get only the last timestep, thus hn and cf can share descriptor
  zdnn_tensor_desc hncf_pre_tfrmd_desc, hncf_tfrmd_desc;

  zdnn_ztensor hn_output_ztensor, cf_output_ztensor;

  zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hncf_pre_tfrmd_desc, 1, 1,
                                 num_batches, num_hidden);
  status =
      zdnn_generate_transformed_desc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc,
                                         &hn_output_ztensor);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc,
                                         &cf_output_ztensor);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Call the AIU
   ***********************************************************************/

  void *work_area = NULL;

  status = zdnn_lstm(&input, &h0, &c0, &weights, &biases, &hidden_weights,
                     &hidden_biases, dir, work_area, &hn_output_ztensor,
                     &cf_output_ztensor);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Output and Cleanup
   ***********************************************************************/

  uint64_t hncf_data_size = num_batches * num_hidden * element_size;
  void *hn_output_data = malloc(hncf_data_size);
  void *cf_output_data = malloc(hncf_data_size);

  status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data);
  assert(status == ZDNN_OK);
  status = zdnn_transform_origtensor(&cf_output_ztensor, cf_output_data);
  assert(status == ZDNN_OK);

  status = zdnn_free_ztensor_buffer(&input);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&h0);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&c0);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&weights);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&biases);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hidden_weights);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hidden_biases);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hn_output_ztensor);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&cf_output_ztensor);
  assert(status == ZDNN_OK);

  free(input_data);
  free(hidden_state_data);
  free(cell_state_data);
  free(weights_data_f);
  free(weights_data_i);
  free(weights_data_c);
  free(weights_data_o);
  free(hidden_weights_data_f);
  free(hidden_weights_data_i);
  free(hidden_weights_data_c);
  free(hidden_weights_data_o);
  free(biases_data_f);
  free(biases_data_i);
  free(biases_data_c);
  free(biases_data_o);
  free(hidden_biases_data_f);
  free(hidden_biases_data_i);
  free(hidden_biases_data_c);
  free(hidden_biases_data_o);
  free(hn_output_data);
  free(cf_output_data);
}



Example of an application calling the zdnn_lstm API (bi-directional)

Back to Table of Contents

// SPDX-License-Identifier: Apache-2.0
/*
 * Copyright IBM Corp. 2021
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *     http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "zdnn.h"

// Sample: LSTM BI-DIR
int main(int argc, char *argv[]) {
  zdnn_status status;

#ifdef STATIC_LIB
  zdnn_init();
#endif

  /***********************************************************************
   *
   * LSTM (BI-DIR):
   *
   * INPUTS --------------------------------------------------------------
   * input           |  ZDNN_3DS  | (num_timesteps, num_batches, num_features)
   * h0              |  ZDNN_3DS  | (2, num_batches, num_hidden)
   * c0              |  ZDNN_3DS  | (2, num_batches, num_hidden)
   * weights         |  ZDNN_3DS  | (2, num_features, num_hidden)
   * biases          |  ZDNN_2DS  | (2, num_hidden)
   * hidden_weights  |  ZDNN_3DS  | (2, num_hidden, num_hidden)
   * hidden_biases   |  ZDNN_2DS  | (2, num_hidden)
   *
   * OUTPUTS -------------------------------------------------------------
   * hn_output       |  ZDNN_4DS  | (num_timesteps, 2, num_batches, num_hidden)
   *                 |            | or (1, 2, num_batches, num_hidden)
   * cf_output       |  ZDNN_4DS  | (1, 2, num_batches, num_hidden)
   ***********************************************************************/

  /***********************************************************************
   * Create input zTensor
   ***********************************************************************/

  zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc;
  zdnn_ztensor input;

  uint32_t num_timesteps = 5;
  uint32_t num_batches = 3;
  uint32_t num_features = 32;
  uint32_t num_hidden = 5;

  zdnn_data_types type = FP32;
  short element_size = 4; // size of each element in bytes

  lstm_gru_direction dir = BIDIR;
  uint8_t num_dirs = 2;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc,
                                 num_timesteps, num_batches, num_features);
  status =
      zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc,
                                         &input_tfrmd_desc, &input);
  assert(status == ZDNN_OK);

  uint64_t input_data_size =
      num_timesteps * num_batches * num_features * element_size;
  void *input_data = malloc(input_data_size);

  status = zdnn_transform_ztensor(&input, input_data);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create initial hidden and cell state zTensors
   ***********************************************************************/

  zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc;
  zdnn_ztensor h0, c0;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs,
                                 num_batches, num_hidden);
  status =
      zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
                                         &h0);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
                                         &c0);
  assert(status == ZDNN_OK);

  uint64_t h0c0_data_size = num_batches * num_hidden * element_size;
  void *hidden_state_data = malloc(h0c0_data_size);
  void *cell_state_data = malloc(h0c0_data_size);

  status = zdnn_transform_ztensor(&h0, hidden_state_data);
  assert(status == ZDNN_OK);
  status = zdnn_transform_ztensor(&c0, cell_state_data);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create input weights zTensor
   * Resultant zTensor is concatenated
   ***********************************************************************/

  zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc;
  zdnn_ztensor weights;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc,
                                 num_dirs, num_features, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_WEIGHTS | PREV_LAYER_NONE,
      &weights_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc,
                                         &weights_tfrmd_desc, &weights);
  assert(status == ZDNN_OK);

  uint64_t weights_data_size = num_features * num_hidden * element_size;
  void *weights_data_f = malloc(weights_data_size);
  void *weights_data_i = malloc(weights_data_size);
  void *weights_data_c = malloc(weights_data_size);
  void *weights_data_o = malloc(weights_data_size);

  status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i,
                                  weights_data_c, weights_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create biases zTensors
   * Resultant zTensors are concatenated
   ***********************************************************************/

  zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc;
  zdnn_ztensor biases;

  zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc,
                                 num_dirs, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_BIASES | PREV_LAYER_NONE,
      &biases_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc,
                                         &biases_tfrmd_desc, &biases);
  assert(status == ZDNN_OK);

  uint64_t biases_data_size = num_hidden * element_size;
  void *biases_data_f = malloc(biases_data_size);
  void *biases_data_i = malloc(biases_data_size);
  void *biases_data_c = malloc(biases_data_size);
  void *biases_data_o = malloc(biases_data_size);

  status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i,
                                  biases_data_c, biases_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create hidden weights zTensor
   * Resultant zTensor is concatenated
   ***********************************************************************/

  zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc;
  zdnn_ztensor hidden_weights;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc,
                                 num_dirs, num_hidden, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &hidden_weights_pre_tfrmd_desc,
      RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE,
      &hidden_weights_tfrmd_desc);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc,
                                         &hidden_weights_tfrmd_desc,
                                         &hidden_weights);
  assert(status == ZDNN_OK);

  uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size;
  void *hidden_weights_data_f = malloc(hidden_weights_data_size);
  void *hidden_weights_data_i = malloc(hidden_weights_data_size);
  void *hidden_weights_data_c = malloc(hidden_weights_data_size);
  void *hidden_weights_data_o = malloc(hidden_weights_data_size);

  status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f,
                                  hidden_weights_data_i, hidden_weights_data_c,
                                  hidden_weights_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create hidden biases zTensors
   * Resultant zTensors are concatenated
   ***********************************************************************/

  zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc;
  zdnn_ztensor hidden_biases;

  zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc,
                                 num_dirs, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &hidden_biases_pre_tfrmd_desc,
      RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE,
      &hidden_biases_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(
      &hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases);
  assert(status == ZDNN_OK);

  uint64_t hidden_biases_data_size = num_hidden * element_size;

  void *hidden_biases_data_f = malloc(hidden_biases_data_size);
  void *hidden_biases_data_i = malloc(hidden_biases_data_size);
  void *hidden_biases_data_c = malloc(hidden_biases_data_size);
  void *hidden_biases_data_o = malloc(hidden_biases_data_size);

  status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f,
                                  hidden_biases_data_i, hidden_biases_data_c,
                                  hidden_biases_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create output zTensor
   ***********************************************************************/

  zdnn_tensor_desc hn_pre_tfrmd_desc, hn_tfrmd_desc, cf_pre_tfrmd_desc,
      cf_tfrmd_desc;

  zdnn_ztensor hn_output_ztensor, cf_output_ztensor;

  zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc,
                                 num_timesteps, 2, num_batches, num_hidden);
  status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc);
  assert(status == ZDNN_OK);

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &cf_pre_tfrmd_desc, 1, 2,
                                 num_batches, num_hidden);
  status = zdnn_generate_transformed_desc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc,
                                         &hn_output_ztensor);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc,
                                         &cf_output_ztensor);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Call the AIU
   ***********************************************************************/

  void *work_area = NULL;

  status = zdnn_lstm(&input, &h0, &c0, &weights, &biases, &hidden_weights,
                     &hidden_biases, dir, work_area, &hn_output_ztensor,
                     &cf_output_ztensor);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Output and Cleanup
   ***********************************************************************/

  uint64_t hn_data_size =
      num_timesteps * 2 * num_batches * num_hidden * element_size;
  uint64_t cf_data_size = 2 * num_batches * num_hidden * element_size;
  void *hn_output_data = malloc(hn_data_size);
  void *cf_output_data = malloc(cf_data_size);

  status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data);
  assert(status == ZDNN_OK);
  status = zdnn_transform_origtensor(&cf_output_ztensor, cf_output_data);
  assert(status == ZDNN_OK);

  status = zdnn_free_ztensor_buffer(&input);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&h0);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&c0);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&weights);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&biases);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hidden_weights);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hidden_biases);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hn_output_ztensor);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&cf_output_ztensor);
  assert(status == ZDNN_OK);

  free(input_data);
  free(hidden_state_data);
  free(cell_state_data);
  free(weights_data_f);
  free(weights_data_i);
  free(weights_data_c);
  free(weights_data_o);
  free(hidden_weights_data_f);
  free(hidden_weights_data_i);
  free(hidden_weights_data_c);
  free(hidden_weights_data_o);
  free(biases_data_f);
  free(biases_data_i);
  free(biases_data_c);
  free(biases_data_o);
  free(hidden_biases_data_f);
  free(hidden_biases_data_i);
  free(hidden_biases_data_c);
  free(hidden_biases_data_o);
  free(hn_output_data);
  free(cf_output_data);
}




Example of an application calling the zdnn_lstm API (multi-layer bi-directional)

Back to Table of Contents

// SPDX-License-Identifier: Apache-2.0
/*
 * Copyright IBM Corp. 2021
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *     http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "zdnn.h"

void do_bidir_layer(zdnn_ztensor *input, uint32_t num_hidden,
                    zdnn_ztensor *hn_output, bool is_prev_layer_bidir) {

  zdnn_status status;

  uint32_t num_batches = input->pre_transformed_desc->dim2;

  // if input is bidir output from previous layer then number of features for
  // this layer is 2x of hidden-state size (dim1) of the previous layer
  uint32_t num_features =
      input->pre_transformed_desc->dim1 * (is_prev_layer_bidir ? 2 : 1);

  zdnn_data_types type = FP32;
  short element_size = 4; // size of each element in bytes

  lstm_gru_direction dir = BIDIR;
  uint8_t num_dirs = 2;

  /***********************************************************************
   * Create initial hidden and cell state zTensors
   ***********************************************************************/

  zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc;
  zdnn_ztensor h0, c0;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs,
                                 num_batches, num_hidden);
  status =
      zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
                                         &h0);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
                                         &c0);
  assert(status == ZDNN_OK);

  uint64_t h0c0_data_size = num_batches * num_hidden * element_size;
  void *hidden_state_data = malloc(h0c0_data_size);
  void *cell_state_data = malloc(h0c0_data_size);

  status = zdnn_transform_ztensor(&h0, hidden_state_data);
  assert(status == ZDNN_OK);
  status = zdnn_transform_ztensor(&c0, cell_state_data);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create input weights zTensor
   * Resultant zTensor is concatenated
   ***********************************************************************/

  zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc;
  zdnn_ztensor weights;

  // if using previous layer bidir output as input then number of features of
  // this layer is
  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc,
                                 num_dirs, num_features, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &weights_pre_tfrmd_desc,
      RNN_TYPE_LSTM | USAGE_WEIGHTS |
          (is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI),
      &weights_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc,
                                         &weights_tfrmd_desc, &weights);
  assert(status == ZDNN_OK);

  uint64_t weights_data_size = num_features * num_hidden * element_size;
  void *weights_data_f = malloc(weights_data_size);
  void *weights_data_i = malloc(weights_data_size);
  void *weights_data_c = malloc(weights_data_size);
  void *weights_data_o = malloc(weights_data_size);

  status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i,
                                  weights_data_c, weights_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create biases zTensors
   * Resultant zTensors are concatenated
   ***********************************************************************/

  zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc;
  zdnn_ztensor biases;

  zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc,
                                 num_dirs, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &biases_pre_tfrmd_desc,
      RNN_TYPE_LSTM | USAGE_BIASES |
          (is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI),
      &biases_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc,
                                         &biases_tfrmd_desc, &biases);
  assert(status == ZDNN_OK);

  uint64_t biases_data_size = num_hidden * element_size;
  void *biases_data_f = malloc(biases_data_size);
  void *biases_data_i = malloc(biases_data_size);
  void *biases_data_c = malloc(biases_data_size);
  void *biases_data_o = malloc(biases_data_size);

  status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i,
                                  biases_data_c, biases_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create hidden weights zTensor
   * Resultant zTensor is concatenated
   ***********************************************************************/

  zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc;
  zdnn_ztensor hidden_weights;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc,
                                 num_dirs, num_hidden, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &hidden_weights_pre_tfrmd_desc,
      RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS |
          (is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI),
      &hidden_weights_tfrmd_desc);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc,
                                         &hidden_weights_tfrmd_desc,
                                         &hidden_weights);
  assert(status == ZDNN_OK);

  uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size;
  void *hidden_weights_data_f = malloc(hidden_weights_data_size);
  void *hidden_weights_data_i = malloc(hidden_weights_data_size);
  void *hidden_weights_data_c = malloc(hidden_weights_data_size);
  void *hidden_weights_data_o = malloc(hidden_weights_data_size);

  status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f,
                                  hidden_weights_data_i, hidden_weights_data_c,
                                  hidden_weights_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create hidden biases zTensors
   * Resultant zTensors are concatenated
   ***********************************************************************/

  zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc;
  zdnn_ztensor hidden_biases;

  zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc,
                                 num_dirs, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &hidden_biases_pre_tfrmd_desc,
      RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES |
          (is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI),
      &hidden_biases_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(
      &hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases);
  assert(status == ZDNN_OK);

  uint64_t hidden_biases_data_size = num_hidden * element_size;

  void *hidden_biases_data_f = malloc(hidden_biases_data_size);
  void *hidden_biases_data_i = malloc(hidden_biases_data_size);
  void *hidden_biases_data_c = malloc(hidden_biases_data_size);
  void *hidden_biases_data_o = malloc(hidden_biases_data_size);

  status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f,
                                  hidden_biases_data_i, hidden_biases_data_c,
                                  hidden_biases_data_o);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create cf output zTensor
   ***********************************************************************/

  zdnn_tensor_desc cf_pre_tfrmd_desc, cf_tfrmd_desc;

  zdnn_ztensor cf_output_ztensor;

  zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &cf_pre_tfrmd_desc, 1, 2,
                                 num_batches, num_hidden);
  status = zdnn_generate_transformed_desc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc,
                                         &cf_output_ztensor);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Call the AIU
   ***********************************************************************/

  void *work_area = NULL;

  status =
      zdnn_lstm(input, &h0, &c0, &weights, &biases, &hidden_weights,
                &hidden_biases, dir, work_area, hn_output, &cf_output_ztensor);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Cleanup and Return
   ***********************************************************************/

  status = zdnn_free_ztensor_buffer(&h0);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&c0);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&weights);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&biases);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hidden_weights);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hidden_biases);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&cf_output_ztensor);
  assert(status == ZDNN_OK);

  free(hidden_state_data);
  free(cell_state_data);
  free(weights_data_f);
  free(weights_data_i);
  free(weights_data_c);
  free(weights_data_o);
  free(hidden_weights_data_f);
  free(hidden_weights_data_i);
  free(hidden_weights_data_c);
  free(hidden_weights_data_o);
  free(biases_data_f);
  free(biases_data_i);
  free(biases_data_c);
  free(biases_data_o);
  free(hidden_biases_data_f);
  free(hidden_biases_data_i);
  free(hidden_biases_data_c);
  free(hidden_biases_data_o);
}

// Sample: LSTM multi-layer BIDIR
int main(int argc, char *argv[]) {
  zdnn_status status;

#ifdef STATIC_LIB
  zdnn_init();
#endif

  uint32_t num_hidden[2] = {5, 4};

  /***********************************************************************
   * Create input zTensor
   ***********************************************************************/

  zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc;
  zdnn_ztensor input;

  uint32_t num_timesteps = 5;
  uint32_t num_batches = 3;
  uint32_t num_features = 32;

  zdnn_data_types type = FP32;
  short element_size = 4; // size of each element in bytes

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc,
                                 num_timesteps, num_batches, num_features);
  status =
      zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc,
                                         &input_tfrmd_desc, &input);
  assert(status == ZDNN_OK);

  uint64_t input_data_size =
      num_timesteps * num_batches * num_features * element_size;
  void *input_data = malloc(input_data_size);

  status = zdnn_transform_ztensor(&input, input_data);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create 2 hn output zTensors
   ***********************************************************************/

  zdnn_tensor_desc hn_pre_tfrmd_desc[2], hn_tfrmd_desc[2];
  zdnn_ztensor hn_output[2];

  for (int i = 0; i < 2; i++) {
    zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc[i],
                                   num_timesteps, 2, num_batches,
                                   num_hidden[i]);
    status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc[i],
                                            &hn_tfrmd_desc[i]);
    assert(status == ZDNN_OK);

    status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc[i],
                                           &hn_tfrmd_desc[i], &hn_output[i]);
    assert(status == ZDNN_OK);
  }

  /***********************************************************************
   * Do the layers
   ***********************************************************************/

  // call the first layer with input, previous layer bidir = false, output goes
  // to hn_output[0]
  do_bidir_layer(&input, num_hidden[0], &hn_output[0], false);

  // call the second layer with hn_output[0] from layer 1, previous layer bidir
  // = true, output goes to hn_output[1]
  do_bidir_layer(&hn_output[0], num_hidden[1], &hn_output[1], true);

  /***********************************************************************
   * Output and Cleanup
   ***********************************************************************/

  void *hn_output_data[2];

  for (int i = 0; i < 2; i++) {
    uint64_t hn_output_data_size = (uint64_t)num_timesteps * num_batches *
                                   num_hidden[i] * 2 * element_size;
    hn_output_data[i] = malloc(hn_output_data_size);

    status = zdnn_transform_origtensor(&hn_output[i], hn_output_data[i]);
    assert(status == ZDNN_OK);
  }

  status = zdnn_free_ztensor_buffer(&input);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hn_output[0]);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hn_output[1]);
  assert(status == ZDNN_OK);

  free(input_data);
  free(hn_output_data[0]);
  free(hn_output_data[1]);
}




Example of an application calling the zdnn_gru API (forward)

Back to Table of Contents

// SPDX-License-Identifier: Apache-2.0
/*
 * Copyright IBM Corp. 2021
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *     http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "zdnn.h"

// Sample: GRU
int main(int argc, char *argv[]) {
  zdnn_status status;

#ifdef STATIC_LIB
  zdnn_init();
#endif

  /***********************************************************************
   *
   * GRU (FWD/BWD):
   *
   * INPUTS --------------------------------------------------------------
   * input           |  ZDNN_3DS  | (num_timesteps, num_batches, num_features)
   * h0              |  ZDNN_3DS  | (1, num_batches, num_hidden)
   * weights         |  ZDNN_3DS  | (1, num_features, num_hidden)
   * input_biases    |  ZDNN_2DS  | (1, num_hidden)
   * hidden_weights  |  ZDNN_3DS  | (1, num_hidden, num_hidden)
   * hidden_biases   |  ZDNN_2DS  | (1, num_hidden)
   *
   * OUTPUTS -------------------------------------------------------------
   * hn_output       |  ZDNN_4DS  | (num_timesteps, 1, num_batches, num_hidden)
   *                 |            | or (1, 1, num_batches, num_hidden)
   ***********************************************************************/

  /***********************************************************************
   * Create input zTensor
   ***********************************************************************/

  zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc;
  zdnn_ztensor input;

  uint32_t num_timesteps = 5;
  uint32_t num_batches = 3;
  uint32_t num_features = 32;
  uint32_t num_hidden = 5;

  zdnn_data_types type = FP32;
  short element_size = 4; // size of each element in bytes

  lstm_gru_direction dir = FWD;
  uint8_t num_dirs = 1;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc,
                                 num_timesteps, num_batches, num_features);
  status =
      zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc,
                                         &input_tfrmd_desc, &input);
  assert(status == ZDNN_OK);

  uint64_t input_data_size =
      num_timesteps * num_batches * num_features * element_size;
  void *input_data = malloc(input_data_size);

  status = zdnn_transform_ztensor(&input, input_data);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create initial hidden zTensor
   ***********************************************************************/

  zdnn_tensor_desc h0_pre_tfrmd_desc, h0_tfrmd_desc;
  zdnn_ztensor h0;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0_pre_tfrmd_desc, num_dirs,
                                 num_batches, num_hidden);
  status = zdnn_generate_transformed_desc(&h0_pre_tfrmd_desc, &h0_tfrmd_desc);
  assert(status == ZDNN_OK);

  status =
      zdnn_init_ztensor_with_malloc(&h0_pre_tfrmd_desc, &h0_tfrmd_desc, &h0);
  assert(status == ZDNN_OK);

  uint64_t h0_data_size = num_batches * num_hidden * element_size;
  void *hidden_state_data = malloc(h0_data_size);

  status = zdnn_transform_ztensor(&h0, hidden_state_data);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create input weights zTensor
   * Resultant zTensor is concatenated
   ***********************************************************************/

  zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc;
  zdnn_ztensor weights;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc,
                                 num_dirs, num_features, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &weights_pre_tfrmd_desc, RNN_TYPE_GRU | USAGE_WEIGHTS | PREV_LAYER_NONE,
      &weights_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc,
                                         &weights_tfrmd_desc, &weights);
  assert(status == ZDNN_OK);

  uint64_t weights_data_size = num_features * num_hidden * element_size;
  void *weights_data_z = malloc(weights_data_size);
  void *weights_data_r = malloc(weights_data_size);
  void *weights_data_h = malloc(weights_data_size);

  status = zdnn_transform_ztensor(&weights, weights_data_z, weights_data_r,
                                  weights_data_h);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create biases zTensors
   * Resultant zTensors are concatenated
   ***********************************************************************/

  zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc;
  zdnn_ztensor biases;

  zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc,
                                 num_dirs, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &biases_pre_tfrmd_desc, RNN_TYPE_GRU | USAGE_BIASES | PREV_LAYER_NONE,
      &biases_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc,
                                         &biases_tfrmd_desc, &biases);
  assert(status == ZDNN_OK);

  uint64_t biases_data_size = num_hidden * element_size;
  void *biases_data_z = malloc(biases_data_size);
  void *biases_data_r = malloc(biases_data_size);
  void *biases_data_h = malloc(biases_data_size);

  status = zdnn_transform_ztensor(&biases, biases_data_z, biases_data_r,
                                  biases_data_h);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create hidden weights zTensor
   * Resultant zTensor is concatenated
   ***********************************************************************/

  zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc;
  zdnn_ztensor hidden_weights;

  zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc,
                                 num_dirs, num_hidden, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &hidden_weights_pre_tfrmd_desc,
      RNN_TYPE_GRU | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE,
      &hidden_weights_tfrmd_desc);
  assert(status == ZDNN_OK);
  status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc,
                                         &hidden_weights_tfrmd_desc,
                                         &hidden_weights);
  assert(status == ZDNN_OK);

  uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size;
  void *hidden_weights_data_z = malloc(hidden_weights_data_size);
  void *hidden_weights_data_r = malloc(hidden_weights_data_size);
  void *hidden_weights_data_h = malloc(hidden_weights_data_size);

  status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_z,
                                  hidden_weights_data_r, hidden_weights_data_h);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create hidden biases zTensors
   * Resultant zTensors are concatenated
   ***********************************************************************/

  zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc;
  zdnn_ztensor hidden_biases;

  zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc,
                                 num_dirs, num_hidden);
  status = zdnn_generate_transformed_desc_concatenated(
      &hidden_biases_pre_tfrmd_desc,
      RNN_TYPE_GRU | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE,
      &hidden_biases_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(
      &hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases);
  assert(status == ZDNN_OK);

  uint64_t hidden_biases_data_size = num_hidden * element_size;
  void *hidden_biases_data_z = malloc(hidden_biases_data_size);
  void *hidden_biases_data_r = malloc(hidden_biases_data_size);
  void *hidden_biases_data_h = malloc(hidden_biases_data_size);

  status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_z,
                                  hidden_biases_data_r, hidden_biases_data_h);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Create output zTensor
   ***********************************************************************/

  // get only the last timestep
  zdnn_tensor_desc hn_pre_tfrmd_desc, hn_tfrmd_desc;

  zdnn_ztensor hn_output_ztensor;

  zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc, 1, 1,
                                 num_batches, num_hidden);
  status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc);
  assert(status == ZDNN_OK);

  status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc,
                                         &hn_output_ztensor);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Call the AIU
   ***********************************************************************/

  void *work_area = NULL;

  status = zdnn_gru(&input, &h0, &weights, &biases, &hidden_weights,
                    &hidden_biases, dir, work_area, &hn_output_ztensor);
  assert(status == ZDNN_OK);

  /***********************************************************************
   * Output and Cleanup
   ***********************************************************************/

  uint64_t hn_data_size = num_batches * num_hidden * element_size;
  void *hn_output_data = malloc(hn_data_size);

  status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data);
  assert(status == ZDNN_OK);

  status = zdnn_free_ztensor_buffer(&input);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&h0);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&weights);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&biases);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hidden_weights);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hidden_biases);
  assert(status == ZDNN_OK);
  status = zdnn_free_ztensor_buffer(&hn_output_ztensor);
  assert(status == ZDNN_OK);

  free(input_data);
  free(hidden_state_data);
  free(weights_data_z);
  free(weights_data_r);
  free(weights_data_h);
  free(hidden_weights_data_z);
  free(hidden_weights_data_r);
  free(hidden_weights_data_h);
  free(biases_data_z);
  free(biases_data_r);
  free(biases_data_h);
  free(hidden_biases_data_z);
  free(hidden_biases_data_r);
  free(hidden_biases_data_h);
  free(hn_output_data);
}



Owner
International Business Machines
International Business Machines
Comments
  • add: help target to make file

    add: help target to make file

    Description

    Minor improvements in Makefile.

    Features

    • add help target to main Makefile
    • set default target .DEFAULT_GOAL for all

    Fixes

    • fix error for missing config.make to GNU make syntax

    Signed-off-by: Jan Polensky [email protected]

  • fix: remove non ASCII characters

    fix: remove non ASCII characters

    Description

    Remove non ASCII characters. See man asciifor more information.

    Fixes

    • remove non ASCII characters with focus on disclaimer and comments and licenses
    • call: find . -type f -exec sed -i 's/\xC2\xA0/ /g' {} \;
    • call: find . -type f -exec sed -i 's/[ \t]\+$//g' {} \;

    Signed-off-by: Jan Polensky [email protected]

  • [BUG] tests cannot be build

    [BUG] tests cannot be build


    name: tests cannot be build about: bug, tests title: [BUG] tests cannot be build labels: Bug, Needs Triage'

    Current Observation:

    On s390 with nnpa support the command make all -C tests creates an error.

        gcc  -I ../zdnn -I third_party/Unity/src -O3 -march=z14 -mzvector -Wall -std=gnu99 -fstack-protector-all  -O0 -g3  -funroll-loops "-Wall" -c -o obj/testDriver_zdnn_tanh.o testDriver_zdnn_tanh.c
        common_rnn.c: In function assert_bidir_output_adv:
        common_rnn.c:133:59: error: ELEMENTS_ALL undeclared (first use in this function); did you mean ELEMENTS_AIU?
          133 |   uint64_t num_elements = get_num_elements(concat_output, ELEMENTS_ALL);
              |                                                           ^~~~~~~~~~~~
              |                                                           ELEMENTS_AIU
        common_rnn.c:133:59: note: each undeclared identifier is reported only once for each function it appears in
        common_rnn.c: In function test_zdnn_api_lstm_gru:
        common_rnn.c:250:67: error: CONCAT_LSTM undeclared (first use in this function); did you mean CONCAT_USAGE?
          250 |         input_weights_shape, input_weights_layout, test_datatype, CONCAT_LSTM,
              |                                                                   ^~~~~~~~~~~
              |                                                                   CONCAT_USAGE
        common_rnn.c:272:67: error: CONCAT_GRU undeclared (first use in this function)
          272 |         input_weights_shape, input_weights_layout, test_datatype, CONCAT_GRU,
              |                                                                   ^~~~~~~~~~
        common_rnn.c:297:5: error: unknown type name zdnn_ztensor_concat_types
          297 |     zdnn_ztensor_concat_types output_concat = NO_CONCAT;
              |     ^~~~~~~~~~~~~~~~~~~~~~~~~
        common_rnn.c:301:23: error: CONCAT_BIDIR_OUTPUT undeclared (first use in this function)
          301 |       output_concat = CONCAT_BIDIR_OUTPUT;
              |                       ^~~~~~~~~~~~~~~~~~~
        make[1]: *** [Makefile:62: obj/common_rnn.o] Error 1
        make[1]: *** Waiting for unfinished jobs....
        make[1]: Leaving directory '/root/TOOLCHAIN_AIU_basic/zDNN/tests'
        make: *** [Makefile:28: test] Error 2
    

    Expected:

    The build of tests should work without any errors.

    Location:

    common_rnn.c

    Anything else:

  • Limit symbol check to global symbols

    Limit symbol check to global symbols

    This adjusts the sym_checker.awk script to only check symbols with GLOBAL binding.

    With recent Binutils we get a line for the local .init section symbol in the readelf output. In its current version the symbol checker is complaining about the presence of that symbol: 1: 0000000000002500 0 SECTION LOCAL DEFAULT 10 .init

    Signed-off-by: Andreas Krebbel [email protected]

  • Prepare v1.0.0

    Prepare v1.0.0

    Description

    • Bump version to v1.0.0
    • Update common_rnn.c to sync with intended v0.4.0 version.
    • Minor Samples fix

    Features

    No new features were added.

    Fixes

    Resolved issue with building tests #9. Bug was caused because common_rnn.c was not updated along with other files for v0.4.0.

  • Sync to 0.4.0

    Sync to 0.4.0

    Sync latest changes.

    • RNN Layer 2 Support
    • RNN Bi-directional Output Transformation Support
    • Removal of non-inclusive IT Terms
    • RNN Pre-check Support
    • Optimized zTensor reserved space
    • Transform optimizations
    • Minor bug fixes

    Signed-off-by: Nicholas Marion [email protected]

  • Fix Makefile dependencies

    Fix Makefile dependencies

    Running make several times still relinks libzdnn and does perform the symbol checking even if nothing changed. This is the result of not having all dependencies written out explicitely in the Makefile.

    In order to fix this I had to:

    • Add a file target for the symbol check 'symcheck'.

    • Rename zdnn_preprocessed.h to zdnn.i to prevent it from being picked up by the *.h wildcard in the zdnn Makefile.

    • Add an explicit rule for the creation of zdnn.i.

    • Change sym_checker.awk so that it is able to deal with just "zdnn.h" instead of "../zdnn/zdnn.h". Right now the leading " is only stripped if there is also a / in the file path.

    • Put the symlink creation for the sonames into explicit rules.

    Signed-off-by: Andreas Krebbel [email protected]

  • Fix initialization of CFLAGS_INIT

    Fix initialization of CFLAGS_INIT

    Fix a typo which currently prevents us from overriding CFLAGS_INIT from outside as it is required for package building.

    Signed-off-by: Andreas Krebbel [email protected]

  • Prepare for v0.3.1

    Prepare for v0.3.1

    Prepare for v0.3.1 release

    1. Currently a target attribute is used to build only the functions required for initialization without -march=z14. This however leads to problems when building with fortify source. The function wrappers for the fortify source feature are marked always_inline but GCC would have to reject inlining into functions with a target attribute.
    • With that commit the init routines are moved to a separate source file: zdnn_init.c which is built with a different set of options.
    1. Add CONTRIBUTION.md file
    2. Minor typo in API doc.

    Signed-off-by: Andreas Krebbel [email protected] Signed-off-by: Nicholas Marion [email protected]

oneAPI Deep Neural Network Library (oneDNN)

oneAPI Deep Neural Network Library (oneDNN) This software was previously known as Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-

Dec 4, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

Nov 24, 2022
Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution
Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution

DeepC: Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution This code uses FSRCNN algorithm t

Nov 28, 2022
Plaidml - PlaidML is a framework for making deep learning work everywhere.
Plaidml - PlaidML is a framework for making deep learning work everywhere.

A platform for making deep learning work everywhere. Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | R

Dec 4, 2022
A header-only C++ library for deep neural networks

MiniDNN MiniDNN is a C++ library that implements a number of popular deep neural network (DNN) models. It has a mini codebase but is fully functional

Nov 26, 2022
copc-lib provides an easy-to-use interface for reading and creating Cloud Optimized Point Clouds

copc-lib copc-lib is a library which provides an easy-to-use reader and writer interface for COPC point clouds. This project provides a complete inter

Nov 16, 2022
Low dependency(C++11 STL only), good portability, header-only, deep neural networks for embedded
Low dependency(C++11 STL only), good portability, header-only, deep neural networks for embedded

LKYDeepNN LKYDeepNN 可訓練的深度類神經網路 (Deep Neural Network) 函式庫。 輕量,核心部份只依賴 C++11 標準函式庫,低相依性、好移植,方便在嵌入式系統上使用。 Class diagram 附有訓練視覺化 demo 程式 訓練視覺化程式以 OpenCV

Nov 7, 2022
Deep Learning in C Programming Language. Provides an easy way to create and train ANNs.
Deep Learning in C Programming Language. Provides an easy way to create and train ANNs.

cDNN is a Deep Learning Library written in C Programming Language. cDNN provides functions that can be used to create Artificial Neural Networks (ANN)

Oct 27, 2022
ESP32/8266 Arduino/PlatformIO library that painlessly enables incredibly fast re-connect to the previous wireless network after deep sleep.

WiFiQuick ESP32/8266 Platformio/Arduino library that painlessly enables incredibly fast re-connect to the previous wireless network after deep sleep.

Apr 3, 2022
A GPU (CUDA) based Artificial Neural Network library
A GPU (CUDA) based Artificial Neural Network library

Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart

Sep 27, 2022
simple neural network library in ANSI C
simple neural network library in ANSI C

Genann Genann is a minimal, well-tested library for training and using feedforward artificial neural networks (ANN) in C. Its primary focus is on bein

Nov 28, 2022
Cranium - 🤖 A portable, header-only, artificial neural network library written in C99
Cranium - 🤖   A portable, header-only, artificial neural network library written in C99

Cranium is a portable, header-only, feedforward artificial neural network library written in vanilla C99. It supports fully-connected networks of arbi

Dec 4, 2022
ORB-SLAM3-Monodepth is an extended version of ORB-SLAM3 that utilizes a deep monocular depth estimation network
ORB-SLAM3-Monodepth is an extended version of ORB-SLAM3 that utilizes a deep monocular depth estimation network

ORB_SLAM3_Monodepth Introduction This repository was forked from [ORB-SLAM3] (https://github.com/UZ-SLAMLab/ORB_SLAM3). ORB-SLAM3-Monodepth is an exte

Nov 8, 2022
A easy-to-use image processing library accelerated with CUDA on GPU.

gpucv Have you used OpenCV on your CPU, and wanted to run it on GPU. Did you try installing OpenCV and get frustrated with its installation. Fret not

Aug 14, 2021
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

Nov 7, 2022
DyNet: The Dynamic Neural Network Toolkit
DyNet: The Dynamic Neural Network Toolkit

The Dynamic Neural Network Toolkit General Installation C++ Python Getting Started Citing Releases and Contributing General DyNet is a neural network

Dec 3, 2022
ffcnn is a cnn neural network inference framework, written in 600 lines C language.

+----------------------------+ ffcnn 卷积神经网络前向推理库 +----------------------------+ ffcnn 是一个 c 语言编写的卷积神经网络前向推理库 只用了 500 多行代码就实现了完整的 yolov3、yolo-fastes

Oct 4, 2022