A flexible, high-performance serving system for machine learning models

Build Test

XGBoost Serving

This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more information about TensorFlow Serving, switch to the master branch or visit the TensorFlow Serving website.


XGBoost Serving is a flexible, high-performance serving system for XGBoost && FM models, designed for production environments. It deals with the inference aspect of XGBoost && FM models, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. XGBoost Serving derives from TensorFlow Serving and is used widely inside iQIYI.

To note a few features:

  • Can serve multiple models, or multiple versions of the same model simultaneously
  • Exposes gRPC inference endpoints
  • Allows deployment of new model versions without changing any client code
  • Supports canarying new versions and A/B testing experimental models
  • Adds minimal latency to inference time due to efficient, low-overhead implementation
  • Supports XGBoost servables, XGBoost && FM servables and XGBoost && alphaFM_Softmax servables
  • Supports computation latency distribution statistics

Documentation

Set up

The easiest and most straight-forward way of building and using XGBoost Serving is with Docker images. We highly recommend this route unless you have specific needs that are not addressed by running in a container.

Use

Export your XGBoost && FM model

In order to serve a XGBoost && FM model, simply export your XGBoot model, leaf mapping and FM model.

Please refer to Export XGBoost && FM model for details about the models's specification and how to export XGBoost && FM model.

Configure and Use XGBoost Serving

Extend

XGBoost Serving derives from TensorFlow Serving and thanks to Tensorflow Serving's highly modular architecture. You can use some parts individually and/or extend it to serve new use cases.

Contribute

If you'd like to contribute to XGBoost Serving, be sure to review the contribution guidelines.

Feedback and Getting involved

  • Report bugs, ask questions or give suggestions by Github Issues
Owner
iQIYI
hosting open source projects in iQIYI, a provider of high-quality video and entertainment services in China
iQIYI
Similar Resources

Forward - A library for high performance deep learning inference on NVIDIA GPUs

 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021

A library for high performance deep learning inference on NVIDIA GPUs.

A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Dec 17, 2022

PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

Dec 29, 2022

DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++

DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++

DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provides an python interface to better interface with machine-learning toolkits. Deep RTS can process the game with over 6 000 000 steps per second and 2 000 000 steps when rendering graphics. In comparison to other solutions, such as StarCraft, this is over 15 000% faster simulation time running on Intel i7-8700k with Nvidia RTX 2080 TI.

Dec 19, 2022

Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI

Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI

High-Performance-Computing-Experiments Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI 实验结

Nov 27, 2021

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

Jan 4, 2023

In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.

In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.

Pico-Motion-Recognition This Repository has the code used on the 2 parts tutorial TinyML - Motion Recognition Using Raspberry Pi Pico The first part i

Nov 3, 2022

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

Dec 31, 2022

Deploying Deep Learning Models in C++: BERT Language Model

 Deploying Deep Learning Models in C++: BERT Language Model

This repository show the code to deploy a deep learning model serialized and running in C++ backend.

Nov 14, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Dec 30, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

Dec 31, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Jan 5, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Dec 23, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Dec 20, 2022
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK Communication Channels GitHub issues: Feature requests, bugs etc Nod.ai SHARK Discord server: Real time discussions with the nod.ai team and oth

Jan 1, 2023
Turi Create simplifies the development of custom machine learning models.
Turi Create simplifies the development of custom machine learning models.

Quick Links: Installation | Documentation Turi Create Turi Create simplifies the development of custom machine learning models. You don't have to be a

Jan 1, 2023
MozoLM: A language model (LM) serving library

A language model serving library, with middleware functionality including mixing of probabilities from disparate base language model types and tokenizations along with RPC client/server interactions.

Dec 1, 2022
faiss serving :)

faiss-server faiss-server provides gRPC services to for similarity search using faiss. It is written in C++ and now supports only CPU environments. In

Dec 9, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022