An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA

An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit.

This work leverages the open-source ASIC placement framework, DREAMPlace, to build an open-source placement framework for FPGAs that is based on the elfPlace algorithm. On the ISPD'2016 benchmark suite, DREAMPlaceFPGA is 5.4× faster for global placement and 1.8× faster for overall placement than elfPlace (CPU), with similar quality of results. In addition, DREAMPlaceFPGA outperforms elfPlace (GPU) by 19% for global placement. For more details, please refer to the 'paper'.

Among the various placement stages: global placement (GP), legalization (LG), and detailed placement (DP), only the global placement stage is accelerated using DREAMPlaceFPGA. The elfPlace (CPU) binary is used to run the legalization and detailed placement stages. Currently, DREAMPlaceFPGA only supports the ISPD'2016 benchmarks, which employs the Xilinx Ultrascale architecture. DREAMPlaceFPGA runs on both CPU and GPU. If installed on a machine without GPU, multi-threaded CPU support is available.

  • DREAMPlaceFPGA Placement Flow

Developers

  • Rachel Selina Rajarathnam, UTDA, ECE Department, The University of Texas at Austin
  • Zixuan Jiang, UTDA, ECE Department, The University of Texas at Austin

External Dependencies

  • Python 2.7 or Python 3.5/3.6/3.7

  • CMake version 3.8.2 or later

  • Pytorch 1.0.0

    • Other version around 1.0.0 may also work, but not tested
  • GCC

    • Recommend GCC 5.1 or later.
    • Other compilers may also work, but not tested.
  • cmdline

    • a command line parser for C++
  • Flex

    • lexical analyzer employed in the bookshelf parser
  • Bison

    • parser generator employed in the bookshelf parser
  • Boost

    • Need to install and visible for linking
  • Limbo

    • Integrated as a submodule: the bookshelf parser is modified for FPGAs.
  • Flute

    • Integrated as a submodule
  • CUB

    • Integrated as a git submodule
  • munkres-cpp

    • Integrated as a git submodule
  • CUDA 9.1 or later (Optional)

    • If installed and found, GPU acceleration will be enabled.
    • Otherwise, only CPU implementation is enabled.
  • GPU architecture compatibility 6.0 or later (Optional)

    • Code has been tested on GPUs with compute compatibility 6.0, 7.0, and 7.5.
    • Please check the compatibility of the GPU devices.
    • The default compilation target is compatibility 6.0. This is the minimum requirement and lower compatibility is not supported for the GPU feature.
    • For compatibility 7.0, it is necessary to set the CMAKE_CUDA_FLAGS to -gencode=arch=compute_70,code=sm_70.
  • Cairo (Optional)

    • If installed and found, the plotting functions will be faster by using C/C++ implementation.
    • Otherwise, python implementation is used.

Cloning the repository

To pull git submodules in the root directory

git submodule init
git submodule update

Or alternatively, pull all the submodules when cloning the repository.

git clone --recursive https://github.com/rachelselinar/DREAMPlaceFPGA.git

Build Instructions

To install Python dependency

At the root directory:

pip install -r requirements.txt 

To Build

At the root directory,

mkdir build 
cd build 
cmake .. -DCMAKE_INSTALL_PREFIX=your_install_path
make 
make install

Third party submodules are automatically built except for Boost.

To clean, go to the root directory.

rm -r build

Cmake Options

Here are the available options for CMake.

  • CMAKE_INSTALL_PREFIX: installation directory
    • Example cmake -DCMAKE_INSTALL_PREFIX=path/to/your/directory
  • CMAKE_CUDA_FLAGS: custom string for NVCC (default -gencode=arch=compute_60,code=sm_60)
    • Example cmake -DCMAKE_CUDA_FLAGS=-gencode=arch=compute_60,code=sm_60
  • CMAKE_CXX_ABI: 0|1 for the value of _GLIBCXX_USE_CXX11_ABI for C++ compiler, default is 0.
    • Example cmake -DCMAKE_CXX_ABI=0
    • It must be consistent with the _GLIBCXX_USE_CXX11_ABI for compling all the C++ dependencies, such as Boost and PyTorch.
    • PyTorch in default is compiled with _GLIBCXX_USE_CXX11_ABI=0, but in a customized PyTorch environment, it might be compiled with _GLIBCXX_USE_CXX11_ABI=1.

Sample Benchmarks

DREAMPlaceFPGA only supports designs for Xilinx Ultrascale Architecture in bookshelf format with fixed IOs. Refer to ISPD'2016 contest for more information.

Four sample designs are included in benchmarks directory.

Running DREAMPlaceFPGA

Before running, ensure that all python dependent packages have been installed. Go to the install directory and run with JSON configuration file.

python dreamplacefpga/Placer.py test/FPGA-example1.json

Unitests for some of the pytorch operators are provided. To run:

python unitest/ops/hpwl_unitest.py

JSON Configurations

The most frequently used options in the JSON file are listed below. For the complete list of available options, please refer to paramsFPGA.json.

JSON Parameter Default Description
aux_input required for bookshelf input .aux file
gpu 1 enable GPU acceleration or run on CPU
num_threads 8 number of CPU threads
num_bins_x 512 number of bins in horizontal direction
num_bins_y 512 number of bins in vertical direction
global_place_stages required global placement configuration of each stage, a dictionary of {"num_bins_x", "num_bins_y", "iteration", "learning_rate"}, learning_rate is relative to bin size
density_weight 1.0 initial weight of density cost
gamma 5.0 initial coefficient for log-sum-exp and weighted-average wirelength
random_seed 1000 random seed
scale_factor 0.0 scale factor to avoid numerical overflow; 0.0 means not set
result_dir results result directory for output
global_place_flag 1 whether to run global placement
legalize_and_detailed_place_flag 1 whether to run legalization and detailed placement using elfPlace
dtype float32 data type, float32 (or) float64
plot_flag 0 whether to plot solution or not (Increases runtime)
deterministic_flag 0 Ensures reproducible run-to-run results on GPU (May increase runtime)

Bug Report

Please report bugs to rachelselina dot r at utexas dot edu.

Publication(s)

  • Rachel Selina Rajarathnam, Mohamed Baker Alawieh, Zixuan Jiang, Mahesh A. Iyer, and David Z. Pan, "DREAMPlaceFPGA: An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit", IEEE/ACM Asian and South Pacific Design Automation Conference (ASP-DAC), Jan 17-20, 2022 (accepted)

Copyright

This software is released under BSD 3-Clause "New" or "Revised" License. Please refer to LICENSE for details.

Owner
Similar Resources

Implementation of "An Analytical Solution to the IMU Initialization Problem for Visual-Inertial Systems"

An Analytical Solution to the IMU Initialization Problem for Visual-Inertial Systems Implementation of "An Analytical Solution to the IMU Initializati

Nov 23, 2022

SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK Communication Channels GitHub issues: Feature requests, bugs etc Nod.ai SHARK Discord server: Real time discussions with the nod.ai team and oth

Jan 1, 2023

Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition.

Gesture Recognition Toolkit (GRT) The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for re

Dec 29, 2022

A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022

Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Dec 30, 2022

Multi-Scale Representation Learning on Proteins

Multi-Scale Representation Learning on Proteins (Under Construction and Subject to Change) Pending: Update links for dataset. This is the official PyT

Dec 12, 2022

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Dec 30, 2022

CubbyDNN - Deep learning framework using C++17 in a single header file

CubbyDNN - Deep learning framework using C++17 in a single header file

CubbyDNN CubbyDNN is C++17 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and I

Aug 16, 2022

Lite.AI.ToolKit 🚀🚀🌟: A lite C++ toolkit of awesome AI models such as RobustVideoMatting🔥, YOLOX🔥, YOLOP🔥 etc.

Lite.AI.ToolKit 🚀🚀🌟:  A lite C++ toolkit of awesome AI models such as RobustVideoMatting🔥, YOLOX🔥, YOLOP🔥 etc.

Lite.AI.ToolKit 🚀 🚀 🌟 : A lite C++ toolkit of awesome AI models which contains 70+ models now. It's a collection of personal interests. Such as RVM, YOLOX, YOLOP, YOLOR, YoloV5, DeepLabV3, ArcFace, etc.

Jan 9, 2023
Comments
  • The

    The "make" errors with different pytorch versions

    Hi Rachel, I met the errors with the "make" command in the build folder. The log is shown as follows.

    image

    I guess the errors comes from the CUDA and pytorch versions. I installed them by using the official command. It is shown as follows. I used the python version of 3.6. The OS is ubuntu 18.04. image

    Could you please check and provide the docker version in the future? Hope you can reply.

  • Support `torch==1.13`

    Support `torch==1.13`

    pytorch changed a part of their dispatch API recently (see https://github.com/limbo018/DREAMPlace/pull/82) so some changes needed to be made in order to support torch==1.13 and thereby take advantage of newer CUDA gencodes. With these changes I can build in this env

    (dreamplaceasic) [email protected]:~/dev_projects/DREAMPlace$ pip freeze
    torch==1.13.0.dev20220812
    torchaudio==0.13.0.dev20220812
    torchtext==0.14.0.dev20220812
    torchvision==0.14.0.dev20220812
    
    (dreamplaceasic) [email protected]:~/dev_projects/DREAMPlace$ nvidia-smi
    Fri Aug 12 17:06:44 2022       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  Off  | 00000000:06:00.0  On |                  N/A |
    |  0%   30C    P8    35W / 320W |    961MiB / 10240MiB |      1%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A      1463      G   /usr/lib/xorg/Xorg                525MiB |
    |    0   N/A  N/A      1940      G   ...ome-remote-desktop-daemon        3MiB |
    |    0   N/A  N/A      4683      G   gnome-control-center               67MiB |
    |    0   N/A  N/A    166784      G   /usr/bin/gnome-shell              142MiB |
    |    0   N/A  N/A    171137      G   ...528177628034215241,131072      219MiB |
    +-----------------------------------------------------------------------------+
    

    and tests pass:

    
    PYTHONPATH=install python dreamplacefpga/Placer.py test/FPGA-example1.json
    
    [INFO   ] DREAMPlaceFPGA - iter:  658, HPWL 8.925862E+03, Overflow [1.105E-01, 1.984E-02, 0.000E+00, 0.000E+00], time 1.467ms
    [INFO   ] DREAMPlaceFPGA - iter:  659, HPWL 8.772681E+03, Overflow [1.047E-01, 1.883E-02, 0.000E+00, 0.000E+00], time 1.494ms
    [INFO   ] DREAMPlaceFPGA - iter:  660, HPWL 8.994490E+03, Overflow [1.046E-01, 1.779E-02, 0.000E+00, 0.000E+00], time 1.464ms
    [INFO   ] DREAMPlaceFPGA - iter:  661, HPWL 8.895871E+03, Overflow [1.008E-01, 1.673E-02, 0.000E+00, 0.000E+00], time 1.992ms
    [INFO   ] DREAMPlaceFPGA - iter:  662, HPWL 8.828119E+03, Overflow [9.929E-02, 1.630E-02, 0.000E+00, 0.000E+00], time 1.459ms
    [INFO   ] DREAMPlaceFPGA - iter:  663, HPWL 8.795461E+03, Overflow [9.738E-02, 1.653E-02, 0.000E+00, 0.000E+00], time 1.453ms
    [INFO   ] DREAMPlaceFPGA - iter:  664, HPWL 8.795823E+03, Overflow [9.447E-02, 1.617E-02, 0.000E+00, 0.000E+00], time 1.484ms
    [INFO   ] DREAMPlaceFPGA - Lgamma stopping criteria: 664 > 100 and (( OVFL: 0.0944678 < 0.1; 0.0161708 < 0.1; 0 < 0.2; 0 < 0.2 and HPWL 8795.82 > 8795.46 ) or 1.61905 < 1.0) and DSP/RAM block legal iter 90 >= 5
    [INFO   ] DREAMPlaceFPGA - Global Placement completed in 9.24 seconds
    [INFO   ] DREAMPlaceFPGA - write placement solution to results/design/design.gp.pl took 0.005 seconds
    [INFO   ] DREAMPlaceFPGA - Legalization and Detailed Placement run using elfPlace (CPU): ./thirdparty/elfPlace_LG_DP --aux benchmarks/FPGA-example1/design.aux --numThreads 1 --pl results/design/design_final.pl
    [INF 2022-08-12 17:53:08    0.00 sec]  ----- Command-Line Options -----
    [INF 2022-08-12 17:53:08    0.00 sec]  numThreads = 1
    [INF 2022-08-12 17:53:08    0.00 sec]  --------------------------------
    [INF 2022-08-12 17:53:08    0.00 sec]  Parsing file benchmarks/FPGA-example1/design.aux
    [INF 2022-08-12 17:53:08    0.00 sec]  Parsing file benchmarks/FPGA-example1/design.lib
    [INF 2022-08-12 17:53:08    0.00 sec]  Parsing file benchmarks/FPGA-example1/design.scl
    [INF 2022-08-12 17:53:08    0.01 sec]  Parsing file benchmarks/FPGA-example1/design.nodes
    [INF 2022-08-12 17:53:08    0.01 sec]  Parsing file benchmarks/FPGA-example1/design.pl
    [INF 2022-08-12 17:53:08    0.01 sec]  Parsing file benchmarks/FPGA-example1/design.nets
    [INF 2022-08-12 17:53:08    0.02 sec]  GP instance stddev = 0.18, trunc = 2.50
    [INF 2022-08-12 17:53:08    0.02 sec]  Import placement from file gp.pl
    [INF 2022-08-12 17:53:09    1.72 sec]  Export solution to file results/design/design_final.pl
    [INFO   ] DREAMPlaceFPGA - Legalization and detailed placement completed in 1.729 seconds
    [INFO   ] DREAMPlaceFPGA - Completed Placement in 11.093 seconds
    
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

The Microsoft Cognitive Toolkit is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.

Jan 6, 2023
Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"

Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"

Dec 25, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Jan 3, 2023
[NeurIPS 2021 Spotlight] Learning to Delegate for Large-scale Vehicle Routing

Learning to Delegate for Large-scale Vehicle Routing This directory contains the code, data, and model for our NeurIPS 2021 Spotlight paper Learning t

Dec 24, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Dec 30, 2022
Insight Toolkit (ITK) is an open-source, cross-platform toolkit for N-dimensional scientific image processing, segmentation, and registration
 Insight Toolkit (ITK) is an open-source, cross-platform toolkit for N-dimensional scientific image processing, segmentation, and registration

ITK: The Insight Toolkit C++ Python Linux macOS Windows Linux (Code coverage) Links Homepage Download Discussion Software Guide Help Examples Issue tr

Dec 26, 2022
This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.
This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

Jun 27, 2021
FG-Net: Fast Large-Scale LiDAR Point Clouds Understanding Network Leveraging Correlated Feature Mining and Geometric-Aware Modelling
FG-Net: Fast Large-Scale LiDAR Point Clouds Understanding Network Leveraging Correlated Feature Mining and Geometric-Aware Modelling

FG-Net: Fast Large-Scale LiDAR Point Clouds Understanding Network Leveraging Correlated Feature Mining and Geometric-Aware Modelling Comparisons of Running Time of Our Method with SOTA methods RandLA and KPConv:

Dec 28, 2022
Square Root Bundle Adjustment for Large-Scale Reconstruction
Square Root Bundle Adjustment for Large-Scale Reconstruction

Square Root Bundle Adjustment for Large-Scale Reconstruction

Dec 20, 2022