In a world that is being taken over by machine learning and deep learning algorithms, you do need faster machines to crunch the humongous data as well. While most “software engineers” get away with using a laptop, in case you want to build your in-house AI capabilities, it is a must for you to have a dedicated workstation.
Getting it built for you by a service provider might end up being considerably costlier than assembling one yourself, and that is why we decided to deep dive into the modus operandi for building an ML/DL workstation in 2019.
What the beast should hold
We are calling our workstation “the beast” because of its immense computation capabilities. Here is the configuration.
GPU- 4 X NVIDIA Tesla V100 Volta GPU Accelerator 32GB Graphics Card
RAM- 4 X Supermicro – 128 GB Registered DDR4-2666 Memory
Processor- Intel Xeon E5-2698 v4 2.2 GHz with turbo-boost 3.60 GHz (20-Cores and 50 Mb Smart Cache)
GPU Cooling unit- ARCTIC Accelero Xtreme+ II VGA Cooler
Power Supply- CORSAIR AX1600i, 1600 Watt, 80+ Titanium Certified, Fully Modular – Digital Power Supply
Motherboard- Supermicro – X10SRA ATX LGA2011-3 Motherboard
CPU cooler- ASUS ROG Ryujin 360 RGB AIO Liquid CPU Cooler 360mm Radiator (Three 120mm 4-pin Noctua iPPC PWM Fans)
Cabinet- Thermaltake Level 20 ATX Full Tower Case
Memory- Intel SSD DC P4510 SERIES (4.0TB, 2.5in PCIe 3.1 x4, 3D2, TLC)
Decisions while choosing the hardware
Several things were taken into account while choosing the hardware configuration of this system. We shall discuss them, one by one.
GPU Let’s talk about the most important unit of the system and why we chose it. NVIDIA Tesla V100 is the latest and most advanced data-centre GPU ever to be built by NVIDIA. Its 32GB stick helps data scientists and ML engineers spend less time on each iteration of model changes so that they can focus more time on changing the model and running it again so as to make better breakthroughs in AI. In case you are crazy about the specs sheet, let me tell you, this one comes with 640 tensor cores that deliver up to a humongous 125 teraflops of deep learning performance. It is also to be noted that our recommended GPU configuration of 4 V100s in SLI is also used by NVIDIA’s own custom workstation called the DGX STATION.
CPU We chose a single CPU based model for our system since our computations will mainly run on the GPU itself, and a 20 core Intel Xeon processor with 40 threads are enough for any computation that might be CPU intensive. A dual CPU model does not boost performance but only takes care of tasks which need even more cores at the same time. In case you do need a dual CPU setup, it is recommended that you make two workstations instead. Workloads don’t always scale in the way one might expect with dual CPUs, and it is always better to use a single one with higher cores instead.
RAM Since many ML/DL based tasks are on images or videos, it is important to have enough memory to load such huge datasets. That is the reason we went with the highest possible configuration of 128GB X 4. Depending on your needs and the type of datasets that you’d be handling, you could go for a 128GB, or a 256GB configuration too. You could leave a few memory slots empty as well since RAM up-gradation is simple and cost-effective.
Power Supply While I recommend the Corsair Ax1600i, you could actually go with any power supply unit that generates at least 1500W power since this beast of a workstation is power hungry and needs 1500W at its peak.
Motherboard The motherboard has been decided after keeping in mind its support for-
a) Intel Xeon Processor.
b) A high amount of DDR4 RAM.
c) Tesla V100 CPUs in SLI.
Memory- Gone are the days of hard disks, and SSD is the new form of memory. Hence we decided to go with the best in line Intel SSD with 4Gb of storage. Our unit supports easy expansion so you could add more memory modules as you need.
Cooling units and cabinet- Although seemingly unimportant, running a 1500W machine has its own problems, and it is a must that you install cooling units separately for both the GPU and the CPU so that they are always in their optimum temperature. In case you see temperature hikes, you can get even better cooling units. The case has been chosen as it is large enough to hold so many components and you can go with fancier cases as long as it is large enough for the components and the 4GPU SLI set.
Advantages and disadvantages
There are always two sides of the same coin and building your own workstation to work on AI projects has its own ups and downs too.
It would cost you comparatively less if you buy the parts separately and assemble them yourself. Buying a custom built workstation by a service provider would cost anywhere between 2 to 3 times higher what it would to putting together one yourself.
When going with a custom built one, you would have to give in to some software and hardware restrictions, whereas when you are putting together one by yourself, you are completely free to build it as you feel like.
When it comes to a workstation, there are always chances of up-gradation. If you get one built for you, you’ll be paying a big price every time you need a change or modification.
In case something goes wrong all of a sudden, you have to find out which part is faulty and get it repaired, or changed, depending on the warranty details. It is recommended that you always have a backup copy of data stored off-site in case of any part malfunction or accident.
Having an in-house high-cost AI workstation means regular maintenance, and that is something that you will have to undertake on your own..
All software and hardware updates will have to be done by your team or you’ll need to hire a professional when you need to.
Unless you have someone with past experience, it is good to hire someone for the task, since putting everything together will need extra cables, thermal paste, and some hacks as well so as to make sure that everything works well and there is proper heat dissipation.
In case you are going to train ML or DL models, it is highly recommended that you install Ubuntu and not Windows. Depending on what sort of projects you are working on, you will also need to install Python, R and different modules such as Tensorflow and Scikit learn to help you in the day to day work.
Cloud Services for ML/DL
While you are working on ML/DL models, you will definitely be needing a lot of data to train models or decide on which algorithm to use. JobsPikr, DataStock, and Google Dataset Search, are some great cloud-based services that might come in handy. In case you want to train your models on web-data, you could even go for DaaS providers like PromptCloud.
As a final statement, I would say the cost offset far outweighs the disadvantages and unless you are a big company needing multiple AI workstations with maintenance agreements, you should build your own AI workstation. Building your own workstation and maintaining it will not only save you a huge amount of money that you can use elsewhere in your business but even bring you closer to the hardware that you use so that you understand more of how ML or DL algorithms use GPUs to run faster and gain an holistic understanding.