A tour of the Indian Institute of Science, Bangalore

Last week I had an opportunity to visit the Indian Institute of Science in Bangalore for 5 days, and I must say after the trip, I absolutely regret flunking KVPY! For those of you who don't know, KVPY is an add-on test given by students during their amazing days of JEE Preparation. The point is, I really loved the campus, and the people I networked with. I was present there to attend a Workshop of High Performance Computing, and other related fields to it.

A big thank you to Swapnil Parekh who recommended and invited me to this course. At first, I was really hesitant to participate, since there was nothing in that course that could directly be of use to me, as a student, or as a developer with my current Tech Stack experience. Still, you never know what good might come off it and after all, it's something new that I will learn. It could be a great experience. Yeah I know the reasoning's not pretty convincing, but I went for it anyway.

My review of Bangalore wouldn't really be negative. It's a fairly polite city that could use some more Hindi speaking people. I spent really less time exploring Bangalore as compared to exploring the campus of IISc. I must say, it's a fairly large campus, a little bigger than my home campus, NIT Hamirpur. I had a lot of fun cycling through its many connected lanes!

Let's talk business

Obviously, most of my time was spent on the purpose of the visit, learning about High Performance Computing! I am planning on writing seperate blogs for each of them in detail, so here's an overview.

Parallel Architectures

The course began with a great lecture on Parallel Architecture and the need for it. Since the workshop was open for people from all industries, the introduction was mainly a general topic. I could understand most of it because of my recent Computer Organization course in the previous semester. We were taught mainly about Instruction Level Parallelism, Superscalar Parallelism, Memory Management, etc.
I think these topics would mainly interest my peers from IIIT and NIT since these are all the topics still fresh in our minds. I saw the practical use case of Flynn's Classification Schema in Multiprocessor Architecture. Well, multiprocessors do seem really fun and exciting until you realize that all the parallel processing would go for a toss if you have a lot of inter-related computations on the data. In the hands-on session, there were certain case questions when I thought that it could be better if we could just spare the trouble and simply do it on one processor!
Anyway, AI/ML engineers would know the advantage of so many processors and GPUs. With a lot of data, come a lot of GPU requirements. Not a big word, GPUs are basically used to outsource computations from the main CPU. They are processing units, which are used when we have a lot of data to process.
If you have m different processors and a loop that is running n times, you could parallelize it and from a complexity of O(n), you now have a complexity of O(n/m). More the number of processors, lesser will be the time for computations.

Now the question is, how do we get away by simply using the phrase parallelizing the code. How do you do that?

Parallelization Principles

Once we had studied the architecture, it was time to learn how to structure our algorithm and code in such a way so that we could maximise processor utilisation. The upcoming days also provided a hands-on session to us where we could run our codes on the clusters of the SERC Department of IISc. In certain cases, where our computation is always using a predictable section of data, it is indeed satisfying to see all of the n computations getting executed seperately in one go. But, consider a complex situation such as sorting, and things start to take an interesting turn.

OpenMP

This was the lecture where I learnt most about processors and cores. What if we could make use of multiple threads on just one processor, using multiple cores of our PCs? Most of the applications that we use do make the use of Quad Core or Octa Core Processors, but our naive coded programs usually run on just one core, on a single thread unless we ask the compiler to graciously do it using a library called OpenMP. Processor will have a shared memory (unless otherwise specified) among its many cores and multiple threads updating the same piece of data will give rise to a race condition. So we will need to define a critical section. Yes, a topic borrowed from my Operating System's course. This course was like a flashback test of my previous semesters on many levels.
I think I had the best time during the hands-on session using this library.

MPI

Now comes the real deal. A processor, or a cluster's node can have limited number of cores. But if we are talking about supercomputers, (and also clusters), they have multiple processors. In architecture, groups of processors or cores form a node, and each node has an associated memory with it. If we want to use more processors, they will have a seperate memory. So if a data is updated by one processor, the other processor which might need that updated data cannot access it. Unless, we send that data from one processor to another processor using a network which connects them.
That, is why we need the Message Passing Interface (MPI). To specify the data that we need to send and the data that we want to receive. It really has a lot of interesting applications, and people who are coding code to be run on multiple processors, that's the thing for you.

OpenACC

We can also use GPUs for accelerating our computations. OpenACC is a library used just for that. With multiple cores using the same memory, it is kind of like OpenMP, except OpenACC directs the compiler to outsource the computations to the GPU attached. I think this topic would be very important for AI/ML enthusiasts, while training models, or while writing internal code to train models.

Introduction to Big Data

Some brief introduction to Big Data and ML topics and how we can use all the above to get better results spurred my inner curiosity towards this amazing field. Anyway, if you're reading this and you have studied Machine Learning, there's not really anything here that you won't know about. And if you're new to the field, there's still I can't tell you much about it :)

Well, that concluded the Workshop, and we finally left the beautiful city of Bangalore with a ton of knowledge and experience with us.

I would like to thank the Supercomputing Education and Research Centre(SERC) for conducting this workshop, and would also like to thank the professors and industry personnel who contributed their time to this. A special thanks to Professor Aditya, Professor Akhila, Professor Yogesh Simmhan and Professor Govindarajan for this amazing workshop.

We had a great time!

How Kafka replaced Zookeeper with the (K)Raft algorithm?

Back in 2021, when I first came across Kafka, I remember the DevOps engineer in my team using terms like Zookeeper, broker configs, etc on our team standup calls. I remember not caring about those terms, and simply focusing on learning about the producer, partitions, topics and consumer groups, and how they could be used in the product my team was developing. While platforms like Kafka were built to abstract certain aspects of distributed computing (replication, consistency) while storing & processing logs, it's a pity how so many of us miss out on knowing the amazing engineering that went behind to build the different parts of a platform such as this one. 4 years later, I'm hungry enough to reverse engineer one of my favourite distributed platforms - Kafka! What did the Zookeeper do? To quote the 1st Kafka paper from 2011, Kafka uses Zookeeper for the following tasks: (1) detecting the addition and the removal of brokers and consumers, (2) triggering a rebalance ...

Find a topic