A Machine Learning Engineer who needs to figure out distributions of features to create better models, or a Platform Engineer who needs to monitor the platform for metrics like requests per minute, needs to draw and understand graphs.

Knowing what graph works in which situation can make it easier to depict stories through graphs. Today there are so many graphs out there, selecting one can become an overwhelming task.

The goal of this article is to understand **how based on specific type of data we can choose a specific type of graph and what information we can infer from that…**

Matrix multiplication is the one of the most fundamental operation that most of the machine learning algorithms rely on. Knowing the working of matrix multiplication in a distributed system provides important insights on understanding the cost of our algorithms. Google’s PageRank algorithm is also based on repeated multiplication of matrices (matrix and a vector) to reach convergence (big sparse matrices). **In this article we will understand how map reduce is used to multiply matrices that are so big that those don’t even fit on a single machine**. The ideas used in this article are also extended to the algorithms that…

In How Map Reduce Let You Deal With PetaByte Scale With Ease, an introduction to how map reduce works and what are the reasons for it to be easily scalable were discussed. In case you are not familiar with the how map reduce works, I recommend you go through that first.

MapReduce can let us process over such high scale but, the interesting thing to know is that we can implement relational algebra operations using map reduce, which makes it possible for these systems to give an abstraction to the end users who are already familiar with SQL to just…

Map Reduce is the core idea used in systems which are used in todays world to analyse and manipulate PetaByte scale datasets (Spark, Hadoop). Knowing about the core concept gives a better understanding of why these systems do certain operations on data at certain stages. Which in turn helps design efficient processes while manipulating the data using those systems. Why data was repartitioned using user_id somewhere before join. Why people are told to avoid reduce tasks as much as possible (Yes, it’s expensive but how much and why). …

Machine Learning Engineer, Mad Street Den, https://kartikeyash.github.io/about/