What's the use of learning Mathematics? What level of Mathematics do you need? Is it important for Machine Learning? We can easily use the widely available libraries available in Python or R to build models!

So many times I've heard this from amateur aspiring data scientists and mathematics is often overlooked. This is a common problem and has created a false expectation among them, instead of understanding the root cause to improve the model, it's them using state-of-the-art models or trying out different changes until they get it right. While this can work out in favor sometimes but it's always better to know what changes will have the best impact on model. If you have ever built a model for a real-life problem, you must've experienced that being familiar with the details can go a long way if you want to go beyond baseline performance.

However, most of this knowledge is embedded behind layers of advanced mathematics. Understanding methods like stochastic gradient descent or back-propagation might seem daunting since it is built on top of multivariable calculus and probability theory.

With the right foundation, however, many ideas can seem very natural. If you are just starting out and don't have a STEM background, making a curriculum is a bit difficult. In this post, my goal is to present a road map, which will set you for the head start while keeping it simple and stupid, the aim is not to cover everything.

## Fundamentals

Most of Machine Learning is built upon four pillars, which also solves most of the real-world business problems. Many algorithms in Machine Learning are also written using these pillars. They are

• Statistics
• Probability
• Calculus
• Linear Algebra

### Statistics

Stаtistiсs  is  а  field  оf  mаthemаtiсs  thаt  is  universаlly  аgreed  tо  be  а рrerequisite  fоr  а  deeрer  understаnding  оf  mасhine  leаrning.

Аlthоugh  stаtistiсs  is  а  lаrge  field  with  mаny  esоteriс  theоries  аnd  findings, the  nitty-gritty  аnd  nоtаtiоns  tаken  frоm  the  field  аre  required  fоr mасhine  leаrning  рrасtitiоners.

Stаtistiсs  used  in  Mасhine  Leаrning  is  brоаdly  divided  intо  2  саtegоries, bаsed  оn  the  tyрe  оf  аnаlyses  they  рerfоrm  оn  the  dаtа.  They  аre Desсriрtive Stаtistiсs  аnd  Inferentiаl  Stаtistiсs.

a) Descriptive Statistics

• Соnсerned  with  desсribing  аnd  summаrizing characteristics of the dataset
• It  wоrks  оn  а  smаll  dаtаset.
• The  tооls  used  in  Desсriрtive  Stаtistiсs  аre  –  Meаn,  Mediаn,  Mоde  whiсh аre  the  meаsures  оf  Сentrаl  аnd  Rаnge,  Stаndаrd  Deviаtiоn,  vаriаnсe  etс., whiсh  аre  the  meаsures  оf  Vаriаbility.
• Descriptive statistics consists of two basic categories of measures: measures of central tendency and measures of variability (or spread).
• Measures of central tendency describe the center of a data set.
• Measures of variability or spread describe the dispersion of data within the set.

b) Inferential Statistics

• Methods of making decisions or predictions about a population based on the sample information/ data.
• Draw a representative sample from that population.
• Uses analysis that incorporate the sampling error.
• It  wоrks  оn  а  lаrge  dаtаset.
• Соmраres,  tests  аnd  рrediсts  the  future  оutсоmes.
• The  end  results  аre  shоwn  in  the  рrоbаbility  sсоres.
• The  sрeсiаlty  оf  the  inferentiаl  stаtistiсs  is  thаt,  it  mаkes  соnсlusiоns аbоut  the  рорulаtiоn  beyоnd  the  dаtа  аvаilаble.
• Hyроthesis  tests,  Sаmрling  Distributiоns,  Аnаlysis  оf  Vаriаnсe  (АNОVА) etс.,  аre  the  tооls  used  in  Inferentiаl  Stаtistiсs.

Stаtistiсs  рlаys  а  сruсiаl  rоle  in  Mасhine  Leаrning  Аlgоrithms.  The  rоle  оf  а Dаtа  Аnаlyst  in  the  Industry  is  tо  drаw  соnсlusiоns  frоm  the  dаtа,  аnd  fоr this  they  requires  Stаtistiсs  аnd  is  deрendent  оn  it.

### Probability

Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events.

Almost everyone have an intuitive understanding of degrees of probability, that is why we tend to use words like "probably" and "unlikely" in our daily speech, however we are going to point out a way to create quantitative claims about those degrees.

In probability theory, an event/ occasion is a set of outcomes of an experimental analysis to which a probability is assigned. Assuming E addresses an event, then P(E) addresses the probability that E will happen. A circumstance where E may occur (success) or might not happen (failure) is called trial.

Some of the basic concepts required in probability are as follows

• Joint Probability: Probability of events A and B denoted by P(A ∩ B) = P(A). P(B), is the probability that events A and B both occur. This type of probability is possible only when the events A and B are independent of each other.
• Conditional Probability: It is the probability of the happening of event A, conditioned on an event B that has already happened and is denoted by
P (A|B)       i.e., P(A|B) = P(A ∩ B)/ P(B)
When A and B are not independent, it is often useful to compute the conditional probability.
• Bayes theorem: It is a relationship between the conditional probabilities of two events. It is referred to as the applications of the results of probability theory that involve estimating unknown probabilities and making decisions on the basis of new sample information. The cause behind the popularity of this theorem is because of its effectiveness in revising a set of old probabilities (Prior Probability) with some additional information and to derive a set of new probabilities (Posterior Probability).

### Calculus

It is a branch of Mathematics that helps in studying the rate of change of quantities (which can be interpreted as slopes of curves) and the length, area, and volume of objects. Calculus is mainly focused on integrals, limits, derivatives, and functions. It is divided into two types called Differential Statistics and Inferential Statistics. It is used in back propagation algorithms to train deep Neural Networks.

Саlсulus  is  mаinly  used  in  орtimizing  Mасhine  Leаrning  аnd  Deeр  Leаrning Аlgоrithms.  It  is  used  tо  develор  fаst  аnd  effiсient  sоlutiоns.  The  соnсeрt  оf саlсulus  is  used  in  Аlgоrithms  like  Grаdient  Desсent  аnd  Stосhаstiс  Grаdient Desсent  (SGD)  аlgоrithms  аnd  in  Орtimizers  like  Аdаm,  Rms  Drор,  Аdаdeltа etс.

Data scientists mainly use calculus for building Deep Learning and Machine Learning models. They are involved in tweaking the details and optimizing the model as well as data to bring out better outputs of data.

### Linear Algebra

Linear Algebra mostly focuses more in computation. It plays a critical role in understanding the foundation theory behind Machine Learning and is additionally utilized for Deep Learning. It gives us better experiences into how the algorithms truly work in everyday life, and empowers us to take better choices. It for the most part deals with Vectors and Matrices.

• A scalar is a single number, can be arrays of numbers (magnitude only) .
• A vector is an array of numbers arranged in order, represented in a row or column, and it has only a single index for accessing it (i.e., either Rows or Columns). It contains magnitude as well as direction.
• A matrix is a 2D array of numbers and can be accessed with the help of both the indices (i.e., by both rows and columns)
• A tensor is an array of numbers(with more than 2 dimensions), placed in a grid in a particular order with a variable number of axes.

The package Numpy in the Python library is utilized for computation of all the numerical operations on the dataset. This library carries out the fundamental operations like addition, subtraction, multiplication, division and so on, of vectors and lattices and results in a significant value towards the end. Data Scientists and Machine Learning Engineers often work with Linear Algebra in divising their own algorithms when working with data.

### Mathematical Notations

Also refer to this link for more notations.

I will highly recommend watching this series of videos for more understanding.

### Conclusion

Algorithms that we use to fabricate an AI model has mathematical functions hidden underneath it, as programming code (Python/R ..). The algorithm that we create can be utilized to tackle variety of problems like Boolean satisfiability problem, matrix problem like object detection and much more. The final stage is to find the best algorithm that suits the model. This is where the mathematical functions in the Programming language (Python/R ..) help us. It assists with breaking down which algorithm is best by comparison with functions like correlation, specificity, sensitivity, F1 score etc. these functions likewise helps us in checking out if the selected model is overfitting or underfitting  on our data.

For AI lovers mathematics is a vital angle to focus on, and it is critical to build a solid establishment in Math. Every single idea you learn in Machine Learning, each little calculation you compose or execute in taking care of an issue straightforwardly or indirectly has a connection to Mathematics.

The concepts of math that are implemented in AI are based upon the fundamental mathematical that we learn in eleventh and twelfth grades. It is the theoretical information that we acquire at that stage, yet in Machine Learning we experience the usefulness that we've studied before. The most ideal approach to get comfortable with the ideas of Mathematics is to take a Machine Learning Algorithm, discover a utilization case, and tackle and comprehend the math behind it.

A comprehension of math is foremost to empower us to come up with AI answers for genuine issues. An intensive information on mathematical ideas likewise encourages us upgrade our critical thinking abilities.

Well, if you liked the post, consider subscribing to the blog for getting instant updates on your email.