Thinking in: Linear Algebra Part 1
Linear algebra is a branch of mathematics which is foundational to machine learning. Mastering it acts as the starting point to seeing data more beautifully.
Linear Algebra is a branch of mathematics concerning linear equations. The familiar ‘y=mx+c’ is the usual starting point of the study. It is also concerned with linear maps and their representations in form of matrices and vectors. This series is an attempt to understand the fundamentals behind them and to help you think like Data Scientists and how they would use Linear Algebra, the language of data.
Part 1 introduces the basics of linear algebra and gives a head start before you dive deeper.
After covering Part 1, you will
Gain deeper insights on scalars, vectors
Be able to compare the performance efficiency of NumPy and Python list.
Following are the links to all the parts of the series:
Part 2 - Vector arithmetic and its interpretation
Part 3 - Advanced Numpy Operations and Optimization
Why this series?
If you are here that means you are already familiar with or have an opinion about Linear Algebra. We are glad to tell you that most of them are correct.
Our intention for this series is to build upon the knowledge that you possess and provide a computer science perspective to it. We will try our best to provide an insight as to how a Computer scientist could think about Linear Algebra. Apart from being a foundation to ML, AI & Computer Vision; Linear Algebra is the heartbeat for computer science viz. Computer Graphics, Digital Signal processing.
Without further ado, let’s start with linear algebra.
Getting Started with Scalars
In linear algebra, real numbers are called Scalars. When we say something is a ‘Scalar quantity’, it represents a single qualitative value e.g. height, weight, temperature, etc. i.e. all these quantities are backed by a single unit (cm, kg and °C).
The notation x ∈ ℝ states that x is a scalar belonging to a set of real-values numbers, ℝ
Notations
ℕ represents the set of positive integers (1,2,3,…)
ℤ designates the integers, which combine positive, negative and zero values
ℚ represents the set of rational numbers that may be expressed as a fraction of two integers.
import numpy as np | |
import matplotlib as plt # Scalars & Arithmetic | |
a = 5 | |
b = 7.5 | |
print("a has value {} and is of type {}".format(a,type(a))) | |
print("b has value {} and is of type {}".format(b,type(b))) print("\n a + b = {} \n a - b = {} \n a * b = {} \n a / b = {} ".format((a+b),(a-b),(a*b),(a/b))) |
The code above will give you the following output:
Few built-in scalar types are int, float, complex, bytes, Unicode in Python. A scalar can be thought of as a matrix with only a single entry. i.e. scalar is its own transpose (Interchanged row & column)
Vectors
Vectors are arrays of single numbers and are defined over vector spaces. A simple way to think of vectors is as a List.
Imagine the X-Y plane, and the screen that you are currently viewing is lying on the plane i.e The leftmost edge of your screen is at (0,0) — Origin. It would be something similar to the above diagram. Each of the points of the rectangle is represented by two numbers/co-ordinates x & y. A list of these two points is a Vector.
Two common ways to declare a vector in Python are :
Using the in-built List
x = [1,2,3]
Using NumPy
a= np.array([1, 2, 3])
import numpy as np | |
# Declaring Vectors | |
#Direct way aka Lists | |
x = [1, 2, 3] | |
y = [4, 5, 6] | |
#using numpy | |
a= np.array([1, 2, 3]) | |
b= np.array([4, 5, 6]) | |
print("x has value {} and is of type {} and has shape {}".format(x,type(x),len(x))) | |
print("y has value {} and is of type {} and has shape {}".format(y,type(y),len(y))) | |
print("\na has value {} and is of type {} and has shape {}".format(a,type(a),a.shape)) | |
print("b has value {} and is of type {} and has shape {}".format(b,type(b),b.shape)) |
A NumPy array is a multidimensional array of objects, all of the same type. It is an object which points to a block of memory, keeps track of the type of data stored , its dimensions and how large each one is.
Numpy array vs Python lists
Python’s lists are efficient general-purpose containers. They support (fairly) efficient insertion, deletion, appending, and concatenation, and Python’s list comprehensions make them easy to construct and manipulate. However, they have certain limitations: they don’t support “vectorized” operations like element-wise addition and multiplication. The fact that they can contain objects of differing types means that Python must store ‘type’ information for every element, and must execute ‘type’ dispatching code when operating on each element which results in inefficient loops.
from numpy import arange | |
from timeit import Timer | |
Nelements = 1000 | |
Ntimeits = 1000 | |
x = arange(Nelements) | |
y = range(Nelements) | |
t_numpy = Timer("x.sum()", "from __main__ import x") | |
t_list = Timer("sum(y)", "from __main__ import y") | |
print("numpy: %.3e" % (t_numpy.timeit(Ntimeits)/Ntimeits,)) | |
print("list: %.3e" % (t_list.timeit(Ntimeits)/Ntimeits,)) | |
if (t_numpy.timeit(Ntimeits)/Ntimeits) < (t_list.timeit(Ntimeits)/Ntimeits): | |
print("\nNumpy is faster than list implementation by {}".format(((t_list.timeit(Ntimeits)/Ntimeits)-(t_numpy.timeit(Ntimeits)/Ntimeits)))) | |
else: | |
print("List is faster than Numpy implementation by {}".format(((t_numpy.timeit(Ntimeits)/Ntimeits)-(t_list.timeit(Ntimeits)/Ntimeits)))) |
The code above will give you the following output (numbers may vary but end result will be the same) :
This however does NOT imply that lists are not useful. Their mutable & Dynamic nature with the ability to contain heterogeneous data types (a single list can contain strings, integers, as well as objects) makes them very powerful for general-purpose programming.
NumPy is a Python extension module that provides efficient operation on arrays of homogeneous data. It allows python to serve as a high-level language for manipulating numerical data. Hence, it is largely recommended to use NumPy array when working with mathematical objects.
We read about Scalars, Vectors and compared NumPy arrays with python lists and executed few commands in Numpy. In part 2, we will delve into vector calculations, dot product and eigenvector and try to understand their significance in machine learning.
Following are the links to all the parts of the series:
Part 2 - Vector arithmetic and its interpretation
Part 3 - Advanced Numpy Operations and Optimization
A great write up. An effective description of Scalar and Vectors.
An aside: The print statements written as f strings look much more elegant and are easier to follow:
print("x has value {} and is of type {} and has shape {}".format(x,type(x),len(x)))
print(f"x has value {x} and is of type {type(x)} and has shape {len(x)}.")
I find the f strings more intuitive to follow.
Pranay Pallav