{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy & Scipy\n", "\n", "**Created and updated by** John C. S. Lui on August 14, 2020.\n", "\n", "**Important note:** *If you want to use and modify this notebook file, please acknowledge the author.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NumPy & SciPy\n", "- NumPy and SciPy are open-source add-on modules to Python that provide common mathematical and numerical routines in pre-compiled, fast functions.\n", "- **NumPy** (Numeric Python) package provides basic routines for manipulating *large arrays and matrices of numeric data* \n", "- **SciPy** (Scientific Python) package extends the functionality of NumPy with a substantial collection of *useful algorithms*, like minimization, Fourier transformation, regression, and other applied mathematical techniques\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scientific Python building blocks\n", "- **Python**\n", "- **Jupyter**: An advanced interative web tool\n", "- **Numpy** : provides powerful numerical arrays objects, and routines to manipulate them. http://www.numpy.org/\n", "- **Scipy** : high-level data processing routines. Optimization, regression, interpolation, etc http://www.scipy.org/\n", "- **Matplotlib** : 2-D visualization, “publication-ready” plots http://matplotlib.org/\n", "- **Mayavi** : 3-D visualization http://code.enthought.com/projects/mayavi/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NumPy basic\n", "\n", "- **NumPy’s** main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. \n", "- In Numpy dimensions are called **axes**. The number of axes is **rank**.\n", "- E.g., the coordinates of a point in 3D space [1, 2, 1] is an array of rank 1, because it has one axis.\n", "- That axis has a length of 3.\n", "- Let say we have the following array: [[1.0, 1.0, 2.0], [0.0, 2.0, 1.0]]\n", " * It has rank of 2 (or 2 dimensions)\n", " * Its first dimension (or axis) has a length of 2\n", " * It second dimension (or axis) has a length of 3\n", " \n", "## Importing NumPy module\n", "- Several ways:\n", " * **import numpy**\n", " * **import numpy as np**\n", " * **from numpy import * **\n", " \n", "- The basic building block of Numpy is a special data type called **array**\n", "- Arrays are similar to lists in Python, the function array takes two arguments: \n", " * the list to be converted into the array and,\n", " * the type of each member of the list\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "a = np.array([1, 4, 5, 8], float) # a is not a list, but an array under NumPy.\n", "print(a)\n", "print (type(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slicing arrays\n", "\n", "Slicing in python means taking elements from one given index to another given index.\n", "\n", "We pass slice instead of index like this: [start:end].\n", "\n", "We can also define the step, like this: [start:end:step].\n", "\n", "If we don't pass start its considered 0\n", "\n", "If we don't pass end its considered length of array in that dimension\n", "\n", "If we don't pass step its considered 1\n", "\n", "Let consider some examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Slice elements from index 1 to index 5 from the following array:\n", "\n", "arr = np.array([1, 2, 3, 4, 5, 6, 7])\n", "print(arr[1:5])\n", "print('-----')\n", "print(arr[1::2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Slice elements from index 4 to the end of the array:\n", "\n", "arr = np.array([1, 2, 3, 4, 5, 6, 7])\n", "print(arr[4:])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Slice elements from the beginning to index 4 (not included):\n", "\n", "arr = np.array([1, 2, 3, 4, 5, 6, 7])\n", "print(arr[:4])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Slice from the index 3 from the end to index 1 from the end:\n", "\n", "arr = np.array([1, 2, 3, 4, 5, 6, 7])\n", "print(arr[-3:-1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Return every other element from index 1 to index 5 using step size of 2\n", "\n", "arr = np.array([1, 2, 3, 4, 5, 6, 7])\n", "print(arr[1:5:2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Return every other element from the entire array:\n", "\n", "arr = np.array([1, 2, 3, 4, 5, 6, 7])\n", "print(arr[::2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 2D array: From the second element, slice elements from index 1 to index 4 (not included):\n", "\n", "arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])\n", "print(arr[1, 1:4])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 2D array: From both elements, return index 2:\n", "\n", "arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])\n", "print(arr[0:2, 2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 2D array: From both elements, slice index 1 to index 4 (not included), this will return a 2-D array:\n", "\n", "arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])\n", "print(arr[0:2, 1:4])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can index array just like we have done to list in Python\n", "\n", "a = np.array([1, 4, 5, 8], float)\n", "print(a[:2])\n", "print(a[1:-1])\n", "print(a[-3:-1])\n", "print(a[3])\n", "a[3] = 100.0\n", "print(a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's try higher dimensional array and illustrate the indexes !!!\n", "\n", "a = np.array([[1,2,3], [4,5,6]], float) # define 2x3 array\n", "print ('a =', a)\n", "a[0,0] = 15.0 # we can re-assign elements in the array\n", "a[1,0] = 12.0\n", "print('a =',a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# we can even using slicing\n", "a = np.array([[1,2,3], [4,5,6]], float) # define 2x3 array\n", "print('a[1,:2]', a[1,:2])\n", "print('a[1,:]=', a[1,:])\n", "print('a[:,2]=', a[:,2])\n", "print('a[-1:,-2:] =', a[-1:, -2:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Methods on Array\n", "\n", "Let's consider some useful **methods** which we can apply to array" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To find out the \"dimension\" of an array\n", "a = np.array([[1,2,3], [4,5,6]], float) # define 2x3 array\n", "\n", "print(a.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To find out the \"type\" of elements in an array\n", "print (a.dtype)\n", "print (type(a))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To find out the \"ndimension\" of an array\n", "print(a.ndim)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To find out the length of the first axis:\n", "print('length of the first axis =', len(a))\n", "print('length of the second axis =', len(a[0])) # length of the 2nd axis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To test whether an element is in the array\n", "a = np.array([[1,2,3], [4,5,6]], float) # define 2x3 array\n", "\n", "print (\"Is 2 in a? \", 2 in a) # test for membership\n", "print (\"Is 9 in a? \", 9 in a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To generate an array of numbers using the function range()\n", "a = np.array(range(10), float) # generate float array with a single item\n", "print('float a =', a)\n", "\n", "a = np.array([range(3), range(3)], int) # generate integer array with 2 items\n", "print('integer a = ', a)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# use reshape() method to re-arrange an array\n", "a = np.array(range(10), float)\n", "print('Before reshape(), a =', a)\n", "\n", "# reshape array a\n", "a = a.reshape(2,5)\n", "print('After 1st reshape(), a =', a)\n", "\n", "# reshape array a\n", "a = a.reshape(5,2)\n", "print('After 2nd reshape(), a =', a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Array assignment is just a reference copy !!!!!!!\n", "\n", "Note that in NumPy, array assignment is just a **reference copy**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array(range(5), float)\n", "b = a\n", "print ('a=', a, 'b=', b)\n", "\n", "a[0] = 10.0 # reassign an item in a\n", "print ('a =', a,'b =', b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## If we want to have another array, use `copy()` method" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array(range(5), float)\n", "b = a\n", "c = a.copy()\n", "print ('Before:', 'a=', a, 'b=', b, 'c=', c)\n", "\n", "a[0] = 10.0 # reassign an item in a\n", "print ('After: ', 'a=', a, 'b=', b, 'c=',c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define an array and fill it with some entries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array(range(5), float)\n", "print('a =', a)\n", "a.fill(0) # use fill method to initialize the array\n", "print('a =', a)\n", "a.fill(100.0)\n", "print('a =', a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([range(3), range(3)], int) # generate array with 2 items\n", "print('a =', a)\n", "a.fill(0) # use fill method to initialize the array\n", "print('a =', a)\n", "a.fill(100.0)\n", "print('a =', a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transpose method" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([range(5), range(5)], int) # generate array with 2 items\n", "print('a =', a)\n", "\n", "a = a.transpose()\n", "print('a =', a)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Concatenate method" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1,2], float) # float type\n", "b = np.array([3,4,5], float) # float type\n", "c = np.array([7,8,9,10], int) # integer type\n", "d = np.concatenate((a,b,c))\n", "print('a =',a)\n", "print('b =',b)\n", "print('c =',c)\n", "print('d =',d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# arange method\n", "\n", "arange method is similar to *range()* function except it returns **an array**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array(range(5), float)\n", "print('a = ', a)\n", "print('type of a: ', type(a))\n", "\n", "b = np.arange(5, dtype=float)\n", "print('b = ', b)\n", "print('type of b: ', type(b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## zeros and ones\n", "\n", "We use **zeros** and **ones** method to initialize an array\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.zeros(7,dtype=int)\n", "print('a=',a)\n", "\n", "b = np.zeros((2,3),dtype=int)\n", "print('b=',b)\n", "\n", "c = np.ones((3,2),dtype=float)\n", "print('c=',c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### We use `zeros_like()` and `ones_like()` functions to mimic and initialize arrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,3,3], [4,5,6]],int)\n", "b = np.zeros_like(a) # create a new array b with all zeros\n", "c = np.ones_like(a) # create a new array c with all ones\n", "print('a=',a)\n", "print('b=',b)\n", "print('c=',c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating identity matrix and diagonal matrix\n", "\n", "- use identity method\n", "- use eye method: which returns matrices with ones along the $k^{th}$ diagonal entries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.identity(5, dtype=float) # create a 5x5 identity matrix\n", "print('a=',a)\n", "\n", "a = np.eye(5, k=1, dtype=float) # create a 5x5 matrix with 1st diagonal being 1\n", "print('a=',a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array mathematics\n", "\n", "Of course we can do addition, subtraction, multiplication and division,...etc. Note that it is done in an **element-wise** manner." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1,2,3,4,5],int)\n", "b = np.array([5,4,3,2,1],float)\n", "print('a =', a, '; b =', b)\n", "print('a+b =', a+b)\n", "print('a-b =', a-b)\n", "print('a*b =', a*b)\n", "print('a/b =', a/b)\n", "print('a%b =', a%b)\n", "print('a**b =', a**b)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# for higher dimenion arrays, it remains to be an element-wise operation\n", "c = np.array([[1,2,3,4,5],[1,1,1,1,1]],int)\n", "d = np.array([[5,4,3,2,1],[2,2,2,2,2]],float)\n", "print('\\nc =', c)\n", "print('d =',d)\n", "print('c+d =', c+d)\n", "print('c-d =', c-d)\n", "print('c*d =', c*d)\n", "print('c/d =', c/d)\n", "print('c%d =', c%d)\n", "print('c**d =',c**d)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The \"broadcast\" feature of NumPy\n", "\n", "Arrays that do not match in the number of dimensions will be **broadcasted** (or adjusted to fit with appropriate dimension)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,2], [3,4], [5,6]], int)\n", "b = np.array([-1,3], float)\n", "print('a =', a)\n", "print('b =', b, end='\\n\\n')\n", "print('a+b =', a+b, end='\\n\\n')\n", "print('a*b =', a*b, end='\\n\\n')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using broadcast to scale the matrix \"a\" by a constant \"b\"\n", "a = np.array([[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]], float)\n", "b = np.array([0.125], float)\n", "print('a=', a, 'b=', b)\n", "print('a*b=', a*b)\n", "print('b*a=', b*a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use numpy array as \"index\" to another numpy array" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can also use numpy as \"index\" to another numpy array \n", "\n", "a = np.array([0,2,4,6,9,10])\n", "print('a =', a)\n", "k = a[np.array([2,3,4])] \n", "print('k =', k)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use elementwise logical comparion to extract elements of an array (IMPORTANT)!!!!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Elementwise logical comparison a>2\n", "\n", "a = np.array([0,2,4,6,9,10])\n", "print('Result of a>2:', a>2, '\\n')\n", "\n", "k = a[a>2] # assign those entries of a whcih satisfy the condition\n", "\n", "print('k =',k )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting minimum and maximum values for all entreis in an array" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Setting minimum and maximum in all elements in an numpy array\n", "\n", "a = np.array([0,2,4,6,9,10])\n", "print('a =', a)\n", "\n", "k = a.clip(1,4) \n", "print('k =',k)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing NaN in the array, and how to filter NaN out" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c = np.array([1,2,3, np.NAN, 4, 5]) \n", "print('c =', c, '\\n')\n", "\n", "# Test whether each element is an NAAN\n", "print('NaN testing for c =', np.isnan(c), '\\n')\n", "\n", "# We can eliminate NaN from c\n", "c1=c[~np.isnan(c)] \n", "print('c1 =', c1, '\\n')\n", "\n", "# Now we can compute statistics in c, for example, the average\n", "print('Average of all entries in c is =', np.mean(c[~np.isnan(c)]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Math libraries in NumPy\n", "\n", "There are **many libraries** like *abs, sign, sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, arcsinh, arccosh, arctanh*,.... Make sure to read the documentation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1,4,9,16,25,36],float)\n", "\n", "print ('Square root of a =', np.sqrt(a)) # use sqaure root\n", "print ('Sign of a =', np.sign(a)) # use sign\n", "\n", "b = np.array([-1,1,-20, 20,0], int)\n", "print ('Sign of b =', np.sign(b)) # use sign\n", "\n", "print ('exp of a =', np.exp(a)) # use exponential\n", "\n", "print ('log of a =', np.log(a)) # use log\n", "\n", "print ('log10 of a =', np.log10(a)) # use log10" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1.1,4.3,9.8,16.5,25.6],float)\n", "\n", "print('a =', a)\n", "print ('floor of a =', np.floor(a)) # use floor\n", "print ('ceil of a =', np.ceil(a)) # use ceil\n", "print ('rint of a =', np.rint(a)) # use rint\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Various constants in NumPy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Pi is = ', np.pi)\n", "print('e is = ', np.e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array iteration in NumPy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1,4,5], dtype=int)\n", "\n", "for num in a:\n", " print('number = ', num)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,2],[3,4],[5,6]], dtype=int)\n", "\n", "for num in a:\n", " print('number = ', num)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,2],[3,4],[5,6]], dtype=int)\n", "\n", "for num1, num2 in a:\n", " print('num1 = ', num1, '; num2 = ', num2, '; num1*num2 = ', num1*num2)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Array operations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([2,4,3], dtype=float)\n", "b = np.array([[1,2],[3,4]], dtype=float)\n", "\n", "print('a =', a)\n", "print('b =', b, end='\\n\\n')\n", "print('element sum of a = ', a.sum()) # sum all elements\n", "print('element sum of b = ', b.sum()) # sum all elements\n", "print('element prduct of a = ', a.prod()) # multiply all elements\n", "print('element product of b = ', b.prod()) # multiply all elements\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Another way is to use NumPy method with array as argument\n", "print('element sum of a = ', np.sum(a)) # sum all elements\n", "print('element sum of b = ', np.sum(b)) # sum all elements\n", "print('element prduct of a = ', np.prod(a)) # multiply all elements\n", "print('element product of b = ', np.prod(b)) # multiply all elements" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# to sort all entries in an array\n", "a = np.array([6, 2, 5, -1, 0],float)\n", "print('a =', a)\n", "print('sorted form of a =', sorted(a)) # sort a but not to alter the array a\n", "print('clipped form of a =', a.clip(0,4.1)) # specify lower/upper bound" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# finding unique entries in an array\n", "a = np.array([1, 1, 2, 2, 3, 4, 4, 5, 5, 5],float)\n", "print('a =', a)\n", "print('unqiue entries of a :', np.unique(a))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# finding diagonal entries in an array\n", "a = np.array([[1,2,3],[4,5,6],[7,8,9]], int)\n", "print('a =', a)\n", "print('diagonal entries of a are :', a.diagonal())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array comparison & testing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 3, 0], float)\n", "b = np.array([0, 3, 2], float)\n", "print('a=',a, '; b=', b)\n", "\n", "print('Is a > b: ', a>b) # a>b returns an array of boolean\n", "print('Is a == b:', a==b)\n", "print('Is a <= b:', a<=b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You can use methods like any or all to test condition\n", "c = a > b # c is now an array of boolean\n", "print('c =', c)\n", "print('There is at least one \"True\" in c: ', any(c))\n", "print('all entries in c are \"True\" : ', all(c))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# use of logical_and, logical_or and logical_not in array\n", "a = np.array([1,3,0], float)\n", "print('a =', a)\n", "b = np.logical_and(a>0, a<3)\n", "print('Are entries in a > 0 AND a < 3: ', b)\n", "c = np.logical_not(b)\n", "print('Use of logical_not in a: ', c)\n", "print('Use of logical_or: ', np.logical_or(b,c))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use of method where() in NumPy\n", "\n", "*where* forms a new array from two arrays of equivalent size using a Boolean filter to choose between elements of the two. Its basic syntax is
\n", "\n", "         *np.where (boolarray, truearray, falsearray)* " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 3, 0], float)\n", "b = np.where(a > 0, a+1, a)\n", "c = np.where(a > 0, 5.0, -1.0)\n", "print('a =', a)\n", "print('b =', b)\n", "print('c =', c)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## It is also possible to test whether or not values are NaN (\"not a number\") or finite\n", "a = np.array([2, np.NaN, np.Inf], float)\n", "print('a =', a)\n", "print('Entry is not a number :', np.isnan(a))\n", "print('Entry is finite :', np.isfinite(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding statistics in an array" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We want to find the mean and variance of a series\n", "a = np.array([2, 1, 3, 10.0, 5.3, 18.2, 16.3],dtype=float)\n", "mean = a.sum()/len(a)\n", "print('mean of a =', mean)\n", "print('mean of a =', a.mean(), end='\\n\\n')\n", "print('variance of a =', a.var())\n", "print('my standard deviation of a =', np.sqrt(a.var()))\n", "print('standard deviation of a =', a.std())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We want to find the min or max or argmin or argmax in a series\n", "a = np.array([2, 1, 3, 10.0, 5.3, 18.2, 16.3],dtype=float)\n", "\n", "print('a =', a)\n", "print('minimum element in a =', a.min())\n", "print('minimum occurs in index:', a.argmin())\n", "print('maximum element in a =', a.max())\n", "print('maximum occurs in index:', a.argmax())\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can even control which axis to take the statistics\n", "a = np.array([[0,2],[3,-1],[3,5]], dtype=float)\n", "print('a =', a)\n", "\n", "print('mean in axis 0 = ', a.mean(axis=0))\n", "print('mean in axis 1 = ', a.mean(axis=1))\n", "print('min in axis 0 = ', a.min(axis=0))\n", "print('min in axis 1 = ', a.min(axis=1))\n", "print('max in axis 0 = ', a.max(axis=0))\n", "print('max in axis 1 = ', a.max(axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array item selection and manipulation (IMPORTANT !!!!!)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[6,4], [5,9]], float) # define a 2x2 array\n", "print('a=', a)\n", "\n", "b = a >= 6 # define a boolean array with the same dimension as 'a'\n", "c = a[b] # select entries in 'a' which are correspondinly true in 'b'\n", "print('b = ', b, '. Type of b:', type(b))\n", "print('c = ', c, '. Type of c:', type(c))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[6,4], [5,9]], float) # define a 2x2 array\n", "print('a=', a)\n", "\n", "b = (a >= 6)\n", "print('b = ', b, '. Type of b:', type(b))\n", "\n", "c = a[b]\n", "print('c = ', c, '. Type of c:', type(c))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[6,4], [5,9]], float) # define a 2x2 array\n", "print('a=', a)\n", "\n", "b = a[np.logical_and(a > 5, a <9)]\n", "print('b = ', b, '. Type of b:', type(b))\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([2, 4, 6, 8], float)\n", "b = np.array([0, 0, 1, 3, 2, 1], int) # it has to be an integer array\n", "print('a=', a, '; b=', b)\n", "c = a[b] # create array c\n", "print('c = ', c, '. Type of c:', type(c))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multidimensional array selection\n", "\n", "For multidimensional arrays, we have to use multiple one-dimensional integer array to the selection bracket, **one for each axis**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,4], [9,16]], float)\n", "b = np.array([0, 0, 1, 1, 0], int)\n", "c = np.array([0, 1, 1, 1, 1], int)\n", "d = a[b,c]\n", "\n", "print('a =', a)\n", "print('b =', b)\n", "print('c =', c)\n", "print('d =', d, '; type of d is:', type(d))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## take method\n", "The function **take** is available to perform selection with integer arrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([2, 4, 6, 8], float)\n", "b = np.array([0, 0, 1, 3, 2, 1], int)\n", "\n", "print('a =', a, '; b=', b)\n", "print('a.take(b) = ', a.take(b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## put method\n", "The function **put** takes values from a soure array and place them at specified indices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([0, 1, 2, 3, 4, 5], float)\n", "print('a=', a)\n", "b = np.array([9, 8, 7], float)\n", "\n", "a.put([0, 3, 5], b) # put entries of b in a according to a given indices\n", "print('a=', a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# for put method, it can repeat if necessary\n", "a = np.array([0, 1, 2, 3, 4, 5], float)\n", "print('a=',a)\n", "\n", "a.put([0,2,3], 5)\n", "print('a=', a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vector and matrix mathematics\n", "\n", "We can do *dot* products and matrix multiplication,...etc" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1,2,3], float)\n", "b = np.array([0,1,1], float)\n", "\n", "print('a=', a, '; b=', b)\n", "print('a dot b =', np.dot(a,b))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,2], [3,4]], float)\n", "b = np.array([2,3], float)\n", "c = np.array([[1,1], [4,0]], float)\n", "print('a=', a, end='\\n\\n') \n", "print('b=', b, end='\\n\\n')\n", "print('c=', c, end='\\n\\n')\n", "print('b*a =', np.dot(b,a))\n", "print('a*b =', np.dot(a,b))\n", "print('a*c =', np.dot(a,c))\n", "print('c*a =', np.dot(c,a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## inner, outer and cross prodcuts of matrices and vectors" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1,4,0], float)\n", "b = np.array([2,2,1], float)\n", "print('a=', a, '; b=', b)\n", "print('np.outer(a,b)=', np.outer(a,b))\n", "print('np.inner(a,b)=', np.inner(a,b))\n", "print('np.cross(a,b)=', np.cross(a,b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Determinants, inverse and eigenvalues\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[4, 2, 0], [9, 3, 7],[1, 2, 1]], float)\n", "print ('a =', a)\n", "print('determinant of a is: ', np.linalg.det(a)) # get determinant\n", "vals, vecs = np.linalg.eig(a) # get eigenvalues/engenvectors\n", "print('eigenalues: ', vals)\n", "print('eigenvectors: ', vecs)\n", "\n", "b = np.linalg.inv(a) # compute inverse of a\n", "print('inverse of a = ', b)\n", "\n", "print(\"let's check, a * b = \", np.dot(a,b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Singular Value Decomposition (SVD)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,3,4], [5,2,3]], float)\n", "U, s, Vh = np.linalg.svd(a) # get SVD of a\n", "print ('U is:', U)\n", "print('s is:', s)\n", "print('Vh is:', Vh)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Polynomial mathematics\n", "\n", "Given a set of roots, it is possible to show the polynomial coefficient using the *poly* method in NumPy.\n", "\n", "Let say for the following polynomial: $x^4 - 11x^3 +9x^2 + 11x - 10$.\n", "If we know the roots are (-1, 1, 1, 10), we use *poly* method to find the \n", "coefficient.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Polynomial coefficients are: ', np.poly([-1, 1, 1, 10]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given a set of coefficients in a polynomail, we can find the roots also via *roots* method.\n", "Let say we have the following polynomail: $x^3 + 4x^2 -2 x + 3$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.roots([1, 4, -2, 3]) # get roots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do polynomail integration via the *polyint* method. For example, we have the following polynomial:
\n", " $x^3 + x^2 + x + 1$\n", "\n", "If we integrate it, we should have $x^4/4 + x^3/3 + x^2/2 + x + C$, where $C$ is a constant." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.polyint([1,1,1,1]) # perform integration on a polynomial (specify coefficients)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.polyder([1/4, 1/3, 1/2, 1, 0]) # perform derivative of polynomial (specify coefficients)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Note that there are other polynomail methods \n", "- polyadd\n", "- polysub\n", "- polymul\n", "- polydiv\n", "- polyval\n", "\n", "Read the documentation of Numpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate polynomial at a given point\n", "\n", "Let's say we want to evaluate $x^3 - 2x^2 + 2$ at $x=4$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.polyval([1,-2,0, 2], 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Curve fitting\n", "\n", "We can use *polyfit()* method, which fits a polynomial of specified order to a set of data using the *least-square method*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = [1, 2, 3, 4, 5, 6, 7, 8] \n", "y = [0, 2, 1, 3, 7, 10, 11, 19]\n", "\n", "# use polyfit to find the least square fit using polynomail of degree 3\n", "np.polyfit(x,y,3) # it returns all coefficients" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistics\n", "\n", "We can use *mean, median, var, stad* methods to find the statistics of vectors or matrices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1,4,3,8,9,2,3], float) # find median\n", "print(\"sorted a is: \", sorted(a))\n", "print('median of a: ', np.median(a))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# find correlation coefficient matrix (corrcoef)\n", "a = np.array([[1,2,1,3], [5,3,1,8]], float)\n", "c = np.corrcoef(a)\n", "print('Correlation coefficient matrix: \\n', c)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# find covariane of data (cov)\n", "print('Covariance of data: \\n', np.cov(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use random number generators in NumPy\n", "\n", "We need random number generators in simulation, statistical analysis and machine learning tasks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# uniformly generate 5 random number in [0.0, 1.0)\n", "np.random.rand(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# random numbers for array, and use of *reshape()* method\n", "print('A 2x3 matrix with random # between [0, 1.0):\\n',\n", " np.random.rand(2,3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or we can use reshape()\n", "print('A 2x3 matrix with random # between [0, 1.0):\\n',\n", " np.random.rand(6).reshape(2,3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generate a SINGLE random number between [0, 1)\n", "np.random.rand()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generate a random INTEGER in the range [min, max]\n", "np.random.randint(5,10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate a random number from a Poisson distribution with a mean of 6.0\n", "np.random.poisson(6.0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate a random number from a Normal distribution with a mean and standard deviation\n", "np.random.normal(1.5, 4.0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate a random number from a standard normal distribution\n", "np.random.normal()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Draw multiple values using the *size* argument\n", "\n", "import numpy as np\n", "\n", "np.random.normal(size=5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Draw samples from a normal distribution with given parameters mean and standard deviation\n", "mu, sigma = 0, 1.0 # mean and standard deviation\n", "s = np.random.normal(mu, sigma, 10000)\n", "\n", "# s = np.random.randint(0, 10, 10000) # random integer between 0 and 10\n", "# s = np.random.uniform(0, 10, 10000) # uniform distribution between 0 and 10\n", "# s = np.random.poisson(1.0, 10000) # Poisson distribution with mean being 1.0\n", "\n", "print('(a) mean of s =', s.sum()/len(s))\n", "print('(b) mean of s =', s.mean())\n", "print('(c) mean of s =', np.mean(s))\n", "\n", "print('Standard deviatin of s =', np.std(s))\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's try to visualize the vector s\n", "\n", "import matplotlib.pyplot as plt\n", "count, bins, ignored = plt.hist(s, 30, density=True)\n", "plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * \n", " np.exp( - (bins - mu)**2 / (2 * sigma**2) ), linewidth=2, color='r')\n", "\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# SciPy\n", "\n", "- Greatly extends the functionality of NumPy\n", "- We can use \"import scipy\" to import the module\n", "- SciPy has **many packages**\n", "- To explore, do **help(scipy)** \n", "- scipy.constants: many mathematical & physical constants\n", "- scipy.speical: many special functions like gamma, beta, bessel,..etc\n", "- scipy.integrate: perform numerical integration using methods like trapezoidal, Simpson's Romberg,..etc\n", "- scipy.optimize: minimization and maximization routines\n", "- scipy.linalg: linear algebra routines (broader than NumPy)\n", "- scipy.sparse: routines for working with large and sparce matrices\n", "- scipy.interpolate: interpolation routines\n", "- scipy.fftpack: Fast Fourier transform routines\n", "- scipy.signal: signal processing routines, e.g., convolution\n", "- scipy.stats: Huge library of various statistical distributions and functions\n", "- scipy.ndimage: n-dimensioanl image processing routines\n", "- scipy.cluster: Vector quantization and Kmeans\n", "- scipy.io: Data input and output\n", "- scipy.spatial: Spatial data strcutures and algorithms\n", "\n", "Let's illustrate some of them" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from scipy import stats\n", "from scipy import io as spio\n", "\n", "a = np.ones((3,3)) # generate a 3x3 matrix with all 1's\n", "\n", "# save a to a MatLab file \n", "spio.savemat('file.mat', {'a': a}) # savemat expects a dictionary\n", "\n", "# load the file to another variable\n", "data = spio.loadmat('file.mat', struct_as_record=True)\n", "print('The matrix a in data is: \\n', data['a'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SciPy: Linear Algebra (scipy.linalg)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Matrix determinant\n", "import numpy as np\n", "from scipy import linalg\n", "arr = np.array([[1,2], [3,4]], float)\n", "print('arr =\\n', arr)\n", "print(\"arr's determinant is: \", linalg.det(arr))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Matrix inverse\n", "arr_inv = linalg.inv(arr)\n", "print(\"arr's inverse is:\\n\", arr_inv)\n", "\n", "# Let's do a test\n", "print(\"\\narr * arr_inv is:\\n\", np.dot(arr, arr_inv))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's examine the optimizatoin and search routines in SciPy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from scipy import optimize # import optimization library\n", "import matplotlib.pyplot as plt # import ploting routine\n", "\n", "# Define function\n", "def f(x):\n", " return x**2 + 10*np.sin(x)\n", "\n", "\n", "# plot it\n", "\n", "x = np.arange(-10, 10, 0.1)\n", "plt.plot(x, f(x))\n", "plt.show()\n", "\n", "\n", "# This function has a global minimum round -1.3\n", "# and a local minimum around 3.8\n", "\n", "# Conduct a gradient descent from zero to find minimum\n", "print ('find minima starting from 0:')\n", "optimize.fmin_bfgs(f, 0) # start from 0 and see it finds local min only\n", "\n", "# conduct gradient descent from 3 to find minimum\n", "print ('find minima starting from 3:')\n", "optimize.fmin_bfgs(f,3)\n", "\n", "\n", "# To find the global optimal, we use scipy.optimize.basinhopping()\n", "# which combines a local optimizer with stochastic sampling of\n", "# starting points for the local optimizer\n", "\n", "print ('find global minimum')\n", "minimizer_kwargs = {\"method\": \"BFGS\"}\n", "ret = optimize.basinhopping(f, 0.0, \\\n", " minimizer_kwargs= minimizer_kwargs, niter=200)\n", "print ('global minimum: x = ', ret.x, 'f(x0)= ', ret.fun)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Finding the roots of a scalar function\n", "import numpy as np\n", "from scipy import optimize # import optimization package\n", "\n", "# define function\n", "def f(x):\n", " return x**2 + 10*np.sin(x)\n", "\n", "root = optimize.fsolve(f,1) # our initial guess\n", "print(\"Guess from 1: \", root)\n", "\n", "root = optimize.fsolve(f, -2.5) # another quess\n", "print(\"guess from -2.5:\", root)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's see some optimization\n", "import numpy as np\n", "from scipy import optimize\n", "\n", "# define function\n", "def f(x):\n", " return x**2 + 10*np.sin(x)\n", "\n", "xdata = np.linspace(-10,10, num=20) # generate 20 pts of x\n", "ydata = f(xdata) + np.random.rand(xdata.size) # generate 20 points of y\n", "\n", "# define another function\n", "def f2(x, a, b):\n", " return a + x**2 + b*np.sin(x)\n", "\n", "# use scipy.optimize.curve_fit()\n", "guess = [2,2] # initial guess\n", "params, params_covariance = optimize.curve_fit(f2, xdata, ydata, guess)\n", "\n", "print(\"params =\", params)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 2 }