Python NumPy Exercise with Solution by DZONE

Python NumPy Exercise

Question 1: How to import a dataset with numbers and texts keeping the text intact in python numpy?

Note: Import the iris dataset keeping the text intact.

    
Solution
url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Print the first 3 rows
iris[:3]
#> array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
#>        [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
#>        [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

Question 2: How to extract a particular column from 1D array of tuples?

Note: Extract the text column species from the 1D iris imported in previous question.


Input:

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)


Solution
# Input:
url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
print(iris_1d.shape)

# Solution:
species = np.array([row[4] for row in iris_1d])
species[:5]
#> (150,)
#> array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
#>        b'Iris-setosa'],
#>       dtype='|S18')

Question 3: How to convert a 1d array of tuples to a 2d numpy array?

Note: Convert the 1D iris to 2D array iris_2d by omitting the species text field.


Input:

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)


Solution
# Input:
url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Output:
# Method 1: Convert each row to a list and get the first 4 items
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
iris_2d[:4]

# Alt Method 2: Import only the first 4 columns from source url
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[:4]
#> array([[ 5.1,  3.5,  1.4,  0.2],
#>        [ 4.9,  3. ,  1.4,  0.2],
#>        [ 4.7,  3.2,  1.3,  0.2],
#>        [ 4.6,  3.1,  1.5,  0.2]])

Question 4: How to compute the mean, median, standard deviation of a numpy array?

Note: Find the mean, median, standard deviation of iris's sepallength (1st column)


Input

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

    
        Solution
# Input
url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Output
mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength)
print(mu, med, sd)
#> 5.84333333333 5.8 0.825301291785

Question 5: How to normalize an array so the values range exactly between 0 and 1?

Note: Create a normalized form of iris's sepallength whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.


Input

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])


Solution
# Input
url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Output
Smax, Smin = sepallength.max(), sepallength.min()
S = (sepallength - Smin)/(Smax - Smin)

print(S)

Question 6: How to find the percentile scores of a numpy array?

Note: Find the 5th and 95th percentile of iris's sepallength.


Input:

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

    
Solution
# Input
url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Output
np.percentile(sepallength, q=[5, 95])
#> array([ 4.6  ,  7.255])

Question 7: How to insert values at random positions in an array?

Insert np.nan values at 20 random positions in iris_2d dataset


# Input
url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')


Solution
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')

# Method 1
i, j = np.where(iris_2d)

# i, j contain the row numbers and column numbers of 600 elements of iris_x
np.random.seed(100)
iris_2d[np.random.choice((i), 20), np.random.choice((j), 20)] = np.nan

# Method 2
np.random.seed(100)
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Print first 10 rows
print(iris_2d[:10])

Question 8: How to find the position of missing values in numpy array?

Note: Find the number and position of missing values in iris_2d's sepallength (1st column).


# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float')
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan


Solution
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Solution
print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum())
print("Position of missing values: \n", np.where(np.isnan(iris_2d[:, 0])))
#> Number of missing values: 
#>  5
#> Position of missing values: 
#>  (array([ 39,  88,  99, 130, 147]),)

Question 9: How to filter a numpy array based on two or more conditions?

Note: Filter the rows of iris_2d that has petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0


# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])


Solution
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Output
condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0)
iris_2d[condition]

Question:10 How to drop rows that contain a missing value from a numpy array?

Note: Select the rows of iris_2d that does not have any nan value.


# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])


Solution
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Output
# No direct numpy function for this.
# Method 1:
any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])
iris_2d[any_nan_in_row][:5]

# Method 2: (By Rong)
iris_2d[np.sum(np.isnan(iris_2d), axis = 1) == 0][:5]