dzone python training institute

Python NumPy Exercise

Question 1: How to import a dataset with numbers and texts keeping the text intact in python numpy?

Note: Import the iris dataset keeping the text intact.

    

Solution

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') # Print the first 3 rows iris[:3] #> array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'], #> [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'], #> [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

Question 2: How to extract a particular column from 1D array of tuples?

Note: Extract the text column species from the 1D iris imported in previous question.


Input:

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)


Solution

# Input: url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) print(iris_1d.shape) # Solution: species = np.array([row[4] for row in iris_1d]) species[:5] #> (150,) #> array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', #> b'Iris-setosa'], #> dtype='|S18')

Question 3: How to convert a 1d array of tuples to a 2d numpy array?

Note: Convert the 1D iris to 2D array iris_2d by omitting the species text field.


Input:

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)


Solution

# Input: url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) # Output: # Method 1: Convert each row to a list and get the first 4 items iris_2d = np.array([row.tolist()[:4] for row in iris_1d]) iris_2d[:4] # Alt Method 2: Import only the first 4 columns from source url iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[:4] #> array([[ 5.1, 3.5, 1.4, 0.2], #> [ 4.9, 3. , 1.4, 0.2], #> [ 4.7, 3.2, 1.3, 0.2], #> [ 4.6, 3.1, 1.5, 0.2]])

Question 4: How to compute the mean, median, standard deviation of a numpy array?

Note: Find the mean, median, standard deviation of iris's sepallength (1st column)


Input

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

    
        

Solution

# Input url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Output mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength) print(mu, med, sd) #> 5.84333333333 5.8 0.825301291785

Question 5: How to normalize an array so the values range exactly between 0 and 1?

Note: Create a normalized form of iris's sepallength whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.


Input

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])


Solution

# Input url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Output Smax, Smin = sepallength.max(), sepallength.min() S = (sepallength - Smin)/(Smax - Smin) print(S)

Question 6: How to find the percentile scores of a numpy array?

Note: Find the 5th and 95th percentile of iris's sepallength.


Input:

url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

    

Solution

# Input url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Output np.percentile(sepallength, q=[5, 95]) #> array([ 4.6 , 7.255])

Question 7: How to insert values at random positions in an array?

Insert np.nan values at 20 random positions in iris_2d dataset


# Input
url = 'https://pythontraining.dzone.co.in/tutorial/exercises/numpy/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')


Solution

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') # Method 1 i, j = np.where(iris_2d) # i, j contain the row numbers and column numbers of 600 elements of iris_x np.random.seed(100) iris_2d[np.random.choice((i), 20), np.random.choice((j), 20)] = np.nan # Method 2 np.random.seed(100) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Print first 10 rows print(iris_2d[:10])

Question 8: How to find the position of missing values in numpy array?

Note: Find the number and position of missing values in iris_2d's sepallength (1st column).


# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float')
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan


Solution

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Solution print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum()) print("Position of missing values: \n", np.where(np.isnan(iris_2d[:, 0]))) #> Number of missing values: #> 5 #> Position of missing values: #> (array([ 39, 88, 99, 130, 147]),)

Question 9: How to filter a numpy array based on two or more conditions?

Note: Filter the rows of iris_2d that has petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0


# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])


Solution

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) # Output condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0) iris_2d[condition]

Question:10 How to drop rows that contain a missing value from a numpy array?

Note: Select the rows of iris_2d that does not have any nan value.


# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])


Solution

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Output # No direct numpy function for this. # Method 1: any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d]) iris_2d[any_nan_in_row][:5] # Method 2: (By Rong) iris_2d[np.sum(np.isnan(iris_2d), axis = 1) == 0][:5]