dzone python training institute
Data Science Interview Questions And Answers
 INTERVIEW QUESTIONS AND ANSWERS  DATA SCIENCE INTERVIEW QUESTIONS Name a few libraries in Python used for Data Analysis and Scientific computations?
  • NumPy
  • SciPy
  • Pandas
  • SciKit
  • Matplotlib
  • Seaborn
Which library would you prefer for plotting in Python language: Seaborn or Matplotlib? Matplotlib is the python library used for plotting but it needs lot of fine-tuning to ensure that the plots look shiny. Whereas Seaborn allows data scientists to create statistically and aesthetically appealing meaningful plots. The answer varies as per the requirements for plotting data. How can you check if a data set or time series is Random? To check whether a dataset is random or not use the lag plot. If the lag plot for the given dataset does not show any structure then it is random. Which plot will you use to access the uncertainty of a statistic? Bootstrap Which python library is built on top of matplotlib and Pandas to ease data plotting? Seaborn What is pylab? A package that combines NumPy, SciPy and Matplotlib into a single namespace. Write the code to sort an array in NumPy by the nth column? Using argsort () function this can be achieved. If there is an array X and you would like to sort the nth column then code for this will be x[x [: n-1].argsort ()] Which Python library would you prefer to use for Data Munging? Pandas What is selection Bias? Selection bias occurs when sample obtained is not representative of the population intended to be analysed. What are the different kernels functions in SVM ? There are four types of kernels in SVM.
  • Linear Kernel
  • Polynomial Kernel
  • Radial basis Kernel
  • Sigmoid Kernel
What is pruning in Decision Tree? When we remove sub-nodes of a decision node, this process is called pruning or opposite process of splitting. What is logistic regression and state an example also? It is a technique to predict the binary outcome from a linear combination of predictor variables is known as Logistic Regression.

For Example: if you want to predict whether a particular candidate will win the quiz or not. In this case, the result of prediction is binary i.e. 0 or 1 (Win/Lose).
What are the important skills to have in Python with regard to data analysis? Here are some of the important skills nedded while performing data analysis using python.
  • Knowledge of in-built data types especially lists, dictionaries, tuples, and sets.
  • Mastery of N-dimensional NumPy Arrays.
  • Understanding of Pandas dataframes.
  • Able to perform element-wise vector and matrix operations on NumPy arrays.
  • You must know the use of Anaconda distribution and the conda package manager.
  • Knowledge of Scikit-Learn.
  • Able to write small, clean functions (important for developers), preferably pure functions that don’t alter objects.
  • Know how to profile the performance of a Python script and how to optimize bottlenecks.

You can easily tackle any problem in data analytics and machine learning if you have appropriate knowledge of the above points.

How is True Positive Rate and Recall related? Write the equation. True Positive Rate = Recall. Yes, they are equal having the formula (TP/TP + FN) Is it possible to capture the correlation between continuous and categorical variable? If yes, how? Yes through ANCOVA(analysis of covariance) technique, it is possible to capture the correlation between continuous and categorical variable.

 

Lets get Confident to

Enter in Coding World

Institute provide Training on Object Oriented Programing Course with Certificate and JOB