Python Practice Problems For Data Interview

Top 110+ Python Interview Questions and Answers - Shiksha Online

Python Practice Problems For Data Interview

Problem Statement: Write a Python function that computes the dot product of a matrix and a vector. The function should return a list representing the resulting vector.

import numpy as np  # Import the NumPy library for numerical operations

def matrix_vector_dot(matrix, vector):
    """
    Computes the dot product of a matrix and a vector.

    Parameters:
    matrix (list of lists): The input matrix.
    vector (list): The input vector.

    Returns:
    list: The resulting vector after the dot product.
    """
   if len(matrix[0]) != len(vector):
		return -1
	else:	 
   return np.dot(matrix, vector).tolist()  # Compute the dot product and convert the result to a list

Problem Statement: Write a Python function that multiplies a matrix by a scalar and returns the result.

import numpy as np  # Import the NumPy library for numerical operations

def scalar_multiply(matrix, scalar):
    """
    Multiplies each element of the matrix by a scalar.

    Parameters:
    matrix (list of lists): The input matrix.
    scalar (float): The scalar value to multiply with.

    Returns:
    list of lists: The resulting matrix after scalar multiplication.
    """
    return (np.array(matrix) * scalar).tolist()  # Convert matrix to NumPy array, multiply by scalar, and convert back to list

Problem Statement: Write a Python function that uses the Jacobi method to solve a system of linear equations given by Ax = b. The function should iterate n times.

import numpy as np  # Import the NumPy library for numerical operations

def jacobi_method(A, b, n):
    """
    Solves the system of linear equations Ax = b using the Jacobi method.

    Parameters:
    A (list of lists): Coefficient matrix.
    b (list): Constant terms vector.
    n (int): Number of iterations.

    Returns:
    list: Approximate solution vector after n iterations.
    """
    x = np.zeros_like(b)  # Initialize the solution vector with zeros
    D = np.diag(A)  # Extract the diagonal elements of A
    R = A - np.diagflat(D)  # Compute the remainder matrix (A without the diagonal)
    for _ in range(n):
        x = (b - np.dot(R, x)) / D  # Update the solution vector
    return x.tolist()  # Convert the result to a list and return

Problem Statement: Write a Python function that performs linear regression using gradient descent. The function should take NumPy arrays X (features with a column of ones for the intercept) and y (target values), a learning rate, and the number of iterations.

import numpy as np  # Import the NumPy library for numerical operations

def linear_regression(X, y, learning_rate, iterations):
    """
    Performs linear regression using gradient descent.

    Parameters:
    X (numpy.ndarray): Feature matrix with a column of ones for the intercept.
    y (numpy.ndarray): Target values vector.
    learning_rate (float): Learning rate for gradient descent.
    iterations (int): Number of iterations.

    Returns:
    numpy.ndarray: Coefficients vector after training.
    """
    m, n = X.shape  # Get the number of samples (m) and features (n)
    theta = np.zeros(n)  # Initialize the coefficients vector with zeros
    for _ in range(iterations):
        gradient = np.dot(X.T, np.dot(X, theta) - y) / m  # Compute the gradient
        theta -= learning_rate * gradient  # Update the coefficients
    return theta  # Return the trained coefficients

Problem Statement: Write a Python function that implements the k-Means clustering algorithm. This function should take specific inputs and produce a list of cluster assignments.

import numpy as np  # Import the NumPy library for numerical operations

def k_means_clustering(data, k, iterations):
    """
    Performs k-Means clustering on the given data.

    Parameters:
    data (numpy.ndarray): The input data points.
    k (int): Number of clusters.
    iterations (int): Number of iterations.

    Returns:
    list: Cluster assignments for each data point.
    """
    centroids = data[np.random.choice(len(data), k, replace=False)]  # Initialize centroids randomly
    for _ in range(iterations):
        distances = np.linalg.norm(data[:, np.newaxis] - centroids, axis=2)  # Compute distances to centroids
        labels = np.argmin(distances, axis=1)  # Assign clusters based on closest centroid
        for i in range(k):
            points = data[labels == i]  # Get all points assigned to cluster i
            if points.size:
                centroids[i] = points.mean(axis=0)  # Update centroid to mean of assigned points
    return labels.tolist()  # Convert the result to a list and return

Dataloopr

Python Practice Problems For Data Interview

Dataloopr

Recent Articles

A/B Testing Alternative - Switchback Design

Relationship Between SVD And PCA

Data Scientist Interview - Netflix

Quant Interview Question - JP Morgan

Common Outlier Treatment Methods

ANOVA Assumptions and Why They Matter

Quant Interview Question - Goldman Sachs

Data Scientist Interview Question - Amazon

10 Probability Distributions & Real Life Examples

Bayesian Media Mix Model

Tags