Photo of data science interview question prep

113 data science interview questions to nail your onsite – 2023 update

Updated in 2023

Get all of the practice you need for your upcoming data science onsite interview so that you can turn that interview into an offer. We have helped 800+ people land great jobs in tech, so we wanted to share some of our inside information and data-backed tips to help you, too.

Check out our list of 113 data science interview questions from top tech companies so you can practice and go into your sessions with confidence.

Data science interview questions

Statistics

  1. Accenture questionWhat is linear regression?
  2. Google question – Find the width of the confidence interval
  3. What is your familiarity with statistical methods and passed projects?
  4. Airbnb questionHow can you report the statistical results to a non-statistician staff?
  5. Netflix question – When you split a population for A/B testing, what are some reasons you could see a significant difference in the control and variant groups?   
  6. Apple questionWhat is bias variance tradeoff? How is XGBoost handling bias-variance tradeoff?
  7. Explain a probability distribution that is not normal.
  8. Google questionIf two predictors are highly correlated, what is the effect on the coefficients in the logistic regression? What are the confidence intervals of the coefficients?
  9. When using Gaussian mixture model, how do you know it is applicable?
  10. IBM question – What are the relationships between the coefficient in the logistic regression and the odds ratio?
  11. What are different metrics to classify a dataset?
  12. How do you find an anomaly in a distribution? How do you investigate that a certain trend in a distribution is due to anomaly?
  13. Explain the concept of multicollinearity

Probability

  1. Facebook question – If you draw 2 cards from a shuffled 52 card deck, what is the probability that you’ll have a pair?
  2. Given an unfair coin with the probability of heads not equal to .5, what algorithm could you use to create a list of random 1s and 0s?
  3. Groupon question – You are on a number line and you can jump to one of the neighboring points with equal probability, with the exception of n=0 where you can’t go to negative numbers but have to come back to n=1. If you start at n=44, what is the expected number of steps to reach n=4444?
  4. CA Technologies question – How do you get an estimate of the answer using Taylor expansion?
  5. Microsoft question – Generate 7 integers with equal probability from a function which returns 1/0 with probability p and (1-p).

Case study questions

  1. How would you measure the impact of introducing a new tool for partners?
  2. CA Technologies question – How do you design an algorithm for fraud detection?
  3. Twitter question – What features would you use to build recommendation algorithm for users.
  4. Accenture questionMap an organization’s problem to data science – how will you solve it using data science and machine learning?
  5. Uber questionHow much would it cost (initial and sustaining costs) to having a fleet of vehicles take Google street view photos of every major city in the US every day?   
  6. Airbnb questionBrainstorm potential causes of an anomaly in web traffic data.
  7. An important metric goes down, how would you dig into the causes?   
  8. Amazon questionEstimate the cumulative sum of the top 10 most profitable products of the last 6 month for customers in Seattle.
  9. How do you deal with unbalanced data where the ratio of positive and negative is huge?
  10. Booking questionHow can we automatically propose ‘good value deals’ to customers, including hotels that don’t have a rating yet?
  11. If you have a customer and want to decide whether they will “buy today” or “not buy today” and you know 1. where they live, 2. their income, 3. their gender, 4. their profession, how would you define a machine learning algorithm to figure this out?
  12. LinkedIn question – Come up with some of the factors that could be used to produce certain algorithms (‘people you may know,’ and an algorithm to discover when a person is starting to search for new job).
More case study questions
  1. Booking questionEach hotel submits a short description. How do we figure out if it’s worth translating in some language?
  2. How can you optimize/increase the number of languages a customer service department is able to serve? The constraint is to maintain the same quality as before with the same budget and same number of customer representatives as before.
  3. Expedia questionDevelop a solution for the revenue optimization team using a structured dataset that describes the historical bookings of their hotels, which had the following attributes: number of people, booking times, arrival times, departure times, hotel features, prices, whether booked or not, etc.
  4. Salesforce questionHow would you build a classifier to predict the outcome of NFL games in real time?
  5. Imagine we see a lot of users filling up a form but not submitting it, why would this be the case and how would you use data to figure it out?
  6. Intel questionGiven measurements of acceleration taken from a wristband, with second by second acceleration in the X, Y, Z axis, how would you predict if the person wearing the band is sitting, walking, or just standing?
  7. LinkedIn questionHow would you design an A/B test for the homepage?
  8. eBay question eBay has to identify the cameras from the similar items like tripods, cables, and batteries, what is the approach? (Data is title, description of the product, price, image, etc.)
  9. How many lines do you think a user’s daily login table has?
  10. Netflix question Given a month’s worth of login data from Netflix such as account_id, device_id, and metadata concerning payments, how would you detect fraud? (identity theft, payment fraud, etc.)
  11. SAP question How would you design a recommendation system for customers, considering that a single customer may use many devices to log onto a single account?
Final case study questions
  1. Slack questionHow would you prioritize which country to expand Slack to for furthering the international effort?
  2. LinkedIn question – What product metrics do you construct? How do you tell if your experiment is successful?
  3. Stripe questionHow would you choose between the subscription and the market-place based options i.e. evaluate which would be better for the business in the long run?   
  4. Booking question – How would you tag a listing as value for money? How would you measure the “value”? What features would you select to explain the “value”?
  5. Intuit question – How would you design a ranking system?
  6. Apple question – How do you take millions of users with 100s of transactions each, amongst 10ks of products and group the users together in a meaningful segments?
  7. Facebook question – How many high schools that people have listed on their profiles are real? How do we find out, and deploy at scale, a way of finding invalid schools?
  8. Uber questionIf you were rolling out Uber ride passes for the first time, how would you set the prices?   
  9. We have a product that is getting used differently by two different groups. What is your hypothesis about why and how would you go about testing it?
  10. Uber question – Explain how network effects might influence your choice of how to assign experimental/control units and measure your main outcome metrics
  11. What trends in the data indicate that a given market is healthy? What does price tell you?  

SQL and databases

  1. Twitter question – How can you illustrate a tree-based system with a SQL query?
  2. Dell question – What is indexing in database?
  3. Pinterest question – Write a SQL query to count the number of unique users per day who logged in from both an iPhone and the web, where iPhone logs and web logs are in distinct relations.
  4. Spotify question – Given a sample set of tables, write a sql query to get a summary metric from those tables.
  5. Facebook question – Given a series of tables; write the SQL code you would need to count subpopulations through joins.
  6. If you have a table with a billion rows, how would you add a column inserting data from the original source without affecting the user experience?
  7. Facebook question – There is a table that tracks every time a user turns a feature on or off, with columns for user_id, action (“on” or “off), date, and time. How many users turned the feature on today? How many users have ever turned the feature on? In a table that tracks the status of every user every day, how would you add today’s data to it?

Programming

  1. Check if an integer is a palindrome (do not convert the integer to string)
  2. Adobe questionWhat kind of coding language do you use when handling a large-scale dataset?
  3. How would you impute missing information?  
  4. Amazon questionWrite a Python function that displays the first n Fibonacci numbers.
  5. Write Python code to return the count of words in a string
  6. Cisco questionMerge 2 sorted linked list
  7. Rakuten questionWrite a function that finds the MST of a directed graph.
  8. Clone a graph
  9. eBay question Given a function roll() that uniformly returns a double between 0 and 1 and an array/list of numbers of length N (no duplicates), create a function shuffle() that returns a permutation of equal probability.
  10. Given 2 sorted arrays of integers, code to find a number from each array such that their sum is closest to some integer K
  11. How would you create/design/implement a certain algorithm from start to end?
  12. LinkedIn questionGiven a random generator that produces a number 1 to 5 uniformly, write a function that produces a number from 1 to 7 uniformly
  13. Generate a sorted vector from two sorted vectors.
  14. Uber questionGiven a random Bernoulli trial generator, write a function to return a value sampled from a normal distribution.
More programming questions
  1. Groupon question – How do you write sqrt function without using sqrt())?
  2. Given 2 sorted arrays, merge them into 1 array. If the first array has enough space for 2, how do you merge the 2 without using extra space?
  3. Apple question – Find the index at which the sum of the left half of array is equal to the right half.
  4. HP question – What is polymorphism and encapsulation in OOP?
  5. Salesforce question – What is the computational complexity of finding the most frequent word in a document?
  6. How would you improve the complexity of a list merging algorithm from quadratic to linear?
  7. IBM question – Given a subset of daily sales and sellers, find the subset that identifies those with the highest daily sales average.

Modeling

  1. Airbnb question – Does the practice of removing missing values cause bias? If so, what would you do?
  2. What is the degree of freedom for lasso?
  3. What is cross validation?
  4. Amazon questionWhat types of regularization exist? Which one is simpler to use?
  5. What is a time series model and how do you do the calculation of ACF and PACF?
  6. Booking questionHow would you create an attribution model?
  7. What would you do if the relation between outcome and features is not linear? How do you validate the model you built? Design and describe an experiment to confirm that the method you developed is a good one.
  8. Dell questionWhat is dimensionality reduction?
  9. IBM questionHow do you validate a machine learning model?
  10. What is a propensity model and how are beta estimates calculated by MLE?
  11. Dropbox questionHow would you set up a propensity model for the SMB team looking at companies between 5-200 employees?
  12. Adobe questionWhat is the difference between logit and probit models?
  13. eBay questionSuggest a modeling process for a binary classification task with skewed and unbalanced data.
  14. Build a model to identify customers interested in receiving ad emails.
  15. Google questionIf the labels are known in the clustering project, how do you evaluate the performance of the model?
  16. How do you evaluate the performance of a regression prediction model as opposed to a classification prediction model?
  17. Microsoft question How would you explain a deep learning model to customers?
  18. FICO questionWhat is a distribution you may use to model data whose range of input values is [0, N]?
  19. How do you measure and compare models? For example, the pros and cons of Random Forest vs. Logistic Regression?
More modeling questions
  1. Netflix question – How should we approach attribution modeling to measure marketing effectiveness?   
  2. Oracle question – Describe random forest to your grandmother
  3. TripAdvisor questionHow do you evaluate a classifier and how do you select features?
  4. ServiceNow question – What are the ways to transform a numeric predictor to a categorical one and vice versa?
  5. What’s the difference between Supervised vs. Unsupervised machine learning?
  6. Intuit question – How does boosting work?
  7. Amazon question – What are hyperparameters, how do you tune them, how do you test them, how do you know if they worked for the particular problem.
  8. What is overfitting? How do you avoid it?
  9. Rakuten questionHow could you contribute to the team with quantitative modeling? Present the answer with details.
  10. What is bagging?
  11. Expedia question – What is the difference between LSTM and RNN?
  12. How do you choose kernels in svm method?
  13. Netflix question – How would you build and test a metric to compare two user’s ranked lists of movie/tv show preferences?
  14. Microsoft question – What is the ROC curve and the meaning of sensitivity, specificity, confusion matrix?

Our mentors at Pathrise work with smart and accomplished data scientists all of the time, so we know where the pitfalls are during interviews. Sometimes, these candidates let nerves get the best of them and struggle with the questions, even if they know how to solve the problems. So, here are some additional tips to help you once you get into the room.

Photo of data science interview questions
Always start with clarifying questions

Sometimes, interviewers make a question intentionally vague as a way to test your problem solving skills. Especially for case study questions, it’s important to clearly define the business use case and metric. For example, if a company asked you to investigate “why sign up rates have declined,” you can ask questions such as:

1) “Over what time period did the decline happen and during which months?”

2) “How are we defining sign up rate? What is the numerator and denominator?”

Proactively show positive signal

While you’re working, provide 30 second “tidbits” of knowledge proactively. This is a strong tactic because, not only does it reduce the opportunities for negative signal, but also it provides the interviewer with a sense of your knowledge. Just make sure you are confident in what you are mentioning so it doesn’t come back to bite you.

Make context statements

Context statements are the difference between doing something and providing the reasoning before/as you are doing something. Adding context can help interviewers interpret your work better. So, try to provide the rationale behind your actions so that your interviewer knows why you are making the choices you are making, especially for actions where the interpretation is opinionated.

Know how to get help

AKA – getting a hint. Some interviewers really hate the word, “hint,” so a better approach is to say something like, “my assumptions are X and Y, I’m thinking of doing Z. But I’m struggling with solving [specific problem].” You can also ask collaborative questions like,

  • I was wondering if you had any thoughts.
  • Do you think I’m going down the right direction?
  • Do you think my assumptions are incorrect?
Understand when to ask permission questions

Every interviewer will have different preferences. For key decision points where the interviewer will have a different preference, you should ask for permission before assuming an appropriate action. These can be questions like, “Can I Google the syntax online?” or “Is it okay if I write some thoughts down on paper?” It’s also better if you ask more closed questions such as, “should I use this solution or think of something more optimal?” versus “What should I do next?”

With these questions & tips in your back pocket, you should be more than prepared for your next data science technical onsite interview. For more help with your data science job search, check out our guide to landing a data science job.

You can also review our other interview question lists:

Pathrise is a career accelerator that works with students and professionals 1-on-1 so they can land their dream job in tech. With these tips and guidance, fellows have seen their interview performance scores double.

If you want to work with any of our mentors 1-on-1 to get help with your data science interviews or with any other aspect of the job search, become a Pathrise fellow.

Apply today.

Pathrise logo

 

Alex MacPherson

Hi I'm Alex! Since graduating from UC Berkeley in 2019, I have worked on the growth team for Pathrise helping job seekers hone their skills to land their dream role through curated content on interview prep, resume building and more.

Leave a Reply

Your email address will not be published. Required fields are marked *