Whenever and wherever the first data scientist job interview took place, it’s probably safe to say that both the employer and the candidates involved were largely winging it.
Unlike more traditional business roles in sales, marketing or finance, the need for data scientists emerged from a need to pursue answers to questions organizations may never have asked before. This included what kind of unstructured or “big data” they have at their disposal, the insights it held about customer behavior and, most importantly, how they should act on it.
Like many other jobs that have been born out of the digital age, meanwhile, data scientists lacked a standardized description of their key responsibilities, background qualifications or attitudes. Some aspects of the job might have been dependent on the unique needs of the business.
Jaydeep Chakraborty was among those early data scientists, starting with consulting companies and more recently in the financial services space. Now as an advisor with the Toronto School of Management (TSoM), he works with countless students to show them how they can help organizations do more with their data through TSoM’s Diploma in Data Analytics Co-op.
“When I started, what a data scientist was supposed to do encompassed data management, predictive model building and communication with the stakeholders,” he said. “Today, the technologies have evolved so much that each of these aspects has become a role in itself. Data management is a piece in itself. Model building is a piece in itself. There is so much more work to do.”
Chakraborty has gone on to hire many data scientists himself. He suggested thinking of the role as “custodians of insight generation” – people who can not only use basic data visualization tools to illustrate high-level trends but use coding skills to create machine learning algorithms that unearth deeper opportunities for business transformation.
As organizations mature in their use of data, the process of hiring data scientists has also become more consistent, Chakraborty said. He outlined several of the stages that those pursuing a data science career should prepare for, including areas that may transcend the typical question-and-answer data science interview associated with hiring for more traditional roles.
Before The Data Science Job Interview, Part One: The Hackathon
Much like a talent agent might watch a musician perform on stage before signing them to a record label, companies often like to see prospective data scientists putting their skills into action first.
The most common form for this is a hackathon, where participants are given data science problems that are loosely representative of what they will tackle on the job.
A hackathon tells a hiring manager several things at once, Chakraborty said: how you can solve problems and what you can accomplish under considerable time pressure. Those participating in a hackathon might be given three to five problems to solve in a single hour, for instance.
Although it’s probably impossible to get through all of them, solving two would be good, he said. Solving three would be an even greater indicator of what a data scientist could accomplish.
In a sense, Chakraborty said organizations may treat hackathons as a sort of “elimination round” in the data scientist recruiting process, especially if they get a significant number of applications.
Before The Data Scientist Job Interview, Part Two: The Case Study
Training in areas such as programming, multivariable calculus and linear algebra may all provide foundational skills for data scientists to get through a hackathon. However, employers need to see more than technical ability to know if someone is right for a data scientist job.
Just as important is for those seeking a career in data science to show they can relate what they learn to others in a business. This is where case studies come in.
While case studies – where students look at real-world scenarios to test their knowledge – have been a staple of MBA programs for years, they look a little different for data scientists. Organizations not only offer a problem statement based on a business challenge they are currently facing or have confronted in the past, but a sample data set.
Aspects of the data might be masked out of consideration for confidentiality and risk management, but otherwise candidates are armed with the same raw material as those currently in the field. After coming up with a solution to the problem statement, candidates are asked to present it.
“Here, the candidate is evaluated not only on their problem solving ability, but also the ability to relate to the business case and provide recommendations,” Chakraborty said.
Now I want to think beyond that I want to see, okay, is the guy able to relate things to the business?
“Tell a story, rather than just come out with statistical jargon. The business doesn’t care about how good a model is. They want to know whether you’re telling them something that will guide them in taking action.”
Going through this stage will tell hiring managers a lot before they even begin with more standard data scientist job interview questions. It should also offer candidates a lot more context about the environment in which their potential employer is operating. You should use that in answering the questions that follow, showing you have identified some of the biggest priorities and objectives and how data science will contribute to addressing them.
Data Science Interview Questions
When companies are hiring for what Chakraborty called a “level one” data scientist, they know candidates might not walk in the door with a lot of experience under their belt yet. However their training should give them enough of a technical background to answer common questions such as the following:
- What do you see as the key distinction between data science and application development?
- How would you define the difference between supervised and unsupervised learning?
- Why might you use a decision tree and how you would create one?
- What are some key benefits of data sampling and how would you approach it?
- What are some best practices for data cleansing/wrangling before you apply a machine learning algorithm?
- What steps would you take to develop a logic regression model?
- Where might you need to use linear regression instead?
- How do you spot the difference between Type 1 and Type II errors?
- Why are true-positive and false-positive rates important, and where might you use them?
- How would you apply the concept of the 80/20 rule to model validation?
Some of the other initial questions might be about which data science courses you’ve taken (or are in the process of taking), or what projects you’re working on that are pushing the limits of your existing knowledge.
Rather than focus solely on what a potential data scientist already knows, in other words, employers want to get a better read on what you want to learn – and how proactive you are in doing so.
This leads naturally into some of the other common questions. The following list is not exhaustive but provides a good starting point in preparing to seek data science career opportunities.
1. What is the most interesting problem you’ve solved?
At first glance, this may sound similar to a more high-level question in job interviews for other roles. It’s not. Hiring managers will want to hear the full overview of the business challenge, the data set involved and how you brought insights back to stakeholders from beginning to end.
In some cases, the data set for a problem and even the code used to analyze it may have already been developed by someone else. That means you should talk about your specific techniques in tackling the data to provide business value.
2. What would be your next step?
Most business problems involving data science are never once-and-done. They need to be continuously developed as business conditions change, new information comes to light, or both.
Hiring managers may phrase this as asking about a particular variable in the data set to make sure you understand what it means. They could also get more specific, such as how you might extend the output of a gradient boost (a machine learning technique used in regression and classification tasks).
“Whenever they explain a project, we ask them questions to see, does the person have a vision about a bigger solution?” Chakraborty said.
3. What kind of data would you ask for if you had enough time and other resources to use it?
There usually isn’t such a thing as “perfect data” in a business. Instead, data scientists searching for insights often have to go back to stakeholders across various functions to ask for more data to fill in gaps or address conflicting information.
Answer this one bearing in mind that data scientists should never be working in silos. Beyond their technical expertise, they need to be effective in managing partner relationships across the entire organization.
Hiring managers will likely probe deeper, asking why a particular data set would interest you and what you think it would mean for the business. Even if the business wouldn’t possess the data set you’re thinking about, this a chance to show how you think about putting data in a business context.
4. How would you describe your approach to communicating?
Given the demand for data science talent, Chakraborty said some organizations may eschew multiple rounds of interviews and combine them into fewer sessions with more stakeholders.
The hiring manager might be joined by someone from HR, for example, but also a business analyst, a scrum master or others who might typically need to engage with data scientists.
Not surprisingly, most organizations are concerned with “fit.” They want to know that you won’t sit in a proverbial corner with a data set avoiding questions and then present an answer to a problem. They want to see that you can answer questions from a variety of team members.
“It’s easy to find a lot of people who are solo code jockeys,” Chakraborty said. “But data science, more often than not, is a team activity.”
In fact, many business problems might have three or more data scientists on a team investigating them. They might not only have to interact with other lines of business but other teams of data scientists. Given that many business problems are related to one another, for instance, you might need to draw upon another team’s analysis of a data set or learnings from a project to solve your own problem.
5. What do you think will be important skills for you to develop in order to excel?
Data scientists obviously learn a lot in school, and even more when they’re on the job. Yet the nature of data science is one of continuous change, which means taking a lead role in your professional development is arguably even more important than in more traditional jobs.
Don’t simply answer this in generalities. Instead, try to talk about times where you investigated data out of sheer interest. Chakraborty gave the example of the COVID-19 pandemic that emerged in early 2020. There are now many publicly-available data sets about the virus, which an aspiring data scientist could use to create models that could forecast a similar public health threat in the future.
As ambitious as that might sound, it’s a way to show hiring managers two important character traits.
One is that you have the kind of curiosity that will lead you to explore data to its full potential, seeking out feedback along the way and changing your approach as you do so.
Secondly, you’re demonstrating that you are exposing yourself to problems that data science could help solve. Remember that a good employer is not simply interested in filling a seat. They will want to learn about what kind of contribution you could make to the organization over the long term – and how they might manage your career progression in order to support you.
Data scientist hiring process timelines
A hackathon, a case study, a technical interview and then an HR interview sounds lengthy. In practice, however, Chakraborty said organizations may not always include each step. As mentioned above, they might combine the technical and HR interview in order to accelerate the processes soas not to lose a promising candidate to another firm.
Some organizations may also hire data scientists on a contractual basis. In that case the process could be even shorter, especially if the candidate has any specific experience or training in a tool the company is using to solve its business problems.
No matter how many interviews are scheduled, your approach should be the same. Knowing how to define technical terms is table stakes. You’ll be much more likely to impress hiring managers by telling stories where you:
- Clearly associate a dataset or data science problem with a business issue
- The key business stakeholders that should be involved in solving a problem
- The biggest business benefits (such as increased revenue) that could come from solving the problem
- How you engage with the business to continuously improve and optimize your approach to data science.
Jaydeep Chakraborty is a data scientist, entrepreneur and faculty member at the Toronto School of Management (TSoM). He developed his data science expertise by working on Machine Learning and AI projects at Accenture, Wipro and Hansa Customer Equity before founding Simplify Analytics (rebranded Clevered). Jaydeep also works at Fido Investment, where he is