How to Become a Data ScientistOverview Data Science vs. Data Analytics Training & Certification Skills, Knowledge and Attributes Career Paths Work Environment Data Scientist Salaries F.A.Q Explore Courses
How to Become a Data Scientist
By: Shaohua Zhang
Last updated: June 29, 2022
Shaohua Zhang is the CEO of Toronto-based data and AI training company, WeCloudData. He is a former data science instructor at Toronto Metropolitan University (formerly Ryerson University), and has held positions as Senior Data Scientist, Head of Technology or Chief Data Officer at Rogers, Blackberry, Kik Interactive and Beam Data. Shaohua holds a Master of Engineering, Internetworking and Master of E-Commerce, Data Mining, both from Dalhousie University.
What do you think is the most valuable resource in the world? For companies that are adapting to a changing digital economy and undergoing deep digital transformations, data has become as good as gold.
In fact, tens of zettabytes of data are being produced every year. To put that into context, 1 zettabyte equals 1 billion gigabytes: It would take literally billions of years to download all that data!
Despite companies generating so much data, there still seems to be a shortage of accessible, valuable and actionable information. This has led to demand for professionals who are able to help businesses store, organize and extract meaningful insights from their data.
And this is where data scientists come in.
Data science is an evolving and in-demand field with enormous growth potential. With its rapid growth, booming job prospects, solid compensation and cutting-edge technologies, the Harvard Business Review once declared it the “sexiest job of the 21st Century.” As more companies realize the benefits of employing data scientists, data science roles are becoming more specialized and abundant.
However, the path to becoming a data scientist is not always as straightforward as more traditional roles. For example, nowadays, you don’t necessarily need a degree. Greater importance is being placed by employers on your skills and experience, so understanding how data science can be used to build better data-driven products will be key to landing a job in this field.
In this guide on how to become a data scientist, we will explore emerging career paths, training and certification opportunities, key data science skills, salaries and more to help aspiring data scientists like you find the straightest possible path to career success.
But first, let’s differentiate between two commonly misunderstood terms in data science.
Data Science vs. Data Analytics
Any search for data science job postings will also produce results for the closely related field of data analytics. While there is some overlap between these two fields, they have different missions, processes and responsibilities. Let’s examine those differences to determine which career best suits you.
Data science takes raw, unstructured data from myriad sources and creates algorithms, predictive modeling processes, and other custom analyses to shape raw data into understandable insight.
Data analytics uses existing, structured data to identify trends, generate practical insights and answer questions to drive better business strategies.
In other words, data science finds new ways to capture and analyze the data used by analysts. Data analytics, in contrast, makes sense out of existing information. (For more, visit CourseCompare’s career guide on how to become a data analyst.)
Data science is, it’s worth noting, the more technically complex field. Data analytics doesn’t require the same level of mathematical and programming fluency as data science. As a result, there are fewer barriers to entry into data analytics.
Training & Certification
The growth of data science has brought about various full-time and part-time training programs where people can either fully devote their time to learning, or have the flexibility to work at the same time.
Although it may seem like the data science field is saturated with talent, demand for skilled data scientists — especially those who are able to convert data into actionable insights — is still growing according to the Job Bank at the Government of Canada. This is the result of net new demand for data scientists combined with the need to replace people retiring from the profession.
To distinguish yourself in this field, it is important to be able to demonstrate your skills and experiences to employers. A great way to get started is to take data science courses and get certified in data science. There are many options, each with their strengths and weaknesses. Ultimately, you need to find a program that fits your interested, career goals and learning needs.
MOOCs, or Massive Open Online Courses, such as Udemy, Coursera, and Datacamp usually have great content for beginners that will help you get started. With these courses, you can learn at your own pace at a relatively low cost while you prepare for more serious training.
The flexibility and a great catalog of courses taught by universities (e.g., Coursera) and industry experts (e.g., Udemy) make online courses a great way to learn basic terms, tools and concepts and kick-start someone’s data science career.
Online courses are great for upskilling but the curriculum is rarely structured into a cohesive program. There are so many options that learners sometimes can’t comprehend how all the things they learn work together in real life. Few MOOCs offer strong learning and career support that will help you get job ready.
A master’s program in data analytics or data science is one of the best options if you want to get a job in this field. The program is longer in duration and the curriculum is developed to help students build a strong theoretical foundation. Candidates with an advanced degree in data and computer science are often preferred by employers.
University programs are usually longer than short courses and therefore don’t work well for students and career switchers who want to get into this field quickly. The overall time and tuition investments are usually a lot higher than Bootcamps and online courses
A Certificate or Diploma from a University/College’s continuing education program is also a great way to kick start one’s data science learning journey. The courses usually have a good balance of theory and practice and are taught by experienced professors and industry professionals
The class size is usually bigger than a Bootcamp and therefore students get less tailored support from the Faculty team. Career support is not as strong as Bootcamps
Data science bootcamps
Bootcamps are usually job oriented and therefore they focus on teaching practical skills demanded by the employers. Bootcamps offer very strong learning support because students go through training in a very intensive way. Bootcamps usually have industry connections and provide strong career support and job referrals
Immersive Bootcamps can be a bit pricey and require the students to dedicate 30-40+ hours a week. Therefore it may not work well for students who have a full-time job. Bootcamp courses usually focus less on theory due to the time limitation. Graduates will still need to spend additional time and effort to learn theory and coding on their own.
A great way to build upon one’s technical and soft skills while getting job advice would be to do a co-op or work on client projects and join a mentorship program and community. Just as there is great value in insightful data, there’s also great value in receiving relevant support and information from a tight-knit community of mentors and learners.
At WeCloudData, our full-time Data Science program with client projects is designed to help our students succeed by getting hands-on experience and career mentorship. If you prefer to have a more flexible learning schedule, our part-time Data Science program can help you make the transition while keeping your other commitments.
The importance of data science portfolio projects
If you’re reading this article, you’re probably a new grad or career switcher. The data science hiring companies usually prefer candidates with experience. A study carried out by WeCloudData shows that on average a company would prefer to hire someone with 2-3 years of experience. That poses a challenge for junior data scientists.
Employers understand that having the math knowledge doesn’t really make someone a great data scientist candidate because data science is more than just mathematics and machine learning algorithms. It requires knowledge of coding experience, experience of database queries, and many other skills. Therefore it’s very hard for employers to tell which candidates will actually perform better on the job. As a result, candidates will need to demonstrate more through hands-on experience.
Because of the competition in the job market, it will be very hard for a candidate without professional data science experience to get noticed by the recruiters and hiring managers. Having a few portfolio projects will definitely make a big difference.
My recommendation to anyone who would like to get into the data science field is building a strong portfolio of data science projects and showcase them via github, linkedin posts, or medium blog posts, because: It helps demonstrate your hands-on experience with data problems; it shows your passion and efforts; it catches employers attention and therefore helps you stand out.
WeCloudData’s Data Science Bootcamp is a one-stop service program for you to acquire the essential data science skills, gain professional experience through real client projects and prepare for data careers. The program is intensive, rigorous, and practical by design. Led … Continue reading →
Springboard’s live online Data Science bootcamp is structured to fit into your life, and guaranteed to get you a job. Learn at your own pace with 1-on-1 mentorship from industry experts and support from student advisors and career coaches. The … Continue reading →
Successful completion of this program will prepare you for a career in Data Science, Data Analysis or Business Intelligence. BrainStation’s Data Science Full-time program is an intensive, collaborative and rewarding learning experience where no two days are the same. With … Continue reading →
Our immersive 12-week Data Science Bootcamp will provide you with the skills, knowledge, and confidence to launch a career in Data Science. From SQL to Python to Machine Learning and beyond, you’ll learn a mix of programming, mathematical, and applied … Continue reading →
The UBC Master of Data Science (MDS) is a 10-month professional degree program. The MDS program was developed by the combined expertise of the UBC departments of Computer Science and Statistics to help meet a growing need for data scientists … Continue reading →
BrainStation’s Python for Data Science training is designed to provide individuals with the Python programming fundamentals needed for a career in data. One of the most widely used programming languages, Python is the language of choice among Back-End Developers, Data … Continue reading →
Jul 21 - Aug 18, 2022
Aug 16 - Sep 13 2022
Sep 12 - Oct 17 2022
Sep 15 - Oct 13 2022
Oct 12 - Nov 9 2022
Nov 14 - Dec 12 2022
Nov 15 - Dec 13 2022
WeCloudData’s Data Engineering diploma program focuses on helping students to acquire the essential data engineering skills, gain professional experience and prepare for data engineer careers. What You’ll Learn Solid understanding of major big data and cloud platforms such as Hadoop, … Continue reading →
Today, Data Scientists play an integral role in helping companies both big and small predict industry trends, launch new products, and innovate based on consumer needs. Data Science is all about extracting insights from data to drive decision making, and … Continue reading →
Sep 20 - Nov 22, 2022
Sep 24 - Oct 29 2022
Nov 13 - Dec 11 2022
Nov 24 - Feb 9 2023
The Bachelor of Computer Science (Honours) (BCS) degree is a four-year program, including three paid co-op work terms, which will prepare graduates for a number of progressive job opportunities as high quality software developers. Building upon the recognized foundations of … Continue reading →
Designed specifically for studying the application of artificial intelligence and machine learning in the context of modern business decision-making. The Master of Management in Artificial Intelligence will provide: A strong understanding of the technical principles of AI and modern methods … Continue reading →
Get the tools to take on one of the greatest management challenges in the age of digital technology and artificial intelligence: become an expert in analytics and data science. You’ll acquire the skills you need for better decision making, making … Continue reading →
The 12-month Master of Management in Artificial Intelligence (MMAI) is designed to meet the growing need for agile, talented individuals with both management skills and advanced applied knowledge of AI. The immersive curriculum offers you a technical foundation in natural … Continue reading →
Skills, Knowledge and Attributes
No two companies structure data exactly the same way, and each faces unique data problems at different stages of their growth, even if they are in the same industry. Data scientists are required to help tackle tough business challenges through data plumbing, understanding, visualization, predictive analytics, and prescriptive analysis, among others.
As such, data scientists need to know many technical skills such as SQL, Python, Visualization, Machine Learning, Cloud Computing, and Big Data. It can be overwhelming to beginners. But Rome wasn’t built in a day, and you certainly don’t need to master everything to start with a great job in this field. So, focusing on learning the most essential skills is critical.
Here, for example, is a general data science learning path proposed by WeCloudData’s data science faculty.
Let me break each one down for you in more detail:
Python and SQL are fundamental coding skills
Most data scientists spend a big chunk of their time on data wrangling, which includes data extraction, data querying, data manipulation, and data visualization.
Almost all data scientist interviews will arrange coding challenges — an employer-given test or assignment that lets you demonstrate your skills — and we’ve seen many job seekers with advanced skills failing the coding challenges and therefore missing out on good opportunities.
There are almost too many resources available for learning SQL and Python. In general, we would recommend a beginner to start with a Udemy or Datacamp free course to learn the basics. Bootcamps like WeCloudData, Lighthouse Labs, and BrainStation also provide free courses that students who have been accepted into a program can use to bring themselves up-to-speed during the pre-bootcamp.
Being great at SQL queries and Python programming isn’t going to make you a good data scientist. With SQL skills you can become a data analyst or database professional. With Python skills you may become a web developer. A data scientist will use SQL and Python as tools to process data, visualize data, and analyze data.
In general, data wrangling and manipulation are very important data skills a data scientist needs to master.
Recommended learning goals
- Data filtering & selection
- Data aggregation
- Data pivoting & transformation
- Data merging & concatenation
- Data cleaning
As the old saying goes “a picture is worth a thousand words.” Data visualization is an imperative part of data science and analytics. Both data analysts and data scientists need to understand how to visualize data and tell data stories.
Data analysts are usually focused on building reports and dashboards. They will be working with tools such as Excel, Tableau and PowerBI to share data insights with their business.
Data scientists need to know how to visualize data as well, especially when it comes to understanding the quality, shape, distribution, and correlation of features and attributes that will be used for machine learning.
Recommended learning goals
- Building visualization dashboards using PowerBI or Tableau
- Visualizing the shape and distributions of data points
- Exploring the correlation or association between variables/attributes
- Discovering errors in data via visualization
- Visualizing map data for location analytics
- Visualizing unstructured data such as text and images for knowledge discovery
Mathematics foundations are essential for data science. They are used throughout different stages of a machine learning project.
A company’s marketing and product teams typically use statistical testing to optimize product feature design and customer behaviors. There are plenty of specific examples of this in practice:
Supply chain management may use regression analysis and time series models to forecast demand.
Statisticians and data scientists may use math and statistics to understand the macroeconomic trends.
Data scientists usually use statistical methods to explore correlations between different variables.
Data scientists and machine learning engineers always use machine learning techniques to do predictive analytics which heavily rely on linear algebra, linear optimization, and statistical learning.
Deep learning engineers will also use non-linear deep neural networks to do image classification, natural language processing, and reinforcement learning.
While math is important in many ways, a beginner may get intimidated by it and find it a bit tedious to study when the theory is not well connected to practical use cases. At WeCloudData, we usually recommend students learn just enough math and start to apply machine learning methods and libraries to solve small challenges. It’s very important to build things slowly and gain confidence, then build on what you’ve learned.
Recommended learning goals
- Statistical distributions
- Hypothesis testing (chi-square test, t-test)
- The basics of probabilities
- The basics of linear algebra (vectors and matrices)
- Gradient descent optimization
- Regression analysis
Machine Learning (ML)
Machine learning is probably one of the most exciting parts of data science. After all, it’s one of the things that separates a data scientist from a data analyst. A data scientist will usually spend more time on building machine learning and statistical models to help the business solve applied problems such as forecasting, customer segmentation, churn prediction, marketing campaign response modeling, and more. The outcome leads to either increased revenue or cost saving and impacts the bottom line.
For beginners, machine learning (ML) can be a complex and challenging topic. It requires knowledge of math and statistics, the ability to manipulate data, and the skills to work with Python libraries as prerequisites.
The good news is that if you’re at this stage you’ve probably already invested in coding and learning a bit of math. Understanding the end to end machine learning (ML) process and practical techniques are probably more important than math in most cases. Even though the theory might be complex behind the scenes, most of the time a data scientist just needs to work with existing open source (or proprietary) Python packages. What’s important is preparing the data that gets fed into the algorithms for training, testing, and then validating the results in an unbiased way, and interpreting the results properly to the business.
Recommended learning goals
- Basic optimization methods for machine learning (e.g., gradient descent)
- Data preprocessing (feature selection and feature engineering)
- Linear regression
- Non-linear models (decision trees; ensemble methods; neural networks)
- Clustering analysis (k-means)
- Model interpretation
Common tools and packages to use for machine learning include: Pandas Dataframe for feature engineering and data manipulation; Scikit-learn for building ML pipelines (swiss army knife that support different algorithms); PyTorch, Tensorflow, or Keras for deep learning; and LIME and SHAP packages for model interpretation.
Big Data skills
Big data is an important skill data scientists need to have. Learning big data should come after one has learned coding and machine learning.
While not all companies have and need to use big data, having the knowledge will definitely help a job applicant stand out to employers from industries such as retail, banking, insurance, telecommunications, and of course, big tech.
Many of my students over the years have found big data harder to learn than machine learning. This is mainly because big data involves different tools and platforms and can feel a bit more engineering focused than data science.
When you work on big data projects, try to widen the scope and work on data collection, data ingestion, data analysis, and machine learning, as well as model deployment.
One great benefit of a data science career is that it doesn’t tie you to any particular industry or sector; data science is in demand across an enormous range of industries, and the skills are readily transferable.
It’s helpful for many of my students to first understand the different “career tracks” one can venture down in data science:
- Technical track
- Managerial track
If you’re passionate about technology and want to stay on the technical path, there are several options:
- Become a senior or lead data scientist and work in different industries
- Become more specialized in ML and turn into a machine learning engineer
- Become a data engineer or even software engineer
Here, more broadly, are some of the most common career paths and data science roles today:
Lead Data Scientist
Working as a lead data scientist doesn’t only require technical skills. You will be setting the project roadmap along with the leaders, carrying out larger data projects with broad strategic implications for your business, and you will be leading junior data scientists to tackle challenging problems. If you want to become a lead data scientist, be prepared to:
- Keep learning new techniques and have at least a specialized area
- Become a generalist since a large scope project will require more than just machine learning
- Get comfortable working with different teams including product, software, engineering, and business
- Stay abreast of cutting-edge technologies and read more literature in the AI field
The path for technical data scientist may look like this depending on the companies you work for:
- Lead Data Scientist
- Principal Data Scientist
- Chief Data Scientist or Chief Scientist
Machine Learning Engineer
Machine learning engineering (MLE) is a specialized role. Going from a DS role to an MLE role requires stronger engineering skills. Machine Learning Engineers will spend more effort on dealing with big data, engineering ML pipelines, and working with MLOps to deploy models into production. If you want to become an ML engineer, try to learn cloud, Docker, Kubernetes, as well as Spark. Understanding of REST APIs and system design are useful too.
We’ve seen many data scientists switching to data engineering in recent years. Some are going after potentially higher salaries while others discovered a stronger interest in the engineering side of data projects.
Data Engineering requires less statistics, math, and machine learning. The requirement for coding is higher, and data engineers need to be comfortable writing production-grade code. Data transformation functions need to be properly tested. Data Engineers also need to have an architect-level view of the entire data pipeline and make sure things run smoothly in production.
Manager of Data Science
A data science manager’s role is similar to that of a lead data scientist. The difference is that the Manager is a people manager role that involves managing the team, setting goals, as well as doing performance reviews, for example.
Becoming a manager might mean that you will become less technical because you will allocate more time to work with your team of data scientists. You will have regular one-on-one meetings to help them set goals, evaluate performance, as well as provide mentorship. That quickly eats up your time, but you’re playing a critical role in building a high-performance data team.
Once you go down the managerial path, your options may be:
- Senior Manager of Data Science
- Director of Data Science or Head of Data Science
- Chief Data Officer
Data Product Owner
Another interesting path to go down is the product manager route. Data-driven product managers are scarce resources. Startups building high-growth applications will want product managers who can work with a team of software engineers, data scientists, and data engineers. Among those roles, data scientists usually work closely with the business teams and therefore it’s natural for some people-oriented data scientists to consider a role in product management. You’re more likely to see data scientists switching to product management in FAANG companies.
With so many businesses going through digital transformations, the demand for data consultants becomes increasingly high. Many companies don’t have the budget to own or experience to run a data science team. But they still have interesting data problems. Companies that want to collect more data for advanced analytics also want some experts’ help in setting the data strategies. This is where consultants step in and provide lots of value.
If you like to work with business leaders and enjoy gaining data experience in various industries, then consulting may be a great option for you.
Data scientists generally report above-average job satisfaction; unscientific surveys from sources including Glassdoor and CareerExplorer find data scientists rate their job satisfaction around 3.5 out of 5.
Among the benefits are the high demand and job security, great pay (more on that later), and versatility. As we covered above, data science skills are easily transferable, and you can work in a wide range industries and disciplines, from automotive to zoology.
Some data scientists also cite the work itself as a plus; it’s interdisciplinary nature requires you to flex a lot of different muscles to solve complex problems, and the resulting challenges keep the job stimulating and rewarding.
On the other hand, some aspects of the job involve repetitive work, particularly in the preparation stage of the lifecycle. Surveys also find that data scientists spend the bulk of their time on mundane tasks like data cleaning, preparation and formatting, although evolving technologies and automation are gradually reversing this trend.
As you might expect with a heavily computer-based career, the work tends to be sedentary, and requires a lot of time at a desk.
Workplace dynamics can vary greatly for data scientists. Depending on the company, you may be on a small, or even one-person team with limited workplace interaction. Many larger companies employ more robust teams with greater opportunities for collaboration and brainstorming. Regardless of the team size, a data scientist must be able to work independently to perform tasks such as writing code and formatting data.
Schedules and working locations can vary as well. Data science tends to be a full-time job which sometimes requires evening and weekend work on larger or more time-sensitive projects. Remote working opportunities are growing, but it’s typically an office-based position.
Like many other STEM careers, data science has a pronounced gender gap. A 2020 report from global management consulting firm BCG found men make up approximately 80 per cent of the workforce, although some reports find the gender gap is slowly narrowing.
Overall, data scientists enjoy a safe working environment, with the most common concerns being stress, the repetitive parts of the job and ergonomic concerns from long hours seated in front of the computer.
A day in the Life
A data scientist’s daily tasks will revolve around working with data and communicating data solutions with a wide range of stakeholders. For example, some data-related tasks include analyzing data and looking for patterns or trends that will help guide business decisions later on. Other responsibilities include:
- Developing and testing new ML models that will help the business simplify data problems and improve functionality.
- Making batch predictions, which is used for making predictions on a large amount of data (usually a tool like Spark is used to generate the predictions).
During this process, there will always be communication between the data scientist and stakeholder to determine the scope of the problem. These requirements might also be determined between the business intelligence professionals and stakeholders, which are then shared with the rest of the data team.
A lot of this communication will involve people who might not be as familiar with the technical aspects of the job, so it is the data scientist’s job to ensure that clients understand the implications of different insights and decisions in terms that are plain and easy to understand.
Data Scientist Salaries
Salaries for data scientists in Canada range from $77,870 per year for entry-level positions to $137,025 per year, with an average salary of $95,219 according to Talent.com
Salaries may range even higher depending on the company and your skill and seniority; Canadian banks prize experienced data scientists and offer salaries as high as $160,000, while blue-chip tech firms like Shopify, Microsoft and IBM offer as much as $190,000 for the most qualified candidates. Of course, salaries for data scientists in managerial positions can be even higher.
|Role||Average Salary in Canada|
|Business Intelligence Analyst||$77,529|
|Machine Learning Engineer||$106,262|
|Senior Data Scientist||$110,946|
|Chief Data Scientist||$240,647|
Government positions hover around the low-to-mid range of that salary scale, but often offer greater stability and pension and benefit packages.
Of course, there are also other things to consider aside from salary. Things like the meaningfulness of your work, work-life balance, benefits, stock options, profit-sharing programs and opportunities for growth will together determine how fulfilled you are in your data science career. A workplace that provides all of these can increase your job satisfaction and overall productivity, too, which may in turn help you advance faster in your career.
Shaohua Zhang is the CEO of Toronto-based data and AI training company, WeCloudData. He is a former data science instructor at Ryerson University, and has held positions as Senior Data Scientist, Head of Technology or Chief Data Officer at Rogers, Blackberry, Kik Interactive and Beam Data. Shaohua has a Master of Engineering, Internetworking and Master of E-Commerce, Data Mining, both from Dalhousie University. His research interests include scalable recommender systems, computational advertising, location intelligence, and Internet of Things. He has one patent pending in location analytics.