Quantcast
Channel: Andela
Viewing all articles
Browse latest Browse all 615

How to get your first job as a Data Scientist

$
0
0

Data Science is an increasingly hot topic in the tech industry. In this Writer’s Room blog, Andela community member Saimadhu Polamuri explores how you can land your first job as a data scientist.

It’s tough to get a data scientist job as a newcomer in the Data Science field. But if you follow a strategy to hone your craft and develop knowledge of the required skill sets, you can take the next step into your new career as a data scientist.

The learning path won’t be easy. You need to spend a lot of dedicated time learning all the required skill sets to land in the data science field.

Follow our six steps to becoming a data scientist!

Six level strategy for getting entry-level data scientist job

Six level strategy for getting entry-level data scientist job

  1. The motivation for the Data  Science field
  2. Data science concepts preparation
  3. How to apply the concepts to the real-world problems
  4. Portfolio building & networking
  5. Smart ways to search for data scientist jobs
  6. What next

Let’s walk through each level in the 6-level strategy.

The motivation for joining the Data Science field

data science field

We need a little motivation to get started, but in the process, we can lose motivation and give up on the thing or skill we started learning. 

This happens with Data Science or anything else you may want to learn. When the difficulty level skyrockets, our mind creates many reasons why we should walk away. When you finally realize the mistake you’ve made in walking away, you have lost a lot of time, which you never get back. 

So, we need high energy in the form of motivation while learning.

When you feel lacking in motivation, remember a Data Science application you admire, and imagine you were part of building such an application.

When you feel like giving up, remember why you started.

So let’s see some uses of data science.

Data Science applications

Data science applications

Various industries use and value Data Science, for which different projects and applications have been built;

Healthcare

Data Science applications are helping in tackling/curing many diseases, while also reducing the time to get approval for new vaccinations and treatments. 

Some of the applications diagnose patients’ diseases more accurately than expert doctors. 

Let’s take the current world pandemic; AI and Data Science models are helping a lot in vaccination creation and tracking the disease. 

If you’re interested, you can read a great article on ScienceDirect. Below are some insights taken from the article published in ScienceDirect titled Artificial Intelligence (AI) applications for the COVID-19 Pandemic.

Applications of AI and Data Science during the COVID-19 Pandemic

  • Early detection
  • Monitoring treatment
  • Tracing for individual contacts
  • Projection of cases and mortality
  • Vaccines development
  • Reducing the workload of healthcare workers
  • Disease prevention

We know that Data Science models outperform with more data. In healthcare, to build such models, we have a lot of data to feed for building Deep Learning or Machine Learning models. 

The insights derived from the various models can be unpredictable compared to real, experienced doctors.

Entertainment

In the entertainment sector, the uses for Data Science are beyond our wildest imaginations. We’ve limited the Data Science applications in entertainment to just Netflix or Youtube recommendations. 

But various applications are trying to learn more about customers to understand what kind of videos we like and what kind of music appeals to us. Streamers show users advertisements for similar tastes and styles using these customer insights. 

Social networking

Our days mostly start with surfing social networking sites, and end with surfing too. 

People are more addicted to social networking sites these days. The key reason is that all social networking sites know what you like and don’t like, so it’s easy to keep you engaged with their platform via suggested content. 

Various Data Science applications in social networking sites also recommend the people you know as suggested friends. 

It’s called recommendation, and every social networking site has its recommendation engine to recommend people you like or content pages. Also, it suggests products you would like to purchase through ads. 

Business growth

Every organization nowadays is moving towards Data Science to know their customers better. This helps organizations to provide more personalized products to users. 

Just imagine a lot of Data Science and decision science techniques are applied to the T-shirt you are wearing now.

Irrespective of the field, every business organization is seeking skilled people to grow the Data Science team in their organization. This gives opportunities for us if we are proficient with the minimum required Data Science skills.

AI and Robotics

Regarding Robotics, I don’t need to say anything about Data Science’s importance in this field. 

Even children know the importance of Data Science or Artificial Intelligence in building robots that look and function like humans.

Like healthcare, much research is also happening in the Robotics field. We have many robots that significantly do the work humans can do. This ranged from cooking, waiter jobs in restaurants, As soldiers in the defense field, and much more.

Space research

In the space segment, the Data Science application helps create the situations for the Rockets and predicts the likelihood of various components used to build rockets. 

In short, the intention is to remember the diversity of the field and scope before you give up and remember the applications of this field to get back on the learning track. 

We have seen various Data Science applications; now let’s discuss what the growth of the data science field is. We will address whether this field will sustain longer or saturate to early.

Growth of Data Science

The other thing to consider when stepping into any new field is the growth of that field. If you consider two decades back, we used to have many job roles that do not exist now. 

With the increase in technology, those jobs have vanished. They lost their significance. So, when selecting any new domain or field, we need to know the growth of the field, and we need to know if this field will sustain for an extended period, or if it will get to the saturation stage. 

According to the global research report, the need for skilled data scientists has continuously increased over the last 5 to 6 years. This continuous growth will continue for the next ten years without any doubt. 

Now let’s discuss various roles available in the Data Science field. 

Possible roles in Data Science

Even though data scientist jobs are considered high-level for newcomers in the Data Science field, many other job roles fall under the Data Science field.

Let’s see different job roles in this field and understand what skill sets we need to get hired. 

Data Science Possible Roles

When you decide to get a job in the Data Science field, You don’t need to limit yourself to a data scientist job role. There are various job roles in this field. 

  • Data Scientist
  • Data Engineer
  • Business Analyst
  • Data Analyst
  • Visualizer
  • Researcher

Let’s discuss the key skill sets for these roles.

Data Scientist

A Data Scientist’s role is to help build various Machine Learning or Deep Learning models to solve complex business problems. 

Before building these models, data scientists spend a lot of time preprocessing the data to create fruitful data for modeling. 

They also spend time with the domain experts to learn more about the problem. If you join any big organization, the data science team will get all the required data from different sources, and this team will mainly focus on building accurate models.

To get the job in this area, you need to be good at different machine learning algorithms and deep learning algorithms along with good command with various databases. 

You need good experience with data preprocessing frameworks like pandas. We need to know the mathematical foundations to understand the various algorithms. It would help if you were very good at coding.

Data Engineer

The data engineer’s role is to collect the data from different sources and make it available to the data scientist team in a more structured form. 

This role requires excellent coding skills like data structure, and you need to be very strong in MYSQL.

This role demands you to change the data from one source or table to another source. Sometimes you need to perform different data transformations on top of the data collected from various data sources.

For this role, you need to learn different Big data technologies like Hadoop, Spark, Kafka, different data streaming services and microservices, .etc.

Business Analyst

The business analyst’s job is to get quick insights from the data. They work on Poc projects to analyze minor data to check the feasibility of the solution.

In some organizations, these business analysts will take care of data modeling too. They mainly interact with clients and stakeholders to collect business requirements.

Data Analyst

The data analyst role is a lower level for the data scientist role. They work a lot on the data preprocessing stage; in many organizations, the junior data scientist role is considered a data analyst.

Visualizer/ Tableau Team

We have a saying

A picture worth a million words. 

In this field, we need to show the available data in a story structure with great visuals to clients.

To visualize the data in a better way, we need visualizers. Who represents the data in a much more reasonable way to address many business questions without building any Machine Learning models.

Also, the build models need to explain how the build model impacts the business with proper storyboards. 

To land this role, you need to be very good at different data visualization tools like Power BI, Tableau, and you also need to be good at database queries. To visualize the data, we need to pull the data from different tables and sources.

Researcher

The data science domain offers many jobs in the R&d too. Many big organizations will have dedicated research teams. Who will build new modeling frameworks, optimizing the model building time for the current using modeling frameworks? 

To get this job, we need to hold a master’s or Ph.D. degree in mathematics or statistics or in Artificial intelligence.  

By now, we know about the field and the different roles we can target.

It’s up to you to know which role you want to go for. First, select the job role you want to pursue and learn the required skill for that role. Don’t spend time learning all the data science skills.

Now let’s learn the concepts we need to learn to get a job in the data science field as a data scientist.

Data Science concept Preparation

How to prepare for data scientist job

How to prepare for data scientist job

When it comes to learning a new skill, the common question we would have in mind is:

How much time do we need to spend on learning various data science skills?

Are you asking the same question in your mind?

Don’t worry; we can learn all the topics and acquire Data Science knowledge for free

In the last section of this article, we have given all the free resources to become data scientists.

Before that, let’s look at the concepts and tools you need to learn and master.

Data science topics

Data science topics

At a high level, you need to learn these four categories.

Coding & database

To land a data scientist job, you need to have a decent amount of coding skills. You need to be very good at MySQL or NoSQL. 

In the entry-level job for other engineering branches, learners need to focus on the data structure and computer science algorithms, particularly coding. 

As a personal suggestion, I would recommend spending more time coding. For every problem you solve, try to check how you can optimize this code better. This helps, in the long run, to make you a better coder.

Preferred coding languages for data science field

In the data science field, a cold war is happening in selecting the best programming language. Some people say Python is the best for the data science field. Whereas some people say R is the best for the data science field.

Both these languages have their advantages and drawbacks too. However, we have other programming languages like Scala, Julia, Octav, etc.

If you have enough time, we suggest you learn Python and R programming languages. In the end, it’s not about the programming language you selected. It’s all about the best model you build.

Again, If you are still confused about the programming language you want to select, pick any of the below.

  • If you are starting fresh, start with python.
  • You already know python, go with python.
  • You know R, go with R.
  • If you know both, that’s great.

With this, we are clear about the programming language. Now let’s discuss databases.

When it comes to the database, you should be very strong in MySQL or NoSQL. 

If you learn MySQL properly, it will help you in cloud related platform databases also. For example, Google cloud will have a big query database. Likewise, AWS has Athena. Both databases run with MySQL queries.

In NoSQL, we are having MongoDB, Cassandra, etc.

Statistics concepts

Many people give a low-level look and spend very less time learning statistical concepts. But don’t forget for all the machine learning and deep learning, the key building blocks are the statistical methods.

In statistics, we have to focus on the below topics mainly.

  • Algebra Concepts
  • Probability Concepts
  • Descriptive statistics
  • Inferential statistics

Machine learning concepts

Once you have completed learning the statistical concepts, you can start learning machine learning concepts.

You can focus on learning the machine learning concepts in the below order.

As a fresher, you can target to learn the supervised learning algorithms. Whatever algorithm you learn, try to learn everything about the algorithm.

You need to learn mathematical concepts. Why we have selected optimization functions, why not other functions, where the selected algorithm will fail, etc..

Deep Learning concepts

Once you get a good knowledge of the machine learning algorithms in both theoretical and practical, you can start learning the deep learning algorithm’s structures. 

Deep learning algorithms use different neural network structures that mimic the behavior of the human brain. Like machine learning algorithms, deep learning is also divided into two categories, such as supervised and unsupervised algorithms.

Deep Learning Supervised Algorithms

Deep Learning Unsupervised algorithms

  • Self-Organizing Maps
  • Boltzmann Machines
  • Autoencoders

These are the basic structures of the Neural Networks. You can implement most of the Natural Language Processing and Computer Vision tasks using Deep Learning Supervised Algorithms. 

CNN architectures used for Computer Vision tasks and RNN used for Natural Language Processing tasks. If you’re a beginner in the journey of deep learning, just focus on these algorithms first.

How to apply the concepts to the real-world problems

Solving data science challenges online

Once we learned all the required data science concepts and learned the required tools. The next big step is to apply the knowledge to solve the real-world problem. 

Because knowing concepts is different, applying the concept to solve the problem is different. Solving problems makes you a completely different person.

But

How to build the models to solve real-world problems then?

Don’t worry, I will give you the list of ways we test your knowledge.

The more you practice, the more you learn. It’s that simple.

To practice the problems you can focus on the below 3 categories.

  • Solve problems over online platforms
  • Participating in webinars/ YouTube live coding channels
  • Working on open source / own project 

Solve problems over online platforms

Unlike other software technologies like web development, android app development, you can’t create code and see the visualizing and perform the required changes while building models in data science. 

The machine learning model building is completely different, maybe which is one reason for the data science field popularity.

No need to worry. We have various platforms to solve problems near to the real world. In the next sections of this article, we will show you multiple platforms to apply data science skills along with the challenging levels.

You can register in the below platform.

Kaggle

For applying various algorithms and data preprocessing techniques Kaggle is the most popular platform to try out. After Kaggle acquired by google, Kaggle went to the next level in providing everything the data science aspirants need to have.

In Kaggle, you will get the problem statement and the required datasets to build any model you want. You are open to use any tool. The intention is to come up with the best model results. 

In the dataset section, Kaggle will provide both train and test datasets. Once you build the model, you have to submit the model predictions of the test data in the portal. 

This shows your rank on all the participants who participate in that problem. This rank is on the global level. So if you get a decent rank, you can showcase this. Kaggle also gives you the badges too. These are badges levels.

  • Gold medal
    • To get the gold medal, you need to rank in the top 10%
  • Silver medal
    • To get the silver medal, you need to rank in the top 20%
  • Bronze  medal
    • To get the bronze medal, you need to rank in the top 40%

In Kaggle, you can become an expert in different categories. If you become an expert in one category, you are called a Kaggle expert. If you become an expert in multiple categories, you are called a 2X expert, 3X expert, etc. 

Once the challenge is open for all the participants in the Kaggle platform, it will be open for months. Some problems will have years of time to come with the best model.

Machinehack 

The other popular platform is machinehack. This is similar to Kaggle but comparatively, the difficulty low. The number of people who are participating in the computation also low. So it’s an ideal place for the beginner to get a low rank to motivate themselves.

Generally, once the data science or machine learning challenge is open for all, you will get 3 to 4 days of 1 week’s time to build the model and rank on the leaderboard. 

Analytics Vidhya hackathons

In this platform, the problems will be a bit lower level, the time you will get to solve also moderate. You can also learn various data science concepts on their blog page.

HackerEarth

On the HackerEarth platform, various companies will provide Data Science, Deep Learning, and Machine Learning problems to solve, which can help you gain employment if you rank highly.

These problems are often complex. Problems posting on this platform are not so regular, so please keep an eye on the platform. In HackerEarth, you can also learn data structure and SQL stuff too.

Suggestion for beginners

While solving these problems at the beginning level, it’s very hard and challenging to solve, So first try to check the winner’s solutions. Always learn from the winner code. For that, you can check the GitHub profiles. In Kaggle, you can find various notebooks to learn. 

Participating in webinars/ youtube live shows

To learn how to solve various Machine Learning problems, you can follow popular youtube channels, where people will explain how they solved the problem. Sometimes various organizations create open webinars you can join, where they show how they used various algorithms to solve real-world problems. 

In the free resources section of this article, we will give you a list of such channels to follow.

Working on a challenging project

Now comes the more challenging part. We applied our data science knowledge on various platforms but didn’t show those projects on our resume. These are only for learning purposes.

However, still, you can showcase the leaderboard ranks in your resume. We will talk more about this in the smart job search section.

So, once you are comfortable with the workflow of solving Machine Learning or Deep Learning problems, now it’s time to work on a challenging project. 

You can search for an open-source project if you are not getting any ideas; you can read research papers and try to improve the accuracy mentioned in the paper by fine-tuning the model with your approach.

You can meet like-minded and talented people on various platforms, form a group and work on a single project.

Don’t limit yourself to the accuracy or any model evaluation metrics. Try to create a complete end-to-end pipeline line for the model, from collecting the data from different sources to deploying the model in a cloud or production environment. 

Example: Suppose you are building an accurate image classifier, then from collecting images to creating a simple web application to showcase how the classifier performing the prediction on a given image is a complete end-to-end pipeline.

You can include Machine Learning, Deep Learning, or natural language processing technique to solve a challenging project. 

Example: To use both Deep Learning and natural language techniques, you can build a Deep Learning and natural language processing that writes a short story from the image.

In this project, you will extract the features from the image and then convert them into text using natural language processing techniques.

Portfolio building and networking

How to build data science Portfolio

How to build a Data Science portfolio

Portfolio building is the critical step to getting a Data Science job.

Unlike other fields, a resume is not enough to get the job. It would help if you also had a stunning online profile.

Let’s discuss that.

The advantage of having a good Data Science portfolio is that it will help you to get job opportunities from unbillable sources. This we will discuss more in the next section. For now, let’s learn how to create a dominating portfolio. 

There are various ways to create a portfolio, but you can start with the ones below.

  • Sharing your projects online
  • Writing articles
  • Connecting online/offline connection

Sharing your projects online

The best way to create an online presence is by sharing your project online. Don’t limit yourself by keeping your project code in GitHub. Always share those links on social media networks too. You can try out all the below-mentioned platforms.

Once you have completed the project, keep all the project-related codes in GitHub. Don’t forget the write the readme file. In the readme file, you need to mention everything about the project.

You can include the below questions and provide the answers to these questions in the readme file.

  • What is the problem statement
  • Solution motivation
  • About the input data sources
  • How to run the project
  • How updated the code block
  • Including the new pipeline/ features in the current project pipeline
  • Research paper read
  • Solution architecture graph/image
  • Motivation/reference GitHub links
  • The accuracy details
  • Further improvement details

Your readme file looks more promising and authentic if you cover the above. Don’t limit yourself to these. Include other platforms that you feel are important to showcase.

Writing articles

To showcase your profile, you can write articles too.

You can write about Machine Learning, Deep Learning, or natural language concepts or algorithms you are strong at. You can also write about how you have solved various Data Science projects. If you solved any online hackathon problems, you could write about that too.

For writing articles, you can consider the below platforms.

Connecting online/offline connection

Your network (connection) is your net worth.

For getting a quality connection, check out LinkedIn connections; you can use the LinkedIn filters to connect with ideal people.

Use LinkedIn as the primary source for connection, Don’t send requests on Facebook.

Smart ways to search for data scientist jobs

Smart data scientist job search

Now comes the key and the last level for getting data scientist jobs. The smart job search.

The topics we will discuss in this section will be helpful for all kinds of job searches. 

In this section, we will be discussing below.

  • Resume preparation
  • Selecting companies
  • Salary expectation

Resume preparation

The first step is to create your resume template with all your essential details. While creating the resume format, remember that junior and experienced candidates should have different styles of resumes.

Data science newbies can follow this order in the resume:

  • Education details
  • Skillset
  • Certifications
  • Awards/Publications
  • Projects (with links)

Experienced candidates can follow this order in the resume:

  • Company wise experience
  • Skillset
  • Eduction details
  • Certifications
  • Awards/Publications
  • Projects (with links)

Don’t send the same resume to all companies. Instead, you should base your resume templates on the job description you are responding to.

Selecting companies

Let’s say the job description needs an NLP skill set. You can change the order of the projects and keep the NLP project on the top of your resume; in the same way, we will include all the NLP skill sets into the skill set section.

While starting your career, focus on startups, but don’t blindly apply to every startup. Before deciding to apply for the job, research the company. 

Remember that you don’t have to restrict yourself to companies within your city or country. Now that remote work is commonplace, many startups are using Andela to recruit talent from all over the world so you may consider remote overseas-based roles as well.

Based on your interest, you can create the below kind of sheet where you keep all the information about the company.

Company NameCompany URLExperience Level RequiredSkillsetEmail
Company Xwww.company.com0 to 1 yearskill1, skill2hr@company.com

When ready, you can start sending your resume to the collected emails.

Salary expectation

When starting your career, you can expect a newcomer’s salary. Don’t worry at this stage – as you develop your skills, your salary will reflect your experience and increase. For now, focus more on learning. Also, research the salary part before discussing your expectations with potential employers.

What next

What next

Once you start work, you should always continuously update your cv while developing new skills. Try various online courses on Udemy, Coursera, Edx, or Udacity.

And an important thing to note. It’s a positive step to try and help out other aspiring data scientists. Share your insights via articles or social media posts. If you are not interested in maintaining your blog, you can join or write for other blogs (like Andela’s blog!).

Enjoy your career journey, and good luck!

Free resources to become a data scientist

Online Courses

Blogs

Free Books

Free Servers

Coding Resources

Database Learning Resources

Machine Learning | Deep Learning | Natural Language Processing Resources

Research Papers

Kaggle Popular Notebooks

Github Resources

Youtube Channels

Popular Youtube University Channels

Want to be part of a vibrant tech community?

Then join the Andela Talent Network!


If you found this blog useful, check out our other blog posts for more essential insights!

The post How to get your first job as a Data Scientist appeared first on Andela.


Viewing all articles
Browse latest Browse all 615

Trending Articles