What it really takes to become a professional data scientist

What does it take to become data scientist

Data scientists take the recommendations that the business analysts make and do a variety of tasks including the following:

Build the technical case. They apply advanced math and statistics to build the technical cases around the hypotheses that the business analysts build. Data scientists are tasked with building the models required to test these theories. This model is important to big data. You start with a hypothesis. For example, if we change the branding colors on a product on a given day and publish that on Twitter and it is positively received, we can expect an increase in sales of 4 percent. That is the hypothesis.

Create the mathematical models. These models measure what positive sentiment means and then can model what tests need to be run to find correlations between that and price increases.

Discover patterns, trends, and correlations. Some tasks may not necessarily start with a hypothesis. This is where the real power of big data comes in. You find patterns and trends you didn’t even know existed.

The skill required here is to take a business idea and model it with numbers and data. Data scientists take that data and turn it into information. There can be a fine line between what data scientists do and what computer scientists do. There are some overlaps, but there are also jobs with a significant difference, namely in scientific and academic research.

Assessing your interest

As with the business analysts, there are a set of questions you can ask yourself to see if you’re a fit for this type of job. So, you should carefully consider the following questions.

Are you naturally inquisitive?

Just as a business analyst needs to think in terms of building hypotheses, the data scientist needs to have aptitude in this area. Computer scientists need to be able to construct models that can prove or disprove a given business hypothesis. Can you see beyond the surface issues and go deep? Do you know when a result has potential and needs further testing? Are you passionate about technology?

Can you focus for a long time?

The journey required to complete a PhD or advanced degree in the big data field can be a long one. You have to commit a significant amount of study to a specific area of research. Are there areas of math, statistics, or computer science that you have a passion for studying? Do you want to address big problems that may take years to solve? Do you like to write . . . a lot? Can you maintain intense focus on a few topics for many years — maybe for an entire career?

Are you self-motivated?

Data scientists need to be able to direct their own intellectual paths. Do you naturally follow a solution to its end? Do you have a knack for knowing where to find answers if you don’t know them?

Are you multidisciplined?

Data scientists need to be knowledgeable in multiple areas — math, statistics, and computer science. Can you pick up computer science languages and concepts easily? Does the idea of a new language excite you or intimidate you? Can you easily collaborate with others to learn new things?

Idea to reality

Data modeling requires the ability to take business concepts and ideas and model those within a world driven by numbers and data concepts. Do you have the aptitude or interest to build experiments that capture the business value?

Looking at a job posting

Let’s take a look at job posting for a data scientist who would operate at a junior level.

Data Consultant – Recent College Grad

Are you a recent college graduate who loves big data? Are you passionate about cutting-edge technologies and solving challenges for Fortune 500 clients? As a consultant, you’ll be part of a team that develops and implements advanced algorithms and data pipelines that extract, classify, merge, and deliver new insights and business value out of structured and unstructured data sets. You’ll work on a team whose data science efforts range from exploration and investigation to design and development of analytic systems. You’ll have a chance to gain diverse experience across multiple technologies and create path-breaking solutions. You’ll be surrounded and learn from the foremost Thought Leaders in the big data space.

This posting describes two paths: Data engineering and data science.

Key responsibilities include:

Data engineering

  • Designing and developing code, scripts, and data pipelines that leverage structured and unstructured data integrated from multiple sources
  • Software installation and configuration
  • Participating in requirements and design workshops with our clients
  • Developing project deliverable documentation

Data science

  • Providing big data solutions for our clients, including analytical consulting, statistical modeling, and quantitative solutions
  • Mentoring sophisticated organizations on large-scale data and analytics and working closely with client teams to deliver results
  • Helping to translate business cases to clear research projects, be they exploratory or confirmatory, to help our clients utilize data to drive their businesses
  • Collaborating and communicating across geographically distributed teams and with external clients

Required skills/experience include:

 Data engineering

  • BS or MS in Computer Science or equivalent work experience
  • Experience programming in Java, Python, SQL, or C/C++
  • Background that includes mathematics, statistics, machine learning, and data mining.
  • Experience with SQL, NoSQL, relational database design, and methods for efficiently retrieving data
  • Prior work/research experience with unstructured data and data modeling
  • Strong analytical skills and creative problem solver
  • Excellent verbal and written communications skills
  • Strong team player capable of working in a demanding startup environment
  • Experience building complex and non-interactive systems (batch, distributed, and so on)

Data Science

  • BS or MS in Computer Science, Math, or equivalent work experience
  • Coursework in mathematics, statistics, machine learning, and data mining
  • Proficiency in R or other math packages (Matlab, SAS, and so on)
  • Excellent programming skills in object-oriented languages
  • Adept at learning and applying new technologies
  • Excellent verbal and written communication skills
  • Strong team player capable of working in a demanding startup environment
  • Experience with Java and Python

You don’t have to have a PhD to be a data scientist. The first role of a data engineer requires the candidate to have deep understanding of data modeling, programming, machine learning, and math. Although they aren’t building complicated algorithms oriented around research like the second posting, this role requires a deep understanding of data and how to construct data to extract value.

Manu Jeevan

Manu Jeevan

Manu Jeevan is a professional blogger, content marketer, and big data enthusiast. You can connect with him on LinkedIn, or email him at manu@bigdataexaminer.com.
Manu Jeevan
Share this on
facebooktwittergoogle_plusredditlinkedinmail

Follow us on
facebooktwittergoogle_pluslinkedinrss

Comments