R, Python or SAS: Which one should you learn first?

r-vs-python-vs-sas-_-final_blog-post

Python, R and SAS are the three most popular languages for data analysis.

kd-nuggets-poll-2014-programming-languages

If you are new to the world of data science and aren’t experienced in either of these languages, it makes sense to be unsure of whether to learn R, SAS or Python.

Don’t fret, by the time you’re done reading this article, you will know without a doubt which language is the right one for you.

Overview

R – R is the lingua franca of statistics. It is a free and open source programming language used to perform advanced data analysis tasks.

Python –Python is a multi-purpose, free and open source programming language which has become very popular in data science due to its active community and data mining libraries.

SAS – SAS has been the undisputed market leader in the enterprise analytics space. It offers a huge array of statistical functions, has a good GUI for people to learn quickly and provides brilliant technical support.

If you are looking to start a career in data science or to gain the skills to be able to transition to this field in the future. Then you are probably doing some research on which of these three programming languages you should learn first to maximize your chances of landing your dream job.  Should you focus on mastering R? Or would be it better to make SAS a priority?   Or should learn Python?

Take a look at these 5 factors as a starting point to help you decide.

Industries where the tool is used

Burtch Works, HR firm, asked over 1000 quantitative professionals which language they preferred, SAS, R or Python. Here are the survey results:

sas-vs-r-vs-python

SAS is largely preferred by big corporations because they are offered highly reputed customer service, which is also why SAS has an advantage in the financial services sector and marketing companies, where cost is not the primary concern for selecting a tool.

tools-used-in-data-science-industry

R and Python, on the other hand, are used by Startups and mid-sized firms. Tech and Telecom companies require huge volumes of unstructured data to be analyzed, and hence data scientists use machine learning techniques for which R and Python are more suitable.

data-scientist-vs-predictive-analytics

Learn how to use Python for data science from Edvancer. 

Cost and ease of learning

SAS is an expensive commercial software and is mostly used by large corporations with huge budgets.

Python and R are free software that can be downloaded by anyone.

You don’t require prior knowledge in programming to learn SAS, and its easy-to-use GUI makes it the easiest to learn of all the three.  The ability to parse SQL codes, combined with macros and other native packages make learning SAS child’s play for professionals with basic SQL knowledge.

To analyze data in Python, you will use data mining libraries like Pandas, Numpy, and Scipy. In other words, you won’t code in native Python language when analyzing data. The code you write in these libraries looks somewhat similar to the code you write in R. Hence, it is easier to learn Python for data science when you are already familiar with R. If you already know R, then you should learn the basics of Python programming language before you start to learn the Python data mining ecosystem.

So, don’t think that R is difficult, and Python is easy to learn!

Data Science capabilities

SAS is extremely efficient at sequential data access, and database access through SQL is well integrated. The drag-and-drop interface makes it easy for you to create better statistical models quickly.  It has decent functional graphical capabilities, but it’s difficult to create complex graphical plots in SAS.

R is known for In-memory analytics and is mainly used when the data analysis tasks require a standalone server. R is an excellent tool for exploring data. Currently, R has more than 5000 community contributed packages in CRAN. The wide range of packages and modules available for statistics and data analysis makes it the most popular and powerful language in data science.  Statistical models can be written in a few lines of code.

You can draw complicated graphs beautifully in R using packages like Ggplot2, lattice, rCharts, etc.

Python libraries like Pandas, Numpy, Scipy and Scikit-learn makes it the second most popular programming language in data science after R. You can also create beautiful charts and graphs using libraries like Matlplotlib and Seaborn.  Python is actively used by the machine learning community to scrap and analyze unstructured data from the web.

I Python notebook – a web-based interactive environment – makes it easier to share your code with anther.

Community Support

SAS has an active online community moderated by community managers. These communities have evolved from peer to peer forums to become publishing platforms for essential content. You can ask queries related to SAS, and the community will answer them. The official blog of SAS is also an essential resource to refer to when you need help with a particular problem.

R has 125 active user groups worldwide, and the number of user group meetings has increased by a significant amount in the last year.  Python has 1,657 user groups, its communities strictly focused on data is much less when compared to R.

R and Python have huge online community support from Stackover flow, mailing lists, user-contributed code and documentation.

SAS doesn’t have an active open source community at all.

Job Scenario

SAS has more than 80,000 customers around the globe, and most of them are corporates with huge budgets. Analysts in these organizations use SAS to quickly and efficiently execute a wide range of statistical models on data sets. That is why the tile “analyst” is often mentioned in SAS job descriptions.

On the other hand, R and Python are used by startups and technology companies. R is more inclined towards tasks related to statistics and data analysis because of which R related jobs have mentions like “ Data miner”, “ Statistician” , “ Data analytics manager”, etc.

Learn R programming from Edvancer.

Meanwhile, given the boom in big data — projected by Ovum to grow 50 percent by 2019 on an already large base — you can expect increasing numbers of business analysts and other nonprogrammers to arm themselves with the R language as well.

Whereas, Python is used by programmers that want to delve into data analysis or apply statistical techniques, and by developers that turn to data science. Python related jobs have mentions like “Machine learning engineer”, “ Data engineer”, “ Big data architect”, etc.

Conclusion

If your goal is to become a business analytics professional and you are planning to join a startup, then you should learn R first. On the other hand, if you want to join a bank or pharma company you should start with SAS and then learn R once you are comfortable with SAS. If you are looking to become a big data professional, then you need to learn either R or Python. This depends on your background as well. If you come from a statistics/ mathematics background then you should learn R; If you have a programming background, then you should learn Python.

That ought to clear it up!

Are you still confused on which tool you should choose? Let us know in comments below!

Share this on
facebooktwittergoogle_plusredditlinkedinmail

Follow us on
facebooktwittergoogle_pluslinkedinrss

Comments

  • sudip
    Reply

    Hai , this is sudip. I am a mechanical engineer having 10 years of experience in project management but know knowledge of programming. Interested in business data analytics. Whether it is suggested to switch if yes how will be learning path.
    Please suggest.

  • Evelyn Münster
    Reply

    R is great and very quick and simple, but still not so easy for production. If your analysis code is needed for production later, use Python.

  • George Soilis
    Reply

    A lot of people say Python is easy to start with.
    I find it really messy ,and with lots of idiosyncratic needs(like having to be very careful with typing loops and brackets,white space etc.).
    For me R was my “first love” (I started my data science “adventures” with R ,at the JHNY data science specialisation in coursera)
    And lease note ,I was totally new not only to statistics and analysis but also in mathematics ,and programming.
    I mean what can be a better proof ,than a guy who never finished high school ,and had no idea about mathematics,statistics ,or programming,taking up R and managing to finish courses that are targeting higher level learners..:):)

    R was for me easy to learn ,easy to understand ,and easy to find help for any problem as the community is big,helpful and really creative.
    libraries pop up every day ,for almost anything you can dream of analysing.
    And yet it remains a simple ,straight forward language(personal opinion ,not the result of the scientific method)
    But in essence ,if you get into the data thing,you will find eventually that you might need to know python ,even if it is only to be able to skim through someone elses code to find help for a problem you re trying to solve.

    All things can be done in both languages ,and knowing both is a plus ,no question about it.
    Actually ,you will probably find that learning additionally C++,Java and some other maybe language will solve lots of higher dimension/quantity (as in Big Data) problems
    But this is for a later day.
    For starters,I d go with R
    And no i am not getting paid to promote R..:):)

  • How can an engineering fresher start a career in analytics[part 1] – Edvancer Eduventures
    Reply

    […] sure you do all these projects using either R or Python or SAS – these are three most popular programming languages in data […]

  • Okeoghene Obedoma
    Reply

    Thank you for the write-up, it is an eye opener for beginners.

  • Michael Zeller
    Reply

    Pardon the contrarian question, but does it really matter which data mining tool you learn?

    What if you could simply use the tool that you prefer, or the one that fits a particular job the best? With the Predictive Model Markup Language (PMML) industry standard, we can leverage one common process and standard to operationalize models from R, SAS, Python, IBM SPSS, Dell Statistica, KNIME and many others.

    Check out this white paper by industry analyst James Taylor titled:

    Standards-based Deployment of Predictive Analytics
    http://zementis.com/knowledge-base/predictive-analytics-deployment/

  • Heidi Huber
    Reply

    I’ve been programming for over 15 years and my experience with Python has only been with de-constructing one software program and replicating part of it to a .Net environment.

    Do employers think that programmers that know R, SAS or Python are more qualified than people who have a variety of other languages under their belt?

    In one respect, you want to show employers you have a wide variety of experience in industries, languages, databases, etc. On the other hand, a lot of resumes go through a filter looking for specific keywords before they are ever viewed by human eyes.

    Does experience trump the latest software or vice-versa?

    • Aatash Shah

      Heidi, R, SAS or Python are not really latest softwares. They are just better suited to analytics & data science vs. other languages. We believe its a combined process where experience in a wide variety of languages also counts but employers would of course give more weightage to knowledge of tools which are essential to the role.

  • Prof Ravi Vadlamani
    Reply

    Excellent! Very neatly analyzed the relative strengths of each of them. Finally, very good conclusion is drawn, which will be useful for budding data scientists.

  • Rahul
    Reply

    Great blog, very clear with what each offers.