What do you think being a data scientist is about?

A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician. A data scientist determines the questions their team should be asking and figure out how to answer those questions using data. A data scientist often develops predictive models for theorizing and forecasting.

What do you see as the major duties and/or knowledge areas?

Major duties:
Data scientists closely follow the “data science process”: data ingest, data transformation, exploratory data analysis, model selection, model evaluation, and data storytelling.

knowledge areas:
Programming languages: Data scientists can expect to spend time using programming languages to sort through, analyze, and otherwise manage large chunks of data. Popular programming languages for data science include: Python, R, SQL, SAS.

Data visualization: Being able to create charts and graphs is a significant part of being a data scientist. Familiarity with the following tools should prepare you to do the work: Tableau, PowerBI, Excel.

Machine learning: Incorporating machine learning and deep learning into your work as a data scientist means continuously improving the quality of the data you gather and potentially being able to predict the outcomes of future datasets. A course in machine learning can get you started with the basics.

Big data: Some employers may want to see that you have some familiarity grappling with big data. Some of the software frameworks used to process big data include Hadoop and Apache Spark.

Communication: The most brilliant data scientists won’t be able to affect any change if they aren’t able to communicate their findings well. The ability to share ideas and results verbally and in written language is an often-sought skill in data scientists.

What differences/similarities do you see between data scientists and statisticians?

Differences: In current terms, the fields of data science and statistics differ in a number of ways. The fields differ in modeling processes, the size of data consumed, the types of problems studied, the academic background of the people in the field, and the terminology used.

Statisticians need to understand the modeling and structure of data, while data scientists need to understand applied statistics. Statisticians deal with nebulous concepts like point estimates, margins of error, confidence intervals, standard errors, p-values, hypothesis testing, and the proverbial argument between the “frequentists” and “Bayesians.” Data scientists on the other hand, closely follow the “data science process” that is more approachable. Although data scientists and statisticians tend to gather information for similar purposes, their means of data collection are quite different. Statisticians tend to focus more on quantifying uncertainty than data scientists. The two fields also use somewhat different nomenclature to describe the same principles.

Similarities: The fields are closely related in the sense that both data science and statistics aim to extract knowledge from data. And data scientists and statisticians tend to gather information for similar purposes.

How do you view yourself in relation to these two areas?

As a graduate student of statistics, I can use a tool like R, Python, SAS, or even excel to analyze data and find certain patterns from the data. I think I will be at an advantage to be a data scientist or statistician after finishing my courses.

 SELECT *
 FROM sys.tables
 WHERE [name] = 'SomeTable'