Data science is statistics

When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math.

If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.

If you say that one kind of data analysis is statistics and another kind is not, you’re not allowing innovation. We need to define the field broadly.

You may not like what some statisticians do. You may feel they don’t share your values. They may embarrass you. But that shouldn’t lead us to abandon the term “statistics”.

16 Responses to “Data science is statistics”

  1. Ken Butler Says:


    (Ken, statistician and proud of it.)

  2. Arnold Rosielle Says:


  3. hilaryparker Says:

    OK but… as you point out, physicists do mathematics, but still call themselves physicists because that’s not all they do.

    Whether or not you agree, data scientists would qualify themselves the same way — they do statistics, but also other things that statisticians generally do not do (product development, system administration, engineering, etc.).

    • Karl Broman Says:

      Yes, I’m not saying that a computer scientist should be called a statistician without his/her approval. (Though I might call a statistician who does only mathematics, a mathematician.)

      But there should be no “data science” that is not statistics. “Statistics” should swallow it up.

      • hilaryparker Says:

        I dunno, I don’t think we get to appropriate the term without adjusting our curriculum a bit so that we teach at least some of the engineering/computer science/business skills needed for many data science jobs.

        Data science is also a completely catch-all term that is still very ill-defined. This was an interesting talk about it that I recently saw, sort of an empirical approach to defining data science:

        • Karl Broman Says:

          I was about to say, “Why not?”

          But then, I do totally agree that bio/statistics training needs to be modernized to include the skills need for visualizing, managing, and analyzing high-dimensional data, and for writing better software. So some (software) engineering and CS skills should be included. (I’m not sure about “business”, but I’ll let that go.)

          If there’s a “data science”, it should be statistics. As needs change, the field should adapt.

          • hilaryparker Says:

            Yes agreed completely. The fact that this new term emerged is, I think, at least somewhat a reflection of our field not adapting fast enough to address current problems.

  4. Rafael S Calsaverini (@rcalsaverini) Says:

    “If you’re analyzing data, you’re doing statistics.”

    Can I suggest you add several exclamation points to this sentence?

    Disclaimer: I’m a physicist, and I work as a data scientist. Yes, I do some things that are not statistics, but they are also not direct related to data analysis – like installing statistical software, coding data interfaces to data sets stored in dozens of different ways, coding some new algorithm, etc. But all of this are just tool to execute my main purpose: analyse data. And when I’m analyzing data, guess what? I’m doing statistics.

  5. Ed Kambour Says:

    Frankly, I think a more apt analogy is Health Science vs. Medicine.

    • Karl Broman Says:

      You make a good point, and I thank you for mentioning it, because I wouldn’t have thought about it otherwise. But I still think the scope of “statistics” should be expanded to encompass all of “data science.”

  6. isomorphismes Says:


