Are there 'serious' mathematical problems emerging from data science?












4












$begingroup$


Are there any proper research-level mathematics problems that have come out of the activity known as "data science"?



I put the quotes because it really is not so clear what "data science" is to me; I am a data science sketpic if you will.










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    What is the definition of 'Serious' mathematical problems?
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 4:40










  • $begingroup$
    A problem which a mathematician would say is a research level mathematics problem.
    $endgroup$
    – T_M
    Jan 29 at 4:43






  • 1




    $begingroup$
    Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 4:46












  • $begingroup$
    In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
    $endgroup$
    – reuns
    Jan 29 at 5:14












  • $begingroup$
    Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
    $endgroup$
    – MASL
    Jan 29 at 5:25
















4












$begingroup$


Are there any proper research-level mathematics problems that have come out of the activity known as "data science"?



I put the quotes because it really is not so clear what "data science" is to me; I am a data science sketpic if you will.










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    What is the definition of 'Serious' mathematical problems?
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 4:40










  • $begingroup$
    A problem which a mathematician would say is a research level mathematics problem.
    $endgroup$
    – T_M
    Jan 29 at 4:43






  • 1




    $begingroup$
    Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 4:46












  • $begingroup$
    In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
    $endgroup$
    – reuns
    Jan 29 at 5:14












  • $begingroup$
    Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
    $endgroup$
    – MASL
    Jan 29 at 5:25














4












4








4





$begingroup$


Are there any proper research-level mathematics problems that have come out of the activity known as "data science"?



I put the quotes because it really is not so clear what "data science" is to me; I am a data science sketpic if you will.










share|cite|improve this question











$endgroup$




Are there any proper research-level mathematics problems that have come out of the activity known as "data science"?



I put the quotes because it really is not so clear what "data science" is to me; I am a data science sketpic if you will.







soft-question data-analysis






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 29 at 13:14









YuiTo Cheng

2,1862937




2,1862937










asked Jan 29 at 4:39









T_MT_M

1,13827




1,13827








  • 1




    $begingroup$
    What is the definition of 'Serious' mathematical problems?
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 4:40










  • $begingroup$
    A problem which a mathematician would say is a research level mathematics problem.
    $endgroup$
    – T_M
    Jan 29 at 4:43






  • 1




    $begingroup$
    Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 4:46












  • $begingroup$
    In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
    $endgroup$
    – reuns
    Jan 29 at 5:14












  • $begingroup$
    Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
    $endgroup$
    – MASL
    Jan 29 at 5:25














  • 1




    $begingroup$
    What is the definition of 'Serious' mathematical problems?
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 4:40










  • $begingroup$
    A problem which a mathematician would say is a research level mathematics problem.
    $endgroup$
    – T_M
    Jan 29 at 4:43






  • 1




    $begingroup$
    Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 4:46












  • $begingroup$
    In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
    $endgroup$
    – reuns
    Jan 29 at 5:14












  • $begingroup$
    Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
    $endgroup$
    – MASL
    Jan 29 at 5:25








1




1




$begingroup$
What is the definition of 'Serious' mathematical problems?
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:40




$begingroup$
What is the definition of 'Serious' mathematical problems?
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:40












$begingroup$
A problem which a mathematician would say is a research level mathematics problem.
$endgroup$
– T_M
Jan 29 at 4:43




$begingroup$
A problem which a mathematician would say is a research level mathematics problem.
$endgroup$
– T_M
Jan 29 at 4:43




1




1




$begingroup$
Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:46






$begingroup$
Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:46














$begingroup$
In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
$endgroup$
– reuns
Jan 29 at 5:14






$begingroup$
In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
$endgroup$
– reuns
Jan 29 at 5:14














$begingroup$
Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
$endgroup$
– MASL
Jan 29 at 5:25




$begingroup$
Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
$endgroup$
– MASL
Jan 29 at 5:25










2 Answers
2






active

oldest

votes


















3












$begingroup$

Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.



Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?



I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    OK, well what's an example of a fundamental problem of data science?
    $endgroup$
    – T_M
    Jan 29 at 5:59






  • 1




    $begingroup$
    There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 6:48










  • $begingroup$
    But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 6:48



















2












$begingroup$

It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.



You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.



To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf




An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.



This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.




Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.






share|cite|improve this answer









$endgroup$














    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091750%2fare-there-serious-mathematical-problems-emerging-from-data-science%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3












    $begingroup$

    Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.



    Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?



    I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      OK, well what's an example of a fundamental problem of data science?
      $endgroup$
      – T_M
      Jan 29 at 5:59






    • 1




      $begingroup$
      There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
      $endgroup$
      – Nilotpal Kanti Sinha
      Jan 29 at 6:48










    • $begingroup$
      But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
      $endgroup$
      – Nilotpal Kanti Sinha
      Jan 29 at 6:48
















    3












    $begingroup$

    Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.



    Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?



    I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      OK, well what's an example of a fundamental problem of data science?
      $endgroup$
      – T_M
      Jan 29 at 5:59






    • 1




      $begingroup$
      There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
      $endgroup$
      – Nilotpal Kanti Sinha
      Jan 29 at 6:48










    • $begingroup$
      But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
      $endgroup$
      – Nilotpal Kanti Sinha
      Jan 29 at 6:48














    3












    3








    3





    $begingroup$

    Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.



    Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?



    I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.






    share|cite|improve this answer











    $endgroup$



    Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.



    Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?



    I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Jan 29 at 5:17

























    answered Jan 29 at 5:01









    Nilotpal Kanti SinhaNilotpal Kanti Sinha

    4,40721640




    4,40721640












    • $begingroup$
      OK, well what's an example of a fundamental problem of data science?
      $endgroup$
      – T_M
      Jan 29 at 5:59






    • 1




      $begingroup$
      There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
      $endgroup$
      – Nilotpal Kanti Sinha
      Jan 29 at 6:48










    • $begingroup$
      But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
      $endgroup$
      – Nilotpal Kanti Sinha
      Jan 29 at 6:48


















    • $begingroup$
      OK, well what's an example of a fundamental problem of data science?
      $endgroup$
      – T_M
      Jan 29 at 5:59






    • 1




      $begingroup$
      There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
      $endgroup$
      – Nilotpal Kanti Sinha
      Jan 29 at 6:48










    • $begingroup$
      But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
      $endgroup$
      – Nilotpal Kanti Sinha
      Jan 29 at 6:48
















    $begingroup$
    OK, well what's an example of a fundamental problem of data science?
    $endgroup$
    – T_M
    Jan 29 at 5:59




    $begingroup$
    OK, well what's an example of a fundamental problem of data science?
    $endgroup$
    – T_M
    Jan 29 at 5:59




    1




    1




    $begingroup$
    There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 6:48




    $begingroup$
    There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 6:48












    $begingroup$
    But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 6:48




    $begingroup$
    But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
    $endgroup$
    – Nilotpal Kanti Sinha
    Jan 29 at 6:48











    2












    $begingroup$

    It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.



    You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.



    To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
    AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf




    An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.



    This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.




    Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.






    share|cite|improve this answer









    $endgroup$


















      2












      $begingroup$

      It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.



      You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.



      To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
      AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf




      An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.



      This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.




      Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.






      share|cite|improve this answer









      $endgroup$
















        2












        2








        2





        $begingroup$

        It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.



        You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.



        To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
        AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf




        An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.



        This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.




        Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.






        share|cite|improve this answer









        $endgroup$



        It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.



        You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.



        To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
        AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf




        An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.



        This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.




        Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Jan 29 at 12:24









        MASLMASL

        708313




        708313






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091750%2fare-there-serious-mathematical-problems-emerging-from-data-science%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

            Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

            A Topological Invariant for $pi_3(U(n))$