Are there 'serious' mathematical problems emerging from data science?
$begingroup$
Are there any proper research-level mathematics problems that have come out of the activity known as "data science"?
I put the quotes because it really is not so clear what "data science" is to me; I am a data science sketpic if you will.
soft-question data-analysis
$endgroup$
|
show 5 more comments
$begingroup$
Are there any proper research-level mathematics problems that have come out of the activity known as "data science"?
I put the quotes because it really is not so clear what "data science" is to me; I am a data science sketpic if you will.
soft-question data-analysis
$endgroup$
1
$begingroup$
What is the definition of 'Serious' mathematical problems?
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:40
$begingroup$
A problem which a mathematician would say is a research level mathematics problem.
$endgroup$
– T_M
Jan 29 at 4:43
1
$begingroup$
Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:46
$begingroup$
In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
$endgroup$
– reuns
Jan 29 at 5:14
$begingroup$
Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
$endgroup$
– MASL
Jan 29 at 5:25
|
show 5 more comments
$begingroup$
Are there any proper research-level mathematics problems that have come out of the activity known as "data science"?
I put the quotes because it really is not so clear what "data science" is to me; I am a data science sketpic if you will.
soft-question data-analysis
$endgroup$
Are there any proper research-level mathematics problems that have come out of the activity known as "data science"?
I put the quotes because it really is not so clear what "data science" is to me; I am a data science sketpic if you will.
soft-question data-analysis
soft-question data-analysis
edited Jan 29 at 13:14
YuiTo Cheng
2,1862937
2,1862937
asked Jan 29 at 4:39
T_MT_M
1,13827
1,13827
1
$begingroup$
What is the definition of 'Serious' mathematical problems?
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:40
$begingroup$
A problem which a mathematician would say is a research level mathematics problem.
$endgroup$
– T_M
Jan 29 at 4:43
1
$begingroup$
Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:46
$begingroup$
In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
$endgroup$
– reuns
Jan 29 at 5:14
$begingroup$
Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
$endgroup$
– MASL
Jan 29 at 5:25
|
show 5 more comments
1
$begingroup$
What is the definition of 'Serious' mathematical problems?
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:40
$begingroup$
A problem which a mathematician would say is a research level mathematics problem.
$endgroup$
– T_M
Jan 29 at 4:43
1
$begingroup$
Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:46
$begingroup$
In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
$endgroup$
– reuns
Jan 29 at 5:14
$begingroup$
Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
$endgroup$
– MASL
Jan 29 at 5:25
1
1
$begingroup$
What is the definition of 'Serious' mathematical problems?
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:40
$begingroup$
What is the definition of 'Serious' mathematical problems?
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:40
$begingroup$
A problem which a mathematician would say is a research level mathematics problem.
$endgroup$
– T_M
Jan 29 at 4:43
$begingroup$
A problem which a mathematician would say is a research level mathematics problem.
$endgroup$
– T_M
Jan 29 at 4:43
1
1
$begingroup$
Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:46
$begingroup$
Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:46
$begingroup$
In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
$endgroup$
– reuns
Jan 29 at 5:14
$begingroup$
In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
$endgroup$
– reuns
Jan 29 at 5:14
$begingroup$
Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
$endgroup$
– MASL
Jan 29 at 5:25
$begingroup$
Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
$endgroup$
– MASL
Jan 29 at 5:25
|
show 5 more comments
2 Answers
2
active
oldest
votes
$begingroup$
Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.
Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?
I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.
$endgroup$
$begingroup$
OK, well what's an example of a fundamental problem of data science?
$endgroup$
– T_M
Jan 29 at 5:59
1
$begingroup$
There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
add a comment |
$begingroup$
It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.
You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.
To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf
An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.
This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.
Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091750%2fare-there-serious-mathematical-problems-emerging-from-data-science%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.
Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?
I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.
$endgroup$
$begingroup$
OK, well what's an example of a fundamental problem of data science?
$endgroup$
– T_M
Jan 29 at 5:59
1
$begingroup$
There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
add a comment |
$begingroup$
Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.
Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?
I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.
$endgroup$
$begingroup$
OK, well what's an example of a fundamental problem of data science?
$endgroup$
– T_M
Jan 29 at 5:59
1
$begingroup$
There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
add a comment |
$begingroup$
Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.
Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?
I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.
$endgroup$
Without any exaggeration, every solution to any fundamental problem of data science is actually a serious mathematical problem because these solution are put into practical real life use in industries where the stakes are in the order of millions and billions of dollars. So the stakes are much higher than research for pure academic purpose. Let's look at some examples from.
Problem statement: An airport terminal has many stands or gate where an incoming aircraft can be landed and parked. At a given time, which of these gates is my best to park an incoming aircraft?
I will post the solution approach later (I am preparing it) but I would like to state that this simple looking problem this is actually an extremely complicated optimization problem in aviation that has taken major airlines years to solve and costing several million dollars and they have only achieved partial solution. The complication is because of the hundreds of practical factors related to commercial flying that an airline has consider to arrive at a solution due to which the number of solutions from which you can search for the best option is more than the number of atoms in the known universe.
edited Jan 29 at 5:17
answered Jan 29 at 5:01
Nilotpal Kanti SinhaNilotpal Kanti Sinha
4,40721640
4,40721640
$begingroup$
OK, well what's an example of a fundamental problem of data science?
$endgroup$
– T_M
Jan 29 at 5:59
1
$begingroup$
There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
add a comment |
$begingroup$
OK, well what's an example of a fundamental problem of data science?
$endgroup$
– T_M
Jan 29 at 5:59
1
$begingroup$
There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
OK, well what's an example of a fundamental problem of data science?
$endgroup$
– T_M
Jan 29 at 5:59
$begingroup$
OK, well what's an example of a fundamental problem of data science?
$endgroup$
– T_M
Jan 29 at 5:59
1
1
$begingroup$
There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
There are four main fundamental problems (1) classification - to which class or category thus this object below (2) regression - what is the predicted numerical value of something (3) optimization - what is the best combination so as a maximize or minimize something (4) making sense of unstructured data - look at social media/internet data/video and find the opinion of the public about something. Almost all data science problems can eventually be broken down into these four fundamental problems.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
$begingroup$
But each of these four have hundreds of methods for solution, not all method work well with all data or all scenario so new algorithms based on math are invented all the time.E.g. 200 years back Gauss invented linear regression and applied it on the data collected by astronomers to predict the location and timing of Encke comet. Today, we get better accuracy with something called XGBoost.
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 6:48
add a comment |
$begingroup$
It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.
You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.
To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf
An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.
This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.
Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.
$endgroup$
add a comment |
$begingroup$
It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.
You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.
To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf
An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.
This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.
Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.
$endgroup$
add a comment |
$begingroup$
It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.
You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.
To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf
An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.
This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.
Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.
$endgroup$
It wasn't clear to me if you were asking specifically for mathematical research problems that have their origin in the so-called data science. The problem is, this term is relatively recent. Nevertheless, the following reference clears my doubts about my answer of Topological Data Analysis as a valid one.
You can find several review and research papers on this topic in the site http://cunygc.appliedtopology.nyc/pages/reading.html. A lean overview can be found on Wikipedia.
To summarize my point, here I'll quote from *BULLETIN (New Series) OF THE
AMERICAN MATHEMATICAL SOCIETY Volume 46, Number 2, April 2009, Pages 255–308, by Gunnar Carlson -available at that page as well: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf
An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data.
This paper will deal with a number of methods for thinking about data using topologically inspired methods. We begin with a discussion of persistent homology, which is a mathematical formalism which permits us to infer topological information from a sample of a geometric object, and we show how it can be applied to particular data sets arising from natural image statistics and neuroscience. Next, we show that topological methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather by producing a simplicial complex associated to certain initial information about the data set. We then demonstrate that persistence can be generalized in several different directions, providing more structure and information about the data sets in question. We then show that the philosophy of functoriality can be used to reason about the nature of clustering methods, and we conclude by speculating about theorems one might hope to prove and discussing how the subject might develop more generally.
Obviously, this example doesn't exhaust the mathematical research topics related to and, specially, originated in data science. For instance, searching for Statistical Learning should yield more examples.
answered Jan 29 at 12:24
MASLMASL
708313
708313
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091750%2fare-there-serious-mathematical-problems-emerging-from-data-science%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
What is the definition of 'Serious' mathematical problems?
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:40
$begingroup$
A problem which a mathematician would say is a research level mathematics problem.
$endgroup$
– T_M
Jan 29 at 4:43
1
$begingroup$
Well then you could have just said 'research' level problem to make things simpler, why complicate :-)
$endgroup$
– Nilotpal Kanti Sinha
Jan 29 at 4:46
$begingroup$
In machine learning the problem of designing efficient models is very hard to formalize and away from trivial cases it seems almost impossible to solve, amazingly this is in complete contradiction with the fact animal and human intelligence do solve it in a quite efficient way.
$endgroup$
– reuns
Jan 29 at 5:14
$begingroup$
Not convinced this is a well-posed question -because of too open. Data science includes Machine Learning, which, you can bet, leads to plenty of "research level problems"
$endgroup$
– MASL
Jan 29 at 5:25