Removing leap day from leap years in xarray dataset

I have Netcdf file loaded in an xarray dataset and I want to make daily climatologies without the leap day that is, without 29th Feb included in it. I'm trying the Dataset.drop method by the syntax is not so intuitive for me. Here is the Dataset

print(ds)

>><xarray.Dataset>

Dimensions:        (lat: 1, lev: 1, lon: 720, time: 27133)

Coordinates:

* lon            (lon) float32 -180.0 -179.5 -179.0 ... 178.5 179.0 179.5

* lev            (lev) float32 1.0

* time           (time) datetime64[ns] 2000-01-02T18:00:00 ... 2018-07-30

Dimensions without coordinates: lat

Data variables:

Var1              (time, lev, lon) float32 ...

Var2              (time, lat, lon) float64 ...

Var3              (time, lat, lon) float64 ...

I tried

ds_N_R.drop(['Var1', 'Var2', 'Var3'], time='2000-02-29')

>>TypeError: drop() got an unexpected keyword argument 'time'

##another approach

ds_N_R.sel(time='2000-02-29').drop(['Var1', 'Var2', 'Var3'])

## gives not the result I intended

<xarray.Dataset>

Dimensions:  (lev: 1, lon: 720, time: 4)

Coordinates:

* lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5

* lev      (lev) float32 1.0

* time     (time) datetime64[ns] 2000-02-29 ... 2000-02-29T18:00:00

Data variables:

*empty*

How do I proceed here? It would be great to know if there is a direct method through which I can calculate daily climatologies considering only 365 days of a year but I would also like to know how to remove data from a particular time step when required.

asked Nov 19 '18 at 13:52

Light_B

969

add a comment |

print(ds)

>><xarray.Dataset>

Dimensions:        (lat: 1, lev: 1, lon: 720, time: 27133)

Coordinates:

* lon            (lon) float32 -180.0 -179.5 -179.0 ... 178.5 179.0 179.5

* lev            (lev) float32 1.0

* time           (time) datetime64[ns] 2000-01-02T18:00:00 ... 2018-07-30

Dimensions without coordinates: lat

Data variables:

Var1              (time, lev, lon) float32 ...

Var2              (time, lat, lon) float64 ...

Var3              (time, lat, lon) float64 ...

I tried

ds_N_R.drop(['Var1', 'Var2', 'Var3'], time='2000-02-29')

>>TypeError: drop() got an unexpected keyword argument 'time'

##another approach

ds_N_R.sel(time='2000-02-29').drop(['Var1', 'Var2', 'Var3'])

## gives not the result I intended

<xarray.Dataset>

Dimensions:  (lev: 1, lon: 720, time: 4)

Coordinates:

* lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5

* lev      (lev) float32 1.0

* time     (time) datetime64[ns] 2000-02-29 ... 2000-02-29T18:00:00

Data variables:

*empty*

asked Nov 19 '18 at 13:52

Light_B

969

add a comment |

print(ds)

>><xarray.Dataset>

Dimensions:        (lat: 1, lev: 1, lon: 720, time: 27133)

Coordinates:

* lon            (lon) float32 -180.0 -179.5 -179.0 ... 178.5 179.0 179.5

* lev            (lev) float32 1.0

* time           (time) datetime64[ns] 2000-01-02T18:00:00 ... 2018-07-30

Dimensions without coordinates: lat

Data variables:

Var1              (time, lev, lon) float32 ...

Var2              (time, lat, lon) float64 ...

Var3              (time, lat, lon) float64 ...

I tried

ds_N_R.drop(['Var1', 'Var2', 'Var3'], time='2000-02-29')

>>TypeError: drop() got an unexpected keyword argument 'time'

##another approach

ds_N_R.sel(time='2000-02-29').drop(['Var1', 'Var2', 'Var3'])

## gives not the result I intended

<xarray.Dataset>

Dimensions:  (lev: 1, lon: 720, time: 4)

Coordinates:

* lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5

* lev      (lev) float32 1.0

* time     (time) datetime64[ns] 2000-02-29 ... 2000-02-29T18:00:00

Data variables:

*empty*

asked Nov 19 '18 at 13:52

Light_B

969

print(ds)

>><xarray.Dataset>

Dimensions:        (lat: 1, lev: 1, lon: 720, time: 27133)

Coordinates:

* lon            (lon) float32 -180.0 -179.5 -179.0 ... 178.5 179.0 179.5

* lev            (lev) float32 1.0

* time           (time) datetime64[ns] 2000-01-02T18:00:00 ... 2018-07-30

Dimensions without coordinates: lat

Data variables:

Var1              (time, lev, lon) float32 ...

Var2              (time, lat, lon) float64 ...

Var3              (time, lat, lon) float64 ...

I tried

ds_N_R.drop(['Var1', 'Var2', 'Var3'], time='2000-02-29')

>>TypeError: drop() got an unexpected keyword argument 'time'

##another approach

ds_N_R.sel(time='2000-02-29').drop(['Var1', 'Var2', 'Var3'])

## gives not the result I intended

<xarray.Dataset>

Dimensions:  (lev: 1, lon: 720, time: 4)

Coordinates:

* lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5

* lev      (lev) float32 1.0

* time     (time) datetime64[ns] 2000-02-29 ... 2000-02-29T18:00:00

Data variables:

*empty*

python python-xarray

asked Nov 19 '18 at 13:52

Light_B

969

asked Nov 19 '18 at 13:52

Light_B

969

asked Nov 19 '18 at 13:52

Light_B

969

asked Nov 19 '18 at 13:52

Light_B

969

asked Nov 19 '18 at 13:52

Light_B

969

add a comment |

1 Answer
1

active

oldest

votes

The right way to use drop() here would be:
ds_N_R.drop([np.datetime64('2000-02-29')], dim='time')

But I think this could actually be more cleanly done with an indexing operation, e.g.,
ds_N_R.sel(time=~((ds_N_R.time.dt.month == 2) & (ds_N_R.time.dt.day == 29)))

answered Nov 19 '18 at 16:23

shoyer

4,7591432

The drop method only removes the first time step from '2002-02-29' and leaves the other 3 time-steps for that day. But, the 'sel' method you suggested is brilliant. I couldn't have figured it out myself to use 'time.dt.month' instead of 'time.month' as 'time' is a dataarray. What I find a bit of frustrating is that it takes me many tries to get the correct syntax for the new functions. I tried reading the source code of the function but it seems that it would take more time and effort from my side to get a good grasp of the source code of the functions.
– Light_B
Nov 20 '18 at 10:12

I can give an example of what I referred above as the syntax not coming intuitively to me. When I'm using 'group by' to calculate climatology, for example, it works without using 'time.dt'. ds.groupby('time.day').mean(dim='time') and, in fact 'time.dt.day' gives an error but, when using the 'sel' method, 'time.month' gives an error.
– Light_B
Nov 20 '18 at 10:57

The 'sel' method you suggested above removes timesteps from 29th Feb but when I calculate daily climatologies my time axis has again 366 values instead of 365. 'Var1_updated' has 20 time steps less compared to the main array and my data has a time range of 2000-2018. To calculate daily climatologies, I'm using daily_clim = Var1_updated.groupby('time.dayofyear').mean(dim='time'). It gives me <xarray.DataArray 'Var1' (dayofyear: 366, lat: 1, lon: 720)>. Then I thought that the values on 'dayofyear' = 60 (Leap Day) should be Nan array but I'm surprised to see that it's not so.
– Light_B
Nov 20 '18 at 11:49

1

Note that the dayofyear attribute represents the "ordinal day" which in pandas is defined as "days since December 31st the preceding year." Therefore all years will contain a date that has an ordinal day of 60; in a non-leap year this date will be March 1st, while in a leap year this date will be February 29th. If I understand your intended use-case correctly (daily climatologies, i.e. grouping by "matching month and day number") I think you might be interested in the discussion in this GitHub issue.
– spencerkclark
Nov 21 '18 at 13:01

1

That solution in pandas is very nice. I think something like that would be made possible with the addition of multi-argument groupby in xarray (see some initial work toward that here). Progress on that has been delayed some due to a broader re-envisioning of MultiIndex support, but for sure it is on the radar.
– spencerkclark
Nov 22 '18 at 17:42

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376113%2fremoving-leap-day-from-leap-years-in-xarray-dataset%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The right way to use drop() here would be:
ds_N_R.drop([np.datetime64('2000-02-29')], dim='time')

But I think this could actually be more cleanly done with an indexing operation, e.g.,
ds_N_R.sel(time=~((ds_N_R.time.dt.month == 2) & (ds_N_R.time.dt.day == 29)))

answered Nov 19 '18 at 16:23

shoyer

4,7591432

The drop method only removes the first time step from '2002-02-29' and leaves the other 3 time-steps for that day. But, the 'sel' method you suggested is brilliant. I couldn't have figured it out myself to use 'time.dt.month' instead of 'time.month' as 'time' is a dataarray. What I find a bit of frustrating is that it takes me many tries to get the correct syntax for the new functions. I tried reading the source code of the function but it seems that it would take more time and effort from my side to get a good grasp of the source code of the functions.
– Light_B
Nov 20 '18 at 10:12

I can give an example of what I referred above as the syntax not coming intuitively to me. When I'm using 'group by' to calculate climatology, for example, it works without using 'time.dt'. ds.groupby('time.day').mean(dim='time') and, in fact 'time.dt.day' gives an error but, when using the 'sel' method, 'time.month' gives an error.
– Light_B
Nov 20 '18 at 10:57

The 'sel' method you suggested above removes timesteps from 29th Feb but when I calculate daily climatologies my time axis has again 366 values instead of 365. 'Var1_updated' has 20 time steps less compared to the main array and my data has a time range of 2000-2018. To calculate daily climatologies, I'm using daily_clim = Var1_updated.groupby('time.dayofyear').mean(dim='time'). It gives me <xarray.DataArray 'Var1' (dayofyear: 366, lat: 1, lon: 720)>. Then I thought that the values on 'dayofyear' = 60 (Leap Day) should be Nan array but I'm surprised to see that it's not so.
– Light_B
Nov 20 '18 at 11:49

1

Note that the dayofyear attribute represents the "ordinal day" which in pandas is defined as "days since December 31st the preceding year." Therefore all years will contain a date that has an ordinal day of 60; in a non-leap year this date will be March 1st, while in a leap year this date will be February 29th. If I understand your intended use-case correctly (daily climatologies, i.e. grouping by "matching month and day number") I think you might be interested in the discussion in this GitHub issue.
– spencerkclark
Nov 21 '18 at 13:01

1

That solution in pandas is very nice. I think something like that would be made possible with the addition of multi-argument groupby in xarray (see some initial work toward that here). Progress on that has been delayed some due to a broader re-envisioning of MultiIndex support, but for sure it is on the radar.
– spencerkclark
Nov 22 '18 at 17:42

|
show 2 more comments

The right way to use drop() here would be:
ds_N_R.drop([np.datetime64('2000-02-29')], dim='time')

But I think this could actually be more cleanly done with an indexing operation, e.g.,
ds_N_R.sel(time=~((ds_N_R.time.dt.month == 2) & (ds_N_R.time.dt.day == 29)))

answered Nov 19 '18 at 16:23

shoyer

4,7591432

The drop method only removes the first time step from '2002-02-29' and leaves the other 3 time-steps for that day. But, the 'sel' method you suggested is brilliant. I couldn't have figured it out myself to use 'time.dt.month' instead of 'time.month' as 'time' is a dataarray. What I find a bit of frustrating is that it takes me many tries to get the correct syntax for the new functions. I tried reading the source code of the function but it seems that it would take more time and effort from my side to get a good grasp of the source code of the functions.
– Light_B
Nov 20 '18 at 10:12

I can give an example of what I referred above as the syntax not coming intuitively to me. When I'm using 'group by' to calculate climatology, for example, it works without using 'time.dt'. ds.groupby('time.day').mean(dim='time') and, in fact 'time.dt.day' gives an error but, when using the 'sel' method, 'time.month' gives an error.
– Light_B
Nov 20 '18 at 10:57

The 'sel' method you suggested above removes timesteps from 29th Feb but when I calculate daily climatologies my time axis has again 366 values instead of 365. 'Var1_updated' has 20 time steps less compared to the main array and my data has a time range of 2000-2018. To calculate daily climatologies, I'm using daily_clim = Var1_updated.groupby('time.dayofyear').mean(dim='time'). It gives me <xarray.DataArray 'Var1' (dayofyear: 366, lat: 1, lon: 720)>. Then I thought that the values on 'dayofyear' = 60 (Leap Day) should be Nan array but I'm surprised to see that it's not so.
– Light_B
Nov 20 '18 at 11:49

1

Note that the dayofyear attribute represents the "ordinal day" which in pandas is defined as "days since December 31st the preceding year." Therefore all years will contain a date that has an ordinal day of 60; in a non-leap year this date will be March 1st, while in a leap year this date will be February 29th. If I understand your intended use-case correctly (daily climatologies, i.e. grouping by "matching month and day number") I think you might be interested in the discussion in this GitHub issue.
– spencerkclark
Nov 21 '18 at 13:01

1

That solution in pandas is very nice. I think something like that would be made possible with the addition of multi-argument groupby in xarray (see some initial work toward that here). Progress on that has been delayed some due to a broader re-envisioning of MultiIndex support, but for sure it is on the radar.
– spencerkclark
Nov 22 '18 at 17:42

|
show 2 more comments

The right way to use drop() here would be:
ds_N_R.drop([np.datetime64('2000-02-29')], dim='time')

But I think this could actually be more cleanly done with an indexing operation, e.g.,
ds_N_R.sel(time=~((ds_N_R.time.dt.month == 2) & (ds_N_R.time.dt.day == 29)))

answered Nov 19 '18 at 16:23

shoyer

4,7591432

The right way to use drop() here would be:
ds_N_R.drop([np.datetime64('2000-02-29')], dim='time')

But I think this could actually be more cleanly done with an indexing operation, e.g.,
ds_N_R.sel(time=~((ds_N_R.time.dt.month == 2) & (ds_N_R.time.dt.day == 29)))

answered Nov 19 '18 at 16:23

shoyer

4,7591432

answered Nov 19 '18 at 16:23

shoyer

4,7591432

answered Nov 19 '18 at 16:23

shoyer

4,7591432

answered Nov 19 '18 at 16:23

shoyer

4,7591432

The drop method only removes the first time step from '2002-02-29' and leaves the other 3 time-steps for that day. But, the 'sel' method you suggested is brilliant. I couldn't have figured it out myself to use 'time.dt.month' instead of 'time.month' as 'time' is a dataarray. What I find a bit of frustrating is that it takes me many tries to get the correct syntax for the new functions. I tried reading the source code of the function but it seems that it would take more time and effort from my side to get a good grasp of the source code of the functions.
– Light_B
Nov 20 '18 at 10:12

I can give an example of what I referred above as the syntax not coming intuitively to me. When I'm using 'group by' to calculate climatology, for example, it works without using 'time.dt'. ds.groupby('time.day').mean(dim='time') and, in fact 'time.dt.day' gives an error but, when using the 'sel' method, 'time.month' gives an error.
– Light_B
Nov 20 '18 at 10:57

The 'sel' method you suggested above removes timesteps from 29th Feb but when I calculate daily climatologies my time axis has again 366 values instead of 365. 'Var1_updated' has 20 time steps less compared to the main array and my data has a time range of 2000-2018. To calculate daily climatologies, I'm using daily_clim = Var1_updated.groupby('time.dayofyear').mean(dim='time'). It gives me <xarray.DataArray 'Var1' (dayofyear: 366, lat: 1, lon: 720)>. Then I thought that the values on 'dayofyear' = 60 (Leap Day) should be Nan array but I'm surprised to see that it's not so.
– Light_B
Nov 20 '18 at 11:49

1

Note that the dayofyear attribute represents the "ordinal day" which in pandas is defined as "days since December 31st the preceding year." Therefore all years will contain a date that has an ordinal day of 60; in a non-leap year this date will be March 1st, while in a leap year this date will be February 29th. If I understand your intended use-case correctly (daily climatologies, i.e. grouping by "matching month and day number") I think you might be interested in the discussion in this GitHub issue.
– spencerkclark
Nov 21 '18 at 13:01

1

That solution in pandas is very nice. I think something like that would be made possible with the addition of multi-argument groupby in xarray (see some initial work toward that here). Progress on that has been delayed some due to a broader re-envisioning of MultiIndex support, but for sure it is on the radar.
– spencerkclark
Nov 22 '18 at 17:42

|
show 2 more comments

The drop method only removes the first time step from '2002-02-29' and leaves the other 3 time-steps for that day. But, the 'sel' method you suggested is brilliant. I couldn't have figured it out myself to use 'time.dt.month' instead of 'time.month' as 'time' is a dataarray. What I find a bit of frustrating is that it takes me many tries to get the correct syntax for the new functions. I tried reading the source code of the function but it seems that it would take more time and effort from my side to get a good grasp of the source code of the functions.
– Light_B
Nov 20 '18 at 10:12

I can give an example of what I referred above as the syntax not coming intuitively to me. When I'm using 'group by' to calculate climatology, for example, it works without using 'time.dt'. ds.groupby('time.day').mean(dim='time') and, in fact 'time.dt.day' gives an error but, when using the 'sel' method, 'time.month' gives an error.
– Light_B
Nov 20 '18 at 10:57

The 'sel' method you suggested above removes timesteps from 29th Feb but when I calculate daily climatologies my time axis has again 366 values instead of 365. 'Var1_updated' has 20 time steps less compared to the main array and my data has a time range of 2000-2018. To calculate daily climatologies, I'm using daily_clim = Var1_updated.groupby('time.dayofyear').mean(dim='time'). It gives me <xarray.DataArray 'Var1' (dayofyear: 366, lat: 1, lon: 720)>. Then I thought that the values on 'dayofyear' = 60 (Leap Day) should be Nan array but I'm surprised to see that it's not so.
– Light_B
Nov 20 '18 at 11:49

1

Note that the dayofyear attribute represents the "ordinal day" which in pandas is defined as "days since December 31st the preceding year." Therefore all years will contain a date that has an ordinal day of 60; in a non-leap year this date will be March 1st, while in a leap year this date will be February 29th. If I understand your intended use-case correctly (daily climatologies, i.e. grouping by "matching month and day number") I think you might be interested in the discussion in this GitHub issue.
– spencerkclark
Nov 21 '18 at 13:01

1

That solution in pandas is very nice. I think something like that would be made possible with the addition of multi-argument groupby in xarray (see some initial work toward that here). Progress on that has been delayed some due to a broader re-envisioning of MultiIndex support, but for sure it is on the radar.
– spencerkclark
Nov 22 '18 at 17:42

The drop method only removes the first time step from '2002-02-29' and leaves the other 3 time-steps for that day. But, the 'sel' method you suggested is brilliant. I couldn't have figured it out myself to use 'time.dt.month' instead of 'time.month' as 'time' is a dataarray. What I find a bit of frustrating is that it takes me many tries to get the correct syntax for the new functions. I tried reading the source code of the function but it seems that it would take more time and effort from my side to get a good grasp of the source code of the functions.
– Light_B
Nov 20 '18 at 10:12

I can give an example of what I referred above as the syntax not coming intuitively to me. When I'm using 'group by' to calculate climatology, for example, it works without using 'time.dt'. ds.groupby('time.day').mean(dim='time') and, in fact 'time.dt.day' gives an error but, when using the 'sel' method, 'time.month' gives an error.
– Light_B
Nov 20 '18 at 10:57

The 'sel' method you suggested above removes timesteps from 29th Feb but when I calculate daily climatologies my time axis has again 366 values instead of 365. 'Var1_updated' has 20 time steps less compared to the main array and my data has a time range of 2000-2018. To calculate daily climatologies, I'm using daily_clim = Var1_updated.groupby('time.dayofyear').mean(dim='time'). It gives me <xarray.DataArray 'Var1' (dayofyear: 366, lat: 1, lon: 720)>. Then I thought that the values on 'dayofyear' = 60 (Leap Day) should be Nan array but I'm surprised to see that it's not so.
– Light_B
Nov 20 '18 at 11:49

Note that the dayofyear attribute represents the "ordinal day" which in pandas is defined as "days since December 31st the preceding year." Therefore all years will contain a date that has an ordinal day of 60; in a non-leap year this date will be March 1st, while in a leap year this date will be February 29th. If I understand your intended use-case correctly (daily climatologies, i.e. grouping by "matching month and day number") I think you might be interested in the discussion in this GitHub issue.
– spencerkclark
Nov 21 '18 at 13:01

That solution in pandas is very nice. I think something like that would be made possible with the addition of multi-argument groupby in xarray (see some initial work toward that here). Progress on that has been delayed some due to a broader re-envisioning of MultiIndex support, but for sure it is on the radar.
– spencerkclark
Nov 22 '18 at 17:42

|
show 2 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu