Can a http request be sent with the nginx location directive?
Maybe this is trivial, but I haven't found anything meaningful or I didn't know where to look...
(How) is it possible to to send a curl / whatever command as soon as a certain path is requested?
Something along these lines, but that would actually work:
location / {
curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect
}
http nginx curl measurement-protocol
add a comment |
Maybe this is trivial, but I haven't found anything meaningful or I didn't know where to look...
(How) is it possible to to send a curl / whatever command as soon as a certain path is requested?
Something along these lines, but that would actually work:
location / {
curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect
}
http nginx curl measurement-protocol
2
I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.
– Ivan Shatsky
Dec 30 '18 at 4:49
If it gets the job done, why not
– Lucian Davidescu
Jan 1 at 5:14
add a comment |
Maybe this is trivial, but I haven't found anything meaningful or I didn't know where to look...
(How) is it possible to to send a curl / whatever command as soon as a certain path is requested?
Something along these lines, but that would actually work:
location / {
curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect
}
http nginx curl measurement-protocol
Maybe this is trivial, but I haven't found anything meaningful or I didn't know where to look...
(How) is it possible to to send a curl / whatever command as soon as a certain path is requested?
Something along these lines, but that would actually work:
location / {
curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect
}
http nginx curl measurement-protocol
http nginx curl measurement-protocol
asked Dec 30 '18 at 4:27
Lucian DavidescuLucian Davidescu
239416
239416
2
I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.
– Ivan Shatsky
Dec 30 '18 at 4:49
If it gets the job done, why not
– Lucian Davidescu
Jan 1 at 5:14
add a comment |
2
I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.
– Ivan Shatsky
Dec 30 '18 at 4:49
If it gets the job done, why not
– Lucian Davidescu
Jan 1 at 5:14
2
2
I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.
– Ivan Shatsky
Dec 30 '18 at 4:49
I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.
– Ivan Shatsky
Dec 30 '18 at 4:49
If it gets the job done, why not
– Lucian Davidescu
Jan 1 at 5:14
If it gets the job done, why not
– Lucian Davidescu
Jan 1 at 5:14
add a comment |
4 Answers
4
active
oldest
votes
(as pointed out in the comments), ngx_http_lua_module
can do it!
location / {
access_by_lua_block {
os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
}
}
note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an &
to the end so it looks like
>/dev/null 2>/dev/null &")
Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?
– Lucian Davidescu
Jan 1 at 16:40
1
@LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!
– cnst
Jan 4 at 15:53
1
@LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.
– cnst
Jan 4 at 22:42
1
@LucianDavidescu seems you can get an array of request headers (sent by the browser) by runninglocal headers, err = ngx.resp.get_headers();
, and get an array of response headers (sent by nginx) by usinglocal headers, err = ngx.req.get_headers()
- but you should probably uselog_by_lua_block
instead ofaccess_by_lua_block
– hanshenrik
Jan 6 at 8:21
1
Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.
– Lucian Davidescu
Jan 7 at 12:35
|
show 7 more comments
What you're trying to do — execute a new curl
instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:
Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.
On the other hand, the performance of
fork
, the underlying system call that creates a new process, which would be necessary if you want to runcurl
for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.
What's the best alternative solution with better architecture?
My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a
cron
job, seecrontab(5)
andcrontab(1)
.
Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:
My suggestion would still be to utilise
access_log
in such daemon, for example, through atail -f /var/www/logs/access_log
equivalent in your favourite programming language, where you'd be opening theaccess_log
file as a stream, and processing data as it comes and when it comes.Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
You could multiplex this through nginx with the help of the not-built-by-defaultauth_request
oradd_after_body
to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through thechan string
from the first, processing them as it goes and sending appropriate requests to Google Analytics.
Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.
Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.
– Lucian Davidescu
Jan 4 at 7:57
1
btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)
– hanshenrik
Jan 4 at 8:03
add a comment |
If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body>
tag:
sub_filter_once on;
sub_filter '</body>' "<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-XXXXXXXX-X', 'auto');
ga('send', 'pageview');
</script></body>";
location / {
}
This Nginx module is called sub
.
what happens to non-html files that happen to contain the phrase</body>
? for example XML files?
– hanshenrik
Jan 7 at 17:15
1
@hanshenrik it's configurable. The default configuration is to replace in the files withtext/html
MIME type and it's possible to permit in others.
– Alexander Azarov
Jan 7 at 17:47
It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...
– Lucian Davidescu
Jan 7 at 18:14
add a comment |
Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.
# pick your location
location /example {
# invite lua to the party
access_by_lua_block {
# set request parameters
local request = {
v = 1,
t = "pageview",
# don' forget to put your own property here
tid = "UA-XXXXXXX-Y",
# this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies
cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
uip = ngx.var.remote_addr,
dp = ngx.var.request_uri,
dr = ngx.var.http_referer,
ua = ngx.var.http_user_agent,
# here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5
ul = string.sub(ngx.var.http_accept_language, 1, 2)
}
# use the location.capture thingy to send everything to a proxy
local res = ngx.location.capture( "/gamp", {
method = ngx.HTTP_POST,
body = ngx.encode_args(request)
})
}
}
# make a separate location block to proxy the request away
location = /gamp {
internal;
expires epoch;
access_log off;
proxy_pass_request_headers off;
proxy_pass_request_body on;
proxy_pass https://google-analytics.com/collect;
}
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53975285%2fcan-a-http-request-be-sent-with-the-nginx-location-directive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
(as pointed out in the comments), ngx_http_lua_module
can do it!
location / {
access_by_lua_block {
os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
}
}
note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an &
to the end so it looks like
>/dev/null 2>/dev/null &")
Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?
– Lucian Davidescu
Jan 1 at 16:40
1
@LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!
– cnst
Jan 4 at 15:53
1
@LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.
– cnst
Jan 4 at 22:42
1
@LucianDavidescu seems you can get an array of request headers (sent by the browser) by runninglocal headers, err = ngx.resp.get_headers();
, and get an array of response headers (sent by nginx) by usinglocal headers, err = ngx.req.get_headers()
- but you should probably uselog_by_lua_block
instead ofaccess_by_lua_block
– hanshenrik
Jan 6 at 8:21
1
Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.
– Lucian Davidescu
Jan 7 at 12:35
|
show 7 more comments
(as pointed out in the comments), ngx_http_lua_module
can do it!
location / {
access_by_lua_block {
os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
}
}
note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an &
to the end so it looks like
>/dev/null 2>/dev/null &")
Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?
– Lucian Davidescu
Jan 1 at 16:40
1
@LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!
– cnst
Jan 4 at 15:53
1
@LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.
– cnst
Jan 4 at 22:42
1
@LucianDavidescu seems you can get an array of request headers (sent by the browser) by runninglocal headers, err = ngx.resp.get_headers();
, and get an array of response headers (sent by nginx) by usinglocal headers, err = ngx.req.get_headers()
- but you should probably uselog_by_lua_block
instead ofaccess_by_lua_block
– hanshenrik
Jan 6 at 8:21
1
Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.
– Lucian Davidescu
Jan 7 at 12:35
|
show 7 more comments
(as pointed out in the comments), ngx_http_lua_module
can do it!
location / {
access_by_lua_block {
os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
}
}
note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an &
to the end so it looks like
>/dev/null 2>/dev/null &")
(as pointed out in the comments), ngx_http_lua_module
can do it!
location / {
access_by_lua_block {
os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
}
}
note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an &
to the end so it looks like
>/dev/null 2>/dev/null &")
answered Jan 1 at 9:38
hanshenrikhanshenrik
10.4k21840
10.4k21840
Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?
– Lucian Davidescu
Jan 1 at 16:40
1
@LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!
– cnst
Jan 4 at 15:53
1
@LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.
– cnst
Jan 4 at 22:42
1
@LucianDavidescu seems you can get an array of request headers (sent by the browser) by runninglocal headers, err = ngx.resp.get_headers();
, and get an array of response headers (sent by nginx) by usinglocal headers, err = ngx.req.get_headers()
- but you should probably uselog_by_lua_block
instead ofaccess_by_lua_block
– hanshenrik
Jan 6 at 8:21
1
Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.
– Lucian Davidescu
Jan 7 at 12:35
|
show 7 more comments
Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?
– Lucian Davidescu
Jan 1 at 16:40
1
@LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!
– cnst
Jan 4 at 15:53
1
@LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.
– cnst
Jan 4 at 22:42
1
@LucianDavidescu seems you can get an array of request headers (sent by the browser) by runninglocal headers, err = ngx.resp.get_headers();
, and get an array of response headers (sent by nginx) by usinglocal headers, err = ngx.req.get_headers()
- but you should probably uselog_by_lua_block
instead ofaccess_by_lua_block
– hanshenrik
Jan 6 at 8:21
1
Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.
– Lucian Davidescu
Jan 7 at 12:35
Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?
– Lucian Davidescu
Jan 1 at 16:40
Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?
– Lucian Davidescu
Jan 1 at 16:40
1
1
@LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!
– cnst
Jan 4 at 15:53
@LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!
– cnst
Jan 4 at 15:53
1
1
@LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.
– cnst
Jan 4 at 22:42
@LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.
– cnst
Jan 4 at 22:42
1
1
@LucianDavidescu seems you can get an array of request headers (sent by the browser) by running
local headers, err = ngx.resp.get_headers();
, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers()
- but you should probably use log_by_lua_block
instead of access_by_lua_block
– hanshenrik
Jan 6 at 8:21
@LucianDavidescu seems you can get an array of request headers (sent by the browser) by running
local headers, err = ngx.resp.get_headers();
, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers()
- but you should probably use log_by_lua_block
instead of access_by_lua_block
– hanshenrik
Jan 6 at 8:21
1
1
Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.
– Lucian Davidescu
Jan 7 at 12:35
Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.
– Lucian Davidescu
Jan 7 at 12:35
|
show 7 more comments
What you're trying to do — execute a new curl
instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:
Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.
On the other hand, the performance of
fork
, the underlying system call that creates a new process, which would be necessary if you want to runcurl
for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.
What's the best alternative solution with better architecture?
My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a
cron
job, seecrontab(5)
andcrontab(1)
.
Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:
My suggestion would still be to utilise
access_log
in such daemon, for example, through atail -f /var/www/logs/access_log
equivalent in your favourite programming language, where you'd be opening theaccess_log
file as a stream, and processing data as it comes and when it comes.Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
You could multiplex this through nginx with the help of the not-built-by-defaultauth_request
oradd_after_body
to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through thechan string
from the first, processing them as it goes and sending appropriate requests to Google Analytics.
Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.
Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.
– Lucian Davidescu
Jan 4 at 7:57
1
btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)
– hanshenrik
Jan 4 at 8:03
add a comment |
What you're trying to do — execute a new curl
instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:
Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.
On the other hand, the performance of
fork
, the underlying system call that creates a new process, which would be necessary if you want to runcurl
for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.
What's the best alternative solution with better architecture?
My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a
cron
job, seecrontab(5)
andcrontab(1)
.
Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:
My suggestion would still be to utilise
access_log
in such daemon, for example, through atail -f /var/www/logs/access_log
equivalent in your favourite programming language, where you'd be opening theaccess_log
file as a stream, and processing data as it comes and when it comes.Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
You could multiplex this through nginx with the help of the not-built-by-defaultauth_request
oradd_after_body
to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through thechan string
from the first, processing them as it goes and sending appropriate requests to Google Analytics.
Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.
Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.
– Lucian Davidescu
Jan 4 at 7:57
1
btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)
– hanshenrik
Jan 4 at 8:03
add a comment |
What you're trying to do — execute a new curl
instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:
Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.
On the other hand, the performance of
fork
, the underlying system call that creates a new process, which would be necessary if you want to runcurl
for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.
What's the best alternative solution with better architecture?
My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a
cron
job, seecrontab(5)
andcrontab(1)
.
Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:
My suggestion would still be to utilise
access_log
in such daemon, for example, through atail -f /var/www/logs/access_log
equivalent in your favourite programming language, where you'd be opening theaccess_log
file as a stream, and processing data as it comes and when it comes.Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
You could multiplex this through nginx with the help of the not-built-by-defaultauth_request
oradd_after_body
to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through thechan string
from the first, processing them as it goes and sending appropriate requests to Google Analytics.
Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.
What you're trying to do — execute a new curl
instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:
Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.
On the other hand, the performance of
fork
, the underlying system call that creates a new process, which would be necessary if you want to runcurl
for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.
What's the best alternative solution with better architecture?
My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a
cron
job, seecrontab(5)
andcrontab(1)
.
Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:
My suggestion would still be to utilise
access_log
in such daemon, for example, through atail -f /var/www/logs/access_log
equivalent in your favourite programming language, where you'd be opening theaccess_log
file as a stream, and processing data as it comes and when it comes.Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
You could multiplex this through nginx with the help of the not-built-by-defaultauth_request
oradd_after_body
to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through thechan string
from the first, processing them as it goes and sending appropriate requests to Google Analytics.
Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.
answered Jan 3 at 19:48
cnstcnst
14.3k25184
14.3k25184
Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.
– Lucian Davidescu
Jan 4 at 7:57
1
btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)
– hanshenrik
Jan 4 at 8:03
add a comment |
Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.
– Lucian Davidescu
Jan 4 at 7:57
1
btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)
– hanshenrik
Jan 4 at 8:03
Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.
– Lucian Davidescu
Jan 4 at 7:57
Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.
– Lucian Davidescu
Jan 4 at 7:57
1
1
btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)
– hanshenrik
Jan 4 at 8:03
btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)
– hanshenrik
Jan 4 at 8:03
add a comment |
If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body>
tag:
sub_filter_once on;
sub_filter '</body>' "<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-XXXXXXXX-X', 'auto');
ga('send', 'pageview');
</script></body>";
location / {
}
This Nginx module is called sub
.
what happens to non-html files that happen to contain the phrase</body>
? for example XML files?
– hanshenrik
Jan 7 at 17:15
1
@hanshenrik it's configurable. The default configuration is to replace in the files withtext/html
MIME type and it's possible to permit in others.
– Alexander Azarov
Jan 7 at 17:47
It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...
– Lucian Davidescu
Jan 7 at 18:14
add a comment |
If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body>
tag:
sub_filter_once on;
sub_filter '</body>' "<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-XXXXXXXX-X', 'auto');
ga('send', 'pageview');
</script></body>";
location / {
}
This Nginx module is called sub
.
what happens to non-html files that happen to contain the phrase</body>
? for example XML files?
– hanshenrik
Jan 7 at 17:15
1
@hanshenrik it's configurable. The default configuration is to replace in the files withtext/html
MIME type and it's possible to permit in others.
– Alexander Azarov
Jan 7 at 17:47
It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...
– Lucian Davidescu
Jan 7 at 18:14
add a comment |
If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body>
tag:
sub_filter_once on;
sub_filter '</body>' "<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-XXXXXXXX-X', 'auto');
ga('send', 'pageview');
</script></body>";
location / {
}
This Nginx module is called sub
.
If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body>
tag:
sub_filter_once on;
sub_filter '</body>' "<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-XXXXXXXX-X', 'auto');
ga('send', 'pageview');
</script></body>";
location / {
}
This Nginx module is called sub
.
answered Jan 5 at 14:20
Alexander AzarovAlexander Azarov
10.4k23945
10.4k23945
what happens to non-html files that happen to contain the phrase</body>
? for example XML files?
– hanshenrik
Jan 7 at 17:15
1
@hanshenrik it's configurable. The default configuration is to replace in the files withtext/html
MIME type and it's possible to permit in others.
– Alexander Azarov
Jan 7 at 17:47
It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...
– Lucian Davidescu
Jan 7 at 18:14
add a comment |
what happens to non-html files that happen to contain the phrase</body>
? for example XML files?
– hanshenrik
Jan 7 at 17:15
1
@hanshenrik it's configurable. The default configuration is to replace in the files withtext/html
MIME type and it's possible to permit in others.
– Alexander Azarov
Jan 7 at 17:47
It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...
– Lucian Davidescu
Jan 7 at 18:14
what happens to non-html files that happen to contain the phrase
</body>
? for example XML files?– hanshenrik
Jan 7 at 17:15
what happens to non-html files that happen to contain the phrase
</body>
? for example XML files?– hanshenrik
Jan 7 at 17:15
1
1
@hanshenrik it's configurable. The default configuration is to replace in the files with
text/html
MIME type and it's possible to permit in others.– Alexander Azarov
Jan 7 at 17:47
@hanshenrik it's configurable. The default configuration is to replace in the files with
text/html
MIME type and it's possible to permit in others.– Alexander Azarov
Jan 7 at 17:47
It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...
– Lucian Davidescu
Jan 7 at 18:14
It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...
– Lucian Davidescu
Jan 7 at 18:14
add a comment |
Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.
# pick your location
location /example {
# invite lua to the party
access_by_lua_block {
# set request parameters
local request = {
v = 1,
t = "pageview",
# don' forget to put your own property here
tid = "UA-XXXXXXX-Y",
# this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies
cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
uip = ngx.var.remote_addr,
dp = ngx.var.request_uri,
dr = ngx.var.http_referer,
ua = ngx.var.http_user_agent,
# here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5
ul = string.sub(ngx.var.http_accept_language, 1, 2)
}
# use the location.capture thingy to send everything to a proxy
local res = ngx.location.capture( "/gamp", {
method = ngx.HTTP_POST,
body = ngx.encode_args(request)
})
}
}
# make a separate location block to proxy the request away
location = /gamp {
internal;
expires epoch;
access_log off;
proxy_pass_request_headers off;
proxy_pass_request_body on;
proxy_pass https://google-analytics.com/collect;
}
add a comment |
Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.
# pick your location
location /example {
# invite lua to the party
access_by_lua_block {
# set request parameters
local request = {
v = 1,
t = "pageview",
# don' forget to put your own property here
tid = "UA-XXXXXXX-Y",
# this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies
cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
uip = ngx.var.remote_addr,
dp = ngx.var.request_uri,
dr = ngx.var.http_referer,
ua = ngx.var.http_user_agent,
# here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5
ul = string.sub(ngx.var.http_accept_language, 1, 2)
}
# use the location.capture thingy to send everything to a proxy
local res = ngx.location.capture( "/gamp", {
method = ngx.HTTP_POST,
body = ngx.encode_args(request)
})
}
}
# make a separate location block to proxy the request away
location = /gamp {
internal;
expires epoch;
access_log off;
proxy_pass_request_headers off;
proxy_pass_request_body on;
proxy_pass https://google-analytics.com/collect;
}
add a comment |
Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.
# pick your location
location /example {
# invite lua to the party
access_by_lua_block {
# set request parameters
local request = {
v = 1,
t = "pageview",
# don' forget to put your own property here
tid = "UA-XXXXXXX-Y",
# this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies
cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
uip = ngx.var.remote_addr,
dp = ngx.var.request_uri,
dr = ngx.var.http_referer,
ua = ngx.var.http_user_agent,
# here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5
ul = string.sub(ngx.var.http_accept_language, 1, 2)
}
# use the location.capture thingy to send everything to a proxy
local res = ngx.location.capture( "/gamp", {
method = ngx.HTTP_POST,
body = ngx.encode_args(request)
})
}
}
# make a separate location block to proxy the request away
location = /gamp {
internal;
expires epoch;
access_log off;
proxy_pass_request_headers off;
proxy_pass_request_body on;
proxy_pass https://google-analytics.com/collect;
}
Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.
# pick your location
location /example {
# invite lua to the party
access_by_lua_block {
# set request parameters
local request = {
v = 1,
t = "pageview",
# don' forget to put your own property here
tid = "UA-XXXXXXX-Y",
# this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies
cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
uip = ngx.var.remote_addr,
dp = ngx.var.request_uri,
dr = ngx.var.http_referer,
ua = ngx.var.http_user_agent,
# here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5
ul = string.sub(ngx.var.http_accept_language, 1, 2)
}
# use the location.capture thingy to send everything to a proxy
local res = ngx.location.capture( "/gamp", {
method = ngx.HTTP_POST,
body = ngx.encode_args(request)
})
}
}
# make a separate location block to proxy the request away
location = /gamp {
internal;
expires epoch;
access_log off;
proxy_pass_request_headers off;
proxy_pass_request_body on;
proxy_pass https://google-analytics.com/collect;
}
answered Jan 8 at 9:07
Lucian DavidescuLucian Davidescu
239416
239416
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53975285%2fcan-a-http-request-be-sent-with-the-nginx-location-directive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.
– Ivan Shatsky
Dec 30 '18 at 4:49
If it gets the job done, why not
– Lucian Davidescu
Jan 1 at 5:14