Can a http request be sent with the nginx location directive?












2















Maybe this is trivial, but I haven't found anything meaningful or I didn't know where to look...



(How) is it possible to to send a curl / whatever command as soon as a certain path is requested?



Something along these lines, but that would actually work:



location / {
curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect
}









share|improve this question


















  • 2





    I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.

    – Ivan Shatsky
    Dec 30 '18 at 4:49











  • If it gets the job done, why not

    – Lucian Davidescu
    Jan 1 at 5:14
















2















Maybe this is trivial, but I haven't found anything meaningful or I didn't know where to look...



(How) is it possible to to send a curl / whatever command as soon as a certain path is requested?



Something along these lines, but that would actually work:



location / {
curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect
}









share|improve this question


















  • 2





    I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.

    – Ivan Shatsky
    Dec 30 '18 at 4:49











  • If it gets the job done, why not

    – Lucian Davidescu
    Jan 1 at 5:14














2












2








2








Maybe this is trivial, but I haven't found anything meaningful or I didn't know where to look...



(How) is it possible to to send a curl / whatever command as soon as a certain path is requested?



Something along these lines, but that would actually work:



location / {
curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect
}









share|improve this question














Maybe this is trivial, but I haven't found anything meaningful or I didn't know where to look...



(How) is it possible to to send a curl / whatever command as soon as a certain path is requested?



Something along these lines, but that would actually work:



location / {
curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect
}






http nginx curl measurement-protocol






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 30 '18 at 4:27









Lucian DavidescuLucian Davidescu

239416




239416








  • 2





    I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.

    – Ivan Shatsky
    Dec 30 '18 at 4:49











  • If it gets the job done, why not

    – Lucian Davidescu
    Jan 1 at 5:14














  • 2





    I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.

    – Ivan Shatsky
    Dec 30 '18 at 4:49











  • If it gets the job done, why not

    – Lucian Davidescu
    Jan 1 at 5:14








2




2





I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.

– Ivan Shatsky
Dec 30 '18 at 4:49





I don't know (how) if this can be done with "pure" nginx, but can give you a recipe on how to do this with OpenResty (or ngx_http_lua_module) if this is an option for you.

– Ivan Shatsky
Dec 30 '18 at 4:49













If it gets the job done, why not

– Lucian Davidescu
Jan 1 at 5:14





If it gets the job done, why not

– Lucian Davidescu
Jan 1 at 5:14












4 Answers
4






active

oldest

votes


















3





+100









(as pointed out in the comments), ngx_http_lua_module can do it!



location / {
access_by_lua_block {
os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
}
}


note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an & to the end so it looks like



>/dev/null 2>/dev/null &")





share|improve this answer
























  • Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?

    – Lucian Davidescu
    Jan 1 at 16:40






  • 1





    @LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!

    – cnst
    Jan 4 at 15:53








  • 1





    @LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.

    – cnst
    Jan 4 at 22:42






  • 1





    @LucianDavidescu seems you can get an array of request headers (sent by the browser) by running local headers, err = ngx.resp.get_headers();, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers() - but you should probably use log_by_lua_block instead of access_by_lua_block

    – hanshenrik
    Jan 6 at 8:21






  • 1





    Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.

    – Lucian Davidescu
    Jan 7 at 12:35



















3














What you're trying to do — execute a new curl instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:




  1. Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.


  2. On the other hand, the performance of fork, the underlying system call that creates a new process, which would be necessary if you want to run curl for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.





What's the best alternative solution with better architecture?




  • My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a cron job, see crontab(5) and crontab(1).



  • Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:




    • My suggestion would still be to utilise access_log in such daemon, for example, through a tail -f /var/www/logs/access_log equivalent in your favourite programming language, where you'd be opening the access_log file as a stream, and processing data as it comes and when it comes.


    • Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
      You could multiplex this through nginx with the help of the not-built-by-default auth_request or add_after_body to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through the chan string from the first, processing them as it goes and sending appropriate requests to Google Analytics.





Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.






share|improve this answer
























  • Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.

    – Lucian Davidescu
    Jan 4 at 7:57






  • 1





    btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)

    – hanshenrik
    Jan 4 at 8:03





















2














If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body> tag:



sub_filter_once on;

sub_filter '</body>' "<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-XXXXXXXX-X', 'auto');
ga('send', 'pageview');
</script></body>";

location / {
}


This Nginx module is called sub.






share|improve this answer
























  • what happens to non-html files that happen to contain the phrase </body> ? for example XML files?

    – hanshenrik
    Jan 7 at 17:15






  • 1





    @hanshenrik it's configurable. The default configuration is to replace in the files with text/html MIME type and it's possible to permit in others.

    – Alexander Azarov
    Jan 7 at 17:47











  • It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...

    – Lucian Davidescu
    Jan 7 at 18:14



















1














Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.



# pick your location 

location /example {

# invite lua to the party

access_by_lua_block {

# set request parameters

local request = {
v = 1,
t = "pageview",

# don' forget to put your own property here

tid = "UA-XXXXXXX-Y",

# this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies

cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
uip = ngx.var.remote_addr,
dp = ngx.var.request_uri,
dr = ngx.var.http_referer,
ua = ngx.var.http_user_agent,

# here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5

ul = string.sub(ngx.var.http_accept_language, 1, 2)
}

# use the location.capture thingy to send everything to a proxy

local res = ngx.location.capture( "/gamp", {
method = ngx.HTTP_POST,
body = ngx.encode_args(request)
})
}
}


# make a separate location block to proxy the request away

location = /gamp {
internal;
expires epoch;
access_log off;
proxy_pass_request_headers off;
proxy_pass_request_body on;
proxy_pass https://google-analytics.com/collect;
}





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53975285%2fcan-a-http-request-be-sent-with-the-nginx-location-directive%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3





    +100









    (as pointed out in the comments), ngx_http_lua_module can do it!



    location / {
    access_by_lua_block {
    os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
    }
    }


    note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an & to the end so it looks like



    >/dev/null 2>/dev/null &")





    share|improve this answer
























    • Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?

      – Lucian Davidescu
      Jan 1 at 16:40






    • 1





      @LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!

      – cnst
      Jan 4 at 15:53








    • 1





      @LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.

      – cnst
      Jan 4 at 22:42






    • 1





      @LucianDavidescu seems you can get an array of request headers (sent by the browser) by running local headers, err = ngx.resp.get_headers();, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers() - but you should probably use log_by_lua_block instead of access_by_lua_block

      – hanshenrik
      Jan 6 at 8:21






    • 1





      Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.

      – Lucian Davidescu
      Jan 7 at 12:35
















    3





    +100









    (as pointed out in the comments), ngx_http_lua_module can do it!



    location / {
    access_by_lua_block {
    os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
    }
    }


    note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an & to the end so it looks like



    >/dev/null 2>/dev/null &")





    share|improve this answer
























    • Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?

      – Lucian Davidescu
      Jan 1 at 16:40






    • 1





      @LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!

      – cnst
      Jan 4 at 15:53








    • 1





      @LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.

      – cnst
      Jan 4 at 22:42






    • 1





      @LucianDavidescu seems you can get an array of request headers (sent by the browser) by running local headers, err = ngx.resp.get_headers();, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers() - but you should probably use log_by_lua_block instead of access_by_lua_block

      – hanshenrik
      Jan 6 at 8:21






    • 1





      Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.

      – Lucian Davidescu
      Jan 7 at 12:35














    3





    +100







    3





    +100



    3




    +100





    (as pointed out in the comments), ngx_http_lua_module can do it!



    location / {
    access_by_lua_block {
    os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
    }
    }


    note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an & to the end so it looks like



    >/dev/null 2>/dev/null &")





    share|improve this answer













    (as pointed out in the comments), ngx_http_lua_module can do it!



    location / {
    access_by_lua_block {
    os.execute("/usr/bin/curl --data 'v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=123&dp=hit' https://google-analytics.com/collect >/dev/null 2>/dev/null")
    }
    }


    note that the execution halts the pageload until curl has finished. to run curl in the background and continue the pageload immediately, add a space and an & to the end so it looks like



    >/dev/null 2>/dev/null &")






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 1 at 9:38









    hanshenrikhanshenrik

    10.4k21840




    10.4k21840













    • Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?

      – Lucian Davidescu
      Jan 1 at 16:40






    • 1





      @LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!

      – cnst
      Jan 4 at 15:53








    • 1





      @LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.

      – cnst
      Jan 4 at 22:42






    • 1





      @LucianDavidescu seems you can get an array of request headers (sent by the browser) by running local headers, err = ngx.resp.get_headers();, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers() - but you should probably use log_by_lua_block instead of access_by_lua_block

      – hanshenrik
      Jan 6 at 8:21






    • 1





      Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.

      – Lucian Davidescu
      Jan 7 at 12:35



















    • Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?

      – Lucian Davidescu
      Jan 1 at 16:40






    • 1





      @LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!

      – cnst
      Jan 4 at 15:53








    • 1





      @LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.

      – cnst
      Jan 4 at 22:42






    • 1





      @LucianDavidescu seems you can get an array of request headers (sent by the browser) by running local headers, err = ngx.resp.get_headers();, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers() - but you should probably use log_by_lua_block instead of access_by_lua_block

      – hanshenrik
      Jan 6 at 8:21






    • 1





      Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.

      – Lucian Davidescu
      Jan 7 at 12:35

















    Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?

    – Lucian Davidescu
    Jan 1 at 16:40





    Yep, it's working! Had to install openresty and no more http/2 support for now - hopefully they'll release a version based on nginx >1.13.9 soon... Any way to pass existing headers as parameters into that?

    – Lucian Davidescu
    Jan 1 at 16:40




    1




    1





    @LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!

    – cnst
    Jan 4 at 15:53







    @LucianDavidescu I honestly can't believe this solution is even considered to be acceptable even for "testing" purposes, let alone any sort of production environment. Spawning a new process in the background, whilst immediately returning back to the client, makes it trivial for a single client using a single TCP connection to completely bring down your whole machine in a matter of seconds, yes, your whole machine, through a trivial exhaustion and overload of the process table. The solution proposed in this answer is hardly different from what would be a forkbomb!

    – cnst
    Jan 4 at 15:53






    1




    1





    @LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.

    – cnst
    Jan 4 at 22:42





    @LucianDavidescu, it could be anything. What if DNS is down, or Google decides to throttle you, or IPv6 gets configured, but doesn't work? Each curl instance would persist for 3+ minutes, with more coming each request. You wouldn't be able to login into a system with shell, because process table is exhausted. Your best bet would be if you're using shared hosting, and/or fork is just slow (and they really are), and are limited to 20 to 100 forks a second, which takes 2/3rd of your CPU power, slowing down the rest of your site. I don't think you fully realise just how expensive forks are.

    – cnst
    Jan 4 at 22:42




    1




    1





    @LucianDavidescu seems you can get an array of request headers (sent by the browser) by running local headers, err = ngx.resp.get_headers();, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers() - but you should probably use log_by_lua_block instead of access_by_lua_block

    – hanshenrik
    Jan 6 at 8:21





    @LucianDavidescu seems you can get an array of request headers (sent by the browser) by running local headers, err = ngx.resp.get_headers();, and get an array of response headers (sent by nginx) by using local headers, err = ngx.req.get_headers() - but you should probably use log_by_lua_block instead of access_by_lua_block

    – hanshenrik
    Jan 6 at 8:21




    1




    1





    Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.

    – Lucian Davidescu
    Jan 7 at 12:35





    Found this - github.com/vorodevops/nginx-analytics-measurement-protocol/tree/… it uses proxy_pass, works quite nice so far.

    – Lucian Davidescu
    Jan 7 at 12:35













    3














    What you're trying to do — execute a new curl instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:




    1. Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.


    2. On the other hand, the performance of fork, the underlying system call that creates a new process, which would be necessary if you want to run curl for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.





    What's the best alternative solution with better architecture?




    • My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a cron job, see crontab(5) and crontab(1).



    • Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:




      • My suggestion would still be to utilise access_log in such daemon, for example, through a tail -f /var/www/logs/access_log equivalent in your favourite programming language, where you'd be opening the access_log file as a stream, and processing data as it comes and when it comes.


      • Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
        You could multiplex this through nginx with the help of the not-built-by-default auth_request or add_after_body to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through the chan string from the first, processing them as it goes and sending appropriate requests to Google Analytics.





    Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.






    share|improve this answer
























    • Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.

      – Lucian Davidescu
      Jan 4 at 7:57






    • 1





      btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)

      – hanshenrik
      Jan 4 at 8:03


















    3














    What you're trying to do — execute a new curl instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:




    1. Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.


    2. On the other hand, the performance of fork, the underlying system call that creates a new process, which would be necessary if you want to run curl for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.





    What's the best alternative solution with better architecture?




    • My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a cron job, see crontab(5) and crontab(1).



    • Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:




      • My suggestion would still be to utilise access_log in such daemon, for example, through a tail -f /var/www/logs/access_log equivalent in your favourite programming language, where you'd be opening the access_log file as a stream, and processing data as it comes and when it comes.


      • Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
        You could multiplex this through nginx with the help of the not-built-by-default auth_request or add_after_body to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through the chan string from the first, processing them as it goes and sending appropriate requests to Google Analytics.





    Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.






    share|improve this answer
























    • Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.

      – Lucian Davidescu
      Jan 4 at 7:57






    • 1





      btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)

      – hanshenrik
      Jan 4 at 8:03
















    3












    3








    3







    What you're trying to do — execute a new curl instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:




    1. Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.


    2. On the other hand, the performance of fork, the underlying system call that creates a new process, which would be necessary if you want to run curl for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.





    What's the best alternative solution with better architecture?




    • My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a cron job, see crontab(5) and crontab(1).



    • Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:




      • My suggestion would still be to utilise access_log in such daemon, for example, through a tail -f /var/www/logs/access_log equivalent in your favourite programming language, where you'd be opening the access_log file as a stream, and processing data as it comes and when it comes.


      • Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
        You could multiplex this through nginx with the help of the not-built-by-default auth_request or add_after_body to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through the chan string from the first, processing them as it goes and sending appropriate requests to Google Analytics.





    Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.






    share|improve this answer













    What you're trying to do — execute a new curl instance for Google Analytics on each URL request on your server — is a wrong approach to the problem:




    1. Nginx itself is easily capable of servicing 10k+ concurrent connections at any given time as a lower limit, i.e., as a minimum, if you do things right, see https://en.wikipedia.org/wiki/C10k_problem.


    2. On the other hand, the performance of fork, the underlying system call that creates a new process, which would be necessary if you want to run curl for each request, is very slow, on the order 1k forks per second as an upper limit, e.g., if you do things right, that's the fastest it'll ever go, see Faster forking of large processes on Linux?.





    What's the best alternative solution with better architecture?




    • My recommendation would be to perform this through batch processing. You're not really gaining anything by doing Google Analytics in real time, and a 5 minute delay in statistics should be more than adequate. You could write a simple script in a programming language of your choice to look through relevant http://nginx.org/r/access_log, collect the data for the required time period, and make a single batch request (and/or multiple individual requests from within a single process) to Google Analytics with the requisite information about each visitor in the last 5 minutes. You can run this as a daemon process, or as a script from a cron job, see crontab(5) and crontab(1).



    • Alternatively, if you still want real-time processing for Google Analytics (which I don't recommend, because most of these services themselves are implemented on an eventual consistency basis, meaning, GA itself wouldn't necessarily guarantee accurate real-time statistics for the last XX seconds/minutes/hours/etc), then you might want to implement a daemon of some sort to handle statistics in real time:




      • My suggestion would still be to utilise access_log in such daemon, for example, through a tail -f /var/www/logs/access_log equivalent in your favourite programming language, where you'd be opening the access_log file as a stream, and processing data as it comes and when it comes.


      • Alternatively, you could implement this daemon to have an HTTP request interface itself, and duplicate each incoming request to both your actual backend, as well as this extra server.
        You could multiplex this through nginx with the help of the not-built-by-default auth_request or add_after_body to make a "free" subrequest for each request. This subrequest would go to your server, for example, written in Go. The server would have at least two goroutines: one would process incoming requests into a queue (implemented through a buffered string channel), immediately issuing a reply to the client, to make sure to not delay nginx upstream; another one would receive the requests from the first one through the chan string from the first, processing them as it goes and sending appropriate requests to Google Analytics.





    Ultimately, whichever way you'd go, you'd probably still want to implement some level of batching and/or throttling, because I'd imagine at one point, Google Analytics itself would likely have throttling if you keep sending it requests from the same IP address on a very excessive basis without any sort of a batch implementation at stake. As per What is the rate limit for direct use of the Google Analytics Measurement Protocol API? as well as https://developers.google.com/analytics/devguides/collection/protocol/v1/limits-quotas, it would appear that most libraries implement internal limits to how many requests per second they'd be sending to Google.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 3 at 19:48









    cnstcnst

    14.3k25184




    14.3k25184













    • Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.

      – Lucian Davidescu
      Jan 4 at 7:57






    • 1





      btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)

      – hanshenrik
      Jan 4 at 8:03





















    • Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.

      – Lucian Davidescu
      Jan 4 at 7:57






    • 1





      btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)

      – hanshenrik
      Jan 4 at 8:03



















    Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.

    – Lucian Davidescu
    Jan 4 at 7:57





    Indeed, that seems to be the scalable long-term solution. However, on the one hand the simultaneous connections and rate-limits are quite high for most use cases anyway (unless there are also performance issues) while on the other hand i think that a "quick and dirty" approach may come handy useful at least for testing purposes.

    – Lucian Davidescu
    Jan 4 at 7:57




    1




    1





    btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)

    – hanshenrik
    Jan 4 at 8:03







    btw i wrote some code to parse nginx access logs in PHP, see line 58 here github.com/divinity76/http_log_parser/blob/master/… (but that code is from 2015 and unmaintained, idk if there's been any changes since 2015)

    – hanshenrik
    Jan 4 at 8:03













    2














    If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body> tag:



    sub_filter_once on;

    sub_filter '</body>' "<script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-XXXXXXXX-X', 'auto');
    ga('send', 'pageview');
    </script></body>";

    location / {
    }


    This Nginx module is called sub.






    share|improve this answer
























    • what happens to non-html files that happen to contain the phrase </body> ? for example XML files?

      – hanshenrik
      Jan 7 at 17:15






    • 1





      @hanshenrik it's configurable. The default configuration is to replace in the files with text/html MIME type and it's possible to permit in others.

      – Alexander Azarov
      Jan 7 at 17:47











    • It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...

      – Lucian Davidescu
      Jan 7 at 18:14
















    2














    If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body> tag:



    sub_filter_once on;

    sub_filter '</body>' "<script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-XXXXXXXX-X', 'auto');
    ga('send', 'pageview');
    </script></body>";

    location / {
    }


    This Nginx module is called sub.






    share|improve this answer
























    • what happens to non-html files that happen to contain the phrase </body> ? for example XML files?

      – hanshenrik
      Jan 7 at 17:15






    • 1





      @hanshenrik it's configurable. The default configuration is to replace in the files with text/html MIME type and it's possible to permit in others.

      – Alexander Azarov
      Jan 7 at 17:47











    • It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...

      – Lucian Davidescu
      Jan 7 at 18:14














    2












    2








    2







    If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body> tag:



    sub_filter_once on;

    sub_filter '</body>' "<script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-XXXXXXXX-X', 'auto');
    ga('send', 'pageview');
    </script></body>";

    location / {
    }


    This Nginx module is called sub.






    share|improve this answer













    If everything you need is to submit a hit to Google Analytics, then it can be accomplished easier: Nginx can modify page HTML on the fly, embedding GA code before the closing </body> tag:



    sub_filter_once on;

    sub_filter '</body>' "<script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-XXXXXXXX-X', 'auto');
    ga('send', 'pageview');
    </script></body>";

    location / {
    }


    This Nginx module is called sub.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 5 at 14:20









    Alexander AzarovAlexander Azarov

    10.4k23945




    10.4k23945













    • what happens to non-html files that happen to contain the phrase </body> ? for example XML files?

      – hanshenrik
      Jan 7 at 17:15






    • 1





      @hanshenrik it's configurable. The default configuration is to replace in the files with text/html MIME type and it's possible to permit in others.

      – Alexander Azarov
      Jan 7 at 17:47











    • It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...

      – Lucian Davidescu
      Jan 7 at 18:14



















    • what happens to non-html files that happen to contain the phrase </body> ? for example XML files?

      – hanshenrik
      Jan 7 at 17:15






    • 1





      @hanshenrik it's configurable. The default configuration is to replace in the files with text/html MIME type and it's possible to permit in others.

      – Alexander Azarov
      Jan 7 at 17:47











    • It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...

      – Lucian Davidescu
      Jan 7 at 18:14

















    what happens to non-html files that happen to contain the phrase </body> ? for example XML files?

    – hanshenrik
    Jan 7 at 17:15





    what happens to non-html files that happen to contain the phrase </body> ? for example XML files?

    – hanshenrik
    Jan 7 at 17:15




    1




    1





    @hanshenrik it's configurable. The default configuration is to replace in the files with text/html MIME type and it's possible to permit in others.

    – Alexander Azarov
    Jan 7 at 17:47





    @hanshenrik it's configurable. The default configuration is to replace in the files with text/html MIME type and it's possible to permit in others.

    – Alexander Azarov
    Jan 7 at 17:47













    It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...

    – Lucian Davidescu
    Jan 7 at 18:14





    It's not that I don't have access to the site to put the javascript there in the first place if that's what I wanted to do...

    – Lucian Davidescu
    Jan 7 at 18:14











    1














    Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.



    # pick your location 

    location /example {

    # invite lua to the party

    access_by_lua_block {

    # set request parameters

    local request = {
    v = 1,
    t = "pageview",

    # don' forget to put your own property here

    tid = "UA-XXXXXXX-Y",

    # this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies

    cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
    uip = ngx.var.remote_addr,
    dp = ngx.var.request_uri,
    dr = ngx.var.http_referer,
    ua = ngx.var.http_user_agent,

    # here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5

    ul = string.sub(ngx.var.http_accept_language, 1, 2)
    }

    # use the location.capture thingy to send everything to a proxy

    local res = ngx.location.capture( "/gamp", {
    method = ngx.HTTP_POST,
    body = ngx.encode_args(request)
    })
    }
    }


    # make a separate location block to proxy the request away

    location = /gamp {
    internal;
    expires epoch;
    access_log off;
    proxy_pass_request_headers off;
    proxy_pass_request_body on;
    proxy_pass https://google-analytics.com/collect;
    }





    share|improve this answer




























      1














      Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.



      # pick your location 

      location /example {

      # invite lua to the party

      access_by_lua_block {

      # set request parameters

      local request = {
      v = 1,
      t = "pageview",

      # don' forget to put your own property here

      tid = "UA-XXXXXXX-Y",

      # this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies

      cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
      uip = ngx.var.remote_addr,
      dp = ngx.var.request_uri,
      dr = ngx.var.http_referer,
      ua = ngx.var.http_user_agent,

      # here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5

      ul = string.sub(ngx.var.http_accept_language, 1, 2)
      }

      # use the location.capture thingy to send everything to a proxy

      local res = ngx.location.capture( "/gamp", {
      method = ngx.HTTP_POST,
      body = ngx.encode_args(request)
      })
      }
      }


      # make a separate location block to proxy the request away

      location = /gamp {
      internal;
      expires epoch;
      access_log off;
      proxy_pass_request_headers off;
      proxy_pass_request_body on;
      proxy_pass https://google-analytics.com/collect;
      }





      share|improve this answer


























        1












        1








        1







        Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.



        # pick your location 

        location /example {

        # invite lua to the party

        access_by_lua_block {

        # set request parameters

        local request = {
        v = 1,
        t = "pageview",

        # don' forget to put your own property here

        tid = "UA-XXXXXXX-Y",

        # this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies

        cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
        uip = ngx.var.remote_addr,
        dp = ngx.var.request_uri,
        dr = ngx.var.http_referer,
        ua = ngx.var.http_user_agent,

        # here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5

        ul = string.sub(ngx.var.http_accept_language, 1, 2)
        }

        # use the location.capture thingy to send everything to a proxy

        local res = ngx.location.capture( "/gamp", {
        method = ngx.HTTP_POST,
        body = ngx.encode_args(request)
        })
        }
        }


        # make a separate location block to proxy the request away

        location = /gamp {
        internal;
        expires epoch;
        access_log off;
        proxy_pass_request_headers off;
        proxy_pass_request_body on;
        proxy_pass https://google-analytics.com/collect;
        }





        share|improve this answer













        Here's how I did it eventually - proxy_pass instead of curl - based on this: https://github.com/vorodevops/nginx-analytics-measurement-protocol/tree/master/lua. The code assumes openresty or just lua installed. Not sure if the comments format is compatible (didn't test) so it may be best to delete them before using it.



        # pick your location 

        location /example {

        # invite lua to the party

        access_by_lua_block {

        # set request parameters

        local request = {
        v = 1,
        t = "pageview",

        # don' forget to put your own property here

        tid = "UA-XXXXXXX-Y",

        # this is a "unique" user id based on a hash of ip and user agent, not too reliable but possibly best that one can reasonably do without cookies

        cid = ngx.md5(ngx.var.remote_addr .. ngx.var.http_user_agent),
        uip = ngx.var.remote_addr,
        dp = ngx.var.request_uri,
        dr = ngx.var.http_referer,
        ua = ngx.var.http_user_agent,

        # here you truncate the language string to make it compatible with the javascript format - you'll want either the first two characters like here (e.g. en) or the first five (e.g en_US) with ...1, 5

        ul = string.sub(ngx.var.http_accept_language, 1, 2)
        }

        # use the location.capture thingy to send everything to a proxy

        local res = ngx.location.capture( "/gamp", {
        method = ngx.HTTP_POST,
        body = ngx.encode_args(request)
        })
        }
        }


        # make a separate location block to proxy the request away

        location = /gamp {
        internal;
        expires epoch;
        access_log off;
        proxy_pass_request_headers off;
        proxy_pass_request_body on;
        proxy_pass https://google-analytics.com/collect;
        }






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 8 at 9:07









        Lucian DavidescuLucian Davidescu

        239416




        239416






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53975285%2fcan-a-http-request-be-sent-with-the-nginx-location-directive%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            Npm cannot find a required file even through it is in the searched directory