Should you guard against unexpected values from external APIs?












50















Lets say you are coding a function that takes input from an external API MyAPI.



That external API MyAPI has a contract that states it will return a string or a number.



Is it recommended to guard against things like null, undefined, boolean, etc. even though it's not part of the API of MyAPI? In particular, since you have no control over that API you cannot make the guarantee through something like static type analysis so it's better to be safe than sorry?



I'm thinking in relation to the Robustness Principle.










share|improve this question




















  • 16





    What are the impacts of not handling those unexpected values if they are returned? Can you live with these impacts? Is it worth the complexity to handle those unexpected values to prevent having to deal with the impacts?

    – Vincent Savard
    Jan 14 at 18:13








  • 55





    If you're expecting them, then by definition they're not unexpected.

    – Mason Wheeler
    Jan 14 at 19:33






  • 28





    Remember the API isn't obligated to only give you valid JSON back (I'm assuming this is JSON). You could also get a reply like <!doctype html><html><head><title>504 Gateway Timeout</title></head><body>The server was unable to process your request. Make sure you have typed the address correctly. If the problem persists, please try again later.</body></html>

    – immibis
    Jan 14 at 22:18






  • 5





    What does "external API" mean? Is it still under your Control?

    – Deduplicator
    Jan 14 at 22:22






  • 10





    "A good programmer is someone who looks both ways before crossing a one-way street."

    – jeroen_de_schutter
    Jan 16 at 9:17
















50















Lets say you are coding a function that takes input from an external API MyAPI.



That external API MyAPI has a contract that states it will return a string or a number.



Is it recommended to guard against things like null, undefined, boolean, etc. even though it's not part of the API of MyAPI? In particular, since you have no control over that API you cannot make the guarantee through something like static type analysis so it's better to be safe than sorry?



I'm thinking in relation to the Robustness Principle.










share|improve this question




















  • 16





    What are the impacts of not handling those unexpected values if they are returned? Can you live with these impacts? Is it worth the complexity to handle those unexpected values to prevent having to deal with the impacts?

    – Vincent Savard
    Jan 14 at 18:13








  • 55





    If you're expecting them, then by definition they're not unexpected.

    – Mason Wheeler
    Jan 14 at 19:33






  • 28





    Remember the API isn't obligated to only give you valid JSON back (I'm assuming this is JSON). You could also get a reply like <!doctype html><html><head><title>504 Gateway Timeout</title></head><body>The server was unable to process your request. Make sure you have typed the address correctly. If the problem persists, please try again later.</body></html>

    – immibis
    Jan 14 at 22:18






  • 5





    What does "external API" mean? Is it still under your Control?

    – Deduplicator
    Jan 14 at 22:22






  • 10





    "A good programmer is someone who looks both ways before crossing a one-way street."

    – jeroen_de_schutter
    Jan 16 at 9:17














50












50








50


7






Lets say you are coding a function that takes input from an external API MyAPI.



That external API MyAPI has a contract that states it will return a string or a number.



Is it recommended to guard against things like null, undefined, boolean, etc. even though it's not part of the API of MyAPI? In particular, since you have no control over that API you cannot make the guarantee through something like static type analysis so it's better to be safe than sorry?



I'm thinking in relation to the Robustness Principle.










share|improve this question
















Lets say you are coding a function that takes input from an external API MyAPI.



That external API MyAPI has a contract that states it will return a string or a number.



Is it recommended to guard against things like null, undefined, boolean, etc. even though it's not part of the API of MyAPI? In particular, since you have no control over that API you cannot make the guarantee through something like static type analysis so it's better to be safe than sorry?



I'm thinking in relation to the Robustness Principle.







design api api-design web-services functions






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 16 at 3:49









jpmc26

3,91321735




3,91321735










asked Jan 14 at 18:12









Adam ThompsonAdam Thompson

477139




477139








  • 16





    What are the impacts of not handling those unexpected values if they are returned? Can you live with these impacts? Is it worth the complexity to handle those unexpected values to prevent having to deal with the impacts?

    – Vincent Savard
    Jan 14 at 18:13








  • 55





    If you're expecting them, then by definition they're not unexpected.

    – Mason Wheeler
    Jan 14 at 19:33






  • 28





    Remember the API isn't obligated to only give you valid JSON back (I'm assuming this is JSON). You could also get a reply like <!doctype html><html><head><title>504 Gateway Timeout</title></head><body>The server was unable to process your request. Make sure you have typed the address correctly. If the problem persists, please try again later.</body></html>

    – immibis
    Jan 14 at 22:18






  • 5





    What does "external API" mean? Is it still under your Control?

    – Deduplicator
    Jan 14 at 22:22






  • 10





    "A good programmer is someone who looks both ways before crossing a one-way street."

    – jeroen_de_schutter
    Jan 16 at 9:17














  • 16





    What are the impacts of not handling those unexpected values if they are returned? Can you live with these impacts? Is it worth the complexity to handle those unexpected values to prevent having to deal with the impacts?

    – Vincent Savard
    Jan 14 at 18:13








  • 55





    If you're expecting them, then by definition they're not unexpected.

    – Mason Wheeler
    Jan 14 at 19:33






  • 28





    Remember the API isn't obligated to only give you valid JSON back (I'm assuming this is JSON). You could also get a reply like <!doctype html><html><head><title>504 Gateway Timeout</title></head><body>The server was unable to process your request. Make sure you have typed the address correctly. If the problem persists, please try again later.</body></html>

    – immibis
    Jan 14 at 22:18






  • 5





    What does "external API" mean? Is it still under your Control?

    – Deduplicator
    Jan 14 at 22:22






  • 10





    "A good programmer is someone who looks both ways before crossing a one-way street."

    – jeroen_de_schutter
    Jan 16 at 9:17








16




16





What are the impacts of not handling those unexpected values if they are returned? Can you live with these impacts? Is it worth the complexity to handle those unexpected values to prevent having to deal with the impacts?

– Vincent Savard
Jan 14 at 18:13







What are the impacts of not handling those unexpected values if they are returned? Can you live with these impacts? Is it worth the complexity to handle those unexpected values to prevent having to deal with the impacts?

– Vincent Savard
Jan 14 at 18:13






55




55





If you're expecting them, then by definition they're not unexpected.

– Mason Wheeler
Jan 14 at 19:33





If you're expecting them, then by definition they're not unexpected.

– Mason Wheeler
Jan 14 at 19:33




28




28





Remember the API isn't obligated to only give you valid JSON back (I'm assuming this is JSON). You could also get a reply like <!doctype html><html><head><title>504 Gateway Timeout</title></head><body>The server was unable to process your request. Make sure you have typed the address correctly. If the problem persists, please try again later.</body></html>

– immibis
Jan 14 at 22:18





Remember the API isn't obligated to only give you valid JSON back (I'm assuming this is JSON). You could also get a reply like <!doctype html><html><head><title>504 Gateway Timeout</title></head><body>The server was unable to process your request. Make sure you have typed the address correctly. If the problem persists, please try again later.</body></html>

– immibis
Jan 14 at 22:18




5




5





What does "external API" mean? Is it still under your Control?

– Deduplicator
Jan 14 at 22:22





What does "external API" mean? Is it still under your Control?

– Deduplicator
Jan 14 at 22:22




10




10





"A good programmer is someone who looks both ways before crossing a one-way street."

– jeroen_de_schutter
Jan 16 at 9:17





"A good programmer is someone who looks both ways before crossing a one-way street."

– jeroen_de_schutter
Jan 16 at 9:17










9 Answers
9






active

oldest

votes


















102














You should never trust the inputs to your software, regardless of source. Not only validating the types is important, but also ranges of input and the business logic as well. Per a comment, this is well described by OWASP



Failing to do so will at best leave you with garbage data that you have to later clean up, but at worst you'll leave an opportunity for malicious exploits if that upstream service gets compromised in some fashion (q.v. the Target hack). The range of problems in between includes getting your application in an unrecoverable state.





From the comments I can see that perhaps my answer could use a bit of expansion.



By "never trust the inputs", I simply mean that you can't assume that you'll always receive valid and trustworthy information from upstream or downstream systems, and therefore you should always sanitize that input to the best of your ability, or reject it.



One argument surfaced in the comments I'll address by way of example. While yes, you have to trust your OS to some degree, it's not unreasonable to, for example, reject the results of a random number generator if you ask it for a number between 1 and 10 and it responds with "bob".



Similarly, in the case of the OP, you should definitely ensure your application is only accepting valid input from the upstream service. What you do when it's not OK is up to you, and depends a great deal on the actual business function that you're trying to accomplish, but minimally you'd log it for later debugging and otherwise ensure that your application doesn't go into an unrecoverable or insecure state.



While you can never know every possible input someone/something might give you, you certainly can limit what's allowable based on the business requirements and do some form of input whitelisting based on that.






share|improve this answer





















  • 20





    What is q.v. stand for ?

    – JonH
    Jan 14 at 20:54






  • 15





    @JonH basically "see also"... the Target hack is an example that he is referencing en.oxforddictionaries.com/definition/q.v.

    – andrewtweber
    Jan 14 at 21:43






  • 8





    This answer is as it stands just doesn't make sense. It's infeasible to anticipate each and every way a third-party library might misbehave. If a library function's documentation explicitly assures that the result will always have some properties, then you should be able to rely on it that the designers ensured this property will actually hold. It's their responsibility to have a test suite that checks this kind of thing, and submit a bug fix in case a situation is encountered where it doesn't. You checking these properties in your own code is violating the DRY principle.

    – leftaroundabout
    Jan 14 at 23:58








  • 23





    @leftaroundabout no, but you should be able to predict all valid things your application can accept and reject the rest.

    – Paul
    Jan 15 at 2:15






  • 10





    @leftaroundabout It's not about distrusting everything, it's about distrusting external untrusted sources. This is all about threat modelling. If you haven't done that your software isn't secure (how can it be, if you never even thought about against what kind of actors and threats you want to secure your application?). For a run of the mill business software it's a reasonable default to assume that callers could be malicious, while it's rarely sensible to assume your OS is a threat.

    – Voo
    Jan 15 at 9:07



















32














Yes, of course. But what makes you think the answer could be different?



You surely don't want to let your program behave in some unpredictable manner in case the API does not return what the contract says, don't you? So at least you have to deal with such a behaviour somehow. A minimal form of error handling is always worth the (very minimal!) effort, and there is absolutely no excuse for not implementing something like this.



However, how much effort you should invest to deal with such a case is heavily case dependent and can only be answered in context of your system. Often, a short log entry and letting the application end gracefully can be enough. Sometimes, you will be better off to implement some detailed exception handling, dealing with different forms of "wrong" return values, and maybe have to implement some fallback strategy.



But it makes a hell of a difference if you are writing just some inhouse spreadsheet formatting application, to be used by less than 10 people and where the financial impact of an application crash is quite low, or if you are creating a new autonomous car driving system, where an application crash may cost lives.



So there is no shortcut against reflecting about what you are doing, using your common sense is always mandatory.






share|improve this answer


























  • What to do is another decision. You may have a fail over solution. Anything asynchronous could be retried before creating an exception log (or dead letter). An active alert to the vendor or provider may be an option if the issue persists.

    – mckenzm
    Jan 15 at 2:20













  • @mckenzm: the fact the OP asks a question where the literal answer can obviously be only "yes" is IMHO a sign they may not just be interested in a literal answer. It looks they are asking "is it necessary to guard against different forms of unexpected values from an API and deal with them differently"?

    – Doc Brown
    Jan 15 at 6:39








  • 1





    hmm, the crap/carp/die approach. Is it our fault for passing bad (but legal) requests? is the response possible, but not usable for us in particular? or is the response corrupt? Different scenarios, Now it does sound like homework.

    – mckenzm
    Jan 15 at 21:56



















19














The Robustness Principle--specifically, the "be liberal in what you accept" half of it--is a very bad idea in software. It was originally developed in the context of hardware, where physical constraints make engineering tolerances very important, but in software, when someone sends you malformed or otherwise improper input, you have two choices. You can either reject it, (preferably with an explanation as to what went wrong,) or you can try to figure out what it was supposed to mean.




EDIT: Turns out I was mistaken in the above statement. The Robustness Principle doesn't come from the world of hardware, but from Internet architecture, specifically RFC 1958. It states:




3.9 Be strict when sending and tolerant when receiving. Implementations must follow specifications precisely when sending to the network, and tolerate faulty input from the network. When in doubt, discard faulty input silently, without returning an error message unless this is required by the specification.




This is, plainly speaking, simply wrong from start to finish. It is difficult to conceive of a more wrongheaded notion of error handling than "discard faulty input silently without returning an error message," for the reasons given in this post.



See also the IETF paper The Harmful Consequences of the Robustness Principle for further elaboration on this point.




Never, never, never choose that second option unless you have resources equivalent to Google's Search team to throw at your project, because that's what it takes to come up with a computer program that does anything close to a decent job at that particular problem domain. (And even then, Google's suggestions feel like they're coming straight out of left field about half the time.) If you try to do so, what you'll end up with is a massive headache where your program will frequently try to interpret bad input as X, when what the sender really meant was Y.



This is bad for two reasons. The obvious one is because then you have bad data in your system. The less obvious one is that in many cases, neither you nor the sender will realize that anything went wrong until much later down the road when something blows up in your face, and then suddenly you have a big, expensive mess to fix and no idea what went wrong because the noticeable effect is so far removed from the root cause.



This is why the Fail Fast principle exists; save everyone involved the headache by applying it to your APIs.






share|improve this answer





















  • 7





    While I agree with the principle of what you're saying, I think you're mistaken WRT the intent of the Robustness Principle. I've never seen it intended to mean, "accept bad data", only, "don't be excessively fiddly about good data". For example, if the input is a CSV file, the Robustness Principle wouldn't be a valid argument for trying to parse out dates in an unexpected format, but would support an argument that inferring colum order from a header row would be a good idea.

    – Morgen
    Jan 14 at 21:22






  • 9





    @Morgen: The robustness principle was used to suggest that browsers should accept rather sloppy HTML, and led to deployed web sites being much sloppier than they would have been if browsers had demanded proper HTML. A big part of the problem there, though, was the use of a common format for human-generated and machine-generated content, as opposed to the use of separate human-editable and machine-parsable formats along with utilities to convert between them.

    – supercat
    Jan 14 at 21:33






  • 9





    @supercat: nevertheless - or just hence - HTML and the WWW was extremely successful ;-)

    – Doc Brown
    Jan 14 at 21:43








  • 11





    @DocBrown: A lot of really horrible things have become standards simply because they were the first approach that happened to be available when someone with a lot of clout needed to adopt something that met certain minimal criteria, and by the time they gained traction it was too late to select something better.

    – supercat
    Jan 14 at 22:05






  • 5





    @supercat Exactly. JavaScript immediately comes to mind, for example...

    – Mason Wheeler
    Jan 14 at 22:11



















13














In general, code should be constructed to uphold the at least the following constraints whenever practical:




  1. When given correct input, produce correct output.


  2. When given valid input (that may or may not be correct), produce valid output (likewise).


  3. When given invalid input, process it without any side-effects beyond those caused by normal input or those which are defined as signalling an error.



In many situations, programs will essentially pass through various chunks of data without particularly caring about whether they are valid. If such chunks happen to contain invalid data, the program's output would likely contain invalid data as a consequence. Unless a program is specifically designed to validate all data, and guarantee that it will not produce invalid output even when given invalid input, programs that process its output should allow for the possibility of invalid data within it.



While validating data early on is often desirable, it's not always particularly practical. Among other things, if the validity of one chunk of data depends upon the contents of other chunks, and if the majority of of the data fed into some sequence of steps will get filtered out along the way, limiting validation to data which makes it through all stages may yield much better performance than trying to validate everything.



Further, even if a program is only expected to be given pre-validated data, it's often good to have it uphold the above constraints anyway whenever practical. Repeating full validation at every processing step would often be a major performance drain, but the limited amount of validation needed to uphold the above constraints may be much cheaper.






share|improve this answer
























  • Then it all comes down to deciding whether the result of an API call is an "input".

    – mastov
    Jan 16 at 17:43











  • @mastov: The answers to many questions will depend upon how one defines "inputs" and "observable behaviors"/"outputs". If a program's purpose is to process numbers stored in a file, its input could be defined as the sequence of numbers (in which case things that aren't numbers aren't possible inputs), or as a file (in which case anything that could appear in a file would be a possible input).

    – supercat
    Jan 16 at 18:01



















3














Let's compare the two scenarios and try to come to a conclusion.



Scenario 1
Our application assumes the external API will behave as per the agreement.



Scenario 2
Our application assumes the external API can misbehave, hence add precautions.



In general, there is a chance for any API or software to violate the agreements; may be due to a bug or unexpected conditions. Even an API might be having issues in the internal systems resulting in unexpected results.



If our program is written assuming the external API will adhere to the agreements and avoid adding any precautions; who will be the party facing the issues? It will be us, the ones who has written integration code.



For example, the null values that you have picked. Say, as per the API agreement the response should have not-null values; but if it is suddenly violated our program will result in NPEs.



So, I believe it will be better to make sure your application has some additional code to address unexpected scenarios.






share|improve this answer































    1














    You should always validate incoming data -- user-entered or otherwise -- so you should have a process in place to handle when the data retrieved from this external API is invalid.



    Generally speaking, any seam where extra-orgranizational systems meet should require authentication, authorization (if not defined simply by authentication), and validation.






    share|improve this answer































      1














      In general, yes, you must always guard against flawed inputs, but depending on the kind of API, "guard" means different things.



      For an external API to a server, you do not want to accidentally create a command that crashes or compromises the state of the server, so you must guard against that.



      For an API like e.g. a container class (list, vector, etc), throwing exceptions is a perfectly fine outcome, compromising the state of the class instance may be acceptable to some extent (e.g. a sorted container provided with a faulty comparison operator will not be sorted), even crashing the application may be acceptable, but compromising the state of the application - e.g. writing to random memory locations unrelated to the class instance - is most likely not.






      share|improve this answer































        0














        To give a slightly differing opinion:
        I think it can be acceptable to just work with the data you are given, even if it violates it's contract. This depends on the usage: It's something that MUST be a string for you, or is it something you are just displaying / does not use etc. In the latter case, simply accept it.
        I have an API which just needs 1% of the data delivered by another api. I could not care less what kind of data are in the 99%, so I will never check it.



        There has to be balance between "having errors because I do not check my inputs enough" and "I reject valid data because I am too strict".






        share|improve this answer



















        • 2





          "I have an API which just needs 1% of the data delivered by another api." This then opens up the question why your API expects a 100 times more data than it actually needs. If you need to store opaque data to pass on, you don't really have to be specific as to what it is and don't have to declare it in any specific format, in which case the caller wouldn't be violating your contract.

          – Voo
          Jan 15 at 19:21






        • 1





          @Voo - My suspicion is that they are calling some external API (like "get weather details for city X") and then cherry-picking the data they need ("current temperature") and ignoring the rest of the returned data ("rainfall", "wind", "forecast temperature", "wind chill", etc...)

          – Stobor
          Jan 16 at 2:24













        • @ChristianSauer - I think you are not that far from what the wider consensus is - the 1% of the data that you use makes sense to check, but the 99% which you don't does not necessarily need to be checked. You only need to check the things which could trip your code up.

          – Stobor
          Jan 16 at 2:25



















        0














        My take on this is to always, always check each and every input to my system. That means every parameter returned from an API should be checked, even if my program does not use it. I tend to as well check every parameter I send to an API for correctness. There are only two exceptions to this rule, see below.



        The reason for testing is that if for some reason the API / input is incorrect my program cannot rely on anything. Maybe my program was linked to an old version of the API that does something different from what I believe? Maybe my program stumbled on a bug in the external program that has never before happened. Or even worse, happens all the time but no one cares! Maybe the external program is beeing fooled by a hacker to return stuff that can hurt my program or the system?



        The two exceptions to testing everything in my world are:





        1. Performance after careful measurement of performance:




          • never optimize before you have measured. Testing all input / returned data most often takes a very small time compared to the actual call so removing it often saves little or nothing. I would still keep the error detection code, but comment it out, perhaps by a macro or simply commenting it away.




        2. When you have no clue what to do with an error




          • there are times, not often, when your design simply does not allow handling of the kind of error you would find. Maybe what you ought to do is log an error, but there is no error logging in the system. It is almost always possible to find some way to "remember" the error allowing at least you as a developer to later check for it. Error counters is one good thing to have in a system, even if you elect to not have logging.




        Exactly how carefully to check inputs / return values is an important question. As example, if the API is said to return a string, I would check that:




        • the data type actully is a string


        • and that length is between min and max values. Always check strings for max size that my program can expect to handle (returning too large strings is a classical security problem in networked systems).


        • Some strings should be checked for "illegal" characters or content when that is relevant. If your program might send the string to say a database later, it is a good idea to be check for database attacks (search for SQL injection). These tests are best done at the borders of my system, where I can pinpoint where the attack came from and I can fail early. Doing a full SQL injection test might be difficult when strings are later combined, so that test should be done before calling the database, but if you can find some problems early it can be useful.



        The reason for testing parameters I send to the API is to be sure that I get a correct result back. Again, doing these tests before calling an API might seem unnecessary but it takes very little performance and may catch errors in my program. Hence the tests are most valuable when developing a system (but nowadays every system seems to be in continous development). Depending on the parameters the tests can be more or less thorough but I tend to find that you can often set allowable min and max values on most parameters that my program could create. Perhaps a string should always have at least 2 characters and be a maximum of 2000 characters long? The min and maximum should be inside what the API allows as I know that my program will never use the full range of some parameters.






        share|improve this answer























          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "131"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: false,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f385497%2fshould-you-guard-against-unexpected-values-from-external-apis%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown




















          StackExchange.ready(function () {
          $("#show-editor-button input, #show-editor-button button").click(function () {
          var showEditor = function() {
          $("#show-editor-button").hide();
          $("#post-form").removeClass("dno");
          StackExchange.editor.finallyInit();
          };

          var useFancy = $(this).data('confirm-use-fancy');
          if(useFancy == 'True') {
          var popupTitle = $(this).data('confirm-fancy-title');
          var popupBody = $(this).data('confirm-fancy-body');
          var popupAccept = $(this).data('confirm-fancy-accept-button');

          $(this).loadPopup({
          url: '/post/self-answer-popup',
          loaded: function(popup) {
          var pTitle = $(popup).find('h2');
          var pBody = $(popup).find('.popup-body');
          var pSubmit = $(popup).find('.popup-submit');

          pTitle.text(popupTitle);
          pBody.html(popupBody);
          pSubmit.val(popupAccept).click(showEditor);
          }
          })
          } else{
          var confirmText = $(this).data('confirm-text');
          if (confirmText ? confirm(confirmText) : true) {
          showEditor();
          }
          }
          });
          });






          9 Answers
          9






          active

          oldest

          votes








          9 Answers
          9






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          102














          You should never trust the inputs to your software, regardless of source. Not only validating the types is important, but also ranges of input and the business logic as well. Per a comment, this is well described by OWASP



          Failing to do so will at best leave you with garbage data that you have to later clean up, but at worst you'll leave an opportunity for malicious exploits if that upstream service gets compromised in some fashion (q.v. the Target hack). The range of problems in between includes getting your application in an unrecoverable state.





          From the comments I can see that perhaps my answer could use a bit of expansion.



          By "never trust the inputs", I simply mean that you can't assume that you'll always receive valid and trustworthy information from upstream or downstream systems, and therefore you should always sanitize that input to the best of your ability, or reject it.



          One argument surfaced in the comments I'll address by way of example. While yes, you have to trust your OS to some degree, it's not unreasonable to, for example, reject the results of a random number generator if you ask it for a number between 1 and 10 and it responds with "bob".



          Similarly, in the case of the OP, you should definitely ensure your application is only accepting valid input from the upstream service. What you do when it's not OK is up to you, and depends a great deal on the actual business function that you're trying to accomplish, but minimally you'd log it for later debugging and otherwise ensure that your application doesn't go into an unrecoverable or insecure state.



          While you can never know every possible input someone/something might give you, you certainly can limit what's allowable based on the business requirements and do some form of input whitelisting based on that.






          share|improve this answer





















          • 20





            What is q.v. stand for ?

            – JonH
            Jan 14 at 20:54






          • 15





            @JonH basically "see also"... the Target hack is an example that he is referencing en.oxforddictionaries.com/definition/q.v.

            – andrewtweber
            Jan 14 at 21:43






          • 8





            This answer is as it stands just doesn't make sense. It's infeasible to anticipate each and every way a third-party library might misbehave. If a library function's documentation explicitly assures that the result will always have some properties, then you should be able to rely on it that the designers ensured this property will actually hold. It's their responsibility to have a test suite that checks this kind of thing, and submit a bug fix in case a situation is encountered where it doesn't. You checking these properties in your own code is violating the DRY principle.

            – leftaroundabout
            Jan 14 at 23:58








          • 23





            @leftaroundabout no, but you should be able to predict all valid things your application can accept and reject the rest.

            – Paul
            Jan 15 at 2:15






          • 10





            @leftaroundabout It's not about distrusting everything, it's about distrusting external untrusted sources. This is all about threat modelling. If you haven't done that your software isn't secure (how can it be, if you never even thought about against what kind of actors and threats you want to secure your application?). For a run of the mill business software it's a reasonable default to assume that callers could be malicious, while it's rarely sensible to assume your OS is a threat.

            – Voo
            Jan 15 at 9:07
















          102














          You should never trust the inputs to your software, regardless of source. Not only validating the types is important, but also ranges of input and the business logic as well. Per a comment, this is well described by OWASP



          Failing to do so will at best leave you with garbage data that you have to later clean up, but at worst you'll leave an opportunity for malicious exploits if that upstream service gets compromised in some fashion (q.v. the Target hack). The range of problems in between includes getting your application in an unrecoverable state.





          From the comments I can see that perhaps my answer could use a bit of expansion.



          By "never trust the inputs", I simply mean that you can't assume that you'll always receive valid and trustworthy information from upstream or downstream systems, and therefore you should always sanitize that input to the best of your ability, or reject it.



          One argument surfaced in the comments I'll address by way of example. While yes, you have to trust your OS to some degree, it's not unreasonable to, for example, reject the results of a random number generator if you ask it for a number between 1 and 10 and it responds with "bob".



          Similarly, in the case of the OP, you should definitely ensure your application is only accepting valid input from the upstream service. What you do when it's not OK is up to you, and depends a great deal on the actual business function that you're trying to accomplish, but minimally you'd log it for later debugging and otherwise ensure that your application doesn't go into an unrecoverable or insecure state.



          While you can never know every possible input someone/something might give you, you certainly can limit what's allowable based on the business requirements and do some form of input whitelisting based on that.






          share|improve this answer





















          • 20





            What is q.v. stand for ?

            – JonH
            Jan 14 at 20:54






          • 15





            @JonH basically "see also"... the Target hack is an example that he is referencing en.oxforddictionaries.com/definition/q.v.

            – andrewtweber
            Jan 14 at 21:43






          • 8





            This answer is as it stands just doesn't make sense. It's infeasible to anticipate each and every way a third-party library might misbehave. If a library function's documentation explicitly assures that the result will always have some properties, then you should be able to rely on it that the designers ensured this property will actually hold. It's their responsibility to have a test suite that checks this kind of thing, and submit a bug fix in case a situation is encountered where it doesn't. You checking these properties in your own code is violating the DRY principle.

            – leftaroundabout
            Jan 14 at 23:58








          • 23





            @leftaroundabout no, but you should be able to predict all valid things your application can accept and reject the rest.

            – Paul
            Jan 15 at 2:15






          • 10





            @leftaroundabout It's not about distrusting everything, it's about distrusting external untrusted sources. This is all about threat modelling. If you haven't done that your software isn't secure (how can it be, if you never even thought about against what kind of actors and threats you want to secure your application?). For a run of the mill business software it's a reasonable default to assume that callers could be malicious, while it's rarely sensible to assume your OS is a threat.

            – Voo
            Jan 15 at 9:07














          102












          102








          102







          You should never trust the inputs to your software, regardless of source. Not only validating the types is important, but also ranges of input and the business logic as well. Per a comment, this is well described by OWASP



          Failing to do so will at best leave you with garbage data that you have to later clean up, but at worst you'll leave an opportunity for malicious exploits if that upstream service gets compromised in some fashion (q.v. the Target hack). The range of problems in between includes getting your application in an unrecoverable state.





          From the comments I can see that perhaps my answer could use a bit of expansion.



          By "never trust the inputs", I simply mean that you can't assume that you'll always receive valid and trustworthy information from upstream or downstream systems, and therefore you should always sanitize that input to the best of your ability, or reject it.



          One argument surfaced in the comments I'll address by way of example. While yes, you have to trust your OS to some degree, it's not unreasonable to, for example, reject the results of a random number generator if you ask it for a number between 1 and 10 and it responds with "bob".



          Similarly, in the case of the OP, you should definitely ensure your application is only accepting valid input from the upstream service. What you do when it's not OK is up to you, and depends a great deal on the actual business function that you're trying to accomplish, but minimally you'd log it for later debugging and otherwise ensure that your application doesn't go into an unrecoverable or insecure state.



          While you can never know every possible input someone/something might give you, you certainly can limit what's allowable based on the business requirements and do some form of input whitelisting based on that.






          share|improve this answer















          You should never trust the inputs to your software, regardless of source. Not only validating the types is important, but also ranges of input and the business logic as well. Per a comment, this is well described by OWASP



          Failing to do so will at best leave you with garbage data that you have to later clean up, but at worst you'll leave an opportunity for malicious exploits if that upstream service gets compromised in some fashion (q.v. the Target hack). The range of problems in between includes getting your application in an unrecoverable state.





          From the comments I can see that perhaps my answer could use a bit of expansion.



          By "never trust the inputs", I simply mean that you can't assume that you'll always receive valid and trustworthy information from upstream or downstream systems, and therefore you should always sanitize that input to the best of your ability, or reject it.



          One argument surfaced in the comments I'll address by way of example. While yes, you have to trust your OS to some degree, it's not unreasonable to, for example, reject the results of a random number generator if you ask it for a number between 1 and 10 and it responds with "bob".



          Similarly, in the case of the OP, you should definitely ensure your application is only accepting valid input from the upstream service. What you do when it's not OK is up to you, and depends a great deal on the actual business function that you're trying to accomplish, but minimally you'd log it for later debugging and otherwise ensure that your application doesn't go into an unrecoverable or insecure state.



          While you can never know every possible input someone/something might give you, you certainly can limit what's allowable based on the business requirements and do some form of input whitelisting based on that.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 17 at 21:33

























          answered Jan 14 at 18:27









          PaulPaul

          2,88711115




          2,88711115








          • 20





            What is q.v. stand for ?

            – JonH
            Jan 14 at 20:54






          • 15





            @JonH basically "see also"... the Target hack is an example that he is referencing en.oxforddictionaries.com/definition/q.v.

            – andrewtweber
            Jan 14 at 21:43






          • 8





            This answer is as it stands just doesn't make sense. It's infeasible to anticipate each and every way a third-party library might misbehave. If a library function's documentation explicitly assures that the result will always have some properties, then you should be able to rely on it that the designers ensured this property will actually hold. It's their responsibility to have a test suite that checks this kind of thing, and submit a bug fix in case a situation is encountered where it doesn't. You checking these properties in your own code is violating the DRY principle.

            – leftaroundabout
            Jan 14 at 23:58








          • 23





            @leftaroundabout no, but you should be able to predict all valid things your application can accept and reject the rest.

            – Paul
            Jan 15 at 2:15






          • 10





            @leftaroundabout It's not about distrusting everything, it's about distrusting external untrusted sources. This is all about threat modelling. If you haven't done that your software isn't secure (how can it be, if you never even thought about against what kind of actors and threats you want to secure your application?). For a run of the mill business software it's a reasonable default to assume that callers could be malicious, while it's rarely sensible to assume your OS is a threat.

            – Voo
            Jan 15 at 9:07














          • 20





            What is q.v. stand for ?

            – JonH
            Jan 14 at 20:54






          • 15





            @JonH basically "see also"... the Target hack is an example that he is referencing en.oxforddictionaries.com/definition/q.v.

            – andrewtweber
            Jan 14 at 21:43






          • 8





            This answer is as it stands just doesn't make sense. It's infeasible to anticipate each and every way a third-party library might misbehave. If a library function's documentation explicitly assures that the result will always have some properties, then you should be able to rely on it that the designers ensured this property will actually hold. It's their responsibility to have a test suite that checks this kind of thing, and submit a bug fix in case a situation is encountered where it doesn't. You checking these properties in your own code is violating the DRY principle.

            – leftaroundabout
            Jan 14 at 23:58








          • 23





            @leftaroundabout no, but you should be able to predict all valid things your application can accept and reject the rest.

            – Paul
            Jan 15 at 2:15






          • 10





            @leftaroundabout It's not about distrusting everything, it's about distrusting external untrusted sources. This is all about threat modelling. If you haven't done that your software isn't secure (how can it be, if you never even thought about against what kind of actors and threats you want to secure your application?). For a run of the mill business software it's a reasonable default to assume that callers could be malicious, while it's rarely sensible to assume your OS is a threat.

            – Voo
            Jan 15 at 9:07








          20




          20





          What is q.v. stand for ?

          – JonH
          Jan 14 at 20:54





          What is q.v. stand for ?

          – JonH
          Jan 14 at 20:54




          15




          15





          @JonH basically "see also"... the Target hack is an example that he is referencing en.oxforddictionaries.com/definition/q.v.

          – andrewtweber
          Jan 14 at 21:43





          @JonH basically "see also"... the Target hack is an example that he is referencing en.oxforddictionaries.com/definition/q.v.

          – andrewtweber
          Jan 14 at 21:43




          8




          8





          This answer is as it stands just doesn't make sense. It's infeasible to anticipate each and every way a third-party library might misbehave. If a library function's documentation explicitly assures that the result will always have some properties, then you should be able to rely on it that the designers ensured this property will actually hold. It's their responsibility to have a test suite that checks this kind of thing, and submit a bug fix in case a situation is encountered where it doesn't. You checking these properties in your own code is violating the DRY principle.

          – leftaroundabout
          Jan 14 at 23:58







          This answer is as it stands just doesn't make sense. It's infeasible to anticipate each and every way a third-party library might misbehave. If a library function's documentation explicitly assures that the result will always have some properties, then you should be able to rely on it that the designers ensured this property will actually hold. It's their responsibility to have a test suite that checks this kind of thing, and submit a bug fix in case a situation is encountered where it doesn't. You checking these properties in your own code is violating the DRY principle.

          – leftaroundabout
          Jan 14 at 23:58






          23




          23





          @leftaroundabout no, but you should be able to predict all valid things your application can accept and reject the rest.

          – Paul
          Jan 15 at 2:15





          @leftaroundabout no, but you should be able to predict all valid things your application can accept and reject the rest.

          – Paul
          Jan 15 at 2:15




          10




          10





          @leftaroundabout It's not about distrusting everything, it's about distrusting external untrusted sources. This is all about threat modelling. If you haven't done that your software isn't secure (how can it be, if you never even thought about against what kind of actors and threats you want to secure your application?). For a run of the mill business software it's a reasonable default to assume that callers could be malicious, while it's rarely sensible to assume your OS is a threat.

          – Voo
          Jan 15 at 9:07





          @leftaroundabout It's not about distrusting everything, it's about distrusting external untrusted sources. This is all about threat modelling. If you haven't done that your software isn't secure (how can it be, if you never even thought about against what kind of actors and threats you want to secure your application?). For a run of the mill business software it's a reasonable default to assume that callers could be malicious, while it's rarely sensible to assume your OS is a threat.

          – Voo
          Jan 15 at 9:07













          32














          Yes, of course. But what makes you think the answer could be different?



          You surely don't want to let your program behave in some unpredictable manner in case the API does not return what the contract says, don't you? So at least you have to deal with such a behaviour somehow. A minimal form of error handling is always worth the (very minimal!) effort, and there is absolutely no excuse for not implementing something like this.



          However, how much effort you should invest to deal with such a case is heavily case dependent and can only be answered in context of your system. Often, a short log entry and letting the application end gracefully can be enough. Sometimes, you will be better off to implement some detailed exception handling, dealing with different forms of "wrong" return values, and maybe have to implement some fallback strategy.



          But it makes a hell of a difference if you are writing just some inhouse spreadsheet formatting application, to be used by less than 10 people and where the financial impact of an application crash is quite low, or if you are creating a new autonomous car driving system, where an application crash may cost lives.



          So there is no shortcut against reflecting about what you are doing, using your common sense is always mandatory.






          share|improve this answer


























          • What to do is another decision. You may have a fail over solution. Anything asynchronous could be retried before creating an exception log (or dead letter). An active alert to the vendor or provider may be an option if the issue persists.

            – mckenzm
            Jan 15 at 2:20













          • @mckenzm: the fact the OP asks a question where the literal answer can obviously be only "yes" is IMHO a sign they may not just be interested in a literal answer. It looks they are asking "is it necessary to guard against different forms of unexpected values from an API and deal with them differently"?

            – Doc Brown
            Jan 15 at 6:39








          • 1





            hmm, the crap/carp/die approach. Is it our fault for passing bad (but legal) requests? is the response possible, but not usable for us in particular? or is the response corrupt? Different scenarios, Now it does sound like homework.

            – mckenzm
            Jan 15 at 21:56
















          32














          Yes, of course. But what makes you think the answer could be different?



          You surely don't want to let your program behave in some unpredictable manner in case the API does not return what the contract says, don't you? So at least you have to deal with such a behaviour somehow. A minimal form of error handling is always worth the (very minimal!) effort, and there is absolutely no excuse for not implementing something like this.



          However, how much effort you should invest to deal with such a case is heavily case dependent and can only be answered in context of your system. Often, a short log entry and letting the application end gracefully can be enough. Sometimes, you will be better off to implement some detailed exception handling, dealing with different forms of "wrong" return values, and maybe have to implement some fallback strategy.



          But it makes a hell of a difference if you are writing just some inhouse spreadsheet formatting application, to be used by less than 10 people and where the financial impact of an application crash is quite low, or if you are creating a new autonomous car driving system, where an application crash may cost lives.



          So there is no shortcut against reflecting about what you are doing, using your common sense is always mandatory.






          share|improve this answer


























          • What to do is another decision. You may have a fail over solution. Anything asynchronous could be retried before creating an exception log (or dead letter). An active alert to the vendor or provider may be an option if the issue persists.

            – mckenzm
            Jan 15 at 2:20













          • @mckenzm: the fact the OP asks a question where the literal answer can obviously be only "yes" is IMHO a sign they may not just be interested in a literal answer. It looks they are asking "is it necessary to guard against different forms of unexpected values from an API and deal with them differently"?

            – Doc Brown
            Jan 15 at 6:39








          • 1





            hmm, the crap/carp/die approach. Is it our fault for passing bad (but legal) requests? is the response possible, but not usable for us in particular? or is the response corrupt? Different scenarios, Now it does sound like homework.

            – mckenzm
            Jan 15 at 21:56














          32












          32








          32







          Yes, of course. But what makes you think the answer could be different?



          You surely don't want to let your program behave in some unpredictable manner in case the API does not return what the contract says, don't you? So at least you have to deal with such a behaviour somehow. A minimal form of error handling is always worth the (very minimal!) effort, and there is absolutely no excuse for not implementing something like this.



          However, how much effort you should invest to deal with such a case is heavily case dependent and can only be answered in context of your system. Often, a short log entry and letting the application end gracefully can be enough. Sometimes, you will be better off to implement some detailed exception handling, dealing with different forms of "wrong" return values, and maybe have to implement some fallback strategy.



          But it makes a hell of a difference if you are writing just some inhouse spreadsheet formatting application, to be used by less than 10 people and where the financial impact of an application crash is quite low, or if you are creating a new autonomous car driving system, where an application crash may cost lives.



          So there is no shortcut against reflecting about what you are doing, using your common sense is always mandatory.






          share|improve this answer















          Yes, of course. But what makes you think the answer could be different?



          You surely don't want to let your program behave in some unpredictable manner in case the API does not return what the contract says, don't you? So at least you have to deal with such a behaviour somehow. A minimal form of error handling is always worth the (very minimal!) effort, and there is absolutely no excuse for not implementing something like this.



          However, how much effort you should invest to deal with such a case is heavily case dependent and can only be answered in context of your system. Often, a short log entry and letting the application end gracefully can be enough. Sometimes, you will be better off to implement some detailed exception handling, dealing with different forms of "wrong" return values, and maybe have to implement some fallback strategy.



          But it makes a hell of a difference if you are writing just some inhouse spreadsheet formatting application, to be used by less than 10 people and where the financial impact of an application crash is quite low, or if you are creating a new autonomous car driving system, where an application crash may cost lives.



          So there is no shortcut against reflecting about what you are doing, using your common sense is always mandatory.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 14 at 18:45

























          answered Jan 14 at 18:39









          Doc BrownDoc Brown

          133k23246388




          133k23246388













          • What to do is another decision. You may have a fail over solution. Anything asynchronous could be retried before creating an exception log (or dead letter). An active alert to the vendor or provider may be an option if the issue persists.

            – mckenzm
            Jan 15 at 2:20













          • @mckenzm: the fact the OP asks a question where the literal answer can obviously be only "yes" is IMHO a sign they may not just be interested in a literal answer. It looks they are asking "is it necessary to guard against different forms of unexpected values from an API and deal with them differently"?

            – Doc Brown
            Jan 15 at 6:39








          • 1





            hmm, the crap/carp/die approach. Is it our fault for passing bad (but legal) requests? is the response possible, but not usable for us in particular? or is the response corrupt? Different scenarios, Now it does sound like homework.

            – mckenzm
            Jan 15 at 21:56



















          • What to do is another decision. You may have a fail over solution. Anything asynchronous could be retried before creating an exception log (or dead letter). An active alert to the vendor or provider may be an option if the issue persists.

            – mckenzm
            Jan 15 at 2:20













          • @mckenzm: the fact the OP asks a question where the literal answer can obviously be only "yes" is IMHO a sign they may not just be interested in a literal answer. It looks they are asking "is it necessary to guard against different forms of unexpected values from an API and deal with them differently"?

            – Doc Brown
            Jan 15 at 6:39








          • 1





            hmm, the crap/carp/die approach. Is it our fault for passing bad (but legal) requests? is the response possible, but not usable for us in particular? or is the response corrupt? Different scenarios, Now it does sound like homework.

            – mckenzm
            Jan 15 at 21:56

















          What to do is another decision. You may have a fail over solution. Anything asynchronous could be retried before creating an exception log (or dead letter). An active alert to the vendor or provider may be an option if the issue persists.

          – mckenzm
          Jan 15 at 2:20







          What to do is another decision. You may have a fail over solution. Anything asynchronous could be retried before creating an exception log (or dead letter). An active alert to the vendor or provider may be an option if the issue persists.

          – mckenzm
          Jan 15 at 2:20















          @mckenzm: the fact the OP asks a question where the literal answer can obviously be only "yes" is IMHO a sign they may not just be interested in a literal answer. It looks they are asking "is it necessary to guard against different forms of unexpected values from an API and deal with them differently"?

          – Doc Brown
          Jan 15 at 6:39







          @mckenzm: the fact the OP asks a question where the literal answer can obviously be only "yes" is IMHO a sign they may not just be interested in a literal answer. It looks they are asking "is it necessary to guard against different forms of unexpected values from an API and deal with them differently"?

          – Doc Brown
          Jan 15 at 6:39






          1




          1





          hmm, the crap/carp/die approach. Is it our fault for passing bad (but legal) requests? is the response possible, but not usable for us in particular? or is the response corrupt? Different scenarios, Now it does sound like homework.

          – mckenzm
          Jan 15 at 21:56





          hmm, the crap/carp/die approach. Is it our fault for passing bad (but legal) requests? is the response possible, but not usable for us in particular? or is the response corrupt? Different scenarios, Now it does sound like homework.

          – mckenzm
          Jan 15 at 21:56











          19














          The Robustness Principle--specifically, the "be liberal in what you accept" half of it--is a very bad idea in software. It was originally developed in the context of hardware, where physical constraints make engineering tolerances very important, but in software, when someone sends you malformed or otherwise improper input, you have two choices. You can either reject it, (preferably with an explanation as to what went wrong,) or you can try to figure out what it was supposed to mean.




          EDIT: Turns out I was mistaken in the above statement. The Robustness Principle doesn't come from the world of hardware, but from Internet architecture, specifically RFC 1958. It states:




          3.9 Be strict when sending and tolerant when receiving. Implementations must follow specifications precisely when sending to the network, and tolerate faulty input from the network. When in doubt, discard faulty input silently, without returning an error message unless this is required by the specification.




          This is, plainly speaking, simply wrong from start to finish. It is difficult to conceive of a more wrongheaded notion of error handling than "discard faulty input silently without returning an error message," for the reasons given in this post.



          See also the IETF paper The Harmful Consequences of the Robustness Principle for further elaboration on this point.




          Never, never, never choose that second option unless you have resources equivalent to Google's Search team to throw at your project, because that's what it takes to come up with a computer program that does anything close to a decent job at that particular problem domain. (And even then, Google's suggestions feel like they're coming straight out of left field about half the time.) If you try to do so, what you'll end up with is a massive headache where your program will frequently try to interpret bad input as X, when what the sender really meant was Y.



          This is bad for two reasons. The obvious one is because then you have bad data in your system. The less obvious one is that in many cases, neither you nor the sender will realize that anything went wrong until much later down the road when something blows up in your face, and then suddenly you have a big, expensive mess to fix and no idea what went wrong because the noticeable effect is so far removed from the root cause.



          This is why the Fail Fast principle exists; save everyone involved the headache by applying it to your APIs.






          share|improve this answer





















          • 7





            While I agree with the principle of what you're saying, I think you're mistaken WRT the intent of the Robustness Principle. I've never seen it intended to mean, "accept bad data", only, "don't be excessively fiddly about good data". For example, if the input is a CSV file, the Robustness Principle wouldn't be a valid argument for trying to parse out dates in an unexpected format, but would support an argument that inferring colum order from a header row would be a good idea.

            – Morgen
            Jan 14 at 21:22






          • 9





            @Morgen: The robustness principle was used to suggest that browsers should accept rather sloppy HTML, and led to deployed web sites being much sloppier than they would have been if browsers had demanded proper HTML. A big part of the problem there, though, was the use of a common format for human-generated and machine-generated content, as opposed to the use of separate human-editable and machine-parsable formats along with utilities to convert between them.

            – supercat
            Jan 14 at 21:33






          • 9





            @supercat: nevertheless - or just hence - HTML and the WWW was extremely successful ;-)

            – Doc Brown
            Jan 14 at 21:43








          • 11





            @DocBrown: A lot of really horrible things have become standards simply because they were the first approach that happened to be available when someone with a lot of clout needed to adopt something that met certain minimal criteria, and by the time they gained traction it was too late to select something better.

            – supercat
            Jan 14 at 22:05






          • 5





            @supercat Exactly. JavaScript immediately comes to mind, for example...

            – Mason Wheeler
            Jan 14 at 22:11
















          19














          The Robustness Principle--specifically, the "be liberal in what you accept" half of it--is a very bad idea in software. It was originally developed in the context of hardware, where physical constraints make engineering tolerances very important, but in software, when someone sends you malformed or otherwise improper input, you have two choices. You can either reject it, (preferably with an explanation as to what went wrong,) or you can try to figure out what it was supposed to mean.




          EDIT: Turns out I was mistaken in the above statement. The Robustness Principle doesn't come from the world of hardware, but from Internet architecture, specifically RFC 1958. It states:




          3.9 Be strict when sending and tolerant when receiving. Implementations must follow specifications precisely when sending to the network, and tolerate faulty input from the network. When in doubt, discard faulty input silently, without returning an error message unless this is required by the specification.




          This is, plainly speaking, simply wrong from start to finish. It is difficult to conceive of a more wrongheaded notion of error handling than "discard faulty input silently without returning an error message," for the reasons given in this post.



          See also the IETF paper The Harmful Consequences of the Robustness Principle for further elaboration on this point.




          Never, never, never choose that second option unless you have resources equivalent to Google's Search team to throw at your project, because that's what it takes to come up with a computer program that does anything close to a decent job at that particular problem domain. (And even then, Google's suggestions feel like they're coming straight out of left field about half the time.) If you try to do so, what you'll end up with is a massive headache where your program will frequently try to interpret bad input as X, when what the sender really meant was Y.



          This is bad for two reasons. The obvious one is because then you have bad data in your system. The less obvious one is that in many cases, neither you nor the sender will realize that anything went wrong until much later down the road when something blows up in your face, and then suddenly you have a big, expensive mess to fix and no idea what went wrong because the noticeable effect is so far removed from the root cause.



          This is why the Fail Fast principle exists; save everyone involved the headache by applying it to your APIs.






          share|improve this answer





















          • 7





            While I agree with the principle of what you're saying, I think you're mistaken WRT the intent of the Robustness Principle. I've never seen it intended to mean, "accept bad data", only, "don't be excessively fiddly about good data". For example, if the input is a CSV file, the Robustness Principle wouldn't be a valid argument for trying to parse out dates in an unexpected format, but would support an argument that inferring colum order from a header row would be a good idea.

            – Morgen
            Jan 14 at 21:22






          • 9





            @Morgen: The robustness principle was used to suggest that browsers should accept rather sloppy HTML, and led to deployed web sites being much sloppier than they would have been if browsers had demanded proper HTML. A big part of the problem there, though, was the use of a common format for human-generated and machine-generated content, as opposed to the use of separate human-editable and machine-parsable formats along with utilities to convert between them.

            – supercat
            Jan 14 at 21:33






          • 9





            @supercat: nevertheless - or just hence - HTML and the WWW was extremely successful ;-)

            – Doc Brown
            Jan 14 at 21:43








          • 11





            @DocBrown: A lot of really horrible things have become standards simply because they were the first approach that happened to be available when someone with a lot of clout needed to adopt something that met certain minimal criteria, and by the time they gained traction it was too late to select something better.

            – supercat
            Jan 14 at 22:05






          • 5





            @supercat Exactly. JavaScript immediately comes to mind, for example...

            – Mason Wheeler
            Jan 14 at 22:11














          19












          19








          19







          The Robustness Principle--specifically, the "be liberal in what you accept" half of it--is a very bad idea in software. It was originally developed in the context of hardware, where physical constraints make engineering tolerances very important, but in software, when someone sends you malformed or otherwise improper input, you have two choices. You can either reject it, (preferably with an explanation as to what went wrong,) or you can try to figure out what it was supposed to mean.




          EDIT: Turns out I was mistaken in the above statement. The Robustness Principle doesn't come from the world of hardware, but from Internet architecture, specifically RFC 1958. It states:




          3.9 Be strict when sending and tolerant when receiving. Implementations must follow specifications precisely when sending to the network, and tolerate faulty input from the network. When in doubt, discard faulty input silently, without returning an error message unless this is required by the specification.




          This is, plainly speaking, simply wrong from start to finish. It is difficult to conceive of a more wrongheaded notion of error handling than "discard faulty input silently without returning an error message," for the reasons given in this post.



          See also the IETF paper The Harmful Consequences of the Robustness Principle for further elaboration on this point.




          Never, never, never choose that second option unless you have resources equivalent to Google's Search team to throw at your project, because that's what it takes to come up with a computer program that does anything close to a decent job at that particular problem domain. (And even then, Google's suggestions feel like they're coming straight out of left field about half the time.) If you try to do so, what you'll end up with is a massive headache where your program will frequently try to interpret bad input as X, when what the sender really meant was Y.



          This is bad for two reasons. The obvious one is because then you have bad data in your system. The less obvious one is that in many cases, neither you nor the sender will realize that anything went wrong until much later down the road when something blows up in your face, and then suddenly you have a big, expensive mess to fix and no idea what went wrong because the noticeable effect is so far removed from the root cause.



          This is why the Fail Fast principle exists; save everyone involved the headache by applying it to your APIs.






          share|improve this answer















          The Robustness Principle--specifically, the "be liberal in what you accept" half of it--is a very bad idea in software. It was originally developed in the context of hardware, where physical constraints make engineering tolerances very important, but in software, when someone sends you malformed or otherwise improper input, you have two choices. You can either reject it, (preferably with an explanation as to what went wrong,) or you can try to figure out what it was supposed to mean.




          EDIT: Turns out I was mistaken in the above statement. The Robustness Principle doesn't come from the world of hardware, but from Internet architecture, specifically RFC 1958. It states:




          3.9 Be strict when sending and tolerant when receiving. Implementations must follow specifications precisely when sending to the network, and tolerate faulty input from the network. When in doubt, discard faulty input silently, without returning an error message unless this is required by the specification.




          This is, plainly speaking, simply wrong from start to finish. It is difficult to conceive of a more wrongheaded notion of error handling than "discard faulty input silently without returning an error message," for the reasons given in this post.



          See also the IETF paper The Harmful Consequences of the Robustness Principle for further elaboration on this point.




          Never, never, never choose that second option unless you have resources equivalent to Google's Search team to throw at your project, because that's what it takes to come up with a computer program that does anything close to a decent job at that particular problem domain. (And even then, Google's suggestions feel like they're coming straight out of left field about half the time.) If you try to do so, what you'll end up with is a massive headache where your program will frequently try to interpret bad input as X, when what the sender really meant was Y.



          This is bad for two reasons. The obvious one is because then you have bad data in your system. The less obvious one is that in many cases, neither you nor the sender will realize that anything went wrong until much later down the road when something blows up in your face, and then suddenly you have a big, expensive mess to fix and no idea what went wrong because the noticeable effect is so far removed from the root cause.



          This is why the Fail Fast principle exists; save everyone involved the headache by applying it to your APIs.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 15 at 20:29

























          answered Jan 14 at 19:42









          Mason WheelerMason Wheeler

          74.7k18213299




          74.7k18213299








          • 7





            While I agree with the principle of what you're saying, I think you're mistaken WRT the intent of the Robustness Principle. I've never seen it intended to mean, "accept bad data", only, "don't be excessively fiddly about good data". For example, if the input is a CSV file, the Robustness Principle wouldn't be a valid argument for trying to parse out dates in an unexpected format, but would support an argument that inferring colum order from a header row would be a good idea.

            – Morgen
            Jan 14 at 21:22






          • 9





            @Morgen: The robustness principle was used to suggest that browsers should accept rather sloppy HTML, and led to deployed web sites being much sloppier than they would have been if browsers had demanded proper HTML. A big part of the problem there, though, was the use of a common format for human-generated and machine-generated content, as opposed to the use of separate human-editable and machine-parsable formats along with utilities to convert between them.

            – supercat
            Jan 14 at 21:33






          • 9





            @supercat: nevertheless - or just hence - HTML and the WWW was extremely successful ;-)

            – Doc Brown
            Jan 14 at 21:43








          • 11





            @DocBrown: A lot of really horrible things have become standards simply because they were the first approach that happened to be available when someone with a lot of clout needed to adopt something that met certain minimal criteria, and by the time they gained traction it was too late to select something better.

            – supercat
            Jan 14 at 22:05






          • 5





            @supercat Exactly. JavaScript immediately comes to mind, for example...

            – Mason Wheeler
            Jan 14 at 22:11














          • 7





            While I agree with the principle of what you're saying, I think you're mistaken WRT the intent of the Robustness Principle. I've never seen it intended to mean, "accept bad data", only, "don't be excessively fiddly about good data". For example, if the input is a CSV file, the Robustness Principle wouldn't be a valid argument for trying to parse out dates in an unexpected format, but would support an argument that inferring colum order from a header row would be a good idea.

            – Morgen
            Jan 14 at 21:22






          • 9





            @Morgen: The robustness principle was used to suggest that browsers should accept rather sloppy HTML, and led to deployed web sites being much sloppier than they would have been if browsers had demanded proper HTML. A big part of the problem there, though, was the use of a common format for human-generated and machine-generated content, as opposed to the use of separate human-editable and machine-parsable formats along with utilities to convert between them.

            – supercat
            Jan 14 at 21:33






          • 9





            @supercat: nevertheless - or just hence - HTML and the WWW was extremely successful ;-)

            – Doc Brown
            Jan 14 at 21:43








          • 11





            @DocBrown: A lot of really horrible things have become standards simply because they were the first approach that happened to be available when someone with a lot of clout needed to adopt something that met certain minimal criteria, and by the time they gained traction it was too late to select something better.

            – supercat
            Jan 14 at 22:05






          • 5





            @supercat Exactly. JavaScript immediately comes to mind, for example...

            – Mason Wheeler
            Jan 14 at 22:11








          7




          7





          While I agree with the principle of what you're saying, I think you're mistaken WRT the intent of the Robustness Principle. I've never seen it intended to mean, "accept bad data", only, "don't be excessively fiddly about good data". For example, if the input is a CSV file, the Robustness Principle wouldn't be a valid argument for trying to parse out dates in an unexpected format, but would support an argument that inferring colum order from a header row would be a good idea.

          – Morgen
          Jan 14 at 21:22





          While I agree with the principle of what you're saying, I think you're mistaken WRT the intent of the Robustness Principle. I've never seen it intended to mean, "accept bad data", only, "don't be excessively fiddly about good data". For example, if the input is a CSV file, the Robustness Principle wouldn't be a valid argument for trying to parse out dates in an unexpected format, but would support an argument that inferring colum order from a header row would be a good idea.

          – Morgen
          Jan 14 at 21:22




          9




          9





          @Morgen: The robustness principle was used to suggest that browsers should accept rather sloppy HTML, and led to deployed web sites being much sloppier than they would have been if browsers had demanded proper HTML. A big part of the problem there, though, was the use of a common format for human-generated and machine-generated content, as opposed to the use of separate human-editable and machine-parsable formats along with utilities to convert between them.

          – supercat
          Jan 14 at 21:33





          @Morgen: The robustness principle was used to suggest that browsers should accept rather sloppy HTML, and led to deployed web sites being much sloppier than they would have been if browsers had demanded proper HTML. A big part of the problem there, though, was the use of a common format for human-generated and machine-generated content, as opposed to the use of separate human-editable and machine-parsable formats along with utilities to convert between them.

          – supercat
          Jan 14 at 21:33




          9




          9





          @supercat: nevertheless - or just hence - HTML and the WWW was extremely successful ;-)

          – Doc Brown
          Jan 14 at 21:43







          @supercat: nevertheless - or just hence - HTML and the WWW was extremely successful ;-)

          – Doc Brown
          Jan 14 at 21:43






          11




          11





          @DocBrown: A lot of really horrible things have become standards simply because they were the first approach that happened to be available when someone with a lot of clout needed to adopt something that met certain minimal criteria, and by the time they gained traction it was too late to select something better.

          – supercat
          Jan 14 at 22:05





          @DocBrown: A lot of really horrible things have become standards simply because they were the first approach that happened to be available when someone with a lot of clout needed to adopt something that met certain minimal criteria, and by the time they gained traction it was too late to select something better.

          – supercat
          Jan 14 at 22:05




          5




          5





          @supercat Exactly. JavaScript immediately comes to mind, for example...

          – Mason Wheeler
          Jan 14 at 22:11





          @supercat Exactly. JavaScript immediately comes to mind, for example...

          – Mason Wheeler
          Jan 14 at 22:11











          13














          In general, code should be constructed to uphold the at least the following constraints whenever practical:




          1. When given correct input, produce correct output.


          2. When given valid input (that may or may not be correct), produce valid output (likewise).


          3. When given invalid input, process it without any side-effects beyond those caused by normal input or those which are defined as signalling an error.



          In many situations, programs will essentially pass through various chunks of data without particularly caring about whether they are valid. If such chunks happen to contain invalid data, the program's output would likely contain invalid data as a consequence. Unless a program is specifically designed to validate all data, and guarantee that it will not produce invalid output even when given invalid input, programs that process its output should allow for the possibility of invalid data within it.



          While validating data early on is often desirable, it's not always particularly practical. Among other things, if the validity of one chunk of data depends upon the contents of other chunks, and if the majority of of the data fed into some sequence of steps will get filtered out along the way, limiting validation to data which makes it through all stages may yield much better performance than trying to validate everything.



          Further, even if a program is only expected to be given pre-validated data, it's often good to have it uphold the above constraints anyway whenever practical. Repeating full validation at every processing step would often be a major performance drain, but the limited amount of validation needed to uphold the above constraints may be much cheaper.






          share|improve this answer
























          • Then it all comes down to deciding whether the result of an API call is an "input".

            – mastov
            Jan 16 at 17:43











          • @mastov: The answers to many questions will depend upon how one defines "inputs" and "observable behaviors"/"outputs". If a program's purpose is to process numbers stored in a file, its input could be defined as the sequence of numbers (in which case things that aren't numbers aren't possible inputs), or as a file (in which case anything that could appear in a file would be a possible input).

            – supercat
            Jan 16 at 18:01
















          13














          In general, code should be constructed to uphold the at least the following constraints whenever practical:




          1. When given correct input, produce correct output.


          2. When given valid input (that may or may not be correct), produce valid output (likewise).


          3. When given invalid input, process it without any side-effects beyond those caused by normal input or those which are defined as signalling an error.



          In many situations, programs will essentially pass through various chunks of data without particularly caring about whether they are valid. If such chunks happen to contain invalid data, the program's output would likely contain invalid data as a consequence. Unless a program is specifically designed to validate all data, and guarantee that it will not produce invalid output even when given invalid input, programs that process its output should allow for the possibility of invalid data within it.



          While validating data early on is often desirable, it's not always particularly practical. Among other things, if the validity of one chunk of data depends upon the contents of other chunks, and if the majority of of the data fed into some sequence of steps will get filtered out along the way, limiting validation to data which makes it through all stages may yield much better performance than trying to validate everything.



          Further, even if a program is only expected to be given pre-validated data, it's often good to have it uphold the above constraints anyway whenever practical. Repeating full validation at every processing step would often be a major performance drain, but the limited amount of validation needed to uphold the above constraints may be much cheaper.






          share|improve this answer
























          • Then it all comes down to deciding whether the result of an API call is an "input".

            – mastov
            Jan 16 at 17:43











          • @mastov: The answers to many questions will depend upon how one defines "inputs" and "observable behaviors"/"outputs". If a program's purpose is to process numbers stored in a file, its input could be defined as the sequence of numbers (in which case things that aren't numbers aren't possible inputs), or as a file (in which case anything that could appear in a file would be a possible input).

            – supercat
            Jan 16 at 18:01














          13












          13








          13







          In general, code should be constructed to uphold the at least the following constraints whenever practical:




          1. When given correct input, produce correct output.


          2. When given valid input (that may or may not be correct), produce valid output (likewise).


          3. When given invalid input, process it without any side-effects beyond those caused by normal input or those which are defined as signalling an error.



          In many situations, programs will essentially pass through various chunks of data without particularly caring about whether they are valid. If such chunks happen to contain invalid data, the program's output would likely contain invalid data as a consequence. Unless a program is specifically designed to validate all data, and guarantee that it will not produce invalid output even when given invalid input, programs that process its output should allow for the possibility of invalid data within it.



          While validating data early on is often desirable, it's not always particularly practical. Among other things, if the validity of one chunk of data depends upon the contents of other chunks, and if the majority of of the data fed into some sequence of steps will get filtered out along the way, limiting validation to data which makes it through all stages may yield much better performance than trying to validate everything.



          Further, even if a program is only expected to be given pre-validated data, it's often good to have it uphold the above constraints anyway whenever practical. Repeating full validation at every processing step would often be a major performance drain, but the limited amount of validation needed to uphold the above constraints may be much cheaper.






          share|improve this answer













          In general, code should be constructed to uphold the at least the following constraints whenever practical:




          1. When given correct input, produce correct output.


          2. When given valid input (that may or may not be correct), produce valid output (likewise).


          3. When given invalid input, process it without any side-effects beyond those caused by normal input or those which are defined as signalling an error.



          In many situations, programs will essentially pass through various chunks of data without particularly caring about whether they are valid. If such chunks happen to contain invalid data, the program's output would likely contain invalid data as a consequence. Unless a program is specifically designed to validate all data, and guarantee that it will not produce invalid output even when given invalid input, programs that process its output should allow for the possibility of invalid data within it.



          While validating data early on is often desirable, it's not always particularly practical. Among other things, if the validity of one chunk of data depends upon the contents of other chunks, and if the majority of of the data fed into some sequence of steps will get filtered out along the way, limiting validation to data which makes it through all stages may yield much better performance than trying to validate everything.



          Further, even if a program is only expected to be given pre-validated data, it's often good to have it uphold the above constraints anyway whenever practical. Repeating full validation at every processing step would often be a major performance drain, but the limited amount of validation needed to uphold the above constraints may be much cheaper.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 14 at 21:54









          supercatsupercat

          7,0201726




          7,0201726













          • Then it all comes down to deciding whether the result of an API call is an "input".

            – mastov
            Jan 16 at 17:43











          • @mastov: The answers to many questions will depend upon how one defines "inputs" and "observable behaviors"/"outputs". If a program's purpose is to process numbers stored in a file, its input could be defined as the sequence of numbers (in which case things that aren't numbers aren't possible inputs), or as a file (in which case anything that could appear in a file would be a possible input).

            – supercat
            Jan 16 at 18:01



















          • Then it all comes down to deciding whether the result of an API call is an "input".

            – mastov
            Jan 16 at 17:43











          • @mastov: The answers to many questions will depend upon how one defines "inputs" and "observable behaviors"/"outputs". If a program's purpose is to process numbers stored in a file, its input could be defined as the sequence of numbers (in which case things that aren't numbers aren't possible inputs), or as a file (in which case anything that could appear in a file would be a possible input).

            – supercat
            Jan 16 at 18:01

















          Then it all comes down to deciding whether the result of an API call is an "input".

          – mastov
          Jan 16 at 17:43





          Then it all comes down to deciding whether the result of an API call is an "input".

          – mastov
          Jan 16 at 17:43













          @mastov: The answers to many questions will depend upon how one defines "inputs" and "observable behaviors"/"outputs". If a program's purpose is to process numbers stored in a file, its input could be defined as the sequence of numbers (in which case things that aren't numbers aren't possible inputs), or as a file (in which case anything that could appear in a file would be a possible input).

          – supercat
          Jan 16 at 18:01





          @mastov: The answers to many questions will depend upon how one defines "inputs" and "observable behaviors"/"outputs". If a program's purpose is to process numbers stored in a file, its input could be defined as the sequence of numbers (in which case things that aren't numbers aren't possible inputs), or as a file (in which case anything that could appear in a file would be a possible input).

          – supercat
          Jan 16 at 18:01











          3














          Let's compare the two scenarios and try to come to a conclusion.



          Scenario 1
          Our application assumes the external API will behave as per the agreement.



          Scenario 2
          Our application assumes the external API can misbehave, hence add precautions.



          In general, there is a chance for any API or software to violate the agreements; may be due to a bug or unexpected conditions. Even an API might be having issues in the internal systems resulting in unexpected results.



          If our program is written assuming the external API will adhere to the agreements and avoid adding any precautions; who will be the party facing the issues? It will be us, the ones who has written integration code.



          For example, the null values that you have picked. Say, as per the API agreement the response should have not-null values; but if it is suddenly violated our program will result in NPEs.



          So, I believe it will be better to make sure your application has some additional code to address unexpected scenarios.






          share|improve this answer




























            3














            Let's compare the two scenarios and try to come to a conclusion.



            Scenario 1
            Our application assumes the external API will behave as per the agreement.



            Scenario 2
            Our application assumes the external API can misbehave, hence add precautions.



            In general, there is a chance for any API or software to violate the agreements; may be due to a bug or unexpected conditions. Even an API might be having issues in the internal systems resulting in unexpected results.



            If our program is written assuming the external API will adhere to the agreements and avoid adding any precautions; who will be the party facing the issues? It will be us, the ones who has written integration code.



            For example, the null values that you have picked. Say, as per the API agreement the response should have not-null values; but if it is suddenly violated our program will result in NPEs.



            So, I believe it will be better to make sure your application has some additional code to address unexpected scenarios.






            share|improve this answer


























              3












              3








              3







              Let's compare the two scenarios and try to come to a conclusion.



              Scenario 1
              Our application assumes the external API will behave as per the agreement.



              Scenario 2
              Our application assumes the external API can misbehave, hence add precautions.



              In general, there is a chance for any API or software to violate the agreements; may be due to a bug or unexpected conditions. Even an API might be having issues in the internal systems resulting in unexpected results.



              If our program is written assuming the external API will adhere to the agreements and avoid adding any precautions; who will be the party facing the issues? It will be us, the ones who has written integration code.



              For example, the null values that you have picked. Say, as per the API agreement the response should have not-null values; but if it is suddenly violated our program will result in NPEs.



              So, I believe it will be better to make sure your application has some additional code to address unexpected scenarios.






              share|improve this answer













              Let's compare the two scenarios and try to come to a conclusion.



              Scenario 1
              Our application assumes the external API will behave as per the agreement.



              Scenario 2
              Our application assumes the external API can misbehave, hence add precautions.



              In general, there is a chance for any API or software to violate the agreements; may be due to a bug or unexpected conditions. Even an API might be having issues in the internal systems resulting in unexpected results.



              If our program is written assuming the external API will adhere to the agreements and avoid adding any precautions; who will be the party facing the issues? It will be us, the ones who has written integration code.



              For example, the null values that you have picked. Say, as per the API agreement the response should have not-null values; but if it is suddenly violated our program will result in NPEs.



              So, I believe it will be better to make sure your application has some additional code to address unexpected scenarios.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Jan 14 at 18:24









              lkamallkamal

              1513




              1513























                  1














                  You should always validate incoming data -- user-entered or otherwise -- so you should have a process in place to handle when the data retrieved from this external API is invalid.



                  Generally speaking, any seam where extra-orgranizational systems meet should require authentication, authorization (if not defined simply by authentication), and validation.






                  share|improve this answer




























                    1














                    You should always validate incoming data -- user-entered or otherwise -- so you should have a process in place to handle when the data retrieved from this external API is invalid.



                    Generally speaking, any seam where extra-orgranizational systems meet should require authentication, authorization (if not defined simply by authentication), and validation.






                    share|improve this answer


























                      1












                      1








                      1







                      You should always validate incoming data -- user-entered or otherwise -- so you should have a process in place to handle when the data retrieved from this external API is invalid.



                      Generally speaking, any seam where extra-orgranizational systems meet should require authentication, authorization (if not defined simply by authentication), and validation.






                      share|improve this answer













                      You should always validate incoming data -- user-entered or otherwise -- so you should have a process in place to handle when the data retrieved from this external API is invalid.



                      Generally speaking, any seam where extra-orgranizational systems meet should require authentication, authorization (if not defined simply by authentication), and validation.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jan 14 at 18:35









                      StarTrekRedneckStarTrekRedneck

                      1783




                      1783























                          1














                          In general, yes, you must always guard against flawed inputs, but depending on the kind of API, "guard" means different things.



                          For an external API to a server, you do not want to accidentally create a command that crashes or compromises the state of the server, so you must guard against that.



                          For an API like e.g. a container class (list, vector, etc), throwing exceptions is a perfectly fine outcome, compromising the state of the class instance may be acceptable to some extent (e.g. a sorted container provided with a faulty comparison operator will not be sorted), even crashing the application may be acceptable, but compromising the state of the application - e.g. writing to random memory locations unrelated to the class instance - is most likely not.






                          share|improve this answer




























                            1














                            In general, yes, you must always guard against flawed inputs, but depending on the kind of API, "guard" means different things.



                            For an external API to a server, you do not want to accidentally create a command that crashes or compromises the state of the server, so you must guard against that.



                            For an API like e.g. a container class (list, vector, etc), throwing exceptions is a perfectly fine outcome, compromising the state of the class instance may be acceptable to some extent (e.g. a sorted container provided with a faulty comparison operator will not be sorted), even crashing the application may be acceptable, but compromising the state of the application - e.g. writing to random memory locations unrelated to the class instance - is most likely not.






                            share|improve this answer


























                              1












                              1








                              1







                              In general, yes, you must always guard against flawed inputs, but depending on the kind of API, "guard" means different things.



                              For an external API to a server, you do not want to accidentally create a command that crashes or compromises the state of the server, so you must guard against that.



                              For an API like e.g. a container class (list, vector, etc), throwing exceptions is a perfectly fine outcome, compromising the state of the class instance may be acceptable to some extent (e.g. a sorted container provided with a faulty comparison operator will not be sorted), even crashing the application may be acceptable, but compromising the state of the application - e.g. writing to random memory locations unrelated to the class instance - is most likely not.






                              share|improve this answer













                              In general, yes, you must always guard against flawed inputs, but depending on the kind of API, "guard" means different things.



                              For an external API to a server, you do not want to accidentally create a command that crashes or compromises the state of the server, so you must guard against that.



                              For an API like e.g. a container class (list, vector, etc), throwing exceptions is a perfectly fine outcome, compromising the state of the class instance may be acceptable to some extent (e.g. a sorted container provided with a faulty comparison operator will not be sorted), even crashing the application may be acceptable, but compromising the state of the application - e.g. writing to random memory locations unrelated to the class instance - is most likely not.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Jan 16 at 11:24









                              PeterPeter

                              2,857415




                              2,857415























                                  0














                                  To give a slightly differing opinion:
                                  I think it can be acceptable to just work with the data you are given, even if it violates it's contract. This depends on the usage: It's something that MUST be a string for you, or is it something you are just displaying / does not use etc. In the latter case, simply accept it.
                                  I have an API which just needs 1% of the data delivered by another api. I could not care less what kind of data are in the 99%, so I will never check it.



                                  There has to be balance between "having errors because I do not check my inputs enough" and "I reject valid data because I am too strict".






                                  share|improve this answer



















                                  • 2





                                    "I have an API which just needs 1% of the data delivered by another api." This then opens up the question why your API expects a 100 times more data than it actually needs. If you need to store opaque data to pass on, you don't really have to be specific as to what it is and don't have to declare it in any specific format, in which case the caller wouldn't be violating your contract.

                                    – Voo
                                    Jan 15 at 19:21






                                  • 1





                                    @Voo - My suspicion is that they are calling some external API (like "get weather details for city X") and then cherry-picking the data they need ("current temperature") and ignoring the rest of the returned data ("rainfall", "wind", "forecast temperature", "wind chill", etc...)

                                    – Stobor
                                    Jan 16 at 2:24













                                  • @ChristianSauer - I think you are not that far from what the wider consensus is - the 1% of the data that you use makes sense to check, but the 99% which you don't does not necessarily need to be checked. You only need to check the things which could trip your code up.

                                    – Stobor
                                    Jan 16 at 2:25
















                                  0














                                  To give a slightly differing opinion:
                                  I think it can be acceptable to just work with the data you are given, even if it violates it's contract. This depends on the usage: It's something that MUST be a string for you, or is it something you are just displaying / does not use etc. In the latter case, simply accept it.
                                  I have an API which just needs 1% of the data delivered by another api. I could not care less what kind of data are in the 99%, so I will never check it.



                                  There has to be balance between "having errors because I do not check my inputs enough" and "I reject valid data because I am too strict".






                                  share|improve this answer



















                                  • 2





                                    "I have an API which just needs 1% of the data delivered by another api." This then opens up the question why your API expects a 100 times more data than it actually needs. If you need to store opaque data to pass on, you don't really have to be specific as to what it is and don't have to declare it in any specific format, in which case the caller wouldn't be violating your contract.

                                    – Voo
                                    Jan 15 at 19:21






                                  • 1





                                    @Voo - My suspicion is that they are calling some external API (like "get weather details for city X") and then cherry-picking the data they need ("current temperature") and ignoring the rest of the returned data ("rainfall", "wind", "forecast temperature", "wind chill", etc...)

                                    – Stobor
                                    Jan 16 at 2:24













                                  • @ChristianSauer - I think you are not that far from what the wider consensus is - the 1% of the data that you use makes sense to check, but the 99% which you don't does not necessarily need to be checked. You only need to check the things which could trip your code up.

                                    – Stobor
                                    Jan 16 at 2:25














                                  0












                                  0








                                  0







                                  To give a slightly differing opinion:
                                  I think it can be acceptable to just work with the data you are given, even if it violates it's contract. This depends on the usage: It's something that MUST be a string for you, or is it something you are just displaying / does not use etc. In the latter case, simply accept it.
                                  I have an API which just needs 1% of the data delivered by another api. I could not care less what kind of data are in the 99%, so I will never check it.



                                  There has to be balance between "having errors because I do not check my inputs enough" and "I reject valid data because I am too strict".






                                  share|improve this answer













                                  To give a slightly differing opinion:
                                  I think it can be acceptable to just work with the data you are given, even if it violates it's contract. This depends on the usage: It's something that MUST be a string for you, or is it something you are just displaying / does not use etc. In the latter case, simply accept it.
                                  I have an API which just needs 1% of the data delivered by another api. I could not care less what kind of data are in the 99%, so I will never check it.



                                  There has to be balance between "having errors because I do not check my inputs enough" and "I reject valid data because I am too strict".







                                  share|improve this answer












                                  share|improve this answer



                                  share|improve this answer










                                  answered Jan 15 at 12:32









                                  Christian SauerChristian Sauer

                                  839515




                                  839515








                                  • 2





                                    "I have an API which just needs 1% of the data delivered by another api." This then opens up the question why your API expects a 100 times more data than it actually needs. If you need to store opaque data to pass on, you don't really have to be specific as to what it is and don't have to declare it in any specific format, in which case the caller wouldn't be violating your contract.

                                    – Voo
                                    Jan 15 at 19:21






                                  • 1





                                    @Voo - My suspicion is that they are calling some external API (like "get weather details for city X") and then cherry-picking the data they need ("current temperature") and ignoring the rest of the returned data ("rainfall", "wind", "forecast temperature", "wind chill", etc...)

                                    – Stobor
                                    Jan 16 at 2:24













                                  • @ChristianSauer - I think you are not that far from what the wider consensus is - the 1% of the data that you use makes sense to check, but the 99% which you don't does not necessarily need to be checked. You only need to check the things which could trip your code up.

                                    – Stobor
                                    Jan 16 at 2:25














                                  • 2





                                    "I have an API which just needs 1% of the data delivered by another api." This then opens up the question why your API expects a 100 times more data than it actually needs. If you need to store opaque data to pass on, you don't really have to be specific as to what it is and don't have to declare it in any specific format, in which case the caller wouldn't be violating your contract.

                                    – Voo
                                    Jan 15 at 19:21






                                  • 1





                                    @Voo - My suspicion is that they are calling some external API (like "get weather details for city X") and then cherry-picking the data they need ("current temperature") and ignoring the rest of the returned data ("rainfall", "wind", "forecast temperature", "wind chill", etc...)

                                    – Stobor
                                    Jan 16 at 2:24













                                  • @ChristianSauer - I think you are not that far from what the wider consensus is - the 1% of the data that you use makes sense to check, but the 99% which you don't does not necessarily need to be checked. You only need to check the things which could trip your code up.

                                    – Stobor
                                    Jan 16 at 2:25








                                  2




                                  2





                                  "I have an API which just needs 1% of the data delivered by another api." This then opens up the question why your API expects a 100 times more data than it actually needs. If you need to store opaque data to pass on, you don't really have to be specific as to what it is and don't have to declare it in any specific format, in which case the caller wouldn't be violating your contract.

                                  – Voo
                                  Jan 15 at 19:21





                                  "I have an API which just needs 1% of the data delivered by another api." This then opens up the question why your API expects a 100 times more data than it actually needs. If you need to store opaque data to pass on, you don't really have to be specific as to what it is and don't have to declare it in any specific format, in which case the caller wouldn't be violating your contract.

                                  – Voo
                                  Jan 15 at 19:21




                                  1




                                  1





                                  @Voo - My suspicion is that they are calling some external API (like "get weather details for city X") and then cherry-picking the data they need ("current temperature") and ignoring the rest of the returned data ("rainfall", "wind", "forecast temperature", "wind chill", etc...)

                                  – Stobor
                                  Jan 16 at 2:24







                                  @Voo - My suspicion is that they are calling some external API (like "get weather details for city X") and then cherry-picking the data they need ("current temperature") and ignoring the rest of the returned data ("rainfall", "wind", "forecast temperature", "wind chill", etc...)

                                  – Stobor
                                  Jan 16 at 2:24















                                  @ChristianSauer - I think you are not that far from what the wider consensus is - the 1% of the data that you use makes sense to check, but the 99% which you don't does not necessarily need to be checked. You only need to check the things which could trip your code up.

                                  – Stobor
                                  Jan 16 at 2:25





                                  @ChristianSauer - I think you are not that far from what the wider consensus is - the 1% of the data that you use makes sense to check, but the 99% which you don't does not necessarily need to be checked. You only need to check the things which could trip your code up.

                                  – Stobor
                                  Jan 16 at 2:25











                                  0














                                  My take on this is to always, always check each and every input to my system. That means every parameter returned from an API should be checked, even if my program does not use it. I tend to as well check every parameter I send to an API for correctness. There are only two exceptions to this rule, see below.



                                  The reason for testing is that if for some reason the API / input is incorrect my program cannot rely on anything. Maybe my program was linked to an old version of the API that does something different from what I believe? Maybe my program stumbled on a bug in the external program that has never before happened. Or even worse, happens all the time but no one cares! Maybe the external program is beeing fooled by a hacker to return stuff that can hurt my program or the system?



                                  The two exceptions to testing everything in my world are:





                                  1. Performance after careful measurement of performance:




                                    • never optimize before you have measured. Testing all input / returned data most often takes a very small time compared to the actual call so removing it often saves little or nothing. I would still keep the error detection code, but comment it out, perhaps by a macro or simply commenting it away.




                                  2. When you have no clue what to do with an error




                                    • there are times, not often, when your design simply does not allow handling of the kind of error you would find. Maybe what you ought to do is log an error, but there is no error logging in the system. It is almost always possible to find some way to "remember" the error allowing at least you as a developer to later check for it. Error counters is one good thing to have in a system, even if you elect to not have logging.




                                  Exactly how carefully to check inputs / return values is an important question. As example, if the API is said to return a string, I would check that:




                                  • the data type actully is a string


                                  • and that length is between min and max values. Always check strings for max size that my program can expect to handle (returning too large strings is a classical security problem in networked systems).


                                  • Some strings should be checked for "illegal" characters or content when that is relevant. If your program might send the string to say a database later, it is a good idea to be check for database attacks (search for SQL injection). These tests are best done at the borders of my system, where I can pinpoint where the attack came from and I can fail early. Doing a full SQL injection test might be difficult when strings are later combined, so that test should be done before calling the database, but if you can find some problems early it can be useful.



                                  The reason for testing parameters I send to the API is to be sure that I get a correct result back. Again, doing these tests before calling an API might seem unnecessary but it takes very little performance and may catch errors in my program. Hence the tests are most valuable when developing a system (but nowadays every system seems to be in continous development). Depending on the parameters the tests can be more or less thorough but I tend to find that you can often set allowable min and max values on most parameters that my program could create. Perhaps a string should always have at least 2 characters and be a maximum of 2000 characters long? The min and maximum should be inside what the API allows as I know that my program will never use the full range of some parameters.






                                  share|improve this answer




























                                    0














                                    My take on this is to always, always check each and every input to my system. That means every parameter returned from an API should be checked, even if my program does not use it. I tend to as well check every parameter I send to an API for correctness. There are only two exceptions to this rule, see below.



                                    The reason for testing is that if for some reason the API / input is incorrect my program cannot rely on anything. Maybe my program was linked to an old version of the API that does something different from what I believe? Maybe my program stumbled on a bug in the external program that has never before happened. Or even worse, happens all the time but no one cares! Maybe the external program is beeing fooled by a hacker to return stuff that can hurt my program or the system?



                                    The two exceptions to testing everything in my world are:





                                    1. Performance after careful measurement of performance:




                                      • never optimize before you have measured. Testing all input / returned data most often takes a very small time compared to the actual call so removing it often saves little or nothing. I would still keep the error detection code, but comment it out, perhaps by a macro or simply commenting it away.




                                    2. When you have no clue what to do with an error




                                      • there are times, not often, when your design simply does not allow handling of the kind of error you would find. Maybe what you ought to do is log an error, but there is no error logging in the system. It is almost always possible to find some way to "remember" the error allowing at least you as a developer to later check for it. Error counters is one good thing to have in a system, even if you elect to not have logging.




                                    Exactly how carefully to check inputs / return values is an important question. As example, if the API is said to return a string, I would check that:




                                    • the data type actully is a string


                                    • and that length is between min and max values. Always check strings for max size that my program can expect to handle (returning too large strings is a classical security problem in networked systems).


                                    • Some strings should be checked for "illegal" characters or content when that is relevant. If your program might send the string to say a database later, it is a good idea to be check for database attacks (search for SQL injection). These tests are best done at the borders of my system, where I can pinpoint where the attack came from and I can fail early. Doing a full SQL injection test might be difficult when strings are later combined, so that test should be done before calling the database, but if you can find some problems early it can be useful.



                                    The reason for testing parameters I send to the API is to be sure that I get a correct result back. Again, doing these tests before calling an API might seem unnecessary but it takes very little performance and may catch errors in my program. Hence the tests are most valuable when developing a system (but nowadays every system seems to be in continous development). Depending on the parameters the tests can be more or less thorough but I tend to find that you can often set allowable min and max values on most parameters that my program could create. Perhaps a string should always have at least 2 characters and be a maximum of 2000 characters long? The min and maximum should be inside what the API allows as I know that my program will never use the full range of some parameters.






                                    share|improve this answer


























                                      0












                                      0








                                      0







                                      My take on this is to always, always check each and every input to my system. That means every parameter returned from an API should be checked, even if my program does not use it. I tend to as well check every parameter I send to an API for correctness. There are only two exceptions to this rule, see below.



                                      The reason for testing is that if for some reason the API / input is incorrect my program cannot rely on anything. Maybe my program was linked to an old version of the API that does something different from what I believe? Maybe my program stumbled on a bug in the external program that has never before happened. Or even worse, happens all the time but no one cares! Maybe the external program is beeing fooled by a hacker to return stuff that can hurt my program or the system?



                                      The two exceptions to testing everything in my world are:





                                      1. Performance after careful measurement of performance:




                                        • never optimize before you have measured. Testing all input / returned data most often takes a very small time compared to the actual call so removing it often saves little or nothing. I would still keep the error detection code, but comment it out, perhaps by a macro or simply commenting it away.




                                      2. When you have no clue what to do with an error




                                        • there are times, not often, when your design simply does not allow handling of the kind of error you would find. Maybe what you ought to do is log an error, but there is no error logging in the system. It is almost always possible to find some way to "remember" the error allowing at least you as a developer to later check for it. Error counters is one good thing to have in a system, even if you elect to not have logging.




                                      Exactly how carefully to check inputs / return values is an important question. As example, if the API is said to return a string, I would check that:




                                      • the data type actully is a string


                                      • and that length is between min and max values. Always check strings for max size that my program can expect to handle (returning too large strings is a classical security problem in networked systems).


                                      • Some strings should be checked for "illegal" characters or content when that is relevant. If your program might send the string to say a database later, it is a good idea to be check for database attacks (search for SQL injection). These tests are best done at the borders of my system, where I can pinpoint where the attack came from and I can fail early. Doing a full SQL injection test might be difficult when strings are later combined, so that test should be done before calling the database, but if you can find some problems early it can be useful.



                                      The reason for testing parameters I send to the API is to be sure that I get a correct result back. Again, doing these tests before calling an API might seem unnecessary but it takes very little performance and may catch errors in my program. Hence the tests are most valuable when developing a system (but nowadays every system seems to be in continous development). Depending on the parameters the tests can be more or less thorough but I tend to find that you can often set allowable min and max values on most parameters that my program could create. Perhaps a string should always have at least 2 characters and be a maximum of 2000 characters long? The min and maximum should be inside what the API allows as I know that my program will never use the full range of some parameters.






                                      share|improve this answer













                                      My take on this is to always, always check each and every input to my system. That means every parameter returned from an API should be checked, even if my program does not use it. I tend to as well check every parameter I send to an API for correctness. There are only two exceptions to this rule, see below.



                                      The reason for testing is that if for some reason the API / input is incorrect my program cannot rely on anything. Maybe my program was linked to an old version of the API that does something different from what I believe? Maybe my program stumbled on a bug in the external program that has never before happened. Or even worse, happens all the time but no one cares! Maybe the external program is beeing fooled by a hacker to return stuff that can hurt my program or the system?



                                      The two exceptions to testing everything in my world are:





                                      1. Performance after careful measurement of performance:




                                        • never optimize before you have measured. Testing all input / returned data most often takes a very small time compared to the actual call so removing it often saves little or nothing. I would still keep the error detection code, but comment it out, perhaps by a macro or simply commenting it away.




                                      2. When you have no clue what to do with an error




                                        • there are times, not often, when your design simply does not allow handling of the kind of error you would find. Maybe what you ought to do is log an error, but there is no error logging in the system. It is almost always possible to find some way to "remember" the error allowing at least you as a developer to later check for it. Error counters is one good thing to have in a system, even if you elect to not have logging.




                                      Exactly how carefully to check inputs / return values is an important question. As example, if the API is said to return a string, I would check that:




                                      • the data type actully is a string


                                      • and that length is between min and max values. Always check strings for max size that my program can expect to handle (returning too large strings is a classical security problem in networked systems).


                                      • Some strings should be checked for "illegal" characters or content when that is relevant. If your program might send the string to say a database later, it is a good idea to be check for database attacks (search for SQL injection). These tests are best done at the borders of my system, where I can pinpoint where the attack came from and I can fail early. Doing a full SQL injection test might be difficult when strings are later combined, so that test should be done before calling the database, but if you can find some problems early it can be useful.



                                      The reason for testing parameters I send to the API is to be sure that I get a correct result back. Again, doing these tests before calling an API might seem unnecessary but it takes very little performance and may catch errors in my program. Hence the tests are most valuable when developing a system (but nowadays every system seems to be in continous development). Depending on the parameters the tests can be more or less thorough but I tend to find that you can often set allowable min and max values on most parameters that my program could create. Perhaps a string should always have at least 2 characters and be a maximum of 2000 characters long? The min and maximum should be inside what the API allows as I know that my program will never use the full range of some parameters.







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Jan 15 at 18:20









                                      ghellquistghellquist

                                      1194




                                      1194






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Software Engineering Stack Exchange!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f385497%2fshould-you-guard-against-unexpected-values-from-external-apis%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown











                                          Popular posts from this blog

                                          android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

                                          SQL update select statement

                                          'app-layout' is not a known element: how to share Component with different Modules