Inconsistent JSON arrays using PHP’s json_encode

14th September, 2010 - Posted by david

I’d be amazed if anyone ever has come across this problem before, but it’s one that stumped me recently and I feel would make a good first ‘real’ post for this new blog.

Firstly, a bit of background: in work, we have an API that uses the XML-based SOAP protocol over HTTP. You can request data either using a PHP SOAP plug-in, or by passing your parameters as encoded JSON over a HTTPS connection. The data you get back is usually an array of properties, with various fields set. When data is requested in JSON, it’s returned in JSON. On the main server, we use memcached, which gets checked for a given query before we hit the main database. Whenever a query is made to the API, if there’s nothing in the cache, we hit the database and then cache the result for 5 minutes, to reduce the impact on our primary database; pretty standard stuff for a big website.

The problem I was experiencing was this:

  • you’d request data in JSON and you got back an array of properties with it’s fields
  • one of the fields itself was an array (in this instance it was an array of “features” of the property)
  • the value of ‘features’ in a given property would vary from "features":["feature 1", "feature 2"], as you’d expect, to "features":["Array"], which is pretty useless!

So, you’d do the query, get the ["feature 1", "feature 2"] text as you’d expect, then reload and watch it change to ["Array"]. Subsequent reloads would yield Array, then after a while you’d get the value you expected, only for it to revert to Array again. I knew the features were always being built correctly as an array, as the same code is used elsewhere on the site, so that wasn’t the problem.

After some time of leaving the problem and coming back to it, I realised the time difference between correct ("features":["feature 1", "feature 2"]) and incorrect ("features":["Array"]) values was about 5 minutes, i.e. you’d make a request, get the correct value, reload, get an incorrect value, keep reloading the page, then about 5 minutes later you’d have a correct one, only for it to revert to incorrect on the next reload. So, I figured it must be something to do with the caching of the data. When you were getting data from the database it was correct; when you were pulling it from the cache it was incorrect.

Next up was to try and figure out why only this array; various other API functions also return arrays and there’s never been any report of such inconsistencies. So, looking again at the code that generated the features array I saw that, for some legacy reason, the array was 1-indexed, as opposed to the standard 0-indexing. I never thought this could cause an issue, but decided to temporarily remove it, just to see if it made a difference: sure enough it did! Such relief.

So, the whole lesson here is this: when trying to solve a non-obvious problem, look for any related patterns you do know (i.e. the 5 minutes of reloading relating to the 5 minute cache time) and anything odd about the data you’re dealing with (i.e. the 1-indexed array). Hopefully these will point you in the right direction. One other thing that I read recently in the book Coders At Work, which is kind of related to my point – if a problem rarely happens and you’ve some code that’s rarely executed, that code block is a good place to start.

Tags: api bug json memcached mysql php | david | 14th Sep, 2010 at 21:02pm | No Comments

No Comments

Leave a reply

You must be logged in to post a comment.