How to encode and decode JSON in PHP?

11 years ago by David Grudl  

Let's create simple OOP wrapper for encoding and decoding JSON in PHP:

class Json
{
	public static function encode($value)
	{
		$json = json_encode($value);
		if (json_last_error()) {
			throw new JsonException;
		}
		return $json;
	}

	public static function decode($json)
	{
		$value = json_decode($json);
		if (json_last_error()) {
			throw new JsonException;
		}
		return $value;
	}
}

class JsonException extends Exception
{
}

// usage:
$json = Json::encode($arg);

Simple.

But it is very naive. In PHP, there are a ton of bugs (sometime called as “not-a-bug”) that need workarounds.

  1. json_encode() is (nearly) the only one function in whole PHP, which behavior is affected by directive display_errors. Yes, JSON encoding is affected by displaying directive. If you want detect error Invalid UTF-8 sequence, you must disable this directive. (#52397, #54109, #63004, not fixed).
  2. json_last_error() returns the last error (if any) occurred during the last JSON encoding/decoding. Sometimes! In case of error Recursion detected it returns 0. You must install your own error handler to catch this error. (Fixed after years in PHP 5.5.0)
  3. json_last_error() sometimes doesn't return the last error, but the last-but-one error. I.e. json_decode('') with empty string doesn't clear last error flag, so you cannot rely on error code. (Fixed in PHP 5.3.7)
  4. json_decode() returns null if the JSON cannot be decoded or if the encoded data is deeper than the recursion limit. Ok, but json_encode('null') return null too. So we have the same return value for success and failure. Great!
  5. json_decode() is unable to detect Invalid UTF-8 sequence in PHP < 5.3.3 or when PECL implementation is used. You must check it own way.
  6. json_last_error() exists since PHP 5.3.0, so minimal required version for our wrapper is PHP 5.3
  7. json_last_error() returns only numeric code. If you'd like to throw exception, you must create own table of messages (json_last_error_msg() was added in PHP 5.5.0)

So the simple class wrapper for encoding and decoding JSON now looks like this:

class Json
{
	private static $messages = array(
		JSON_ERROR_DEPTH => 'The maximum stack depth has been exceeded',
		JSON_ERROR_STATE_MISMATCH => 'Syntax error, malformed JSON',
		JSON_ERROR_CTRL_CHAR => 'Unexpected control character found',
		JSON_ERROR_SYNTAX => 'Syntax error, malformed JSON',
		5 /*JSON_ERROR_UTF8*/ => 'Invalid UTF-8 sequence',
		6 /*JSON_ERROR_RECURSION*/ => 'Recursion detected',
		7 /*JSON_ERROR_INF_OR_NAN*/ => 'Inf and NaN cannot be JSON encoded',
		8 /*JSON_ERROR_UNSUPPORTED_TYPE*/ => 'Type is not supported',
	);


	public static function encode($value)
	{
		// needed to receive 'Invalid UTF-8 sequence' error; PHP bugs #52397, #54109, #63004
		if (function_exists('ini_set')) { // ini_set is disabled on some hosts :-(
			$old = ini_set('display_errors', 0);
		}

		// needed to receive 'recursion detected' error
		set_error_handler(function($severity, $message) {
			restore_error_handler();
			throw new JsonException($message);
		});

		$json = json_encode($value);

		restore_error_handler();
		if (isset($old)) {
			ini_set('display_errors', $old);
		}
		if ($error = json_last_error()) {
			$message = isset(static::$messages[$error]) ? static::$messages[$error] : 'Unknown error';
			throw new JsonException($message, $error);
		}
		return $json;
	}


	public static function decode($json)
	{
		if (!preg_match('##u', $json)) { // workaround for PHP < 5.3.3 & PECL JSON-C
			throw new JsonException('Invalid UTF-8 sequence', 5);
		}

		$value = json_decode($json);

		if ($value === null
			&& $json !== ''  // it doesn't clean json_last_error flag until 5.3.7
			&& $json !== 'null' // in this case null is not failure
		) {
			$error = json_last_error();
			$message = isset(static::$messages[$error]) ? static::$messages[$error] : 'Unknown error';
			throw new JsonException($message, $error);
		}
		return $value;
	}
}

This implementation is used in Nette Framework. There is also workaround for another bug, the JSON bug. In fact, JSON is not subset of JavaScript due characters \u2028and \u2029. They must be not used in JavaScript and must be encoded too.

(In PHP, detection of errors in JSON encoding/decoding is hell, but it is nothing compared to detection of errors in PCRE functions.)