The Unterminated String

Embedded Things and Software Stuff

Parsing with RapidJSON

Posted at — Feb 28, 2023

RapidJSON is one of many available options for parsing JSON with C++. The official documentation offers a fairly good tutorial.

The following provides some additional notes and examples which build on top of the tutorial, to act as a reference for myself, if no one else. All the examples are taken from the same file which can be found here.

Note that the focus of this post is on reading JSON rather than creating or manipulating JSON.

Parse Errors

First things first - how to read in a JSON document and confirm that it is legitimate JSON. The following code tries to parse a JSON string with a syntax error. It uses the RapidJSON functions to check for, and to describe the errors encountered with parsing.

When run, this will produce the following descriptive text:

Parse Error at offset 9. Missing a comma or '}' after an object member.
#include <cassert>
#include <cstring>
#include <iostream>
#include <rapidjson/document.h>
#include <rapidjson/error/en.h>

void json_has_parse_error()
{
    char const *const json = R"({"bad":{})";

    rapidjson::Document d;
    d.Parse(json);

    if (d.HasParseError())
    {
        std::cout << "Parse Error at offset " << d.GetErrorOffset()
                  << ". "
                  << rapidjson::GetParseError_En(d.GetParseError())
                  << '\n';
    }
}

Parsing a Valid JSON Document

The following JSON will be used throughout the remaining examples:

rapidjson::Document parse_test_document()
{
    char const *const json = R"(
    {
        "array_with_elements": ["one", "two", "three"],
        "object_with_elements" : {
            "a" : 1,
            "b" : 2
        },
        "string": "hello world",
        "int": 42,
        "float": 42.42,
        "boolean": false
    }
    )";

    rapidjson::Document d;
    d.Parse(json);

    assert(d.HasParseError() == false);
    assert(d.GetParseError() == rapidjson::kParseErrorNone);

    return d;
}

Notes on Types and Type Specific Functions and Helpers

A rapidjson::Document “is a” rapidjson::Value. Many of the examples here fall back to using the base class.

A rapidjson::Value contains a superset of all the possible functions for value types. This includes functions that are type specific, so care must be taken to not call them from the wrong context.

The first thing to do then is determine the JSON type which the rapidjson::Value holds. One approach is to call the member function GetType() which returns a enumeration value from:

enum Type {
    kNullType = 0,      //!< null
    kFalseType = 1,     //!< false
    kTrueType = 2,      //!< true
    kObjectType = 3,    //!< object
    kArrayType = 4,     //!< array
    kStringType = 5,    //!< string
    kNumberType = 6     //!< number
};

However, its likely you will have expectations on the node’s type based on its key. In which case it may be clearer to use the IsXXX() member functions to confirm the type. There are also several functions available which expose additional information about the underlying representation of numbers:

When it comes to Arrays and Objects, its pretty easy to accidentally use the wrong rapidjson::Value function. In particular, the names of the Array specific functions are quite generic and could easily be mistaken to apply to Object types.

The following provides an overview of some of the Array and Object specific functions:

Function Operates On
Empty() Array
Size() Array
Begin() Array
End() Array
ObjectEmpty() Object
MemberCount() Object
HasMember() Object
FindMember() Object
MemberBegin() Object
MemberEnd() Object

RapidJSON provides type specific classes which may help prevent against accidentally calling the wrong function. They only contain the functions that are appropriate to type they represent. These are “views” - they hold a reference to the initial rapidjson::Value.

rapidjson::Value Function Type Returned
GetObject() rapidjson::Value::Object (or ConstObject)
GetArray() rapidjson::Value::Array (or ConstArray)

For example, when given the document created in the above example, we can hold it as a Value or an Object:

void get_doc_as_object(rapidjson::Document const &d)
{
    assert(d.IsObject());
    assert(d.GetType() == rapidjson::kObjectType);
    assert(d.MemberCount() == 6);

    rapidjson::Value const &v(d);
    assert(v.GetType() == rapidjson::kObjectType);
    assert(v.IsObject());
    assert(v.MemberCount() == 6);

    rapidjson::Value::ConstObject const o = d.GetObject();
    assert(o.MemberCount() == 6);
}

Access Members of an Object

You can use operator[] to access a known JSON member. However, if you are not 100% certain the key will exist, you should first check that it does. HasMember() can be used for this purpose.

For anything in production I would recommend to err on the side of caution and double check both of the following before accessing the value:

Not doing so risks undefined behaviour when parsing JSON which doesn’t meet your code’s expectations. See “Assertions” below.

void access_object_with_elements(rapidjson::Value const &j)
{
    const char *const node_name = "object_with_elements";
    assert(j.HasMember(node_name));

    assert(j[node_name].IsObject());
    assert(j[node_name].ObjectEmpty() == false);
    assert(j[node_name].MemberCount() == 2);

    assert(j[node_name]["a"].IsNumber());
    assert(j[node_name]["a"] == 1);

    assert(j[node_name]["b"].IsNumber());
    assert(j[node_name]["b"] == 2);
}

Each use of [] in the above example incurs a lookup cost. So even if you limit yourself to a check against HasMember() then take a reference using [] you would incur one extra lookup cost than is strictly necessary.

The tutorial recommends obtaining an iterator to the member being inspected to reduce this performance cost. The following function demonstrates using iterators in this fashion, and should be considered a performance improvement over the example in access_object_with_elements() in terms of lookups. (No attempt at profiling this saving was made).

void access_object_using_iterator(rapidjson::Value const &d)
{
    rapidjson::Value::ConstMemberIterator const owe_itr =
        d.FindMember("object_with_elements");

    assert(owe_itr != d.MemberEnd() && owe_itr->value.IsObject());

    rapidjson::Value const &owe = owe_itr->value;
    assert(owe.MemberCount() == 2);

    rapidjson::Value::ConstMemberIterator member_itr;

    member_itr = owe.FindMember("a");
    assert(member_itr != owe.MemberEnd());
    assert(member_itr->value.IsNumber());
    assert(member_itr->value == 1);

    member_itr = owe.FindMember("b");
    assert(member_itr != owe.MemberEnd());
    assert(member_itr->value.IsNumber());
    assert(member_itr->value == 2);
}

Access Elements of an Array

The following demonstrates two different approaches to inspecting the elements of an array:

void access_array_with_elements(rapidjson::Value const &j)
{
    rapidjson::Value::ConstMemberIterator const awe_itr =
        j.FindMember("array_with_elements");

    assert(awe_itr != j.MemberEnd());
    assert(awe_itr->value.IsArray());
    assert(awe_itr->value.Empty() == false);
    assert(awe_itr->value.Size() == 3);

    rapidjson::Value::ConstArray const a = awe_itr->value.GetArray();

    // By index
    for (rapidjson::SizeType i = 0; i < a.Size(); ++i)
    {
        assert(a[i].IsString());
    }

    // By iterator
    for (rapidjson::Value::ConstValueIterator element_itr = a.Begin();
         element_itr != a.End();
         ++element_itr)
    {
        assert(element_itr->IsString());
    }
}

You will notice from the above example there are two different iterators in RapidJSON:

Access A String

The following function demonstrates how to access a string:

void access_string(rapidjson::Value const &j)
{
    rapidjson::Value::ConstMemberIterator const s_itr =
        j.FindMember("string");

    assert(s_itr != j.MemberEnd());
    assert(s_itr->value.IsString());

    assert(std::strcmp(s_itr->value.GetString(), "hello world") == 0);
}

Access A Boolean

The following function demonstrates how to access a Boolean:

void access_boolean(rapidjson::Value const &j)
{
    rapidjson::Value::ConstMemberIterator const f_itr =
        j.FindMember("boolean");

    assert(f_itr != j.MemberEnd());
    assert(f_itr->value.IsBool());
    assert(f_itr->value.IsTrue() == false);
    assert(f_itr->value.IsFalse());
    assert(f_itr->value == false);
}

Assertions

The following function incorrectly attempts to obtain a JSON string as an object:

void incorrectly_access_string_as_object(rapidjson::Value const &d)
{
    const char *const node_name = "string";

    assert(d[node_name].IsString());

#if 1// Invalid - assertion will trigger
    d[node_name].GetObject();
#endif
}

If this invalid function call were made on a debug build (NDEBUG not provided) you would see an assertion similar to the following be logged:

Assertion `IsObject()' failed.

From the RapidJSON tutorial:

For example, if a value is a string, it is invalid to call GetInt(). In debug mode it will fail on assertion. In release mode, the behavior is undefined.

Note that it is possible to define your own assertion handling by defining some macros. See the documentation for:

Version

This post was written while using RapidJSON at v1.1.0-725-g012be852.