Parsing Nested JSON String to CSV in Java

parse.png

This is my first blog and I want to share my learnings and some observations/tips/challenges faced to dive into this topic.

What is JSON?

JSON: It is a data format called JavaScript Object Notation, which is a standard text-based format storing and transporting data usually from server to webpage.

It represents structured data in 2 format:

  1. Object: an unordered collection of 0 or more key-value pairs separated by commas and represented inside { } braces

  2. Array: an ordered collection of 0 or more values and represented inside [ ] brackets

Parsing JSON text:

When we parse this JSON text, it deserializes JSON string into objects may be map, dictionary, custom class object etc.

POJO(plain old java object)

It is a class with only private fields and public getter/setter methods.

How to parse in Java??

There are various libraries available to parse JSON and some of them which are highly popular :

1. org.json

It is json-java library to convert json string to object using org.json.JSONObject class

Pros:

Just to get an attribute

Cons:

Its proof-of-concept with inefficient implementation No object de-serialization available

2.Gson

It is google library to convert JSON string to java object and vice-versa using either of JsonParser class or Gson class

Pros:

Convert JSON to POJO and vice-versa Need to create model with same JSON structure and is automatically filled where we can define custom field name using @SerializedName

Cons:

Doesn't support dynamic filtering of fields on levels other than root

3. Jackson

It is a library which support the use of JSON tree representation called JsonNode Convert JSON to POJO and vice-versa Need to create model with same JSON structure and is automatically filled where we can define custom field name using @JsonProperty

ObjectMapper package is available to do this binding of JSON to Object of com.fasterxml.jackson.JsonNode which is object representation of json string. It is used to extract value.

Overall, based on Benchmark, Jackson is 3-4 times better than org.json and twice better than Gson

Sample Json to parse

Let's call it sample.log


  {
     "post/time" : 152,
     "post/bytes" : 6,
     "success" : true,
     "context" : {
         "postId" : "c8f0ef7",
         "nativePostIds" : "d1ebc",
         "siteId" : {
              " page-hits" : 123,
              " name" : "abc.com"
              }
     "users" : ["john","celia"]
      }
}

Code

import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.IOException;
public class ExtractFromJson {

    public static class Post
    {
        @JsonProperty("post/time")
        private int postDuration;
        @JsonProperty("post/bytes")
        private int postBytes;
        private String success;
        private Id context;
        private List<String> users; 
        public Post(){}
        //getter methods
    }
    public static class Id
    {
        private String postId;
        private Site siteId; 
        public Id(){}
        //getter methods 
   }
   public static class Site
   {
        private int page-hits;
        private String name;
        public Site(){}
        //getter methods
   }

    public static void  main(String args[]) throws IOException {

        String json = "{\"posts/time\":152,\"post/bytes\":6,\"success\":true,\"context\":{\"postId\":\"c8f0ef7\",\"nativePostIds\":\"d1ebc\",{\"siteId\":{\"page-hits\":123,\"name\":\"abc.com\"}},\"users\":[\"john\",\"celia\"]}";

        ObjectMapper mapper = 
       new ObjectMapper().configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERIES,false);

        Post mappedValue = mapper.readValue(json,Post.class);
        System.out.println(mappedValue.getPostDuration()); 

        // **This is how we can handle nested fields**
        System.out.println(mappedValue.getContext().getSite().getPageHits());

Jackson Library is used

I used Intellij and used Maven dependency for Jackson

       <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.5.3</version>
            <scope>compile</scope>
        </dependency>

In case, we want to write results to CSV

        FileWriter writer = new FileWriter("output/output.csv");
        writer.write("PostDuration,PostBytes,PostId,Success,Page-Hits\n");
        List<String> outputData = new ArrayList<>();
        outputData.add(mappedValue.getPostDuration()); 
        outputData.add(mappedValue.getPostBytes());
        outputData.add(mappedValue.getContext().getPostId());
        outputData.add(mappedValue.getSuccess());
        outputData.add(mappedValue.getContext().getSite().getPageHits());
        String collect = outputData.stream().collect(Collectors.joining(","));
        writer.write(collect);
        writer.write("\n");
        writer.close();

Output CSV structure

      PostDuration,PostBytes,PostId,Success,Page-Hits
      152 , 6 , c8f0ef7, true, 123
       43 , 23, sdrinttq , true, 98

Exceptions faced

P.S. We face this exception during build time

 Solution:

 ```

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>2.12.4</version>
            <configuration>
                <testFailureIgnore>true</testFailureIgnore>
            </configuration>
        </plugin>
    </plugins>
</build>


- 
Exception in thread "main" com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: 
Unrecognized field "field_name" (class <class_name>), not marked as ignorable (one known property: "context"])

     Solution:

    Define the ObjectMapper object like this to avoid declaring unwanted fields
ObjectMapper mapper =
new ObjectMapper().configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERIES,false);  

``` 
  • IntelliJ - Error:java: release version 5 not supported

    Solution: this link is helpful IntellijSolution

Tips when creating Model class!!!

  1. Declare it static if writing inner classes
  2. when you see a JSON structure, create a class for every { } structure as I have explained in above code
  3. while considering which library to use, we should consider performance ( This is the benchmark used to compare the performance of deserializer https://github.com/fabienrenaud/java-json-benchmark, ease of use and dependencies involved
  4. JSON parser = new JSONParser() has been depricated

Thanks for reading !!!