Parsing Nested JSON String to CSV in Java
This is my first blog and I want to share my learnings and some observations/tips/challenges faced to dive into this topic.
What is JSON?
JSON: It is a data format called JavaScript Object Notation, which is a standard text-based format storing and transporting data usually from server to webpage.
It represents structured data in 2 format:
Object: an unordered collection of 0 or more key-value pairs separated by commas and represented inside { } braces
Array: an ordered collection of 0 or more values and represented inside [ ] brackets
Parsing JSON text:
When we parse this JSON text, it deserializes JSON string into objects may be map, dictionary, custom class object etc.
POJO(plain old java object)
It is a class with only private fields and public getter/setter methods.
How to parse in Java??
There are various libraries available to parse JSON and some of them which are highly popular :
1. org.json
It is json-java library to convert json string to object using org.json.JSONObject class
Pros:
Just to get an attribute
Cons:
Its proof-of-concept with inefficient implementation No object de-serialization available
2.Gson
It is google library to convert JSON string to java object and vice-versa using either of JsonParser class or Gson class
Pros:
Convert JSON to POJO and vice-versa Need to create model with same JSON structure and is automatically filled where we can define custom field name using @SerializedName
Cons:
Doesn't support dynamic filtering of fields on levels other than root
3. Jackson
It is a library which support the use of JSON tree representation called JsonNode Convert JSON to POJO and vice-versa Need to create model with same JSON structure and is automatically filled where we can define custom field name using @JsonProperty
ObjectMapper package is available to do this binding of JSON to Object of com.fasterxml.jackson.JsonNode which is object representation of json string. It is used to extract value.
Overall, based on Benchmark, Jackson is 3-4 times better than org.json and twice better than Gson
Sample Json to parse
Let's call it sample.log
{
"post/time" : 152,
"post/bytes" : 6,
"success" : true,
"context" : {
"postId" : "c8f0ef7",
"nativePostIds" : "d1ebc",
"siteId" : {
" page-hits" : 123,
" name" : "abc.com"
}
"users" : ["john","celia"]
}
}
Code
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.IOException;
public class ExtractFromJson {
public static class Post
{
@JsonProperty("post/time")
private int postDuration;
@JsonProperty("post/bytes")
private int postBytes;
private String success;
private Id context;
private List<String> users;
public Post(){}
//getter methods
}
public static class Id
{
private String postId;
private Site siteId;
public Id(){}
//getter methods
}
public static class Site
{
private int page-hits;
private String name;
public Site(){}
//getter methods
}
public static void main(String args[]) throws IOException {
String json = "{\"posts/time\":152,\"post/bytes\":6,\"success\":true,\"context\":{\"postId\":\"c8f0ef7\",\"nativePostIds\":\"d1ebc\",{\"siteId\":{\"page-hits\":123,\"name\":\"abc.com\"}},\"users\":[\"john\",\"celia\"]}";
ObjectMapper mapper =
new ObjectMapper().configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERIES,false);
Post mappedValue = mapper.readValue(json,Post.class);
System.out.println(mappedValue.getPostDuration());
// **This is how we can handle nested fields**
System.out.println(mappedValue.getContext().getSite().getPageHits());
Jackson Library is used
I used Intellij and used Maven dependency for Jackson
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.5.3</version>
<scope>compile</scope>
</dependency>
In case, we want to write results to CSV
FileWriter writer = new FileWriter("output/output.csv");
writer.write("PostDuration,PostBytes,PostId,Success,Page-Hits\n");
List<String> outputData = new ArrayList<>();
outputData.add(mappedValue.getPostDuration());
outputData.add(mappedValue.getPostBytes());
outputData.add(mappedValue.getContext().getPostId());
outputData.add(mappedValue.getSuccess());
outputData.add(mappedValue.getContext().getSite().getPageHits());
String collect = outputData.stream().collect(Collectors.joining(","));
writer.write(collect);
writer.write("\n");
writer.close();
Output CSV structure
PostDuration,PostBytes,PostId,Success,Page-Hits
152 , 6 , c8f0ef7, true, 123
43 , 23, sdrinttq , true, 98
Exceptions faced
- Cannot resolve plugin org.apache.maven.plugins:maven-surefire-plu.. name
P.S. We face this exception during build time
Solution:
```
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.12.4</version>
<configuration>
<testFailureIgnore>true</testFailureIgnore>
</configuration>
</plugin>
</plugins>
</build>
-
Exception in thread "main" com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException:
Unrecognized field "field_name" (class <class_name>), not marked as ignorable (one known property: "context"])
Solution:
Define the ObjectMapper object like this to avoid declaring unwanted fields
ObjectMapper mapper =
new ObjectMapper().configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERIES,false);
```
IntelliJ - Error:java: release version 5 not supported
Solution: this link is helpful IntellijSolution
Tips when creating Model class!!!
- Declare it static if writing inner classes
- when you see a JSON structure, create a class for every { } structure as I have explained in above code
- while considering which library to use, we should consider performance ( This is the benchmark used to compare the performance of deserializer https://github.com/fabienrenaud/java-json-benchmark, ease of use and dependencies involved
- JSON parser = new JSONParser() has been depricated
Thanks for reading !!!