Skip to content

Loretta C. Duckworth Scholars Studio

⠀

Menu
  • Scholars Studio Blog
    • Digital Methods
      • coding
      • critical making
      • data visualization
      • digital pedagogy
      • immersive technology (AR/VR)
      • mapping
      • textual analysis
      • web scraping
    • Disciplinary Fields
      • Anthropology
      • Archaeology
      • Architecture
      • Art History
      • Business
      • Computer Science
      • Critical Digital Studies
      • Cultural Studies
      • Dance
      • Economics
      • Education
      • Environmental Studies
      • Film Studies
      • Gaming Studies
      • Geography
      • History
      • Information Science
      • Linguistics
      • Literary Studies
      • Marketing
      • Media and Communication Studies
      • Music Studies
      • Political Science
      • Psychology
      • Public Health
      • Sculpture
      • Sociology
      • Urban Studies
      • Visual Art
    • Cultural Analytics Practicum Blogposts
  • Current Staff
  • Newsletter
  • About
    • Games Group 
Menu

Processing JSON Data With jq

Posted on September 21, 2017December 1, 2017 by Luling Huang

By Luling Huang

When we collect data from the Web through API (e.g., Twitter), we usually receive data in JSON (JavaScript Object Notation). How do we sift out unwanted information and transform JSON into the tabular formats many data analysis programs recognize (e.g., CSV; Comma-Separated Values)?

This post introduces a powerful command line tool called jq for processing json data. It works on multiple platforms (see more for installation here). The primary reason to use jq is that no coding is involved. Although Python’s “json” package is easy to work with, jq can get what we want even with just one line of command.

The data structure of JSON is similar to Python’s dictionary (with key-value pairs, in JSON they’re called name-value pairs). This is why Python’s “json” package is very handy if we have already stored some data in Python’s dictionary. But let’s just assume no Python is involved and we just get a JSON file from somewhere.

In OS X’s Terminal (it should work similarly on Windows), we can display the structure of a JSON file called “test.json” with this command (remember to navigate to the directory that stores the file first):

$ jq '.' test.json

 

“jq” simply invokes jq. “.” is the basic jq operator to filter information. Here the JSON file is just reproduced without filter. Below is a small portion of a tweet’s information from a Twitter data set (collected through twarc). We can interpret the data structure from a CSV perspective: in one row (for each tweet), we have 6 variables (“contributors,” “truncated,” “is_quote_status,” etc.) as column names, and we have 6 corresponding values for each variable for that tweet (“null,” “false,” “false,” etc.).

json data example

Because Twitter API gives us a lot metadata information that may not be relevant, our next task is to find the information we want to use for data analysis. Let’s say “id” is what we want to keep.

$ jq '.id' test.json

 

The above command will print out all the values under the attribute name of “id.” To see how many tweets we have collected, we can do this:

$ jq '.id' test.json | wc -l

 

“wc” is for word count and “-l” is for counting number of lines in output (this is a built-in command in Terminal, not part of jq). And we use “|” to “pipe” the result of filtering into the function for counting.

From inspecting the data, we know that tweet content is stored under the attribute name of “full_text.” Let’s extract this information together with id, and put the filtering result in an JSON array enclosed by [ ] for each tweet.

Note that in the original JSON file, each tweet is enclosed by { } (i.e., a JSON object with name-value pairs). Because we want to save the output to CSV, name-value pairs are no longer needed. Instead, we need arrays to represent rows in CSV. In other words, the hierarchical JSON structure is flattened.

The procedure can be illustrated with a simple example. This is what we begin with (there are three tweets each of which is a JSON object and each object has two name-value pairs):

this is a simple example of 3 json objects

Then we extract the information from “id” and “full_text” and put it into three arrays each of which is for one tweet:

this is a simple example of three json arrays

If we write each array as one row, then we have the CSV output (if necessary, column names can be added easily later):

By using jq, we can achieve this with:

$ jq -r '[.id, .full_text] | @csv' test.json > test.csv

 

“[.id, .full_text]” means we create a new JSON array and put the filtered information in it. Then we pipe the array to “@csv,” which formats the array to CSV. “-r” is for “raw output,” which tells jq to treat the CSV formatted result as plain text.  “> test.csv” designates the output file.

Now we have turned a JSON file into a two-column CSV and it’s ready to be put into data analysis programs.

For more information on JSON syntax, here is the official introduction page. Also, there are many other features in jq and here is the manual.

 

1 thought on “Processing JSON Data With jq”

  1. bola389 says:
    January 3, 2020 at 11:59 pm

    Awesome! Its truly awesome paragraph, I have got much clear idea about from
    this post.

Leave a Reply

You must be logged in to post a comment.

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025

Tags

3D modeling 3D printing arduino augmented reality banned books coding corpus building critical making Cultural Heritage data cleaning data visualization Digital Preservation digital reconstruction digital scholarship film editing game design games gephi human subject research linked open data machine learning makerspace makerspace residency mapping network analysis oculus rift omeka OpenRefine Photogrammetry Python QGIS R SketchUp stylometry text analysis text mining textual analysis top news twitter video analysis virtual reality visual analysis voyant web scraping webscraping

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

Blog Tags

3D modeling (11) 3D printing (14) arduino (8) augmented reality (5) banned books (3) coding (12) corpus building (4) critical making (7) Cultural Heritage (11) data cleaning (4) data visualization (11) Digital Preservation (3) digital reconstruction (9) digital scholarship (12) film editing (3) game design (3) games (6) gephi (3) human subject research (3) linked open data (4) machine learning (6) makerspace (8) makerspace residency (4) mapping (30) network analysis (17) oculus rift (8) omeka (3) OpenRefine (4) Photogrammetry (5) Python (8) QGIS (10) R (9) SketchUp (4) stylometry (8) text analysis (10) text mining (4) textual analysis (32) top news (102) twitter (5) video analysis (4) virtual reality (17) visual analysis (5) voyant (4) web scraping (16) webscraping (3)

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025
  • From Theory to Practice: Weaving in Response to the Grid in the Global Context March 26, 2025
  • Visiting a Land of Twilight February 24, 2025

Archives

©2025 Loretta C. Duckworth Scholars Studio | Design: Newspaperly WordPress Theme
Menu
  • Scholars Studio Blog
    • Digital Methods
      • coding
      • critical making
      • data visualization
      • digital pedagogy
      • immersive technology (AR/VR)
      • mapping
      • textual analysis
      • web scraping
    • Disciplinary Fields
      • Anthropology
      • Archaeology
      • Architecture
      • Art History
      • Business
      • Computer Science
      • Critical Digital Studies
      • Cultural Studies
      • Dance
      • Economics
      • Education
      • Environmental Studies
      • Film Studies
      • Gaming Studies
      • Geography
      • History
      • Information Science
      • Linguistics
      • Literary Studies
      • Marketing
      • Media and Communication Studies
      • Music Studies
      • Political Science
      • Psychology
      • Public Health
      • Sculpture
      • Sociology
      • Urban Studies
      • Visual Art
    • Cultural Analytics Practicum Blogposts
  • Current Staff
  • Newsletter
  • About
    • Games Group