By jay winkler and Rebecca Y. Bayeck, Ph.D
One of Wikidata’s greatest strengths is its large range of properties that can be applied to Wikidata items. Our project examined what information was available in Wikidata items for Black artists, especially those from Philadelphia. When designing our initial query, we tried different properties (properties being Wikidata’s term for the relationships described in its RDF triples) to capture, throwing variables such as the artists’ birthplaces, educational institutions, sexual orientation and more, into our query to see what gave us useful data. Given the many options, it’s no surprise that when it came time to create our final visualizations, we went in very different directions.
In our previous blogs, we discussed the initial project design, looked at Wikidata edits, and explained how to query Wikidata. All of these steps helped to develop research questions that could be used to create visualizations.We took the same basic approach here: we each used SPARQL to create a query that would give us the initial data. We used the Pandas Python library to create tables and plots that would display the data and allow us to manipulate it. Using DataFrame, the structure Pandas uses to create and display tables, made it easier to see the data and track how it was being manipulated.The areas where our approaches diverge show the versatility of Wikidata. Rebecca focused on using information available within Wikidata to attempt to track the locations of artists’ exhibitions using what others (and the project team) have contributed to tell something about the artists. By focusing on changes in the ethnic group property (P172) use over time, jay also looked to examine the biases underlying Wikidata itself, including its user habits and policies.
Some questions arose as we completed our analysis: what are the goals of the ethnic group property, and how do its use standards complicate our data gathering? What are we really measuring when we measure the use of African American as a tag? We used Google Colab notebooks to gather our data and create our visualizations. The notebooks are available in Github as well. Other researchers are welcome to build on the foundations of our code for their own use.
Philadelphia Black Artist Exhibition Cities
As previously stated, our queries led us to explore Philadelphia Black art exhibitions. In the process of enhancing Black artists’ records in Wikidata (see Designing a Wikidata Project), we became interested in these artists’ nationwide reputations. In other words, we wanted to know the cities where these artists exhibited, which implied in our analysis of the data that they were known beyond their city of residence.To address this question, we ran different queries. For instance, to address Philadelphia Black artists’ reputations, we focused on the location of their exhibitions. In the arts, exhibitions are a means for artists to show their work, and they can be an indicator of an artist’s popularity, especially when these exhibitions take place in various cities. Our query yielded 33 names of artists with exhibition histories.
The data examined show that Black artists residing in Philadelphia exhibited only in their city of residence. This finding may tend to indicate that Black artists from Philadelphia have difficulties showing their art in other cities. Yet, the limited data (33 artists, and only 10 with a recorded exhibition history in Wikidata) may also point to the lack of data on these artists in Wikidata. This finding is also a call to be intentional about Wikidata editing. Our project, as discussed below, is a step toward increasing BIPOC representation in Wikidata.
When running our initial SPARQL queries that helped us form our research questions, we typically restricted our searches to African American artists. However, we were also interested in comparing African American artists to Wikidata’s total population of artists. We removed that parameter and ran a query that would give information about all artists. It was then that we noticed that the vast majority of artists simply did not have any sort of ethnicity tagging.
African American was by far the most common ethnic group tag in our original Philadelphia sample, and the same was true as the query was run again for a number of other American cities. What was most interesting was that the vast majority of artists in every city were untagged. Further investigation showed that this is a result of the way Wikidata’s requirements for the ethnic group property work; there is supposed to be a high bar for usage with a source required.
We were interested in whether or not there were any noticeable changes in the way the ethnic group property was used over time. One result of the high bar for usage is that whiteness is treated as a default on the platform. It became relevant to track whether or not we could see the representation of BIPOC artists increase or decrease when compared to untagged artists. We wanted to sort all the artists by the date they were added to Wikidata, and compile a running percentage of what percent of the Wikidata collection each ethnic group tag represented as more artists were added. Writing a query that gathered the necessary biographical information for every artist with a United States birthplace was straightforward, but gathering the creation dates of the various artists’ Wikidata items turned out to be much more difficult.
The Wikidata Query Service will not simply return the creation date of an item, which means we needed to turn to Wikidata’s API. We started by using SPARQL to gather a list of all artists with American birthplaces. We then iterated through that list using the Wikidata API to gather the creation date of each item. Finally, for each item, we calculated the percentage of Wikidata’s collection represented by each ethnic group at the time the given item was added. There were many intermediate steps of data cleaning and wrangling that can be looked at in the Google Colab notebook.
While our research was focused on the contrast between tagging of Black artists and untagged entries, other ethnic groups were present in the dataset. In the visualizations, other ethnic group tags are combined in a single “Other” category. While most non-African American ethnic groups had too few entries to provide useful data, aggregating the data provides insight on whether or not the ethnic group property is being used at all.
While we hoped that the data would show change over time, it remained remarkably consistent after some early fluctuation. By 2014, about 6% of all artists on Wikidata were tagged using the African American ethnic group statement. That number stayed within a percentage point of 6% from that point on. For context, around 13% of Americans are African American. Around 89% of all artists had no ethnicity statement assigned. Each individual non-African American ethnic group had extremely small numbers, with most below 10 total uses.
The city level data shows slightly more variation, but most cities we explored individually had numbers fairly close to the national data. Individual city-level charts can be seen by running the Google Colab notebook, but this chart gives a sense of how closely the individual city level data tracks with the national data. All cities are within a few percentage points of each other, and we can see little change throughout time, with one notable exception.
The outlier was Philadelphia. Philadelphia always had unusually high representation of African American artists on Wikidata, with the number steadily rising throughout the graph. You can see a tiny spike when our project begins, with it rising from 12% to 13% in a little under a month.
Conclusions
While an easy conclusion would be that African Americans or any other given ethnic group are underrepresented in Wikidata, it cannot be argued that this data is sufficiently conclusive to prove that. What the data does show is the difficulty of capturing ethnic information in Wikidata at all. The ethnic group property has very stringent standards for use, and editors frequently remove unsourced statements. Finding a definitive source of a person’s ethnic group is often quite difficult, and we can see that difficulty in the infrequency of use of the property.
Further study would be required to determine if users are assuming that those without an ethnic group statement are white, but I believe the data shows that users do not feel white subjects require an ethnic group statement. To fix the problem, it is not enough to merely tag more people as white (which, as previously mentioned, is often difficult to source). One potential start is a proposed property for nationality, which when used in tandem with the ethnic group property could allow us to capture more specific information that is currently lost when tagging someone as white or African American. Adding more qualifiers to the ethnic group property may also allow for capturing more complicated information. Further research is required on how other linked data projects handle this issue.
Addressing the implicit bias of Wikidata’s editors that leads to underrepresentation is a parallel project to any sort of policy change, and projects like ours are a start. There is a body of literature on underrepresentation on Wikipedia, but no equivalent studies could be found on Wikidata. We don’t think solutions will come easy here, but Wikidata’s structure will continue to allow us to track how editor usage of this property changes while discussions advance.
Our project is an attempt to address representation of Black artists in Wikidata, yet, it also highlights the need to research and provide complete information on Black artists, as well as all artists. The scant information about these artists’ exhibitions and the default to whiteness for any other artists demonstrate the need for a revision of Wikidata policies and editing practices.