Category Archives: In the News

Faculty Support of Open Data: An Interview with Sergei Pond

Headshot of Sergei Pond

This week is Open Access Week, a yearly international celebration that aims to increase awareness about open access. Most academic work is locked up behind a paywall, available only to those who are affiliated with a college or university. Open access scholarship is completely free to read and reuse.

Professor of Evolutionary Genomics Sergei Pond is one of the many Temple faculty members who support open research practices. Pond recently spoke with Librarian Sarah Jones to discuss his work and his thoughts on how open data can help with the current reproducibility crisis.

Tell us about your recent research on COVID-19. What role did open data play in your work? 

My group at Temple (Institute for Genomics and Evolutionary Medicine) is a computational biology group. We use sequence data to watch what the virus is doing and evaluate how certain intervention efforts are going. Sequence data have never been generated at a faster pace than during the COVID-19 outbreak. As of today, around 130,000 SARS-CoV-2 genomes are available. In March there were 500. The rate of accumulation is really remarkable.

We’ve done a lot of collaborative work to look at the early evolution of SARS-CoV-2. Viral genomes change all the time; the trick is to figure out which changes matter and which ones don’t. At this moment, there are some changes but none that appear to be particularly important. Once we start giving it something to work against, like large scale drugs and vaccines, then we’ll watch it.

Something the public may not appreciate is that you have to do a lot of tedious work to make sure that the data you’re analyzing makes sense. You have to clean it up and make sure that your tools run fast enough. One of the issues everyone has run across is the volume of data–typically you’re talking about hundreds or maybe thousands of sequences, but tens or hundreds of thousands brings it up to a different scale. One of the issues with these large datasets is that they’re so big and the techniques that you use tend to be fairly complicated, so it turns into this hard-to-interpret black box. We’re trying to design something that’s easy to understand.

You recently published an article on the lack of data sharing in COVID-19 research. What problems do you see this causing? Which open tools and practices would you like to see adopted? 

Ideally, what you would like to be able to access are the original files that came off the sequencer. Typically what you see is the final genome; it’s a product of many steps that translate these data from raw sequencing data to genomes. It’s been the bane of computational biology that it’s not very common to share the original data. More importantly, it is next to impossible to find sufficient detail about how people went about processing these data to generate the genome. So basically you receive a genome but you’re missing how it was assembled. This is what creates the crisis of reproducibility. You have to be able to trust the data that you’re putting into your analyses.

There’s absolutely no excuse with modern tool availability not to publish the entire chain. If you’re an experimental scientist and you don’t publish your lab protocol nobody will believe it; it has to be recreatible. But in computational analysis there’s no standard like this. There’s no expectation that you will release the data and the tools that you used to analyse these data.

Tell us about your work with the Galaxy Project. How does this platform encourage open practices? 

Galaxy is a computational framework for open data and democratizing data analysis. Every step from extracting raw data to doing comparative analysis can be done in Galaxy. I think the strongest aspect of it is the longstanding focus on the reproducibility and shareability of research. When you develop a process for doing something, you can publish it and share it. It will record which tools you used, which settings, and how they were connected to each other. Each step you can store and share, so when you publish your work you instantly release your entire workflow.

Are there any misconceptions that you would like to address regarding open data? What do you wish people knew about it? 

There are a few things that tend to slow down or prevent people from doing open data and sharing. One, the logistics of it: will you find the time to annotate and format everything correctly and submit it? That excuse is becoming harder to use because there are large entities that have databases and tools that allow you to do this as easily as possible.

The other issue is data ownership. If you release open data it will be good for science, it will be good for discovery, and it will enable other people to extract more information from it. But as a data producer, how do you get proper credit for it? As a scientist you get rewarded for publishing papers and bringing in grants, but for being a good citizen of the open community, it’s not there.

I want to mention the idea of privacy. Human genomic data is personal health information, which needs to be guarded and protected. Viral data are a little different, you can’t track them down to specific individuals. But nonetheless, that could be a concern. It definitely is a concern in the area of HIV because in many jurisdictions in the United States HIV transmission is still a felony. That’s changing, but it’s still there. You don’t want to have a potential disclosure of an infection route.

Is there anything else you wanted to share? 

I want to emphasize that SARS-CoV-2 genomics has been a unique effort when it comes to collaboration and open science. It’s not ideal and we can improve on it, but compared to previous outbreaks this is probably the most open environment that we’ve had. It’s obviously a necessity, considering how much damage this pandemic has already caused. A truly international, truly open effort is necessary.

Fortunately, a lot of it was set up prior to SARS-CoV-2. There are people that took the time and strategically thought about how they could accelerate all of the necessary steps when the next big pathogen came out, and we’re reaping the benefits now. That really is important and worth emphasizing. That would not have happened without planning.

Thank you Dr. Pond!

All About Impact Factors

“Impact” by Dru! is licensed under CC BY-NC 2.0.

This week, Clarivate Analytics released its annual Journal Citation Report, which includes new and updated Journal Impact Factors (JIF) for almost 12,000 academic journals. In case you’re not familiar, the JIF is based on the average number of times a journal’s articles were cited over a two year period.

Impact factors are a relatively recent phenomenon. The idea came about in the 1960s, when University of Pennsylvania linguist Eugene Garfield started compiling his Science Citation Index (now known as the Web of Science), and needed to decide which journals to include. He eventually published the numbers he had collected in a separate publication, called the Journal Citation Report (JCR), as a way for librarians to compare journals (JCR is now owned by Clarivate Analytics). Now, impact factors are so important that it is very difficult for new journals to attract submissions before they have one. And the number is being used not just to compare journals, but to assess scholars. JIF is the most prominent impact factor, but it is not the only one. In 2016, Elsevier launched CiteScore, which is based on citations from the past three years.

Academics have long taken issue with how impact factors are used to evaluate scholarship. They argue that administrators and even scholars themselves incorrectly believe that the higher the impact factor, the better the research. Many point out that publishing in a journal with a high impact factor does not mean that one’s own work will be highly cited. One recent study, for example, showed that 75% of articles receive fewer citations than the journal’s average number.

Critics also note that impact factors can be manipulated. Indeed, every year, Clarivate Analytics suspends journals who have tried to game the system. This year they suppressed the impact factors for 20 journals, including journals who cited themselves too often and journals who engaged in citation stacking. With citation stacking, authors are asked to cite papers from cooperating journals (which band together to form “citation cartels”). The 20 journals come from a number of different publishers, including major companies such as Elsevier and Taylor & Francis.

As a result of these criticisms, some journals and publishers have also started to emphasize article-level metrics or alternative metrics instead. Others, such as the open access publisher eLife, openly state on their website that they do not support the impact factor. eLife is one of thousands of organizations and individuals who have signed the San Francisco Declaration on Research Assessment (DORA), which advocates for research assessment measures that do not include impact factors. Another recent project, HuMetricsHSS, is trying to get academic departments, particularly those in the humanities and social sciences, to measure scholars by how much they embody five core values: collegiality, quality, equity, openness, and community. While these developments are promising, it seems unlikely that the journal impact factor will go away anytime soon.

What do you think about the use of impact factors to measure academic performance? Let us know in the comments.

What to Know About “Predatory” Publishers

predatorypublishers

“Little roar” by Becker1999 is licensed under CC BY 2.0.

UPDATE: Since this post originally appeared, Beall’s List has been taken down.

Recently, the term “predatory” publisher has become a buzzword among many in academia. “Predatory” publishers run online, open access (OA) journals that will accept almost any paper submitted. They offer little in terms of copy editing or peer review. Journal websites may include false information about impact factors, editorial board members, and other affiliations. “Predatory” publishers often spam authors via e-mail to encourage them to submit their work.

These publishers profit from this scheme by charging authors various publication fees. Authors are willing to pay such fees because of the “publish or perish” culture of academia. They are usually unaware that they are dealing with a “predatory” publisher, or may not become aware until their article has been published.

The term “predatory” publisher was coined by controversial librarian Jeffrey Beall in 2010. Beall currently maintains a list of suspected “predatory” publishers on his website. Because not all “predatory” publishers on Beall’s list are alike (and in fact, some may not be predatory at all), many scholarly communications experts prefer to use the terms “questionable” or “low-quality.”

In addition to Beall’s List, a number of high-profile stings have tried to expose the questionable practices of these publishers and their journals by submitting nonsense or significantly flawed papers. One Harvard medical researcher, for example, submitted an article to 37 questionable journals entitled “Cuckoo for Cocoa Puffs?: The Surgical and Neoplastic Role of Cacao Extract in Breakfast Cereals.” The actual text of the article was randomly generated. 17 journals accepted the paper, promising to publish it if he would pay the $500 fee. Of course, it’s important to note that this is not a problem limited to OA journals–traditional subscription journals have also been known to publish faked work. To see a few examples, check out the blog Retraction Watch, which monitors all of the retractions in scientific journals.

Whatever you want to call them, hundreds of “predatory” publishers do exist, and according to a 2015 study, the number is growing rapidly. Last Friday, the Federal Trade Commission (FTC) made it clear that they are paying attention to this phenomenon: they filed a complaint against well-known “predatory” publisher OMICS Group. OMICS Group publishes over 700 open-access journals in a wide variety of disciplines, from business and management to chemistry to political science. According to the FTC, OMICS Group is not upfront with scholars about the publication fees its journals charges. In addition, OMICS Group journals do not allow authors to withdraw their articles. The FTC also pointed out that a subsidiary of OMICS Group runs scam conferences where they advertise the appearance of academics who never agreed to participate.

So exactly how concerned should scholars be about this phenomenon? In general, “predatory” publishers are not a huge threat to most scholars, especially if you do your research before submitting your article to a journal or agreeing to serve on a journal editorial board. Asking your colleagues if they have heard of the journal before is a good first step. Be aware, however, that many OA journals are just starting out, so they may not have the same name recognition as top journals in the field that have been around for decades.

Second, check out the journal’s website. Do you recognize any of the scholars on the editorial board? If so, do they list their work for the journal on their own faculty profile page? Are any author fees clearly stated somewhere (if you are in the humanities, know that most OA humanities journals do not charge any publishing fees)? Remember: just because a journal charges a fee, does not make it predatory. Many reputable OA journals rely on article processing charges (APCs) to recoup their costs.

Finally, check out the Open Access Scholarly Publishers Association (OASPA) and the Directory of Open Access Journals (DOAJ) to see if your publisher or journal is listed. In order to be included in the DOAJ, applications are reviewed by four different people. And in May of this year, the DOAJ announced it was taking additional steps to make sure that the directory is a trustworthy source of information.

Still not sure if the journal you are interested in publishing in passes muster? Contact the Libraries for help.

Project to Watch: SocArXiv

socarxiv

In a recent post, we argued that preprints are having a moment. Here’s further proof: this week, the Center for Open Science and the University of Maryland launched a new repository for social science research, called SocArXiv (the name comes in part from the well-known preprint repository arXiv). Currently, there is a temporary home for the repository here, with a more robust platform coming in the near future. In addition to preprints, SocArXiv also accepts conference papers, working papers, datasets and code. The project is being led by Philip N. Cohen, a Professor of Sociology at the University of Maryland. The steering committee includes scholars, librarians, and open access advocates.

Interested in submitting? Just e-mail socarxiv-Preprint@osf.io from your primary e-mail address. Put the title of your work in the subject line, and the abstract in the body of your e-mail. Then attach the work as a PDF or Word file. Finally, hit send. Your scholarship should appear on the site shortly and you should be automatically registered for an Open Science Framework account. Use this account to go into the page for your work on the site and add any relevant tags. Just make sure that you have the rights to anything you post. If you’re not sure, check your publication agreement or search SHERPA/RoMEO, a database of publisher copyright and self-archiving policies. And remember: this method of submission is only temporary. Once the permanent SocArXiv platform is up and running we will update this post.

Some researchers may wonder why they should post their work to SocArXiv, when there are so many other options, including another open access repository, the Social Science Research Network (SSRN). SSRN was founded in 1994 by Wayne Marr, a professor of finance, and Michael Jensen, and professor of business administration. It includes scholarship from a range of disciplines, from accounting to economics to political science. The business model of SSRN has always been different than most other open access repositories. Unlike arXiV, which is based at Cornell University and funded by grants and library support, SSRN is a privately-held corporation. While all deposited papers are free for users to read, SSRN also offers paid content to users through its partnerships with other publishers (such as Wiley-Blackwell). In May of this year, a major change came to SSRN when the platform was bought by Elsevier, a large Dutch company that publishes some of the world’s top journals. Elsevier also owns the reference manager Mendeley. SSRN’s management claims that all the scholarship on the site will remain free. They also argue that Elsevier’s ownership will only make SSRN better, providing them with the resources they need to make much needed improvements in the design and functionality of the site. Many scholars, librarians, and other experts, however, are worried. They wonder what Elsevier will do with all the scholarly data it now owns, and how the company will try to monetize that data. Similar concerns have been raised about other popular scholarly sharing platforms, including Academia.edu and ResearchGate. Kevin Smith, the Dean of Libraries at the University of Kansas, has called this trend “the commodification of the professoriate.”  SocArXiv, then, offers a non-commercial alternative that puts scholars’ interests first.

The Past, Present, and Future of Preprints

arXiv

Preprints seem to be having a moment. Last week, the registration agency CrossRef announced that they will soon allow members to assign DOIs (digital object identifiers) to preprints, just as they do for published articles. In making this change, CrossRef is acknowledging that preprints are an important part of the scholarly publishing ecosystem. In addition, back in March, a group of biologists made it into the New York Times for advocating for the use of preprints in their own discipline. At the same time, many academics still don’t know much about preprints or why they matter.

In general, a preprint is a piece of scholarship that has not yet been peer reviewed (and thus, hasn’t been published in a scholarly journal). It is related to a postprint, which has been peer reviewed, but has not been properly formatted by the publisher. Confusingly, the term preprint is sometimes also used to describe a postprint. Preprints have a long history, but people have been trying to collect and distribute them in a more formal way since the 1940s. The first online archive for preprints, arXiv, was launched in 1991 by Paul A. Ginsparg, a physicist at Los Alamos National Laboratory (Ginsparg is now a professor at Cornell University). Ginsparg hoped that arXiv (originally called xxx.lanl.gov) would help “level the research playing field,” by granting anyone with an internet connection access to the latest scholarship in high-energy physics, for free. He also knew it would help researchers get their work out into the world faster than ever before. Almost twenty-five years later, arXiv hosts over 1 million preprints from disciplines including mathematics, computer science and statistics. As New York University Professor of Physics David Hogg noted in a recent Wired article, “When I give seminars, I give the arXiv numbers for my papers. Why? Because I know that my arXiv papers are available to any audience member, no matter what their institutional affiliation or library support.” Thanks in part to the success of arXiv, scholars in other disciplines are now considering making drafts of their work public, including those in the humanities. In CORE, the Modern Language Association’s new digital repository, 25% of the articles are preprints or postprints.

So, why should academics, particularly those outside of the sciences, care about preprints? These days, more and more scholars are sharing copies of their work online (see our recent post on Academia.edu). Since most scholars do not own the copyright to their work, however, they may not have permission from the publisher to do so. One way to get around this is by sharing a preprint. While the vast majority of publishers will not allow a scholar to make the final version of their article (also known as the publisher’s version/PDF) freely available, they often allow the sharing of a preprint and/or postprint through an institutional repository or a personal website. According to SHERPA/RoMEO, a database of journal policies, 79% of publishers formerly allow for some kind of self-archiving.

It’s important to point out that not everyone in the academy agrees that the posting of preprints is a good idea. Some scholars worry that if they share their ideas too early, they might get stolen. Others correctly note that a preprint is not a substitute for a peer-reviewed journal article (which remains the gold standard for getting tenure). Finally, there are more general concerns about sharing work before it has been thoroughly vetted or revised. However, one recent study compared over 9,000 preprints from arXiv to their final published versions. The authors ultimately found that there were very few differences between the two versions.

Have you shared a preprint of your work online before? Why or why not?

 

Google Books and Fair Use

GoogleBooksScanning

“Google Scanning @ AAEL” by Dave Carter is licensed under CC BY-NC-SA 2.0.

The big news in scholarly communication last week came from the Supreme Court, which declined to hear an appeal from the Authors Guild over Google Books. The Supreme Court’s decision puts an end to a legal battle that has been going on since 2005. This is great news for scholars, students, and the public, all of whom have come to rely on Google Books for their research. But the case is also significant because it reaffirms the doctrine of fair use. Fair use is an established part of U.S. copyright law, however, it’s not always clear what does or does not constitute fair use. Generally, courts rely on four factors to determine fair use: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market for of value of the copyrighted work. The Authors Guild tried to argue that Google’s actions are not protected under fair use, because Google is a for-profit company (as opposed to an educational institution), and because Google is scanning entire books (not just parts of books). However, the courts ultimately ruled that Google has to copy books in their entirely for the project to be useful. In addition, they ruled that Google’s use of the texts is transformative (another key indicator for fair use).

Google’s book scanning project began back in 2004. The company partnered with several major libraries, including Stanford University, the University of Michigan, the University of California, Harvard, Oxford University, Columbia University, Princeton University, and the New York Public Library. The libraries selected books from their collections (focusing primarily on books published before 1923), and then Google did the digitization work for free. Google did not ask permission from any authors before they started scanning the library books. In order to obtain new and recently published books, Google also signed contracts with publishers which allowed them to scan and show parts of in-copyright books. All books (whether from libraries or publishers) are made available through Google’s search engine, which enables users to look for a particular word or phrase across texts. For most books that are still under copyright, however, users can only see a “snippet” of text based on their search term.

In 2005, a group of publishers, along with the Authors Guild and several individual authors, sued Google for copyright infringement. Both publishers and authors worried that they would not be properly compensated for the use of their content. They also worried that hackers would be able to get digital copies of the books and share them widely. In the years that followed, these groups tried to work out a compromise with Google. But in 2011, a federal judge rejected a proposed settlement. Then, in 2013, a district court ruled that Google’s treatment of the books was transformative and that its actions constituted fair use. The Authors Guild then appealed this decision. In 2015, a Second Circuit Court of Appeals agreed that Google Books was legal. The appeal to the Supreme Court was the Authors Guild’s last chance.

According to one source, Google has now scanned around 30 million volumes. In addition, they currently have over 40 Library Partners. Although the scans are not always perfect, Google Books remains a rich, free resource for people around the world. It is just another example of why fair use is so important.

Save

Paying for Peer Review?

peerreview

“Peer Review” by AJCann is licensed under CC BY-SA 2.0.

A recent article in Times Higher Education reports that a new British publisher, Veruscript, plans to pay authors for their peer review work. Although paying for peer review is not a new idea, it is only recently that scholarly journals have begun to experiment with the practice.

Perhaps the most prominent mega-journal that pays for peer review is Collabra, which is published open access by the University of California Press. Launched in 2015, Collabra publishes scholarship on life and biomedical sciences, ecology and environmental sciences, and social and behavioral sciences. To cover the costs of making articles freely available, Collabra charges authors an article processing charge (or APC) of $875. Collabra then takes a portion of that money ($250) and places it in a fund to pay editors and reviewers. Reviewers are offered money no matter whether they accept or reject a manuscript. Reviewers can choose to take the money outright, donate it to the Collabra APC waiver fund, or donate it to their own intitution’s open access publishing fund. The amount is low enough that scholars are not motivated to review just because of the money, yet it’s a small way to reward the academic labor that goes into reviewing articles.

Still, the practice of paying for peer review remains controversial within the academy. Many academics feel that peer review is a community service that should not be monetized. Serving as a peer reviewer, they argue, is simply part of one’s job as a scholar. Others point out that there are ways to reward peer reviewers without actually paying them. The Molecular Ecologist, for example, publishes a yearly list of its best reviewers. The open access mega-journal PeerJ offers another kind of incentive for peer reviewers. Under their economic model, individuals pay a flat fee for membership. Membership allows authors to submit papers and preprints. However, in order for authors to keep their membership from lapsing, they must submit one “review” a year. This can be an informal review, such as a comment on an article, or a formally requested peer review (which is by invitation only). Finally, the for-profit company Publons showcases the work of peer reviewers by making it possible for scholars to create Publons-verified profiles in which they list all the journals they have reviewed for. Publons claims that their model will help scholars get credit for what is usually invisible work, as well as give them another way to demonstrate their subject expertise.

What do you think? Should reviewers be compensated for their work?

How Much Does it Cost to Produce a Scholarly Monograph?

universitypress

“Cambridge University Press” by Lezan is licensed under CC BY 2.0.

University presses have long played a crucial role in disseminating scholarship. Over time, however, sales of scholarly monographs have declined, while the cost of producing them has not. This has led to what many people refer to as a “crisis” in scholarly publishing.

While this “crisis” has been around for decades, only now, thanks in large part to the digital revolution, are we seeing university presses start to experiment with new business models. The University of California Press, for example, recently launched an imprint called Luminos where authors, rather than readers, help the press cover their costs. For authors who publish with Luminos, the UC Press charges a baseline fee of $15,000. UC Press and its library partners absorb some of that cost, but the author is expected to pay between $5,000 and $7,500. The thought is that the author will not necessarily pay this money out of pocket, but that they will be able to find financial support from their department, provost, or library. Indeed, over 50 university libraries in the United States now have some kind of Open Access Publishing Fund, which is designed to support authors publishing open access articles or books. A Luminos book goes through the same editorial process (including peer review), that all other UC Press books go through. Unlike a traditional monograph, however, once published, the Luminos book is made available open access, so that anyone can read or download a copy for free. A print version is also available for sale (which helps UC Press recoup more of its costs).

It is important to point out that although UC Press has set $15,000 as their fee, this does not mean that $15,000 is the full cost of publishing a book. The actual cost could be higher or lower. One difficulty presses are facing when it comes to changing their business models is that it’s hard to say just how much it costs to produce a scholarly monograph.

Recently, the research group Ithaka S+R tried to find out. They interviewed 20 university presses from across the United States. These presses ranged in size from small (annual revenue under $1.5 million, with only about 10 employees) to large (annual revenue over $6 million, with almost 80 employees). All together, they analyzed 382 titles (all published in 2014). Ithaka S+R estimated costs at every stage of a book’s production, from acquisition to editorial to copyediting to design to marketing. What they found was that the cost of publishing scholarly monographs ranged widely, from $15,140 to $129,909. The overall average full cost of a book was $39,892. Staff time was, unsurprisingly, the biggest cost associated with producing a book. In addition, acquisitions work was the most expensive activity. Ithaka S+R also sought to understand what makes certain books more expensive to produce. They found that longer books, as well as books with illustrations do cost more. An author’s first book, however, was not more expensive than later books.

Despite Ithaka S+R’s well-researched report, more work still needs to be done on this important issue, particularly into why the costs vary so widely. In addition, presses are just beginning to publish digital projects, and the costs of producing this type of scholarship are largely unknown.