Can automated knowledge summaries aid scholarly communication?Emma Warren-Jones
Cactus Communications assessed Scholarcy’s knowledge extraction and summarisation capabilities in a recent pilot.
The product and strategy team at Cactus Communications wanted to analyze the accuracy and comprehensiveness of Scholarcy’s Highlights and Keywords, in the context of the full article, to see how useful these features could be in promoting published work and increasing reader engagement with individual papers.
At the end of 2019, Cactus processed over 125 academic papers using the Scholarly Highlights API across each of the following domains: Engineering, Humanities, Life Sciences, Medicine & Surgery, Physical Sciences. A team of subject matter experts (SMEs) then assessed Scholarcy’s output for each of the papers and scored the technology based on relevance, salience and consistency of our AI generated highlights, summaries, keywords and headlines in relation to the original article.
Example output of Scholarcy Highlights API
Our expectation was that Scholarcy would work best on STEM papers, for three reasons:
- STEM papers typically have a clearly defined ‘IMRaD’ structure and writing style which makes them more conducive to automated processing.
- Papers in this domain tend to discuss more clearly delineated facts and findings – which extractive models can be trained to identify – whereas discursive, rhetorical texts require a different approach.
- Scholarcy’s models were fine-tuned on a corpus of around 10,000 documents that consisted of STEM and computer science papers/book chapters, plus government reports. The corpus was weighted heavily towards open-access STEM articles.
Cactus SMEs were sent the highlights, summaries and keywords generated by Scholarcy for each of the test papers in their subject area. Using this data, they were asked to score Scholarcy between 1 and 10 (with 1 being very poor and 10 being very strong) on each of the following:
- How accurate do you think Scholarcy’s single sentence summary was?
- How relevant did you find Scholarcy’s extracted keywords to the paper overall?
- How well did Scholarcy’s Highlights capture the essence of the paper?
- Were the highlights consistent with the detail of the manuscript?
- Were the individual sentences of the highlights well written and grammatical?
- Did the phrases and sentences in the highlights fit together and make sense collectively?
- How well did the longer structured summary capture the key sections of the manuscript?
- Did the phrases and sentences in the longer summary fit together and make sense collectively?
From a single-blind selection of 125 papers across five domains, Scholarcy scored an average of 8/10 across all features (highlights, summaries, keywords and headlines) for Medical and Physical Science articles.
Comments on the quality of some of Scholarcy’s structured summaries in comparison to the original papers included:
“The [Scholarcy] summary captured all of the major sections and only omitted a few key details that were included in the abstract.”
“The author did not provide specific findings in the abstract that were included in the structured summary.”
“The structured summary captured many of the important sections of the manuscript.”
“The structured summary contained nearly all the relevant information and key results. The paper’s original abstract seemed longer and contained unnecessary information.”
In relation to Scholarcy’s highlights, SMEs said that these were ‘well written’ and ‘offered a pretty good summary of the paper’.
As expected, humanities papers were rated lower as their content is generally not as amenable to extractive summarisation. Initially, the low scores for life sciences papers were a surprise, as we would have expected them to score similarly to the medicine and surgery papers. This may be related to the distinction between basic and applied research: life science papers tend to focus on exploratory, experimental methods, which may result in a number of findings, whereas applied science papers tend to test a single hypothesis.
The lower scores for engineering papers may be the result of these papers being more mathematics-focused, the results of which cannot as easily be summarised automatically.
Inter-rater agreement was more consistent for the life science, medicine/surgery and physical science papers. For engineering and humanities, there was a wider range of scores for how well the Scholarcy headline (3.8 – 7.5), highlights (4.9 – 7.2) and keywords (5.8 – 8.2) captured the essence of the paper.
To improve performance for humanities and life sciences, we will be building new summarisation models. Summarisation for humanities may require an entirely different approach that learns to attend to rhetorical structure, protagonists, and narrative event chains.
Extracted keywords scored highest
Across all subject domains, Scholarcy’s keywords were considered by editors to be the most valuable output from the manuscripts, scoring an average of 7/10.
In the highest scoring subjects, the aspects that Scholarcy performed best at, according to Cactus’ Subject Matter Experts, were:
- Consistency of Scholarcy Highlights with details in the manuscript.
- Individual sentences in Scholarcy Highlights were considered to be well written and grammatical.
- Phrases and sentences in Scholarcy’s longer summary fit together and made sense collectively.
Scholarcy output from two individual papers
These are two examples of headline summaries and keyword listings generated by Scholarcy from manuscripts ranging in length from 20-120 pages.
Paper 1: A study of the renewable energy sector in Asia
“This study examines how the scale of investment differs by applying a variety of technologies for each alternative and renewable energy source in the global market and how the objective can be achieved more cost-effectively.”
Capital Expenditures global market municipal solid waste State Oil Fund of Azerbaijan annual growth rate renewable energy sources clean development mechanism International Energy Agency concentrated solar power installed capacity
Paper 2: A study of extramammary Paget Disease
“The present study demonstrated that HER2 protein overexpression and gene amplification were detected in 38% and 19% of samples, respectively, in the metastatic sites of Extramammary Paget disease with regional lymph node metastasis”.
The pilot with Cactus has underlined feedback we’ve received from researchers over the past 18 months that Scholarcy is particularly effective at parsing, summarizing and extracting information from STEM articles. It also confirmed that a valuable feature for editors reading and assessing research is a list of the most relevant keywords associated with the paper, along with a set of key highlights summarizing the main findings of studies. As a result of this exercise, we’ve enhanced Scholarcy’s keyword algorithm, adding relevance ranking scores to keywords in our API. This means that we now show only the top 10 most relevant keywords for any published article or manuscript.
The evaluation has also triggered further research into ways we can improve Scholarcy’s summarisation engine to work more effectively with research in Arts & Humanities and Social Sciences.
Cactus has recently integrated Scholarcy’s highlights technology with its new COVID-19 research platform, to help users to quickly assess the significance of papers for their work.