Thursday, June 15, 2023

Doing literature reviews the smart way

Despite literature surveys being a key component of research, strategies for reviewing the scientific literature and identifying promising avenues of research are rarely included in graduate student coursework. This means that students may be unaware of more powerful tools that are available.

It is useful to have a tiered search strategy, starting with resources aimed at a broad audience, for example technical magazines such as Optics & Photonics News, to identify interesting or promising directions to study in more detail. While wikipedia is a popular first choice, peer-reviewed alternatives such as Scholarpedia provide more reliable and trustworthy articles written by known experts.

Google Scholar is perhaps the most popular scholarly search engine, but its limitations mean it is most useful for exploring papers on highly specific lines of research, mainly by following citation trains and highly-cited papers. Subscription-based search engines such as Web of Science are usually available under university subscriptions and give much more powerful tools for exploring a research area and seeing the bigger picture, such as the ability to filter search results by journal or author affiliations and visualise how publication trends are evolving over time using citation reports

Thanks to covid, many academic talks can now be viewed online. These are a great alternative to reading the papers themselves, particularly because the speaker may reveal insights that didn't end up in the journal article. One should keep in mind differences between workshops and larger conferences - target audience, breadth and depth of individual talks and the programme as a whole, and sometimes the candour of the speakers, particularly if the talk will be made available online. This means that in-person conference attendance is still highly valuable, because speakers may be more willing to share unpublished work and future research ideas during smaller more informal discussions. Talking to the right person can save hours of time figuring out what the key references are!

The volume of publications in an area may shape your research strategy. If a given keyword has hundreds or thousands of articles coming out each year, it's usually a sign that you need to narrow your focus to find a niche in which you can shine. Publications often follow a hype cycle, that is, an initial surge of interest leading to a transient peak in activity, followed by a more stable plateau as the field matures. Sometimes a line ends up being infeasible, leading to interest dying off before such a plateau can form.

It is important to emphasize the number of publications in an area should not be used to judge whether a field is worthwhile to study. For example, one researcher might see a booming field and be put off, desiring to work in a smaller area with a better potential for growth. A short peak of activity followed by little interest may suggest a research line has a difficult problem that nobody knows how to solve, offering an opportunity for you to make your mark.

Does artificial intelligence have a place in reviewing the literature and deciding on promising lines of research? Yes and no. Artificial intelligence is more than just large language models and chatbots, encompassing a variety of other machine learning-based tools for enhancing productivity, for example by helping to analyse and visualise citation networks. Some experimental examples of these network analysis tools are available on arXiv through arXivlabs and are worth a try - even if their capabilities are limited or inaccessible today (e.g. requiring a subscription), in the coming years the best ones will become more widely available via university-wide subscriptions, similar to the growth of collaborative paper-writing tools such as Overleaf.

And what about large language models? In my opinion, it's best to avoid them when carrying out literature reviews. Language models are trained to favour fluency over accuracy, so rather than generating new knowledge they are better used for performing tasks where the end-user can verify the output. Even when asked to analyze specific papers, you can't be sure that the model missed or misunderstood an important point, for example when jargon used within a research area differs from the commonly-understood meaning of a word. And even if (or when) these issues are solved by new and improved models, at the end of the day large language models are designed to spit out probable-sounding sequence of tokens. On the other hand, scientific breakthroughs often come about through the pursuit of unlikely or unexpected avenues of investigation.

Finally, one should not read too much. Too much time spent reading what other people have done not only takes time away from your own research, but it can also sap your creativity and ability to pursue directions away from the groupthink. Richard Hamming explained this eloquently in famous lecture "You and Your Research" he gave at Bell Labs, available both as a text transcript and a video recording. I highly recommend reading or watching!

In summary:

  1. You should use a variety of sources, search engines, and media types
  2. Remember every source and search engine has a bias
  3. Aggregated statistics are just as important as individual papers
  4. Try emerging AI-powered search & visualization tools
  5. Don't read too much!

No comments:

Post a Comment