Daniel Leykam's research blog: writing

Showing posts with label writing. Show all posts

Tuesday, July 30, 2024

Mistakes to avoid when writing the introduction to your paper

As a journal editor I read a lot of manuscripts. The introduction is often the hardest part of a paper to write, particularly for high impact journals where one must tread a fine line between shameless self-promotion and clearly explaining the importance of one's work in a manner that is appreciated by both specialists and non-specialists. Two mistakes crop up time and again:

Mass citations

"Extremely niche topic x has become a hot topic due to its potential applications [1-26]. Many novel effects have been reported [27-48]. These works have been extended to unprecedented directions [49-68], paving the way to..."

Yes, it is important to acknowledge relevant prior work on your topic. But when you cite papers en masse it gives the impression that you don't understand which papers in your research paper are really important!

One sentence citations

"Smith et al. explored applications of extremely niche hot topic x [1]. Brown et al. innovatively demonstrated a novel effect [2]. Newton et al. paved the way to...[3]."

The opposite extreme of explaining each reference individually (but in a single sentence only, otherwise the introduction will be too long) has the same effect, suggesting you have merely skimmed the works you have cited without really understanding how they fit together and what the bigger picture is.

Don't do this! Cite one or two review articles instead, along with the specific works you are building on. Don't make the reader have to do a literature review just to tell whether your paper might be worth reading!

Wednesday, January 17, 2024

Talks-to-papers with Whisper

Last year I wrote about a neat and lightweight implementation of the Whisper speech-to-text model. One of the potential applications I mentioned was converting recorded presentations (seminars, lectures, etc.) into written notes. A few weeks ago a review article I wrote using this approach was published in AAPPS Bulletin. Here's how I did it:

1. Identify source material. In this case, I had an online conference talk that had been recorded and uploaded to Youtube.

2. Download the raw audio using a tool such as yt-dlp

3. Convert audio to a text transcript. I used whisper.cpp (can run on CPU). The base and small models sizes already do pretty well in terms of accuracy and run quickly.

4. Transcript editing. Whisper won't have perfect accuracy, especially when attempting to transcribe scientific jargon. So it's necessary to carefully review the generated text.

5. Figure conversion. In this case since it was my own talk, I had access to high resolution version of the figures I wanted to include in the paper. Minor reformatting required.

6. Add references. While I cited papers in the slides, the citations need to be converted to a .bib file or other reference manager format. It would be helpful to have an AI assistant that could do this automatically.

And with that I had a first draft completed! Very nice, since the first draft is usually the hardest to write. I did spend some more time polishing the text, adding some details that didn't make it into the original talk, and making the language more formal in parts, but it ended up being a lot easier than writing the whole text from scratch!

Friday, April 21, 2023

Large language models for everyone

ChatGPT's release late last year attracted a surge in interest -- and investment -- in anticipation of numerous monetization opportunities offered by the new and improved large language model. At the time there were no serious competitors - everyone had to use OpenAI's service, which is now pay to play.

As I wrote last month, competing models such as LLaMA have been released with downloadable weights, allowing end-users to run them locally (on high-end GPUs or even CPUs after discretization).

Researchers from Stanford University have released Alpaca, a fine-tuned version of LLaMA, showing how fine-tuning of language models for more specialized applications could be carried out relatively inexpensively provided one has access to a sufficiently powerful foundation model.

However, LLaMA (and therefore its derivatives) were released under a restrictive license, in principle limiting them to non-commercial research purposes only. Nevertheless, students have been free to use illegal leaked copies of LLaMA to write their essays and do their homework.

This week, Stability AI released StableLM, a language model with a similar number of parameters to LLaMA, under a CreativeCommons license that allows free re-use even for commercial purposes.

Barriers towards widespread adoption of large language models are dropping fast!

Monday, March 20, 2023

Speech-to-text with Whisper

Whisper is another neat productivity tool that has been translated to a high-performance model that can be run without specialized hardware, even on your phone!

The speed and accuracy is remarkable - it takes only a few minutes to create a transcript of an hour-long seminar. While these capabilities have been around for some time (e.g. subtitle options in Youtube and video conferencing programs), it is great there are now fast, open source tools that can be run locally, without an internet connection or the privacy risks of sending your data to some untrusted server on the cloud.

Some potential applications in research:

Brainstorming - discussions can be transcribed to a text format that can be more easily reviewed later (e.g. searching for keywords).
Paper drafting - often when writing or typing we fall into the habit of writing long convoluted sentences that need heavy editing to make them more readable and digestible. Dictating parts of a paper might be a better way to come up with clear and concise text.
Converting recordings of conference / workshop talks into full-length paper drafts or conference proceedings. I am trying this one out on one of the online talks I gave during covid.

The ability to quickly and accurately convert research between different formats (text, audio, visual, different languages, etc.) will ultimately improve the accessibility of research, ensuring that it is open for all to use and build on. Further reading on this important initiative can be found on the arXiv blog.