I switched from Notion to Obsidian.md to keep track of meeting notes, to store highlights from scientific papers (using a Zotero plugin), and to maintain a knowledge base for my PhD research.
Obsidian and llm
To summarise or rephrase parts of notes and paper highlights, I often use OpenAI models through llm
, a CLI utility that provides an easy way to interact with the OpenAI API. Traditionally, my workflow consisted of piping Markdown files from my Obsidian vault to llm
, combining them with a system prompt that described the requested task.
cat papers/paper.md | llm -m 'gpt-4o' -s "Summarise the methodology described in the provided paper notes"
I then copied the model output from the terminal and pasted it in a custom callout that I specifically defined for LLM-generated content (as to keep it separate from my own writing), giving me a first rough draft that I can edit later.
.callout[data-callout="llm"] { --callout-color: 0, 100, 100; --callout-icon: bot;}
A main disadvantage, however, is that I kept switching between Obsidian and the terminal to copy and paste model output. Another disadvantage was that I had no control over which lines from the relevant file to send to OpenAI when piping the full contents of a file to llm
1.
Obsidian and Espanso
Outside of tagging LLM output, I also use custom Obsidian callouts to highlight definitions, examples, and quotes from papers and textbooks. For this, I started using Espanso — a text expander that automatically replaces specific keywords with predefined text. This helps me avoid repeatedly typing callout boilerplate:
- trigger: ":def" replace: "> [!Definition] ⋯\n> ⋯" - trigger: ":ex" replace: "> [!Example]- ⋯ \n> ⋯"
Every time I type :def
— inside or outside of Obsidian — it replaces this keyword with the boilerplate that is needed to define a Definition
callout in Obsidian. Other custom triggers include arrows (←, →), dots (⋯, ⋮), and dashes (—) that I use in notes, and abbreviations for commonly used words (all preceded by the :
character, which I have reserved for Espanso keywords only).
Regex Triggers, Arguments, and Shell Commands
Regex triggers are an advanced alternative to regular triggers in Espanso. Instead of the static triggers shown above, these dynamic triggers provide flexibility through regex syntax and the use of named groups to pass arguments to expansions. The following two examples are taken from the Espanso documentation:
- regex: ":greet\\((?P<person>.*)\\)" replace: "Hi {{person}}!"
We can also execute scripts and shell commands through Espanso:
- regex: "=sum\\((?P<num1>.*?),(?P<num2>.*?)\\)" replace: "{{result}}" vars: - name: result type: shell params: cmd: "expr $ESPANSO_NUM1 + $ESPANSO_NUM2"
Expanding LLM Output
My initial goal was to create a new trigger — :llm(⋯)
— that allows for the following workflow: copy the text that we want the LLM to work with to the system’s clipboard (so that we don’t have to pipe a full file), type the trigger with the system prompt in the parenthesis, and expand this trigger to the custom LLM callout in Obsidian (with the system prompt as title, and the output of the LLM as the body of the callout). The trigger :llm("Summarise the provided text.")
would expand to:
> [!llm]- "Summarise the provided text." 💬⋯ (→ Summary of text currently in clipboard)
Limitation of Regex Triggers
Regex triggers have a limitation where the maximum length of a regex match, including captured named groups, are restricted to 30 characters.2 This constraint is intended to enhance performance, but makes it harder to write an expressive system prompt.
To circumvent this issue, we can use an LLM to first write us a full-sized prompt based on one or two provided keywords (as to keep our trigger under 30 characters) that we can then feed to our main LLM.
We could do this in one call — asking to expand the prompt and then use this prompt to perform the task at hand — but this would make it harder to extract the expanded prompt for the title for our custom callout. Another approach would be to provide the callout boilerplate to llm
, asking it to return the filled-in callout; but as API calls are cheap, we’ll go with the most straightforward approach:
- regex: ":llm\\((?P<userprompt>.*?)\\)" replace: "> [!llm]- \"{{prompt}}\" 💬\n⋯\n\n" vars: - name: prompt type: shell params: cmd: "echo $ESPANSO_USERPROMPT | llm -s \ 'Expand provided keywords to a short but effective, \ imperative one-sentence prompt in UK English. \ The final output when using this prompt should be \ short and to the point.'
For now, if we type :llm(explain to 5yo)
, we get the following in return:
> [!llm]- "Explain the provided text in simple words that a 5-year-old can easily understand." 💬⋯
Passing Full-Sized Prompt
Now that we can circumvent our character limit and have access to our full-sized prompt, we just have to provide it to another call of llm
. We can access the full-sized prompt by using the variable name in all caps, preceded by $ESPANSO_
. I pass the output through sed
— replacing newlines with the >
character — as to make sure that everything sticks together in the final callout.
- regex: ":llm\\((?P<userprompt>.*?)\\)" replace: "> [!llm]- \"{{prompt}}\" 💬\n{{output}}\n\n" vars: - name: prompt type: shell params: cmd: "echo $ESPANSO_USERPROMPT | llm -m 'gpt-4o' -s \ 'Expand provided keywords to a short but effective, \ imperative one-sentence prompt in UK English. \ The final output when using this prompt should be \ short and to the point.' - name: output type: shell params: cmd: "llm -m 'gpt-4o' -s \ \"$ESPANSO_PROMPT.\" | sed -z \"s/\\n/\\n> /g\""
Final Espanso Trigger
The last thing we have to do is accessing our clipboard. We do this by creating a new variable inside our trigger,
- name: clipboard type: clipboard
and piping it to our final call to llm
. The final Espanso trigger looks as follows:
- regex: ":llm\\((?P<userprompt>.*?)\\)" replace: "> [!llm]- \"{{prompt}}\" 💬\n{{output}}\n\n" vars: - name: clipboard type: clipboard - name: prompt type: shell params: cmd: "echo $ESPANSO_USERPROMPT | llm -m 'gpt-4o' -s \ 'Expand provided keywords to a short but effective, \ imperative one-sentence prompt in UK English. \ The final output when using this prompt should be \ short and to the point.' - name: output type: shell params: cmd: "echo $ESPANSO_CLIPBOARD | llm -m 'gpt-4o' -s \ \"$ESPANSO_PROMPT.\" | sed -z \"s/\\n/\\n> /g\""
Demonstration
I am sure that there are plenty of ways to
cat
specific lines only given a file, but I assume for this you need to provide specific line numbers, making the process a bit more complex. ↩︎https://espanso.org/docs/matches/regex-triggers/#limitations ↩︎