Untitled

Audio is streamed directly from the publisher (media.museapp.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Discuss this episode in the Muse community

Follow @MuseAppHQ on Twitter

Show notes

00:00:00 - Speaker 1: There’s been very little innovation and research more generally into what is a good interface for inputting equations. So I think most people are probably familiar with Microsoft Word or Excel have these equation editors where you basically open this palette and there is a preview and there is a button for every possible mathematical symbol or operator you can imagine.

00:00:28 - Speaker 2: Hello and welcome to Meta Muse. Muse is a tool for thought on iPad and Mac. This podcast isn’t about Muse the product, it’s about Muse the company and the small team behind it. I’m Adam Wiggins here today with my colleague Mark McGranaghan. Hey, Adam. And joined by our guest Sarah Lim, who goes by Slim. Hello, hello, and Slim, you’ve got various interesting affiliations including UC Berkeley, Notion, Inc and Switch, but what I’m interested in right now is the lessons you’ve learned from playing classic video games. Tell me about that.

00:01:01 - Speaker 1: So this arose when I was deciding whether to get the 14 inch or 16 inch M1 MacBook Pro and a critical question of our age, let’s be

00:01:10 - Speaker 1: honest. Exactly, exactly. I couldn’t decide. I posted a request for comments on Twitter, and then I had this realization that when I was 6 years old playing Organ Trail 5, which is a remake of Organ Trail 2, which is itself a remake of the original. I was in the initial outfitting stage, and you have 3 choices for your farm wagon. You can get the small farm wagon, the large farm wagon, and the Conestoga wagon. I actually don’t know if I’m pronouncing that correctly, but let’s assume I am. So I just naively chose the Conestoga wagon because as a 6 year old, I figured that bigger must be better and being able to store more supplies for your expedition would make it more successful. I eventually learned that the fact that the wagon is much larger and can store a lot more weight means that it’s a lot easier to overload it. Among other things, this requires constantly abandoning supplies to cut weight. It makes the roover forwarding minigame much more perilous. It’s a lot harder to control the wagon. And yeah, I never chose that wagon again on subsequent playthroughs, and I decided to get the 14-inch laptop.

00:02:12 - Speaker 2: Makes perfect sense to me and and what a great lesson for a six year old trade-offs, I feel like it’s one of the most important kind of fundamental concepts to understand as a human in this world, and I think many folks struggle with that well into adulthood. At least I feel like I’ve often been in certainly business conversations where trying to explain trade-offs is met with confusion.

00:02:35 - Speaker 1: They should just play Organ Trail.

00:02:37 - Speaker 2: Clearly that’s the solution. And tell us a little bit about your background.

00:02:42 - Speaker 1: Yeah, so I’ve been interested in basically all permutations really of user interfaces and programming languages for a really long time, so this includes the very different programming languages as user interfaces and programming languages for user interfaces, and then, you know, the combination of the two. So right now I’m doing a PhD in programming languages, interested in more of like the theoretical perspective, but in the past, I’ve worked on I guess, end user computing, which is really the broader vision of notion, I was at Khan Academy for a while on the long term research team.

00:03:18 - Speaker 2: Yeah, and there I think you worked with Andy Matuschek, who’s a good friend of ours and uh previous guest on the podcast.

00:03:24 - Speaker 1: Yes, definitely. That was the first time I worked with Andy in real depth, and I still really enjoy talking to him and occasionally collaborating with him today.

So, I guess, prior to that, I was doing a lot of research at the intersection of HCI or human computer interaction and programming tools, programming systems, I guess. So, one of the big projects that I worked on as an undergrad was focused on inspecting.

CSS on a webpage or more generally trying to understand what are the properties of like the code that influence how the page looks or a visual outcome of interest, and there I was really motivated by the fact that you have these software tools have their own Mental model, I guess, or just model of how code works and how different parts of the program interact to produce some output and then you have the user who has often this entirely different intuitive model of what matters, what’s important.

So they don’t care if this line of code is or isn’t evaluated, they care whether it actually has a visible effect on the output. So trying to reconcile those two paradigms, I think is a recurring theme in a lot of my work.

00:04:30 - Speaker 2: And I remember seeing a little demo maybe of some of the, I don’t know if it was a prototype or a full open source tool, but essentially a visualizer that helps you better understand which CSS rules are being applied. Am I remembering that right?

00:04:43 - Speaker 1: Yeah, so that was both part of the prototype and the eventual implementation in Firefox, but the idea there is The syntax of CSS really elides the complexity, I think, because syntactically it looks like you have all of these independent properties like color, red, you know, font size, 16 pixels, and they seem to be all equally important and at the same level of nesting, I guess, and what that really hides is the fact that there are a lot of dependencies between properties, so a certain property like Z index, you know, the perennial favorite Z index 9999999. Doesn’t take effect unless the element has like position relative, for example, and it’s not at all apparent if you’re writing those two properties that there is a dependency between them.

So I was working on visualizing kind of what those dependencies were. This actually arose because I wrote to Burt, who is one of the co-creators of CSS and was like, Hi, I’m interested in building a tool that visualizes these dependencies. Where can I find the computer readable list of all such dependencies? And he was like, oh, we don’t have one, you know, we have this SVG that tries to map out the dependencies between CSS 2.1 modules, and even there you can see all these circular dependencies, but we don’t have anything like what you’re looking for. That to me was totally bananas because it was the basic blocker to most people being able to go from writing really trivial CSS to more complicated layouts. So I was like, well, I guess this thing doesn’t exist, so I’d better go invent it.

00:06:12 - Speaker 2: Perfect way to find good research problems. Now, more recently, two projects I wanted to make sure we reference because they connect to what we’ll talk about today, which is recently worked on the equation editor at Ocean, and then you worked on a rich text CRDT called Paratext at In and Switch. Uh, would love to hear a little bit about those projects.

00:06:34 - Speaker 1: Yeah, definitely. So I guess the Paroex project, which was the most recent one was collaboration with Jeffrey Litt, Martin Klutman and Peter Van Harperberg, and that one was really exciting because we were trying to build a CRDT that could handle rich text formatting and traditionally, you have all of these CRDTs that are designed for fairly bespoke applications. They’re things like a counter data type or a set data type that has certain behavior when you combine two sets, and we’re still at the stage of CRDT development where aside from things like JSON CRDTs like automerge, we don’t really have a one size fits all CRDT framework or solution. You still mostly have to hand design and implement the CRDT for a given application.

And it turns out that in the case of something like rich text, it’s a lot harder than just saying, oh, you know, we’ll store annotations in an array and call it a day, because the semantics for how you want different types of formatting to combine when people split and rejoin sessions and things like that are all very complex and it turns out that we have a lot of learned behaviors that arise, even from just like, Design decisions in Microsoft Word, where you expect certain annotations to be able to extend, certain annotations to not extend, things like that. Capturing all of the nuance in that behavior turns out to be really difficult and requires a lot of domain specific thinking.

But we think we have an approach that works and I would really encourage everyone to read the essay that we published and try to poke holes in it too. This was like the 5th version of the. algorithm, right? So like months ago, we were like, all right, let’s start writing and then Martin, who has just an incredible talent for these things is like, hey, everyone, you know, I found some issues with the approach and, you know, oh no, 00, and sort of we fix those, we’re like, all right, you know, this one’s good and just repeat this like week after week. So I really have to give him a ton of credit for both coming up with a lot of these problems and also figuring out ways to work around it.

00:08:33 - Speaker 2: We talked with Peter a little bit recently, Peter van Hardenberg, about the pencils down element of the lab, but also just research generally, which is there’s always more to solve, you know, it’s the classic XKCD, more research needed is always the end of every paper ever written, which is indeed the pursuit of the unknown. That’s part of what makes science and Seeking new knowledge, exciting and interesting, but at some point you do have to say we have a new quantum of knowledge and it’s worth publishing that. But then I think if it’s just straight up wrong or you see major problems that you feel embarrassed by, then if you want to invest more.

00:09:09 - Speaker 1: Right, exactly. I think in this case. There was a distinction between, there’s always more we can tack on versus we wanted to get it right, you know, and in particular, the history of both operational transforms or OT and CRDT for rich text, just text in general is such that it’s this minefield of I guess to use kind of a gruesome visual metaphor, just dead bodies everywhere.

You’re like, oh, you know, such and such algorithm was published and it’s such and such time and it was new hotness for a while and then we realized, oh, it was actually wrong and this new paper came out which proved like 4 of the algorithms were wrong and so on.

And so with correctness being such an important part of any algorithm, of course, but also kind of this white whale in the rich text field, we thought it was important to at least make a credible effort to having a correct algorithm.

00:09:57 - Speaker 2: Yeah, makes sense. Yeah, I can highly recommend the Paroex essay.

One of the things I found interesting about it, maybe just for anyone who’s listening, whose head is spinning from all the specialized jargon here, CRDTs are a data structure for doing collaborative software, collaborative documents, and then, yeah, rich text, the Microsoft Word is the canonical example there.

You can bold things, you can italic things, you can make things bigger and smaller.

Well, part of what I enjoyed about this paper was actually that I felt, even if you have no interest in CRDTs, it has these lovely visualizations that show kind of the data representation of a sentence like the quick brown fox, and then if you bold quick, and then later someone else bolds fox, you know, how do those things merge together.

But even aside from the merging and the collaborative aspect, which obviously is the research, the novel research here. I felt it gave me a greater understanding of just how rich text editing works under the hood, which I guess I had a vague idea of, but hadn’t thought about it so deeply. So, highly recommend that paper. Just give them the figures, even if you don’t want to read the thousands of words.

00:11:05 - Speaker 1: I’m glad you like the figures. They were a real labor of sigma.

00:11:08 - Speaker 2: Perfect, yeah, so.

00:11:10 - Speaker 1: The one thing I would add is that CRDTs are a technology for collaboration, but the way they differ from operational transforms or OTs is that a CRDT is basically designed to operate in a decentralized setting, so you don’t need a persistent network connection to all the parts. you don’t need a centralized server. The idea is you can fluidly recover from network partitions by merging all of the data and operations that happened while you were offline, and this turns out to be really important to our vision of how collaborative editing should work because we think it’s really important for people to be able to do things like not always be editing in the same document at the same time as everyone. Maybe I want to take some space for myself to write in private and then have my changes sync up with everyone else thereafter. Maybe I’m, you know, self-conscious about other people editing. are seeing my work in progress, but I think that it would be interesting and helpful to look at what the main document looks like and how that’s evolving while I’m working in private, and you can have that kind of one way visibility with something like a CRDT versus with something like Google Docs, where it’s just sort of always online or always not editing in your own personal editor. Conversely, maybe I’m OK with everyone else seeing the work that I’m doing in progress, but I just find it really visually jarring to have all these cursors and different colors jumping around and People inserting text, bumping my paragraphs down the page. I’ve definitely been there. I’m not particularly precious about people seeing my work in progress, but I just cannot focus on writing when the page is just changing all around me. So in that situation, maybe I would want to allow other people to see my work in progress, so that we don’t duplicate effort or something like that, but I just have like a focus mode where incoming changes don’t disrupt my writing environment and these kinds of fork join one way window. Microgit style branching paradigms are really only enabled by a technology like CRDTs where you have the flexibility to separate and then come back together.

00:13:12 - Speaker 2: And I’m incredibly excited by the design research that needs to go into that.

Now at this point, I think we’re still on the technology level, you know, one way to think of it is Google Docs came along, I don’t know, 15, it’s almost 20 years ago now, I can’t even remember, let’s say 15 years ago, and this novel idea that We could both have a shared document or several people could have a shared document, all see the up to-date version and type into it and get, you know, a reasonable response or have that be coherent was an amazing breakthrough at the time and has since been kind of widely copied notion, Figma, many others.

But now maybe we can go beyond that, much more granularity, like you said, maybe borrowing from the developer version control workflows a little bit in a lightweight way, giving a lot more control and flexibility, and giving us a lot more choices about how we want to work most effectively.

But before we can even get onto those design decisions and how do we present all these different things to the user, what are the different options? We need this like fundamental underlying merge technology, hence the endless fascination that we have the lab and increasingly the technology industry generally has with CRDTs because it has the potential to enable all that.

00:14:23 - Speaker 1: Yeah, when we were working on the Paratax project, Peter was pushing really hard for, don’t make this just a technology project.

It’s a socio-technical endeavor and we need to invest a lot of time in the design component, also just doing user interviews, identifying how people interact with and.

How people collaborate in the status quo on text and Jeffrey and I actually did do a bunch of user interviews with people from all kinds of backgrounds. We’ve talked to people who write plays, people who produce a dramatic podcast kind of in this style of Night Vale.

I love Night Vale. Yeah, people who are in the writer’s room kind of working together with their collaborators on that, people who write lessons, video lessons for educational platforms. And there was a ton of really interesting Insights into user behavior around collaborative text.

We ended up just torn because we had this 12 week project and we were like, how should we best spend our time? Clearly, this is not just a technical area and we need to invest a lot in getting the design right, understanding what the design space even looks like since it hasn’t really been explored.

I really want to avoid, and this is a recurring theme in my work, I really want to avoid publishing or shipping something. And having it be this like, very broad, very shallow exploration into all the things that are possible. I think that this kind of work plays an important role, and there are a lot of people who do this well, just fermenting the space of possibilities and getting these ideas in a lot of people’s heads, who can then go on and do really cool things with them.

My personal style, I never want to feel like something is half baked, I guess, I would much rather ship this cohesive contribution like, here is an algorithm for building rich text. We think that this is a technical prerequisite to all of these interesting design choices, but the alternative with a 12 week period, and in fact, you know, this, the correctness and revision phase extended way over that. So thanks a lot to Martin and Jeffrey for leading during that part.

But it’s just already so hard to get it correct that trying to tack on a really substantive design exploration that does the area justice on top of that, I was just really worried it would stretched too thin.

So absolutely lots of room for future work in this particular. project. It’s very much a challenge in any area where you have simultaneously this rich design space that’s just asking to be explored with tons of prototypes and things like that, and then also to even realize the most simple of those prototypes, you require fundamentally new technology.

00:16:53 - Speaker 2: Yeah, I’ve been down that same path on many research projects as well, and often it’s that I’m excited for what the technology will enable, but also that in many cases it’s a combination, you know, some kind of peer to peer networking thing, but with that will enable us to provide a certain benefit to the user and I want to explore both of those things, but then that’s too much and then the whole thing is half baked exactly as you said. I’ve never found a perfect or even a good. Way to really manage that tradeoff. You just kind of pick your battles and hope for the best. Yeah, definitely. Well, I do want to hear about the equation editor project, but first I feel I should introduce our topic here, which I think folks could probably have gleaned is going to be rich text and rich text editing, and maybe we could just step back a moment and define that a little bit.

I think we know that texts, you know, symbolic representation of language is a pretty key thing, writing and the printing press and all that sort of thing. We wrote about that a little bit in our text blocks memo, which I’ll like in the show notes. But typically, I think computers for a lot of their early time and even now with something like computer code is typically plain text, that’s the dot TXT file is kind of almost the native style of text that you have and then rich text typically layers something on top of that. I don’t know, so maybe you could better define rich text for us to have a more concrete discussion about it.

00:18:21 - Speaker 1: Yeah, I think rich text for most people basically evokes things like bold, italic, underline, the ability to augment plain text with annotations that are useful in formatting, actually, I think.

Notepad to word pad is the archetypal jump in software, if you’re thinking about it from the old Windows perspective.

In the past few years, I think we’ve started to see a real expansion of what rich text can look like. So, of course, we started out with something like Markdown, which is, of course, a plain text representation. But it’s designed to be able to capture more nuance in plain text and be rendered to something like HTML which very much supports rich text.

So in Markdown, you have not only these kinds of inline formatting elements like bold and italic and hyperlinks as well.

You also have support for images, which you could think of as more block level rich text elements, I guess, and I don’t think there’s a real clear consensus across editors on how block level rich text elements should be displayed.

Of course, in between you have things like bulleted lists and those tend to be handled in a fairly standard manner with nested lists and so on, but it quickly becomes like a question of taste. Which kinds of annotations you support.

So in editors like Coto or Notion, you have all these different block types where the block is really the atom of collaboration and editing, and then you can have things like, you know, file embeds or even database views, things like that.

So I think we’re at a point now where both block-based editors, I’m using block based editors in like the text or writing sense, not the structured editors for programming sense, although I have other things to say about that, but we’re at a point where you’re starting to see these block-based editors appear and I think that there are a lot of really interesting patterns that this permits that the paragraphs via linear sequence of characters, including new lines and whitespace does not permit, or at least doesn’t allow you to build as structured tooling around.

00:20:30 - Speaker 2: I’m trying to think what is actually the core of the difference between a block-based editor, that’s a notion, a RO uses working on its own block text implementation and a flow of characters, so that’s Microsoft Word, Google Docs, maybe even text editors. I guess it’s sort of like paragraphs are separated by like these sort of nested. Elements or have a parent to the document versus like two new lines embedded in the stream of characters, but I don’t know, that seems too unsophisticated, maybe have a better definition for us.

00:21:03 - Speaker 1: So, I actually think about this very similarly to in the like programming languages and editor tools space.

There is a distinction between structured editors and regular plain text editors for programs. The idea is that you might have a text-based programming language and you can write that perfectly fine in any buffer that allows you to put sequential characters, often AI is sufficient for some languages, and then on the other hand, These programs might have a lot of inherent structure. A simple example is with lisps which are built out of these parenthesis S expressions, everything is, you know, an S expression. You can think about like the structure of the tree formed by, I guess a forest, formed by having like these S expressions with subelements and stuff. that, and then you can do manipulations directly on the structure in a way that allows you to always have a syntactically correct program or at least a partial syntactically correct program by doing things like I’m just going to take this subtree, which is a sub-expression and move it somewhere else where there’s room for another subexpression. So, I think of block-based editors as capturing a very similar zeitgeist to structured editors for code, because instead of just having this linear buffer of characters that can have, you know, formatting or things like that, you can have new lines, you actually have more of a forest structure where you have lots of like individual blocks, and then you can have blocks that are children of other blocks and so on, and that allows you to Do things like move an entire subtree representing an outline to another position in the document without selecting all of the characters, you know, cut them and then paste them somewhere else. So things like reparenting becomes a lot easier, things like setting the background of an entire subtree becomes a lot easier. Just in general, you have more structure and there’s more things you can do with that structure, I guess is how I would phrase it. One of my favorite things that you can do with this model in notion is you can change the type of a block very easily. So let’s say I have a bullet list item, and then I hit enter and enter these like subnote or something like that as children of the initial bullet list item. I can turn the bullet list item into a page, and then all of a sudden it’s just a subpage in the document, and the sub bullets that were there before are just like top level bullets in that page. And this is particularly important for my workflow because I care a lot about starting out with something like really rough and sketchy and then progressively improving it or moving up and down the ladder of like fidelity into something more polished. So you might, for instance, start off with just an outline list or even a one dimensional list of to do blocks when you’re trying to do project planning or something. And then later on, let’s say I want to put these into like a tasks database with support for like a conbond view or something like that. I don’t actually want to sit there and like recreate all of these tasks in Jira. I’ve been there, you know, I’ve been the person making all the tasks in Jira after the meeting and then assigning them to people. What the workflow that I think notion is poised to enable and can certainly do a better job in this regard, but already offers some benefits on is like, can I just highlight all of these blocks because everything is a block, move them into some existing database and have them match the schema. That kind of like allowing people to do fast and loose prototyping with very unstructured primitives and then promote them into something more structured like in a relational database setting or similar, I think is the sweet spot, structured editing provides the sweet spot between like just completely unstructured text and these very high fidelity, high effort interfaces that allows you to kind of move between them.

00:24:47 - Speaker 3: Yeah, I really like that direction and framing, and if I can extend it a little bit, I think we can also look at a continuum of richness in terms of the content itself.

So you have plain text, what you might classically call rich text with links and bold and underlying. And then you maybe start to throw a few images in, and then what if you can put it in videos and what if you have a whole table, and that table is actually a database query, and you can nest the figment document, and this way you can see that there’s sort of continuum on the richness of the document. One reason I think Notion has been so successful, they’ve been pushing along that continuum while maintaining a sort of foundation of rich textness, which is very familiar and the important basic use case for a lot of people.

A related idea is that I think we’re seeing a lot of the classic document types converge. So if you look at a rich text like a Microsoft Word and a PowerPoint and increasingly spreadsheets, those all used to be 3 distinct Microsoft Office applications, and we’re seeing the value of them being in or being the same document.

This is actually one of the motivating ideas behind Muse and a lot of the research we’ve done in the lab, and the kind of something Slim was saying, you want to take your idea continuously through different media and different modalities and different degrees of fidelity, and you don’t want to jump between different applications do that. You want to be able to do it on the same canvas. That’s by the way, one of the reasons I like Canvas. It’s not only because it’s a free multimedia surface, but also it evokes this idea of like flexibility and potentiality, and I think that’s one of the things that’s really excited about these mixed media documents.

00:26:16 - Speaker 2: And I know if Jeffrey were here, he might jump in and say that one downside to our current application silo world is that the only way to have this deeply rich text where it’s images, video, a table, a database query, something like that, is to have the Uber application, to have the everything app, and certainly notion has probably gotten pretty far on that, but others kind of in In some ways are forced to do that, like we have to do some of that in Muse as well. People come in and ask for all these different types here as well, and there’s more of like an open doc inspired or Unix inspired future that maybe Jeffrey and others, including me, would hope for, which would be more that applications could be these individual data types and you could put them all together through some kind of more operating system connection.

But that is so completely reversed from kind of how all our computing devices work today. It’s hard to see how we might get to that.

00:27:14 - Speaker 3: Yeah, I’m certainly sympathetic to that concern, although I suspect the way out is through, and you get platforms from working killer apps.

And so the way we got the whole unit ecosystem was they wanted to build a computer for, you know, writing and running programs and then eventually got all this generalized text processing stuff, but it’s not like they started in like, oh, I’m gonna make a generalized text processing machine.

I don’t think that was really the way that they approached it and developed a success. So, I’m still hopeful we could do this, but I think you got to extract it from something that’s already working as an app, but it always helps to have an eye towards that, and I think we’ve done some of that with Muse.

00:27:46 - Speaker 1: I was just going to say that it’s not me talking about texts, unless I bring up my favorite piece of software of all time, which is Pandok.

And I think that Pandok actually is very relevant to this discussion. So for those who aren’t as familiar with it, Panok brands itself as this Swiss Army knife for document formats, and it’s sort of headline contribution is that it allows you to convert between all kinds of documents.

For instance, I can take a Word document and convert it to a PDF Word documents to something like, I don’t know, IPython notebook, Jupiter notebook, back and forth across this incredible bipartite graph of formats, but I think that the subtler contribution that Pandokc makes, which is extremely significant, is that Pandok has this form of markdown called Pandok markdown that essentially aligns and supersedes all of the different fragments of markdown that we’ve seen before.

So the problem with markdown basically is that the original specification is sort of ill-defined. There are several cases in which the behavior is not super clear and then on top of that, it’s not very expressive.

There aren’t very many constructs. So things like fenced code blocks, which many people associate very closely with Markdown today, that was only added by GitHubb flavored markdown, which is certainly widely used among the programming community, but not everyone is on GitHub, of course. And then you have things like table formatting or even like strike through really strike Through wasn’t defined in the original markdown specification either. And so you have markdown and then you have like GitHub flavored markdown, common mark is sort of this unifying effort remark down all these different is the markdown cinematic universe. I tried to make a joke about this. I had this joke ready for the markdown Cinematic universe when the last Marvel. Movie came out. But then like, it didn’t get nearly the traction in my timeline as the Dune did, perhaps understandably. So really, I’m just going to have to wait till the next movie comes out. It’s a real, real tragedy. No, but like, I guess you have this real pluralism of forms and it becomes very difficult to use markdown truly as a portable format because the way it renders in one editor or even parses can very much differ from editor to editor. So, Pandoc provides this format that essentially serves as an IR or intermediate representation between all these kinds of documents using a markdown supersets that somehow magically encapsulates everything.

00:30:18 - Speaker 2: And that includes not just markdown, but also like PDFs or Microsoft Word, that seems.

00:30:24 - Speaker 1: Well, so the way it works is it’s this compilation pipeline, I guess, that allows you to go from a markdown document.

It compiles it to PDF using PDF Lawtech or something. It outputs Lawtech, it outputs HTML various things, and you can think of it as being this intermediate representation because you start with this like Word document, you can turn that into markdown and you can go from that markdown format into any of these output formats, which turns out to be like really powerful because the main issue with these kinds of conversions is that it’s often lossy, there are features that are supported by Law tech, for instance, that aren’t supported by the web natively, there are features that are part of like Word documents that aren’t necessarily supported by HTML and so on and so forth.

So Pandok serves this role of like basically saying, OK, what is an intermediate language that can encapsulate all the different implementations of the same concept across different input and output formats.

And what I think is so remarkable about it is that oftentimes when you are using an AP. of software and you’re like, oh darn, you know, now I need to support this other thing too. You quickly end up in a situation where you have the snowball and things start to feel tacked on.

So you’re like, Oh man, it’s very clear that they just glommed on this additional syntax for this feature. And with Pandok, everything feels like very principled in its inclusion. And at the same time, whenever I’m using Pandok and I’m like, darn, I really wish there was a construct that I could use to express this. particular thing, I look up in the documentation and it’s always supported. So, as one of my favorite examples, one of the output formats that Handok supports is various slideshow frameworks. So Beamer for people who use Lawtech and Reveal JS for people who use HTML and CSS and these slideshow frameworks basically allow you to replace something like PowerPoint, Keynote, Google slides with essentially like a text-based format. I really like doing slideshows in Pandock markdown. There are a few reasons for that. The first reason is that it’s really useful to be able to reuse some of the same content from like my blog post or essay even in the slideshow. There are some really minor and almost petty, but really significant reasons. Like, I like to have equations or code blocks with syntax highlighting in my slideshows, and there’s not really a good solution to putting like a syntax highlighted code block in Keynote right now.

00:32:39 - Speaker 2: Last I remembered, the gold standard at the Ruby conferences I used to frequent was to take screenshot of Textmate and paste that in.

00:32:47 - Speaker 1: Yeah, it’s awful. I don’t want to see your like monochai editor with like the weird background that contrasts weirdly with the slide background. I just, ah, and it doesn’t scale on a huge conference display anyway, I digress, but The other reason why I really like doing my slideshows in text is actually that there is often a hierarchical structure to my presentations, right? I’ll have like these main top level sections and then I’ll have subsections, and then I’ll have like sub subsections and all of these manifest and slides. But in the gooey thumbnail view of most of these existing Slideshow editors like PowerPoint or Google slides, it reduces it all to like this linear list. It’s like, here are all of your thumbnails in order. And it makes it very hard, as soon as I have like an hour-long conference talk, how do I like jump to this subsection that I know exists, aside from like scrolling past like 117 thumbnails and trying to find the right one, right? And moreover, let’s say I want to Reorder a certain part of the talk because I think it better fits the narrative structure. Now I have to like figure out which thumbnails I need to drag to which other place or worse, go into the individual slide, select the text from that, move that somewhere else, and it’s just way, way clunkier actually than reordering some text in like a bullet list outline in my editor.

And then the other part is that I was talking about how Pandok has really great support, expressive support for idioms of different formats, and one thing you often have in slideshows is that I have some element on the screen and then I press, you know, the next button again and then another element will appear.

So in Pandoc you can denote this with just like an ellipsis basically so like dot dot dot and then if I have a slide where I have a paragraph and then the dot dot dot and then another paragraph, it will render with just the first paragraph visible and then I press next and then like the subsequent paragraph comes in.

And that’s like just a very lightweight way to handle these stepped. Animations compared to going to the animation pane and then clicking the element that I want to animate in and so on and so forth.

So it started off with me being like, I’ll just prototype in this format, but then it ended up supporting columns, it supports all these things that you actually want. And I was like, this is in many ways a more ergonomic way to handle long technical slideshows. Anyway, I have to chill for Pandok anytime I talk about rich text, I’m contractually obligated to do so.

00:35:08 - Speaker 2: Yeah, it’s a great piece of software, use it here and there. I think I was doing some Asky doc kind of manuals many years ago and yeah, just in general, it’s also worth looking at the homepage that you mentioned the plot they have where it shows all the different formats that can convert between is quite fun. You click on that, you can zoom in.

00:35:26 - Speaker 1: Yeah, I had this really elaborate plan when I decided to go to Berkeley, that I was going to print out a door-sized poster of like that graph that shows all the formats they convert between and then show up at John McFarlane’s door and ask him to sign it. But then the pandemic interfered with some of those plans. Nonetheless, it remains on my list.

00:35:48 - Speaker 2: Good bucket list item, pretty unique one at that.

00:35:51 - Speaker 1: Also, I found my tweet, or I found the draft of my tweet, which is about eternals, and I said, directed by Chloe Zhao, the latest entry in the Markdown Cinematic Universe features an ensemble cast of multi markdown, GitHubb flavored markdown, PHP Markdown Extra, R Markdown, and Common Mark as they joined forces in battle against mankind’s ancient enemy, Doc X. Nice.

00:36:12 - Speaker 2: Wow. You would have gotten the like from me.

00:36:16 - Speaker 1: Yeah, we’ll see if it ever sees the light of Twitter.com.

00:36:20 - Speaker 2: You briefly mentioned there equations and La tech, and maybe that’s a good chance to talk about the equation project you did for notion. And part of what I thought was so interesting or what I think in general is interesting about equations is that they are obviously an extremely important symbolic format, but in many ways extremely different from the pros we’ve been talking about.

So English or other languages, even languages that are right to left or something like that, they all have the same kind of basic flow and the way that we represent sound. So with these little squiggly symbols, even though the symbols themselves and sounds vary and how we put them together into words across languages, that’s a common thing. If you go to the mathematical realm, you have symbolic representation, but equations are the whole own beast, and I think one that has gotten a lot less attention from kind of the software and editing world. So tell us about that rabbit hole.

00:37:16 - Speaker 1: Yeah, so just as context for people, notion and many other applications actually have long supported block equations, an equation that basically takes up, you know, most of the page horizontally.

What is much more uncommon in editors is support for inline equations and so this can be something as simple as saying, You want to type let X be a variable, and X should be formatted or stylized mathematically.

Being able to refer to elements of a block level equation in inline text is a prerequisite for being able to do any kind of serious mathematical writing, yet because this is kind of this niche area that has historically been the purview of Overleaf and other law tech editors, it’s really not implemented.

In most editors.

So I pushed really hard to add inline equations and inline math to notion, because I was like, there’s a huge opportunity for people to write scientific or mathematical documents that take advantage of all of notion’s other features like being able to embed FIMA or embed illustrations and things like that, right? So, it turns out that it’s kind of difficult, exactly as you’re describing to do this equation format.

There’s been very little innovation and research more generally into what is like a good interface for inputting equations. So I think most people Probably familiar with Microsoft Word or Excel have these equation editors, or even like operating system level sometimes where you basically like open this palette, and there is a preview and there is a button for every possible mathematical symbol or operator you can imagine. And then for composite symbols like the fraction bar or integral or something like that, you find the button for that, you click it, and then you click into like the little subboxes and then you find whatever symbol you want and you put those there too. So it’s kind of a structured editor, but like in an unimaginably cumbersome interface. This is what I used to do my lab reports in high school, for example. And then at the other end of the spectrum, you have things like law tech. Law tech is basically how everyone in at least in computer science and mathematics chooses to typeset their work, typesets complex mathematics. One of the real selling points of law tech, I think is that It turns out that operator spacing is really important, and there’s a big difference between, say, a dash that’s used like a hyphen or a dash character that’s used in text, and a hyphen or a dash character that’s used as a minus sign in an equation, the spacing is subtly different.

And one of the big things that Lawtech does is it basically allows you to declare certain operations in certain contexts as like a math operator versus just a symbol versus just like a tagged group of characters, and it correctly handles the spacing depending on what kinds of characters are around the operator in question. And so Lawtech basically produces really nice looking mathematics at the cost of this markdown which looks like I kind of smashed my keyboard that only had like 3 characters. It’s the exact opposite of the equation editors instead of having a button for every imaginable character, you only have 3 buttons. The buttons are backslash, open curly brace, and closed curly brace, and somehow like permuting those characters is supposed to get you like any possible mathematical outfit. There’s just two ends of the spectrum.

00:40:41 - Speaker 3: Yeah, I used to do my analysis homework in college in law tech, and I remember when I first looked up how you would input in law tech these formulas, like, that can’t be right. This is not the best way in the world to do this. In fact, that’s it, that’s the one and only way.

00:40:53 - Speaker 1: It really is, it’s terrifying. It’s the one and only way and the wild part is there are people who are like super, super good at law tech. They can like live tech their lecture notes. I was never nearly like that fast, but some people can do it usually with extensive use of macros, which macros are another selling point of law tech as you can define these kind of custom shorthand for operators you use a lot. But anyway, yeah, so you have a lot of tech sort of at the other end of the spectrum, like really quite unreadable, oftentimes, like, it’s like a right only format, many times.

00:41:23 - Speaker 2: And of a regular expressions come to mind on that as well, yeah.

00:41:26 - Speaker 1: It’s exactly the same zeitgeist, I think. It turns out that figuring out how to have like a combination, gooey, plain text interface that allows you to be like in a rich text editor like notion, then. into an inline equation field to have like an inline symbol and then go back into the GUI editor was like just very unexplored territory.

And it kind of makes sense that lots of people don’t prioritize this because many people that notion rightfully had the question like, oh, is this something we should be working on? But first of all, it turned out that if you actually tallied up like our user requests, inline math was like near the top.

Of editor feature based requests. And then more generally, it turns out that because this is like a prerequisite for many researchers and for students, you can get a lot of people on your platform who rely on it, you know, as a student to take notes and something like that, because there’s literally no alternative. And then they are able to stick around and use the platform for all kinds of other things.

So this is just kind of a plug that more editors should implement this.

But Yeah, I thought that this project was really interesting because in the interaction paradigm, you want to capture a lot of the things that are very fluid about editing regular text. So for instance, we knew it was important that you should be able to use the arrow keys to move left and right, kind of straight through a token without editing it if you wanted, or if you wanted to be able to go. Into a token and edit it using the arrow keys, you shouldn’t have to like use the mouse to click, although, of course, you should also be able to use the mouse to click. And when you have this formatted equation, we made the decision that the rendered equation would be represented as this atomic token. So if you were highlighting text to copy and paste and move around, it would be like highlighting a single character that would just be like the whole equation. But of course, you could go in and edit the equation. Any way you want it in kind of this pop up text editing interface.

I think another thing that’s the subtle interface challenge here is that like Mark was saying, there is often a Uh, disproportionately large number of characters used to represent the equivalent of like one character with a formatted output. And so that’s something you don’t really take into account. The output is like X with a hat in San Sara font, and then there’s like 25 characters of markup that goes into that, and you just need to like scale the interface appropriately to take that into account.

But I think that it’s really interesting because It shows the power of combining different input and output formats in like the same atom, right? So you have like a single line of text, and you want to have rich text that’s formatted and stylized and so on, hyperlinks, and then also equations or whatever inline rendered output of another input format that you have. I think that that’s really where GUI editors and whizzy wig editors can shine

← All episodes of Metamuse