Issue #37 – How to build a better web browser
By Scott Berkun, December 2004
Web browsers are funny things. On the one hand, they’re supposed to be lightweight little programs that just let you view websites, and on the other, they carry the same burdens as operating systems and application suites, trying to provide everything to everyone. Here in this little essay I explain what I know about designing browsers. I’m in the lucky minority of people that have actually designed successful browsers, or parts of them, for any length of time, and with Firefox and Opera in the headlines, and the art of browser design becomes important again, I thought I’d write down some of what I know. Its been years since I was a program manager on the Internet Explorer project, but I’ve maintained interests in the design of navigation and searching systems of all kinds: what follows is a rough summary of what I’ve learned.
Understanding what people do with browsers
For all kinds of application design, the most common mistakes is to jump right in and start adding features to whatever it is you already have (or have copied from someone else). For various reasons this doesn’t work: Big piles of hit and miss features, chosen thoughtlessly, is less desirable that small piles of good features, chosen carefully. Consider this: what makes a good feature? It’s not an abstract quality: goodness means a problem is solved for a user. If you don’t spend some time considering who these people are and what they’re doing, odds are slim you’ll find features that matter much: unless of course you’re building the browser exclusively for people that that look, smell and think like you do.
So the first thing to do is step back and build a model for how browsers are used. User research and design analysis allows for complex models, which, depending on the kinds of questions you want answers to, may be entirely worthwhile. However, for the sake of this essay I’m providing a super simple model to use. In this little model everything people do with browsers falls into three categories:
Reading. It’s the ability to read information off the screen, through the browser, that is the largest percentage of human activity with browsers. Think about it: the reason for all of that searching, clicking and bookmaking is to find things worth reading (or perhaps skimming, sending to friends, or printing out). If you were to take a stopwatch and measure how much time most people spend doing various things with their browser, reading or skimming pages would be at the top of the list. There is a reason that the page itself is the largest part of the UI of any web browser. Scroll bars get more usage than most other browser features combined.
Navigating. Since reading doesn’t require much interaction from the browser, most browsers build their structure around moving between pages. Hypertext systems are based on the ability to navigate, both through clicking links, and using other browser tools (Bookmarks, searches). A good browser makes it easy not only to move from one page or site to another, but also makes it easy to return to things the user has already seen (e.g. History) and things they have deemed as important or interesting (Bookmarks).
Interacting: Any time the user has to put information into the browser, URLs, passwords, credit card numbers, naked pictures of their neighbors, they are interacting with the browser or a website. Within the browser itself, this could be setting preferences, or adding/managing bookmarks. With websites it could be creating email, chatting with friends, or doing any of the other amazing things websites do.
Now in this basic model, it’s safe to assume that people with different levels of experience will read, navigate and interact with a browser in different ways. More advanced users (which probably includes you since you looked for an essay about designing browsers) combine these different activities in all kinds of complex and near-ADD ways. Less savvy users, or people on low bandwidth connections, will tend to work on one thing at a time, and their pace of activity will be much slower. But in most cases the model still holds: even for RSS and blog junkies who consume and produce volumes of information hourly, all their behavior can be broken down into reading, navigating, and interacting. So while I’d love to write more about models of web usage, I need to get on with the rest of the essay.
Understanding how browsers are used
Building on our little model, there are at least two piles of research that have direct implications for making better browsers. One is David Abrahm’s study on bookmarks, the other is Linda Tauschler’s study on history and navigation. There are many other studies I’ve seen in the same research space, but I always find myself wandering back to these two. Although somewhat dated (late 90s), they point out many important observations about how browsers are used to navigate through the web.
The basic gleanings from the research:
- People have repeating patterns of web usage. Although the web is huge, most people visit very small parts of it more than once, and even fewer parts of it on a regular (daily/weekly) basis. This means browser usage is semi-deterministic: looking at a user’s history can be highly predictive about future behavior. A smart browser should take advantage of this, and make it easier to go to places users return to often.
- The rate at which people change the pile of sites they visit is slow. The accumulation of large numbers of bookmarks/favorites takes most people a long time. Many people have fairly small numbers of favorites, between 10 and 50 (this may have changed since research was performed).
- Bookmarks serve different purposes, despite how indifferent most bookmark UIs are to them. Some bookmarks are references for a problem to be solved later. Some mark interesting or curious things. Some bookmarks are much more important than others, have different probabilities of reuse, and will be used in different ways.
- People use different triggers to get back to different websites. Sometimes it might be the name of the site. Sometimes it might be the title of the essay they were reading. Other times it might the time of day they were looking for it, what question they were trying to answer, or simply the location of the link in a search result page for a particular query. Additionally, people have varying levels of ability to predict which of the things they’ve bookmarked will be used in an hour, a week, or ever
So by applying our model (Reading, Navigating, Interacting) and our short list of gleanings, there are all sorts of things that can be done to browsers to make them better. The rest of the essay is about those things, so if you were wondering where the list features is, it starts now.
All of that research implies that bookmarks have tons of un-harvested value for users. So its not surprising that one of my favorite things about Firefox is that it allows searching of bookmarks. (Fun tangent: searching favorites was one of several features related to favorites that I failed to get into IE4 and IE5, and I’m happy to see someone else got it done. We did get searching history into IE4, based on the bet that history had more valuable data than favorites (aka bookmarks) did. The bet was that since favorites were all of the places people thought they went often, but history was the list of places people actually went to, the data in history would tend to be more valuable. Sadly, I never collected on the bet: when I left IE5 team to work on Windows I also gave up the time to research these things (or to beg others to research these things)
But even with searching, favorites are dumb as rocks. I think there was a Netscape feature called smart bookmarks eons ago, but it wasn’t that smart either (I believe it was part of their push functionality). What I’m talking about is this: The browser should use several easy to collect kinds of data to improve the user’s ability to get value out of bookmarks. So maybe it’s not smart bookmarks, its value added bookmarks. Here’s some examples:
- Frequency of visitation. Sounds like part of a prison sentence, doesn’t it? What I mean is that every time I go to a url (regardless of how I get there) the system should add to a counter for that bookmark. This allows me to sort favorites by frequency of use. It also allows the system to know when I’m returning to a particularly important place, and perhaps behave slightly differently while I’m there (think better/smarter caching, or other non-invasive smartness).
- Basic meta data. Searching bookmarks today only hits on the page titles. There is decent metadata on web pages that should be recorded as part of the favorite: Titles, keywords, descriptions, etc. Or I should have the option to add my own commentary about the page at the moment I’m creating a bookmark for it (e.g. an optional part of the add bookmark dialog box). Then I can put in my own meta data about the URL, including any trigger words or thoughts about why I, the strange and ridiculous human being that I am, am choosing to bookmark the definition of the word eudemonic, or some other obscure thing I’m likely to forget the reason for later on. Then any searches or later interaction with the bookmark now benefit from my own context.
- Personal usage and trend analysis. There are many tools for webmasters that allow analysis of how people use their site. There’s no reason the opposite kind of intelligence can’t be built into the browsers themselves: analysis of how Fred uses his own web browser. The history functionality of most browsers logs most of their user’s behavior: so why not mine that data for the user’s advantage? Are there certain patterns of usage I have, that if I recognized it, would be of use to me? Do I start 80% of browser sessions by typing in www.google.com, even though my home page is something else? Are there certain pages I hit every hour? Every day? Every Friday at 2pm? I’m not suggesting that user’s get hit over the head with analysis tools: instead I’m suggesting that the browser can do analysis in the background on my behalf, and provide me reports or suggestions for optimizations if I desire them. Right now that data stream of user behavior is entirely ignored. Privacy risks aside, I bet I could find ways to optimize my own browser usage if I could see a medium level detail analysis of a week’s worth of browsing.
Intelligent bookmark management
Since the days of Mosaic and Spyglass, the management of bookmarks has been a dead end of UI design. Every browser I’ve ever seen either sees bookmark management as a file system type task, and gives you a file system view, or as a hierarchical sorting task, giving you a hierarchical view. Both are limited for the same reason: manually reorganizing bookmarks is not fun. There has never been an “organize your bookmarks” party or drinking game. People don’t want to do it, and unless it’s easy to do, most people won’t.
The smart way to manage bookmarks is to have the system do it for me. Three pieces of data can be used to make decisions about bookmarks: How often I visit things, which links are broken or have moved, and how many categories I have that have the same items in them. By scanning bookmarks and capturing these 3 piles of information, a whole bunch of smart decisions can be made by the browser on my behalf, helping me to organize them. Call it a wizard, call it “fix my bookmarks” menu command, but the functionality is the same, and people would like it. In some ways this mimics my desire for a “take out the trash” or “do my laundry” command: it should be possible for the system to take over mundane, tedious things for me.
For example, the 5 or 10 URLs I visit most often, whether they’re in my bookmarks or not, should be pushed into my quick link bar. I should never have to manually configure that thing at all if I don’t want to. Second, any urls that are dead should be deleted, or moved to a folder of dead links that I can try to revive. Lastly, for duplicate bookmarks, I should get the choice for how to merge them together into folders, or eliminate the duplication (Yes, this would require some heinous looking UI, duplication confirmation UI, like diff tools, always does). It wouldn’t work for all cases, but in some common situations folders with the same or similar titles can be combined together, and duplicates removed.
Good side-bars and bad side-bars
When I worked on the design of the explorer bars, the side bar thingies in IE4 & IE5, there was a big theory that I used (a theory well seeded by Steve Capps and Walter Smith) at work for why these things were worth building, and why they were better for some tasks than others. The idea was to capture the hub and spoke movement between websites that often occurred, where you’d be viewing a list of links that you needed to go through sequentially. Search results, being as lousy as they were then, were perfect candidates for this kind of user behavior. So the first bar we prototyped and experimented with was for search, and it worked well (That is, for users. The search engines hated it: when your business is based on advertising, the last thing you want is for your screen real estate to drop by 80%). These days, search results are so much better, so it’s sidebar might be used less for hub and spoke behavior, yet the fundamental theory is still sound. Whenever you have a list of ten links you want to view one at a time, a separate list of them that isn’t lost when you navigate makes sense.
The sidebar, explorer-bar, or whatever you want to call things carved out of the sides of browsers, earn their space in two ways: either the user needs to jump between several links from a list of links sufficiently often to dedicate space to it, or the information shown in the sidebar is important enough to reduce the space available to the page being viewed. That said, there are tons of different side bars built for IE5. In fact there are even tools that let you build your own, making use of the APIs IE5 provides. Most bars I’ve seen are medicore, mostly because they don’t earn their space, and weren’t designed with the theory in mind (or were designed in spite of it, since these browser add-ons are often seen by ISVs as a promotion vehicle for their existing software, more than a smart way to enhance the user’s browsing experience).
For advanced users, sidebars should be user createable When I hit a meaty page of links I want to view in hub/spoke fashion, I should be able to convert the page into a sidebar unit, with the browser chucking out all of the formatting, and just giving me a list of links I can navigate from (Opera does this today, as show links. Its a powerful little tool). For less savvy users, the sidebar is unlikely to be worth the overhead of managing them (turning on, turning off, resizing, etc.). History and Favorites are both good choices for default sidebars since they justify the use of space, and given the research mentioned earlier, are the lists they provide are places people will, in theory, return to often.
Supporting specialized tasks: Research & Annotations
One of the challenges of browser design is that the browser has to work like a swiss army knife: it is used in many different ways to do many different kinds of things. Building one piece of software that does them all equally well is impossible, and given that the market goal for major browsers is market share, browsers tend to be have general purpose UI designs, and not highly specialized or optimized designs for specific kinds of tasks.
So the challenge in making a better mass market browser is to find the pockets of common tasks that many people perform and find ways to support those tasks without messing up the basic/core usage of the product.
One of these common high level tasks is doing research: either for a product to purchase, a place to go on vacation, or various kinds of information gathering for work related projects. This kind of research often involves switching between two or more web sites, and compiling notes along the way. Even with tabbed browsing, doing research effectively requires using a word processor or some other app to write things down. Worse, there’s no way to preserve the browser state between sessions: so a complex arrangement of different sources can’t be easily preserved.
To support this kind of research, browsers should provide two things: basic project and annotation functionality. It should be possible to save the state of browser, tabs, pages and all, and return to it at will (Opera does this reasonably well with it’s save/open session commands). Then a student writing a paper, or a programmer planning a vacation, can stop their research whenever they want, and return to it in exactly the same state. The only UI necessary for this is a menu command to save the project state, and another one to recover it. Anyone that doesn’t work this way is largely unaffected.
The second part of this is some basic way to write commentary about the pages or websites in a project. This is a more difficult proposition: Finding the right place to make notes and creating the interaction model for it is messy. As mentioned above, the minimal (and probably crappy) way to do this would be to allow annotations of bookmarks. Then a user could type in their notes about a hotel or research paper into the bookmark, and work with a set of bookmarks for a given project. This would quickly become cumbersome since you want to see all of your comments at once: the world shouldn’t revolve around the bookmark system, it should revolve around doing research.
The next simplest answer is to provide a single place for users to write down whatever they want. That text is saved with the project file, and can be copied and pasted out for other purposes. Opera uses a sidebar to allow the creation of notes, and it’s a good start. But since this text box doesn’t handle links or basic html, it fills quickly, and feels clunky. As much as it sucks to flip between a browser and a word processor, that’s what I do whenever I’m doing researchy tasks (such as, say, writing an essay) regardless of what browser I’m using.
Credit cards, passwords and zip codes
I worked on a big database project years ago, and one of our big rules for the system was that people should only type things in once. If they typed in an address or an ISBN number, they should never be required to put it in again: ever. Typing sucks, and keeping track of long strings of random digits sucks worse. But sadly today, I know of many smart tech-head people who keep little text files on their desktops containing various passwords, credit card #s and other account information. Either they don’t trust their browsers, or their browsers don’t provide a convenient way to manage that information for them.
Firefox, Opera and IE all cover the basics for passwords. They detect form fields for username/pw combinations, and offer to remember them. Firefox provides better UI for reviewing the list of stored passwords (you can actually view them, unlike IE) but no support is provided in either product for Credit cards. Way back during the IE4 betas there was a feature called Microsoft Wallet, that offered websites a payment API: sites could ask the browser for payment, and the user would be prompted to select a credit card from their “wallet”. For a bunch of reasons this feature was pulled from IE5, but the basic idea is sound: provide a standard secure way for me to pay for things on the web, and don’t ever require me to type in that 16 digit # again.
Lastly, the probable leader in the “things typed in most often” category are people’s zip codes. There’s no technical reason that a browser can’t send a user’s zipcode as part of the HTTP header, informing any website of where the user currently is. It would allow weather and news sites to provide more useful information without the user having to do anything. The challenge is privacy and anonymity: to which the easy answer is to let users opt in: either assume it’s ok to give zipcodes to sites I have an account with (since the browser knows that from the password list) or give me the global choice to broadcast that information all the time.
Red herrings and over-rated concepts
Over the years there are a bunch of ideas for browser that come up every couple of years as the new sliced bread. Since for whatever reason we’re put so much emphasis on these things, I’m going to explain why they have limited value towards making better browsers.
Security and Stability
Something is wrong if competition in any product line continually focuses on security and stability. These design attributes are basic requirements, not advanced features. You won’t see advertisements for toaster ovens that say “Now, it explodes less often!” So while viruses, hacks, and crashes are still a popular topic of discussion for software products, better browsers involve getting past these basic requirements. I fully admit this is easier said than done: writing software is different from building a fortress, and things like pop-up ads and spyware clearly establish that there are competitive economic forces working against consumers. However the goal of browser design should be to minimize the impact of these things, and move on to investing as much energy as possible towards actually improving people’s ability to use the web. I doubt it will happen any time soon, but I’d like to think browsers can reach the same safety/reliability standards of automobiles: advances in car safety/security/reliability do happen, but the baseline standard is high.
Applications and Platforms
The running joke about web browsers is that they require all of the investment of an OS, without any of the revenue OS’s provide. It’s a funny thing: any web programmer sees any web browser as a programming platform, not an app. But at the same time the rest of the planet sees the web browser, and most web sites, as just another kind of application. The conflict makes browser design tough: it’s impossible to invest in the end-user experience and the developer experience to everyone’s satisfaction (a burden consumer OS developers have). Hell, even if you were only trying to do one of those two things, you still wouldn’t be able to do it to everyone’s satisfaction.
It’s hubris to assume that any single platform development is going to revolutionize everything. Anyone who’s been paying attention for more than a year knows that change is much slower that we like to believe. It’s a tiny pocket of early adopters that dominate most insider’s perception of the world, and the rate of technology migration out to even the mainstream tech sector, much less the mainstream business sector, isn’t that fast. So support for new protocols and standards may give programmers more power, but the rate at which they’ll be able to adopt it (despite what they say), and use it to help customers (despite what they say) will be slow and tenuous, if it happens at all. I’m all for progress in DOMs, improved HTTP type protocols, and better standards support: I just don’t see a dramatic reduction in crappy websites as a result of it. Websites have become complex bits of engineering, and the rate at which new platform enhancements will be adopted will continue to slow over time.
RSS / Push
The first time I hit a website that offered an RSS feed, and I looked at it, I was really confused. They are almost identical to any of the many Push technology formats of the late 1990s.Netscape and Microsoft had competing versions of XML files with name/value pairs, that were meant to store frequently updated meta data about web sites and web pages (See CDF, Channel definition format). Push technology imploded not because the technology didn’t work: in fact much of it worked ok. Instead it was that there just isn’t that much good content out there that most people care to be notified of every ten minutes. Worse, even when there was, there were only so many different ways to shove it into people’s faces.The screen only has so much real estate, and people only have so much attention to spend. Push technology and all of its trappings was the grand red herring of the entire browser war.
The rise of RSS, Atom and related formats (and the return of buzz about pushy web stuff) is directly tied to the rise of blogs. There are more people writing more stuff within communities of shared interest that all of the push stuff makes more sense to a tiny bit more of the population (Any guesses at the percentage of web browser users on the planet that use RSS or Atom feeds in any way. 5%?) I know many people who seem happy with all of the blogs they can now digest through RSS in their own way: I’m very happy for them. But coming back around to our framework: reading, navigating, interacting. Push technology is a boost to navigation, if it suceeds in making it easier for me to find things I want to read. But if I subscribe to too many feeds, I have a new kind of navigation problem to deal with, that can’t be solved by a file format. More so, RSS and other push type systems don’t improve my ability to read the content, improve the quality of the content, or do anything to sucessfully automate the process of finding and abandoning feeds. Pointcast, the first popular push application, at least attempted to seek out smart things for me based on some set of preferences and smart filtering, but most of the feed readers I’ve seen haven’t even gotten that far yet. The real goal of push should be to improve the quality of things I read, not the volume.
Interest in content filtering comes in waves. It was huge when I worked on it for IE3, since the Communications Decency Act (1996) has everyone terrified. A couple of years later, no one knew what parental controls and filtering were. It’s come back a few times, and then faded a few times. I think it’s good that parents are interested in what they’re kids are doing, and that technology exists to help parents make and execute their choices. The problem is that parental controls isn’t really a technological problem: it’s a kind of security problem, where the weakest link is what tells the real story. Any 12 year old that can’t access porn at his home computer is going to do the simplest thing: hang out at friend’s house who’s parents don’t use parental controls. That’s what me and my friends did to look at Playboy magazines, and I doubt 12 years olds today are much different. Few technologies will ever stand up to the will of adolescents trying to do things they’re told they’re not allowed to do (See Peacefire.org).
There are many CS and HCI laboratories that research ways to represent information in two or three dimensions (3D). They make some of the best and coolest prototypes and demos you will ever see come out of university research. The problem is these projects rarely clarify, or hypothesize, about what kinds of human activities benefit from 3D, or even from complex 2D representations (E.g. would a VR word processor be anything but frustrating?). Most of the time these projects work the other way: find things that are interesting to represent in 3D or complex 2D, find data sets that can be represented and interacted with in 2d and 3d, and then work backwards to see what those systems are good for. This isn’t to criticize researchers: in some sense this is exactly how exploration works: you point in a direction and see how far you can go. But when it comes to human interaction, someone has to start with careful design thinking about normal human behavior with existing designs, and hypothesize about how 3D or new 2D representations will help people (other than by impressing them with how neat and interesting their browser could look). So the problem I’ve found with most visualization research is that it fails most of the basic constraints of browser design.
For example, where does the visualization go? Browsers are confined to framing the page, since as we noted earlier, most of the user’s time is spent reading or using the web page itself. People do not spend most of their time navigating around (Despite how most visualization research projects are designed). This means that any visualization must be one or more of: compact, transparent, interstitial, or easily movable: all things that work dead against 90% of the visualization and navigation prototypes I’ve ever seen. So, apologies. I’m a reformed optomist (and a sad skeptic) on this one. I spent so much time exploring, prototyping and playing with different visualization techniques, that as much as I’d like to see something that pays off, I think there are too many constraints in real browser usage for these things to make a big difference. The compact simplictiy of a back button and a decent history list are just very hard to beat (though even the perennial “back button algorithm optimization debate” always seems worth exploring)
An annoying list of things that were almost covered in this essay
My notes for this essay outlasted me: there’s lots of good ideas and problems to solve that didn’t make it into this essay. My list included such diverse elements as: ways to improving online reading, back/forward systems, collaboration, real annotations, questioning thumbnails, web searching revisited, cookie management, improving printing, and more. I list them here not specifically to annoy you, but in the hopes I’ll come back to this essay one day and finish it off, or maybe this list will spark someone else to one up this essay.
References and other stuff you’ll like if you liked this essay
Multimedia and Hypertext, Nielsen. For hypertext history buffs, this is a fantastic compilation of various hypertext systems through the 80s and early 90s. Many of the things seen in today’s browsers have a heritage to these early projects. More so, there are plenty of interesting ideas from these older projects that have yet to be explored in mainstream projects.
Literary Machines, Ted Nelson. If you are a browser builder, this book will blow your mind. Written well before the Internet, it expresses a vision for how hypertext can change the world. It’s a sometimes frustrating read (as hypertext people tend to think everything should be explained in hypertext), but has just enough wonderfully crazy ideas in it to justify the frustration.
Stuff I’ve seen, Microsoft Research: A research prototype that bets everything on recurrence and reuse. It provides one interface for searching email, web and other stores of things you’ve already looked at. The MSR group has done several interesting navigation based research projects, including data mountain. See the HCI and Visualization listings on the MSR site. Lots of screenshots, prototypes and papers.
The magic of reading, By Bill Hill. Provides a framework for thinking about reading, both online and off. You need to install Microsoft Reader to read this (Note: I couldn’t find any other version of this very fine research paper. This is so diametrically opposed to the higher level idea of making things easy to read that I won’t even make fun of it).