You are here: Wiki>SE Web>ThesesHome>ThesisGoHyper (24 Nov 2015, jennifergeb)Edit

GoHyper: Ein neuartiges Recherchewerkzeug (GoHyper: A new kind of web research tool) [Bachelorarbeit]

Der Code zu dieser Arbeit liegt bei GitHub und ein grober Projektplan befindet sich hier.


GoHyper is a plug-in for Firefox that allows to create and show annotated hyperlinks between arbitrary existing HTML web pages easily. Such a mechanism is a powerful tool for researching and understanding complex topics on the web. (This page is in English to make it easier to provide English documentation for GoHyper later)


User Manual

GoHyper Overview

GoHyper is a tool that allows you (as a reader of other people's web pages) to add hyperlinks between arbitrary sections of text on arbitrary web pages with minimal effort and then navigate in the resulting personal web of findings. This capability acts as a powerful "brain extender" when you research a complex topic on the web: It allows to record your findings very quickly without getting overwhelmed by the number of small insights you are having.

It works like this:
  • You find a relevant statement (called a "quote", e.g. two sentences) on a web page A, mark it with the mouse, add it to your GoHyper Zettel, and perhaps add tags and a comment.
    (Think of the Zettel as a notepad or large clipboard)
  • Once there are quotes on your Zettel that you want to connect, open the Zettel, select the quotes, select the relationship kind (there are several), and perhaps add tags or a comment.
  • Click "Connect" to create a new, personal hyperlink between the quotes.
  • When you now view web page A, your quote will be highlighted and an annotation appears at its end by which you can see the tags and comment or jump to the linked other quote.
  • You can also directly browse through your personal GoHyper database.

Installation in the web browser

Firefox:
  • Open the menu, select "Add-ons" ("Tools-->Add-ons" in older versions of Firefox)
  • Search for "GoHyper" via the search box at the top
  • Find GoHyper in the results list
  • Click "Install"

Other browsers than Firefox are currently not supported. We may provide an implementation for Chrome later. We will not provide an implementation for Internet Explorer.

Collecting quotes on the Zettel

The Zettel (German for slip of paper or chit of paper) is your main work area when you collect quotes during research. The Zettel is where quotes live that you have found but not yet connected to other quotes.

  • Find a relevant quote on a web page A
  • Mark it with the mouse (text selection)
  • Open the context menu and select "Add to GoHyper Zettel..." and perhaps add tags and a comment.
    (Think of the Zettel as a multi-item clipboard. Tags are keywords that help finding stuff once you have a large collection.)
  • Click "OK". The web address A and the quote are now stored on your Zettel along with a time stamp, your tags, and your comment.
  • Alternatively, select "Quick-add to GoHyper Zettel" on the context menu if you do not intend to add tags or a comment: The "Add to Zettel" dialog will not appear.

Creating links from the Zettel

Once you have quotes on your Zettel, you can use the Zettel to create links:

  • Open the Zettel view via the Alt-F11 hotkey or the GoHyper icon on the toolbar.
  • Items (quotes) on the Zettel are sorted in chronological order, youngest first.
  • Select one quote. This quote becomes the FROM quote (the origin of the link(s) to be created) and will be marked accordingly (in red).
  • Select a second quote. This quote becomes a TO quote (the target of the link to be created) and will be marked accordingly (in green).
  • If you want to create multiple links at once, you can now select additional quotes which will all become TO quotes as well to be linked from the same FROM quote.
  • Select a link type from the list of types in the selection box at the bottom [!!!this list is preliminary and should probably be improved!!!]:
    • see-also: no particular meaning (a "normal" hyperlink)
    • is-similar-to: the FROM and the TO quote have similar content
    • is-different-from: the FROM and the TO quote contradict each other
    • is-detail-for: FROM elaborates on something mentioned in TO
    • is-background-for: FROM explains something that is useful when trying to understand TO [!!!is-background-for is difficult to discriminate from is-detail-for. Should we use only one of these?!!!]
    • group: FROM and all TOs have something in common that turns them all into a group
  • Of these types, 'group' is special, because it does not discriminate FROM and TO: If you have marked multiple TO quotes, the same type of link will be created not only from FROM to TO but also from any TO to any other TO. [!!!but how can we add to the group later? It needs a name, probably via a tag. Too complicated?!!!]

Using the GoHyper-augmented personal web

As long as GoHyper is active, it will consult its internal links database each time you visit a web page. If the URL of that page is in the database, GoHyper will locate the respective quote (or quotes) on it, highlight them, and add a hyperlink annotation at its end.

The highlighting consists simply of a different background color: Same hue, but 15% darker for light backgrounds or 15% lighter for dark ones.
The hyperlink annotation is
  • a compact summary of the (possibly many) hyperlinks that are attached to this particular quote
  • a way to look at the quote's metadata (tags, comment)
  • a way to look at the hyperlinks' metadata (target URL, link type, tags, comment)
  • a way to use one of the hyperlinks

Here is an example what this might look like:

This is what could be some initial text on the page in question (a page containing at least one quote). This sentence is what could be the first part (sentence) of the quote. And this sentence might represent the quote's remainder, long enough to make the quote cover more than one line.{3● 1≈ 2≉ 5⊡ 6▨} And this is finally the subsequent text on the page in question.

The above annotation means the quote has
  • 3 incoming or outgoing see-also links (incoming means this quote is the TO end, outgoing means this quote is the FROM end and this is a plain normal hyperlink, the difference is not shown in the annotation, only in the subsequent details view)
  • 1 is-similar-to link (for these, incoming and outgoing makes no difference anyway)
  • 2 is-different-from links (for these, incoming and outgoing makes no difference as well)
  • 5 incoming or outgoing is-detail-for links
  • 6 incoming or outgoing is-background-for links
and is of course somewhat unrealistic; real annotations will usually be much simpler.

If the annotation shows only a single link,
  • hovering over it will display a tool tip with the respective metadata and
  • clicking on it will follow the link, open the target web page, and scroll to the target quote on that web page.
If the annotation shows multiple links,
  • hovering over any one of the annotations will display a tool tip with abbreviated meta data of the links of that link type and
  • clicking on any part of the annotation will open a details view showing all information about the quote (tags, comment) and each link (link type, link direction, creation date, URL, quote, tags, comment) in an appropriate form.
  • You can navigate to any target quote by clicking in the details view.

What if the content of a page changes?

In order to present you with the augmented view of a web page, GoHyper needs to locate your quote(s) on the page. This is not always easy. Some web pages contain the same text for ages and your quotes on such pages will be in the same spot with the same wording for a long time. Other pages change gradually (e.g. Wikipedia pages) and may change the wording of your quote. Still other pages frequently even change abruptly (e.g. front pages of web portals) and your whole quote may have disappeared the next day.

GoHyper uses two mechanisms to cope with these problems:
  1. If it cannot find the whole quote on the page, it will search for ever-shorter prefixes and suffixes of your quote, i.e. only the first few words and the last few words, to locate a quote that has been modified.
  2. In some cases, it will not actually store the URL of the page on which you have marked a quote but rather the URL of a different page that is known to contain the same quote (and enough text before and after it) for a longer time. This applies to the front pages of several types of blogs and of some web portals for which GoHyper knows the URL rules and is able to derive the address of a stable, long-term page to replace the original URL.

Where is the data stored?

All data you collect with GoHyper is currently stored in your browser's local profile (which in turn is stored in your user profile on your computer). That means it will not be available on your other computers or in other people's accounts on the same computer. If you collect substantial amounts of data, make backups regularly!

We hope to add mechanisms for network/cloud synchronization to later versions of GoHyper.


Implementation considerations

How does one write a Firefox add-on anyway?

There is plenty of information on this on the Mozilla Add-On Developer Hub.
In particular the XUL School is highly recommended.
(Being familiar with CSS and Javascript is a big plus, but otherwise it's not rocket science).

Usability focus

Of course people have scanned sources, extracted quotes, and created semantic networks from them for a long time.
The key new point of GoHyper is the ease with which such networks can be managed and used. But making something easy with software is usually difficult, so make sure to create solutions that are likely to have good usability -- shy away from simple-but-half-baked solutions.

Internationalization

The initial implementation should support English and German locales. (We use the term "Zettel" in all languages.)

Annotation markup

For marked-up quotes on web pages, it is preferable to use CSS styles and characters only for the markup, not images, in order to keep display times low.

Here is a list of Unicode bracket characters for the markup. Many other Unicode characters may be helpful as well, for instance mathematical operators, geometric shapes, arrows, or more arrows.
Beware: by far not all fonts will display all those characters!

This is why Franz Zieris recommended to use MetroUI or Font Awesome instead and indeed this may be better -- and prettier as well.

Database migration

GoHyper is a tool for creating data. Data is valuable. Make sure that a user's data are automatically, invisibly, and reliably(!) migrated when s/he switches from one version of GoHyper to a later one (which will undoubtedly often use a different database schema). Such a switch may make multiple steps at once, say, from version 3 to version 7. It is OK if such a migration takes a little longer because it internally goes to database schema versions 4, 5, and 6 first. Backwards migration needs not be supported.

A backup function for the database contents would be great, though. (And of course importing a backup must immediately trigger the migration.)

Open questions

The above user manual sketch leaves open a number of issues that need a solution. For instance:
  • How to use the same quote for linking multiple separate times? (This could be done by a Zettel content history)
  • How to put the same quote on the Zettel again long after linking?
  • How to remove links (or whole quotes with all their links)?
  • etc.

Future functionality

Things we will want to add later:
  • Search in quotes, URLs, tags, comments.
  • Smarter quote search: Find quotes by similar sets of words rather than prefixes and suffixes.
  • Tag management: rename a tag, unite two tags, delete a tag.
  • Link management: delete all links with a certain tag.
  • Quote management: delete all quotes (and their links) with a certain tag.
  • Cloud sync for the database
  • The GoHyper social network: I can make the quotes and links of certain other users dynamically visible in my browser. This requires a server and will create a huge scalability problem. (Merely syncing fractions of other people's databases would be cheaper, but also much less useful and appealing.)


Interested?

If you are interested in perhaps building GoHyper in your bachelor thesis, talk to Lutz Prechelt.

I expect you to create a clean open-source code base that can easily be developed further (including documentation with screenshots) and to perform some user trials in order to achieve intuitive usability.

Origin

GoHyper is based on an idea of Hardyna Vedder (and she is working with us to shape it into a concrete tool).
Topic revision: r9 - 24 Nov 2015, jennifergeb
 
  • Printable version of this topic (p) Printable version of this topic (p)