Linking to part of a resource

We had a question posted by sylvain:

Last week I happened to drive for nearly 22 hours so I got time to let me brain go crazy. One question I asked myself was to find a way to be able to quote directly a resource passage. let me explain.

Say you write a blog article which refers to another article or comment. Today you would only link to the URL pointing to that article or comment but you cannot state exactly what sentence you refer to. The classical way to do so is to add a footnote giving more details about the actual quote without polluting the main article. This works well in a book but I assume we should be able to do differently in a the Web context, shouldn’t we?

Our Comments:


XHTML_on_a_blobThis is actually a problem I started working on about a year and half ago. The solution is actually pretty easy, but requires that the publisher include id attributes for all elements they want to allow reference to, or a naming convention needs to be developed and agreed upon to autogenerate values that any compatible processor could then apply to the same XML/XHTML document and come up with the same result for each node each and every time (e.g. using position within a well formed XML/XHTML doc would allow you to combine the values of the id attributes for each preceding ancestor and come up with a unique value that would come to the same value each and every time that document is processed by a client processor.

The problem with this comes when a document is updated… the old id values would hold potential of either being shifted or no longer present. The solution to this would be to combine the tag values with a date sequence matching the time the document was created that they are referencing… not always an easy task and is often innaccurate because of this. Of course the publishers would then need to maintain multiple versions of a documents state and if theyre willing to to go to all that work, something tells me that they would be just as willing (actually, more so) to provide id attributes for each element that never change once they’re originally generated. This obviously doesn’t help if those elements and related content is deleted at a later date. But its half way there, and definitely better than the previous scenario.

Of course implementing web servers that use SVN or other SCC/RCS projects could easily allow access to the state of any given document at any given point in its history. In this situation combining the revision number (or name depending on the system you use) with the document you are referencing would solve this problem quite nicely.

The final piece of the problem is solved using Tidy, TagSoup, or something similar. As long as an agreement can be made as to what XHTML version should be output and what the non-XHTML conformant tags map to in the XHTML world, and this standardized format for making these determinations agreed upon and published then the result would allow the ability to use simple XPath patterns to determine the internal “address” of the content you are point at. By using XSLT 1.0 (injecting the XPath value into the compiled DOM — obviously XSLT/XPath 1.0 (as well as 2.0 for that matter) doesn’t allow for dynamic XPath evaluation, but thats easily fixed by just injecting the XPath value into the compiled DOM object using the standard DOM API.

In the end, even though a lot of the above would be difficult to rely on being implemented in any broad capacity in regards to server-side features, if even a small group of folks with interest in this type of referencing/indexing system were to implement a solid specification to act as a guideline for writing transformation files (as well as what to output when converting HTML to XHTML) then use this as an experiment to determine what works and what doesnt, then its possible to then use this to generate extended momentum.

But who knows if thats even possible either. I guess its a “try and find out” sort of thing.

One last thing… Ultimatelly a request to the server would return only the request XML/XHTML fragment, not the whole document. This too is easy, you simple have to URL encode the XPath on the client and unencode it on the server to ensure conformance to the URI/IRI/HTTP specs so that no matter how many hops and as such “hands” touch it, it should be reasoanble enough to expect it will arrive to its destination intact. Not sure how much of a concern this really would be and how much would be pushing the overkill button a little too much, but conforming to the spec certainly isnt going to hurt, and more than likely will help quite a bit to ensure a certain level of expectations can be assumed.

BTW… things can get even more interesting when you combine a client-side XQuery DB cache, as you can then use XQuery to access any given set of data that can easily be cached in the background with the data you have interest in (dynamically adding content that matches certain criteria based on searchs to the various engines across the net – or – only caching content that you specifically subscribe to via data feeds, etc… either way, store this in an easily accessible XQuery-enable client-side database, using XSLT 1.0 or 2.0 to transform the return result allows for a lot of REALLY interesting and exciting possibilities.

Oh, and by the way… all of the above has been proven to work. Not publicly yet, but I can share a few things with you that will push you in the right direction if needs be 😀

Enjoy your day!