Semantic Starry Sky using Binary RDF (HDT) Streaming over WebSockets

The image is a screen shot of a star field centered on our sun that was created from an RDF Turtle representation of the Hipparcos Star Catalog, which contains 118,218 stars, using the WebGL Haylyn client.  The RDF visual data was generated on a Haylyn server and was originally streamed as RDF Turtle over WebSockets.  HTTP polling was abandoned in Haylyn in 2011 due to poor performance and was switched to WebSockets.  However, in the use case of Hipparcos, the visualization data is in excess of 500,000 triples and load time from server to an actual visualization was aproximately 55 seconds which does not contribute to an enjoyable user-exeprience.

Text-based serializations of RDF (Turtle, N-Triples, RDF/XML, or the latest W3C recommendation - JSON-LD) are very handy since both human and machine alike can read them.  But there is a cost factor to this human readablilty.  In order to send server-based RDF to the Haylyn WebGL client, the binary memory structures on the server need to be serialized out to Turtle (or originally N-Triples) and then sent to the client where they have to be parsed into the client-side memory structures before being usable.  Text-based serializations are bulky but can be compressed to achieve a more compact size but at a cost of computational time both in the compression and the de-compression phase on the client.  Enter binary RDF.  Being unavailable, a javascript implementation of Binary RDF (HDT - a W3C member submission) was created and retrofitted into the Haylyn WebGL client.   Java libraries for HDT are available on the HDT web site and were added to the Haylyn server.  The RDF Turtle/WebSockets in Haylyn has been kept and the server/client can switch between text Turtle or Binary HDT during any particular session.  Further comparisons will be made.  The initial integration of HDT into Haylyn is not as efficient as it could be, that said, the initial timings are as follows to handle 2,034,211 triples:

TIme to Generate (milliseconds) Size in RAM (bytes)
Turtle 5,681 84,106,871
Binary HDT 7,935 14,154,034
Turtle with GZIP compression 7,733 13,367,910
Binary HDT with GZIP compression 8,168 7,977,923

These timings DO NOT include transfer time to client nor time for the client to digest this data.  Compression is performed with the Apache Compression library.  The major advantage of HDT is that it does not need to be interpreted on the client, but can be queried as it is a usable indexed memory structure.  Further optimizations of the javascript HDT implementation and server-size java interfacing will be done to improve performance.

The above is not quite the visual xanadu that is the Google Chrome 100,000 stars WebGL experiment which uses the Astronomy Nexus HYG dataset - a combination of catalogs which includes Hipparcos, but, the author of the 100,000 stars experiement lamented, "I feel like I've gotten to the point where my data was mixing too much with my code." Haylyn is data-driven with nothing specifically in code about stars.   Additional datasets are planned to augment the above visualization and will include exoplanet data, constellation data, planetary data, and publication data. 

If you cannot wait, please try out these excellent viewers for this type of data: Stellarium and/or the iPhone-based Exoplanet by Hanno Rein.