Thursday, August 28, 2008

New Age Tagging = SharedTags(sm) | TagFont(sm) | TagSort(sm)


.... Tag Queries for a Thursday Early Afternoon ....

I. Are there implementions/technologies that can display the degree of association / co-occurence of Tags within a corpus (and enable one to navigate in one way or another (e.g., tag cloud and/or other visualization))?

II. Are there implementations/technologies that allow one to designate the relevative importance of a tag for a section and/or corpus of text? Major / Minor Importance Would Be A Good Beginning (e.g. Bold vs. Non-Bold)

BTW: Certainly The Ability To Indicate Relative Importance by Font Size / Style Would Be More Interesting [:-)

[And Lets Not Forget About Color []]

III. Are there implementations/technologies that allow one to sort sub-document tagged text by select criteria? For example: Have the ability (re)sort tagged text sections (e.g. paragraphs) such that the most relevant sorted sections are displayed before those of less relevance (e.g. sections with tags that are more 'associated' or co-occurring)

BTW: Tagging Is Not Limited To 'Text' (Can Also Apply To Photos (Flickr), Videos (YouTube), Other Media .

Please Post Responses / Thoughts As (A) Comment (s) On This Blog Entry




John Fudrow said...

Because of the nature of flat hierarchy tagging, most systems don’t have the structure for creating relative relationships between tags. I haven’t been researching tagging as of late but from what I have seen:

I. Creating associations for user created tags would be a meta-tag for your tags. Therefore, the majority of systems leave the tagging at the individual item level.
II. Importance is a subjective term when dealing with user-generated tags. One user may deem TagA as being more important than TagB, while another could care less about TanA and only want to keep TagB. This concept requires at minimum two levels of descriptive relationships to be formed.
a. Using font size and style are very important tools for emphasis. Ask any graphic or web designer. Color can be a two edged sword as there are cultural differences and visual impairments that limit the usefulness of color based features.
III. This concept relies on the faceted tagging of not only documents but also document elements. Where this may sound highly interesting, the logistics of having separate sets of tags for container and contents would require a system to organize and relate multiple sets of tags, which have a differing value as to their scope and applicability.
a. You are correct to highlight that tagging is applied to not only text but to visual and audio materials. This in itself can cause headaches for ascertaining the semantics of what a tag means.
IV. I had once suggested the creation of a tagging facet system that would allow advanced users, hopefully librarians, to associate tags in a closed system. This would allow relationships to be joined in many ways and with scaled importance. My dream would be to see the “facets” shared amongst interested groups and expanded much like a wiki article.

ChemSpiderMan said...

I gave a presentation at ACS Philly recently on our Proof of Concept document markup according to the National Library of Medicine DTD but also extended by us to facilitate chemistry specific markup.

The presentation here outlines the general concept:

The basic concept I presented is as follows, with a focus on Chemistry Articles. A lot of effort is being expended in "text-mining"
publications, post-publication, to index these articles and make them searchable not only by text but by the specific language of chemistry, chemical structures. We are specifically asking the question "why extract chemical structures from articles using chemical name conversion approaches and chemical image conversion tools when the structures in the article were ORIGINALLY machine readable?"

We are considering a system whereby authors are asked to contribute to the availability of a free online service for performing structure and substructure-based searches of chemistry articles. While the submission of journal articles is already a lot of work (I know from experience of authoring/co-authoring about 10 a year) we hope that authors will support a service whereby they can upload their own articles to a "validation and mark-up service". The upload capabilities will support upload of the primary document, chemical structures in standard formats and supplementary information of various types (to be defined)

This system will perform the following services:

1) semi-automated markup of a document - title, author(s), abstract and additional dictionary-based terms plus the ability to use the NLM-DTD markup
2) identification of chemical names and conversion to structures in an automated fashion
3) conversion of structure IMAGES to connection tables using optical structure recognition software (either commercial or open surce)
4) ask authors to confirm whether the converted structures are appropriate
5) provide a structure validation service for submitted molecules checking for "accurate representation"
6) Deposit all structures associated with an article onto ChemSpider but under embargo. Associate the article Title, authors and "abstract snippet" with all structures.
7) Issue a set of ChemSpider IDs for the author to submit to the publisher with the article
8) When a publication has passed through review the author can release the structures from embargo using a DOI or an article URL (more common for Open Access articles)

The result of this project will be a way for publishers to link their articles directly to a free access chemistry database and use a series of web services to enable other capabilities (to be defined). It will also allow articles in Open Access and non-Open Access publications to searchable by the "language of chemistry".

This is only a slice of the overall project but I think it may be of interest relative to the comments you have made below.

Parts of this were shown last week at Drexel University and this snippet is available online here:

We are also going to provide a Microsoft Word add-on which will allow users to prepare articles for publishing using similar technologies.

Once articles are marked up and tagged with some common dictionaries some of your suggested approaches will be facilitated.