Musings from Mars » Taking a Snapshot of the Semantic Web:Mighty Big, But Still Kinda Blurry

For Software Addicts: Yes!

Maybe

Nah!

Taking a Snapshot of the Semantic Web:
Mighty Big, But Still Kinda Blurry

Published February 21st, 2009

It's still somewhat difficult to get a handle on exactly what is meant by the "Semantic Web," and whether today's technologies are truly able to realize the vision of Tim Berners-Lee, who first articulated it back in 1999. From what I've read, I think there's general agreement that we aren't even close to being "there" yet, but that many of the ongoing Semantic Web activities, technologies, development platforms, and new applications are a big leap beyond the unstructured web that still dominates today.

There is a huge, seemingly endless amount of work being done by thousands of groups all trying to contribute to making the Semantic Web a reality. In my few weeks of research, I still feel as though I've just stepped my toe into that vast lake of semantic experimentation. Partly as a result of the many disparate projects, however, it does become rather difficult to see the entire forest for all the tiny trees. That said, these thousands of groups do appear to be working more or less together on the basis of consensus-based open standards, and they have set up mechanisms to keep everyone abreast of new ideas, solutions, and projects, under the general leadership of the World Wide Web Consortium (W3C)'s Semantic Web Activity. Semantic Web Stack As Envisioned by Berners-Lee

Semantic Web Stack As Envisioned by Berners-Lee

As a starting point for exploration into this topic, the Wikipedia article that describes the Semantic Web Stack is quite good. Among its good overview and many useful links, the article includes the original conception of the Stack as designed by Berners-Lee. Besides cataloguing the sheer number of different projects all tackling different aspects of building a Semantic Web, it's important to distinguish ongoing projects from those that expired years ago—a distinction that's not always readily apparent to those peering in from the outside. Even excluding these, there are far too many projects to read up on in a few weeks, so this snapshot is necessarily incomplete. But after having the content reviewed by some Semantic Web experts, I'm confident it includes all the most significant threads of this new web, which, as Berners-Lee envisioned it:

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.

In my tour of the Semantic Web as it exists today, it's interesting to note that most of the projects are geared not toward machine-to-machine interaction, but rather to the traditional human-to-machine. Humans being by nature anthropocentric, the first steps being taken toward Berners-Lee's vision are to build systems that are semantically neutral with respect to human-to-human communication. Once we can reliably discuss topics without drifting off into semantic misunderstandings, then perhaps we can start teaching machines "what we mean by" ... This paper is an attempt to assess the current state of today's steps, while compiling a list of resources that would prove useful to someone thinking about building a Semantic Web application in 2009. Challenges to Building Semantic Web Applications The process of applying concepts from the Semantic Web to build richer, more knowledge-oriented applications presents developers with several, somewhat challenging prerequisites:

Taxonomies for the content being published,
Ontologies for the content, based on the developed taxonomies,
Content tagged using the developed ontologies,
Database tools for storing and serving RDF and/or OWL ontologies,
Database tools for connecting ontologies with the content they describe,
Application server specializing in querying and formatting semantic content,
User interface tools to present semantic content in optimum, not necessarily traditional, ways.

Ontology standards One of the base specifications for ontologies, RDF (Resource Description Framework), is a well established standard based on XML and URIs. that is the basis for all the news feeds and podcasts one can subscribe to today. DAML (DARPA Advanced Markup Language) is one of the early ontology standards built as an extension to XML and RDF. Still widely used, DAML is also the precursor to OWL. OWL (Web Ontology Language) is a sophisticated framework built on top of RDF and is perhaps the most well known and most adopted of such ontology languages. OWL is the standard adopted by the W3C (the official standards body for web specifications). At the moment, there are several different flavors of OWL, which makes adopting OWL more challenging than using RDF.

Each of these requirements present a fairly steep learning curve to developers who have not previously worked with the technologies to build Semantic Web applications. Solutions for aiding with some of the requirements exist, but it's not clear how effective they are at this stage. For example, I have listed some tools that assist in extracting semantics from unstructured documents, and others that do something similar with content stored in relational databases. On the other hand, the process of tagging unstructured content appears to have no good automated solutions. The process of building an ontology can be quite time-consuming, unless there happens to be an existing ontology you can reuse. There are several extensive online libraries of ontologies that can help. One fly in this bibliographical ointment, however, is the difficulty one may face in choosing among different, perhaps conflicting, ontologies on the same topic. It's important to emphasize that one of the first steps in building an ontology is to build a taxonomy. Although ontologies are not taxonomies, they use taxonomies as their jumping-off point. Therefore, one has to be ready and able to build a taxonomy for a subject before one can build an ontology. Many of the tools and projects included here are designed to assist with building and browsing a web of Linked Data, rather than true semantic data. Some of the demo browsers for linked data don't strike me as being particularly relevant to most end-user requirements for knowledge management. However, linked data is quite useful in integrating content from across the web, and projects built around it typically make heavy use of RDF, SPARQL, and other related specifications. Websites that make use of linked data represent the vanguard of the application of Semantic Web concepts, and their number appears to be exploding at the moment. Well-known examples of the use of linked data are websites built as "mashups," such as Google Maps. Use of Microformats and RDF Triples is also a typical component of websites that expose their content as linked data. More powerful tools exist in the form of integrated application server suites, such as the OpenLink Virtuoso server, Cyc Knowledge Server, and Intelligent Topic Manager. The first two have open source versions that can be used by developers to "dip their feet" into the task of building a Semantic Web applications using sample data. Of the two, I was most impressed with the breadth of tools in the OpenLink project, as well as with the range and vibrancy of the Virtuoso developer community. Virtuoso also comes with a rich set of user interface "widgets" that can be of great assistance in presenting semantic information appropriately. A Possible Approach To sum up, the landscape of the Semantic Web is still quite fuzzy and volatile, with many mountains of activity building up rapidly and eroding with nearly equal speed. Which landforms will remain once the evolution is complete is impossible to say here in 2008. However, the landscape is exciting to watch and flush with tantalizing experiments that will undoubtedly inspire more experimentation in the years ahead. Obviously, given all of the preceding caveats, the decision to engage in a Semantic Web experiment cannot be made lightly. One must have a clear idea of the knowledge management/presentation problem that such an experiment is designed to solve, and an understanding of the resources that will need to be devoted to the project. Although the maturity of tools, standards, and processes for such a project is quite young, it would definitely be in the interests of an organization with suitable candidate data and sufficient resources (including time) to begin an experiment of its own as a learning exercise.

What is an ontology? An ontology is a systematic description of concepts, in detail and thoroughness such that a machine encountering the concept could "understand" it. In this aspect, ontology development is closely related to research into artificial intelligence. In the past, humans have relied on complex taxonomies to describe the way abstract ideas and concrete individuals relate to one another. Ontologies differ from taxonomies in the complexity and thoroughness of describing the relationships between the elements of a taxonomy. A typical taxonomy is a tree structure that arrays terms as categories and subcategories. However, no subcategory has any notion of its relationship to its siblings, nor to any other categories elsewhere in the tree. An ontology can describe these relationships, thereby enriching one's understanding of what a given category means. Further, each category (or "concept", or "class") in the tree can have its own distinct properties. Properties describe the relationships between and among individuals in the ontology. Individuals are the specific instances of each class that the ontology needs to be able to describe. Properties are characteristics of a class that help distinguish one group of individuals from another. For example, if we have a class "job" in our ontology, with a subclass "administrative" and a further subclass "computer specialist," we could distinguish all the individuals who are computer specialists by defining the job's characteristics (properties). A computer specialist "writes software programs," "performs desktop support," "manages databases," "builds web applications," and so on. With an ontology, we could very richly define a group of individuals using such properties. This is a simplistic overview of properties… OWL provides a vast array of ways to describe properties and of the types of properties one can describe.

I would advise against a major expenditure for such an experiment, however. As noted, given the state of the technology, it strikes me as being unwise to invest a large sum in any commercial product to use as an application platform. Most of the tools that exist for building Semantic Web applications have open source licenses, so it makes sense to restrict experimentation to such tools for now. The data store chosen for such an experiment should ideally be one that currently suffers from being both fragmented and unstructured, existing in incompatible file formats and stored in different locations within the organization's Intranet—all factors that make it difficult for users to locate specific information. Given the uncertainties surrounding such an experiment, the data store chosen should also be one that is not so volatile that time pressures can cause discontinuities in content over the course of the project. Whoever undertakes such a Semantic Web experiment needs to be prepared to conclude that the effort required to bring their experiment to fruition is too great to justify the added value. Even if this were to prove true in 2009, I'm confident that the impressive swirl of activity taking place now will coalesce into truly usable techniques and tools within a few years. The standards on which the Semantic Web will be built are still evolving, but they are much more mature than the methods developers have built to turn those standards into working applications. Therefore, having gotten one's feet wet in the state of things this year will undoubtedly provide a solid foundation for building Semantic Web applications in coming years. The bulk of this report consists of a compilation of resources on various aspects of the Semantic Web and developing Semantic Web applications. The resources are divided into the following categories:

Ontology Development Tools
Application Development Tools
Database Tools
Application Servers
Semantic Application Demos
Semantic Website Enhancements
Other Resources

Ontology Development Tools

Protege

Comes in two "flavors": Version 3.4 handles both OWL and RDF ontologies, while 4.0 is geared toward the latest OWL standards only.
Impressive software for creating OWL ontologies.
User interface is well organized, given the complexity of the objects and properties you're dealing with. The interface also must handle multiple views of the information, and it does so quite well.
Numerous plugins for Protege make specific task work easier. There are many more plugins for Protege 3.4 than for 4.0 at this time.
One plugin enables database connections, with which you can import entire databases or tables, including their contents. Tables typically become OWL objects, and columns become object properties. Impressively, this tool also creates a complete form with which you can enter new instance information. Each form field can also be customized after creation.
Protege can also export ontologies to "OWL Document" format, which is a browsable HTML representation of the ontology.
Stanford is developing a web-based version of Protege. The beta URL is at Web Protege.

Protege Plug-Ins

OntoLT. The OntoLT approach aims at a more direct connection between ontology engineering and linguistic analysis. Used with Protege, OntoLT can automatically extract concepts (Protégé classes) and relations (Protégé slots) from linguistically annotated text collections. It provides mapping rules, defined by use of a precondition language that allow for a mapping between linguistic entities in text and class/slot candidates in Protégé. (This plug-in is only available for Protege 3.2.)
There are a wide array of plug-ins for Protege 3.2, and a much smaller set for 4.0. This page from the "old" Protege wiki has good links to the full library of Protege plug-ins.

Ontowiki

Ontowiki is a tool providing support for agile, distributed knowledge engineering scenarios. It facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents. Ontowiki is built on the Powl platform. I have downloaded and installed an instance of Ontowiki on my home computer; the installation and configuration was quite simple.
- An online demo of Ontowiki is available.

Application Development Tools

The list in this section is just a small subset of the tools now available for building Semantic Web applications. There are several complete, continuously updated lists on the web, including those at SemWebCentral and the Semantic Web Company. Developer Resources

SemWebCentral is an Open Source development web site for the Semantic Web. It was established in January, 2004 to support the Semantic Web community by providing a free, centralized place for Open Source developers to manage Semantic Web software and content development. Another purpose is to provide resources for developers or other interested parties to learn about the Semantic Web and how to begin developing Semantic Web content and software. SemWebCentral has the following major portals:
Web Tools by category, a list of 148 projects organized by topic and a wide variety of other attributes.

Code snippets, an archive of code snippets, scripts, and functions developers have shared with the open source software community.
Learn About the Semantic Web, a collection of overviews, tutorials, and papers covering Semantic Web topics.
Programming With RDF is part of the RDF Schemas website. It has links to repositories of programmer resources by programming language, showing the kind of documentation, code, and tutorials covered by the repository.
Semantic Web Tools is a comprehensive list of over 700 developer tools now available for semantic-web-related projects. There are several such lists on the web, but this one is particularly good since it breaks the list down by category and language, making it much easier to narrow down the list you're interested in. This site is hosted by the Semantic Web Company.
Developers Guide to Semantic Web Toolkits collects links to Semantic Web toolkits for different programming languages and gives an overview about the features of each toolkit, the strength of the development effort and the toolkit's user community.

Frameworks Sesame

- Extensions and Plugins
- Rio, a set of parsers and writers for RDF that has been designed with speed and standards-compliance as the main concerns. Currently it supports reading and writing of RDF/XML and N-Triples, and writing of N3. Rio is part of Sesame, but can also be downloaded and used separately.
- Elmo is a toolkit for developing Semantic Web applications using Sesame. Elmo wraps Sesame, providing a dedicated API for a number of well known web ontologies including Dublin Core, RSS and FOAF. The dedicated API makes it easier to work with RDF data for the supported ontologies. Elmo also offers a set of tools related to the supported ontologies, including an RDF crawler, a FOAF smusher and a FOAF validator.

Sesame

Aduna Software

Sesame also has a large ecosystem of addons and related toolsets. The following are the main links to these.

Jess

Java

Sandia National Laboratories

A Jess Plugin for Protege is available, integrating Jess development with your ontology.

Jena

- ARQ, which is a query engine for Jena. ARQ supports multiple query languages (SPARQL, RDQL, and ARQ, the engine's own language), and besides Jena it can be used with general purpose engines and remote access engines. ARQ can also rewrite queries to SQL.
- Joseki, an HTTP server-based system that support SPARQL queries. Joseki features a WebAPI for the remote query and update of RDF models, including both a client component and an RDF server. The Joseki server can run embedded in an application, as a standalone program, or as a web application inside a suitable application server (such as Tomcat). It provides the operations of query and update on models it hosts.

SPARQL

HP Labs Semantic Web Programme

The Owl API

- RDF/XML parser and writer
- OWL/XML parser and writer
- OWL Functional Syntax parser and writer
- Turtle parser and writer
- KRSS parser
- OBO Flat file format parser
- Support for integration with reasoners such as Pellet and FaCT++

OWL API

OWL 2

Co-Ode

Powl

Powl

Ontowik

Visualization and Query Tools Jambalaya

Computer Human Interaction & Software Engineering Lab

Shrimp

Jamabalaya

University of Georgia

OntoVista

University of Georgia

OntoVista

SWRL (Semantic Web Rule Language)

SWRL

Pellet

OWL DL

Clark & Parsia LLC

reasoning services

Pronto is an extension of Pellet that enables probabilistic knowledge representation and reasoning in OWL ontologies. Pronto is distributed as a Java library equipped with a command line tool for demonstrating its basic capabilities. It is currently in development stage—more robust and mature than a mere prototype, but less mature than a production-level system like Pellet.

Pronto offers core OWL reasoning services for knowledge bases containing uncertain knowledge; that is, it processes statements like “Bird is a subclass-of Flying Object with probability greater than 90%” or “Tweety is-a Flying Object with probability less than 5%”. The use cases for Pronto include ontology and data alignment, as well as reasoning about uncertain domain knowledge generally; for example, risk factors associated with medical conditions like breast cancer.

OWL Ontology Validator

- This is an online demo of the Validator.

This online tool

Seamark Navigator

Seamark Navigator

Siderean

Unstructured Content Mining Tools Calais

- This online demo takes a URL as input and returns a set of metadata extracted from the page, which developers can use to help develop taxonomies for their content.

Calais

Cortex Competitiva Platform

- A small online demo shows sample text and entities and actions extracted from it.

IdentiFinder Text Suite

IdentiFinder Text Suite

BBN Technologies

DL-Learner

DL-Learner

AKSW

Wikipedia entry for ILP

Transformation Tools GRDDL

GRDDL

transformations

RDFizers

Simile

RDFizers

Database Tools

Query Languages and Tools SPARQL Query Language for RDF

SPARQL

Owlgres

Owlgres

PostgreSQL

D2RQ

D2RQ

Jena

Sesame

SPARQL Protocol

Linked Data

Conversion/Transformation Tools OntoSynt

OntoSynt

SQL Fairy

Relational.OWL

Relational.OWL

Triplify

Described elsewhere in this report.

SQL Fairy

SQL Fairy

Application Servers

OpenLink Virtuoso Universal Server

Virtuoso, developed by OpenLink Software, is a complex product that appears to be a total solution for hosting Semantic Web applications, among other uses. In the company's words, from a recent release: "Virtuoso enables end users, systems architects, systems integrators, and developers to interact with data at the conceptual as opposed to the traditional logical level. Data about customers, suppliers, invoices, and orders, stored in existing ODBC- or JDBC-accessible database systems such as Oracle, Informix, Ingres, SQL Server, Sybase, Progress, and MySQL, can be presented in RDF form for use in Semantic Web applications."
Virtuoso is also available in an Open Source Edition, a very active project that includes a large number of modules for use with various content management systems. The main difference between the open source and commercial editions of Virtuoso is the Virtual Database Engine, which essentially enables an application to incorporate multiple data servers in its queries.
Also available as open source from OpenLink is its OpenLink Ajax Toolkit (OAT), which comes with a wide range of user interface and data widgets, as well as complete applications for building data queries, designing databases, and designing web forms. The OpenLink Data Explorer is one of these standalone OAT applications. Widgets that are part of OAT include:
- Charts
- Tables
- Pivot Tables
- Tree controls
- Docks
- Sidebars
- Timelines
- RDF Visualizer
- Edit-in-place
- Buttons and Sliders
- Windows
- Tag clouds
- Mashups (e.g., data with Google Maps)
- Data modeling
The standalone applications running on the Open-Source Edition all incorporate widgets from the OAT to create quite robust, desktop-application-like tools (the username/password for all of these is demo/demo):
OpenLink also provides OpenLink Data Spaces (ODS), which run on the Virtuoso server, either the commercial or open-source editions. ODS enables developers to create a presence in the Semantic Web via Data Spaces derived from Weblogs, Wikis, Feed Aggregators, Photo Galleries, Shared Bookmarks, Discussion Forums and more. Data Spaces thus provide a foundation for the creation, processing and dissemination of knowledge for the emerging Semantic Web. ODS is pre-installed as part of the demonstration database bundled with the Virtuoso Open-Source Edition. Existing ODS modules include:
- Blogs
- Wikis
- Briefcase (file-sharing)
- Feed Manager
- Calendar
- Bookmark Manager
- Community (small-group spaces)
- Mail

Cyc Knowledge Server

The Cyc Knowledge Server is a very large, multi-contextual knowledge base and inference engine developed by Cycorp. The Cyc technology includes the following components:
Cycorp also offers an open-source version of Cyc called OpenCyc. OpenCyc contains the full set of (non-proprietary) Cyc terms. The portal for the OpenCyc project, where developers can download the software and learn about ongoing projects and documentation is OpenCyc.org.

Intelligent Topic Manager

Intelligent Topic Manager (ITM) is a commercial semantic software platform that enables a wide range of applications in enterprise information systems. ITM is designed to help organizations leverage, organize and model content and knowledge, to manage business reference models and taxonomies, to categorize and classify content, and to empower search. The platform consists of the following components and functionalities:

Oracle Semantic Technologies

Oracle Spacial 11g is an open, scalable RDF management platform. Based on a graph data model, RDF triples are persisted, indexed and queried, similar to other object-relational data types. Application developers can use the Oracle server to design and develop a wide range of semantic-enhanced business applications.

Asio Tool Suite Available from BBN, the Asio Tool Suite is focused primarily on building Semantic Web applications by integrating an enterprise's existing databases and systems without the need for complete reengineering. Designed to address the volume, variety, and exponential increase in enterprise data, the Asio Tool Suite supports information discovery via Semantic Web standards and provides for data accessibility via queries posed in a user’s own ontology. The suite further enables integration of systems by building bridges in semantic meaning from one system to another. The suite consists of the following components: Parliament

Jena

SPARQL

SWRL

Cartographer

Snoggle

Scout

Asio Scout provides semantic bridges to relational databases and web services that let an organization keep their existing systems in place for as long as necessary to, for example, support ongoing operations. Scout's semantic bridges act like any passive data consumer, but unlike other counterparts, their functionality— in concert with Asio Semantic Query Distribution's high-level perspective—enables consolidated knowledge discovery that wasn't previously conceivable. Scout can be used for web portals, standalone desktop applications, or web-enabled applications.

Semantic Application Demos

Browsers and Search Portals

Disco - Hyperdata Browser is a simple browser for navigating the Semantic Web as an unbound set of data sources. The browser renders in HTML all information that it can find on the Semantic Web about a specific resource. This resource description contains hyperlinks that allow you to navigate between resources. While you move from one resource to another, the browser dynamically retrieves information by dereferencing HTTP URIs and by following rdfs:seeAlso links.
- Here is an online demo of Disco's presentation of DBPedia's Semantic Web database on the concept "Sociobiology."
Umbel Subject Concepts Explorer is a lightweight ontology structure for relating Web content and data to a standard set of subject concepts. Its purpose is to provide a fixed set of reference points in a global knowledge space. These subject concepts have defined relationships between them, and can act as binding or attachment points for any Web content or data.
- Here is an online demo of Umbel's presentation of the concept "Field of Study."
Openlink Data Explorer is one product developed from the open-source version of the Virtuoso Universal Server product. This is the platform used by the DBPedia project, including the demos on the DBPedia page. The demo below shows the XHTML view option of a Data Viewer ontology query.
- Here is an online demo of the OpenLink Data Explorer presentation of the concept "speed of light."
Zitgist DataViewer lets users browse linked data on the web, starting from an RDF or OWL ontology URL.
- Here is an online demo of the Zitgist viewer browsing an ontology on the concept of "music genre."
The Sindice Semantic Web Index monitors, harvests existing web data published as RDF and Microformats and makes them available under a coherent umbrella of functionalities and services. Its index of data is presented as a search portal much like Google. Sindice is created at DERI, the world’s largest institute for Semantic Web research. It is based on DERI’s unique cluster technology which indexes and operates over terascale semantic data sets (trillions of statements) while also providing very high query throughputs per cluster size. Leveraging unique cluster technologies, Sindice performs sophisticated reasoning which dramatically enhances data reusability, search precision, and recall. It obtains data by focused crawling methods which detects and focuses on metadata rich internet sources.
- Here is an online demo of the Sindice search engine.
The RKB Explorer is an application built using awards data from the National Science Foundation (NSF). It has used this data to build ontologies around NSF grants, and users can search and browse the data through the Explorer. All URIs on this domain are resolvable, and search results deliver HTML or RDF, depending on the content. The browse interface provides viewing and navigating using RDF triples, and the query interface provides access using SPARQL. I discovered this useful application through a search on "NSF funding" using Sindice.
- Here is an online demo for searching NSF Awards using RKB Explorer.
Marbles Linked Data Browser is a server-side application that formats Semantic Web content for XHTML clients using Fresnel lenses and formats. Colored dots are used to correlate the origin of displayed data with a list of data sources, hence the name. Marbles provides display and database capabilities for DBpedia Mobile.
- Here is an online demo of the Marbles browser viewer displaying linked data for the National Science Foundation.
The Cyc Foundation Concept Browser lets users search and browse the content of the OpenCyc knowledge base.

Brownsauce is a Semantic Web browser that lets users browse RDF files on the web. It runs as a local Java client and has a built-in Jetty web server. Brownsauce uses the Jena Semantic Web framework.

Ontology Viewers and Query Tools

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data. DBpedia is one of the projects developed/sponsored by AKSW. A wide variety of articles and publications about DBpedia have been published (see the Resources section of this report).
jSpace is a WebStart java application that demonstrates how one might search and query a given ontological database. There are several example database available to download for use with jSpace. jSpace's development was apparently inspired by mSpace. (mSpace was an innovative, but now defunct, project that attempted to merge the power of Google with the powerful interface of iTunes. Although the mSpace demo of a classical music explorer is not accessible now, it's well worth checking out the video demos of it.)
Owlsight is an innovative web application that uses the Google Web Toolkit and the Est JavaScript library to let users navigate OWL ontologies, browsing the relationships between classes, properties, and instances. Owlsight uses the Pellet ontoloty reasoner.
OpenCyc for the Semantic Web is both a project and an OWL ontology browser. Using this tool, users can access the entire OpenCyc content as downloadable OWL ontologies as well as via Semantic Web endpoints (i.e., permanent URIs). These URIs return RDF representations of each Cyc concept as well as a human-readable version when accessed via a Web Browser.

Here is an online demo of the the OpenCyc Semantic Web result for "National Science Foundation."

Knowledge/Content Management

The KiWi wiki project proposes a new approach to knowledge management that combines the wiki philosophy with the intelligence and methods of the Semantic Web. (KiWi stands for "Knowledge in a Wiki.")
DeepaMehta is a software platform for knowledge management. Knowledge is represented in a semantic network and is handled collaboratively. The DeepaMehta user interface is completely based on Mind Maps / Concept Maps. Instead of handling information through applications, windows and files, with DeepaMehta the user handles all kind of information directly and individually.
- Here is an online demo of the the DeepaMehta interface on the subject of the computer user interface.
Semantic MediaWiki and SMW+are extensions to the MediaWiki platform, described elsewhere in this report.

Application Repositories

MIT's Simile project has been extremely creative and productive in applying concepts of linked data, RDF, and the Semantic Web generally to demonstration applications, all available as open source. (Simile is an acronym for "Semantic Interoperability of Metadata and Information in unLike Environments".) Some of its projects are included elsewhere in this report, but here is a list of some others relevant to the Semantic Web:
- Longwell, a server application that applies concepts of faceted browsing with visualizing RDF stores.
- PiggyBank is a Firefox add-on that enables users to develop "mashups" of web data by using "screen scrapers." The software also allows users to tag information found and embed RDF into their content.
- RDFizers, described elsewhere in this report.
- Referee, a server application that creates browsable RDF files from web server logs.
- Welkin, an RDF visualizer built as a client-side java application. (Note: I couldn't get it to run on my Mac, even though MIT makes a Mac OS X disk image available.)
- Fresnel, a vocabulary for displaying RDF.
- Banach, a collection of operators that work on RDF graphs to infer, extend, emerge or otherwise transform a graph into another.
- Data Collecton, a project that aims to develop a collection of RDF data sets that are generally useful for the metadata research and tools community.
DERI (Digital Enterprise Research Institute) International is the collection of bi-lateral agreements between like minded institutes working on the Semantic Web and Web Science. Its mission is to exploit semantics for people, organizations, and systems to collaborate and interoperate on a global scale. DERI conducts and funds research in Semantic Web technologies, conducts projects that have led to numerous prototype applications, and develops ontologies. The following are a few interesting links from DERI's Irish branch in Galway:
- Research Clusters covering such topics as eLearning, Semantic Reality, Semantic Web Services, Industrial and Scientific Applications of Semantic Web Services, and Social Software. Each cluster has its own website and projects.
- Research Projects, a lengthy list of ongoing projects.
- Tools, a lengthy list of software tools available for download, typically from SourceForge.
University of Georgia's Large Scale Distributed Information Systems has a wide array of semantic applications available. The online repository has descriptions, downloads, and online demos. The applications cover such functions as visualization, ontology queries, ontology browsing, web services, and more.
10 Semantic Apps To Watch From the ReadWriteWeb site, this is an intriguing list of new semantic-web-related applications that are now available out there. The article gives first explains what they mean by a "Semantic Application," and then briefly describes each application's innovative use of this new technology. The ten applications listed are:

Semantic Website Enhancements

Semantic Web Crawling: A Sitemap Extension

This specification

Triplify

Triplify

Linked Data

The Triplify project already has configurations for a variety of widely used content management systems, such as OpenConf, WordPress, Drupal, Joomla!, osCommerce, and phpBB. (The page that has links to these configurations also has a great list of other Semantic Web resources.) Triplify is one of the applications developed by AKSW. (I plan to download Triplify and integrate it in an instance of WordPress on my home computer.)

Microformats

Microformats

lossless XHTML

code and tools

RDFa in HTML

RDFa in HTML

Exhibit

Exhibit

There are several online demos of Exhibit presentations starting here.

Semantic MediaWiki

Semantic MediaWiki

MediaWiki

SMW+

SMW+

Halo

detailed list of features

Semantic Toolbar: Lets users create, inspect and alter semantic annotations in the wiki text without knowing the annotation syntax.
Advanced Annotation Mode: In this mode, wiki pages are displayed in the same way as they are displayed in the standard view mode. However, users can easily add annotations by simply highlighting the word or passage they want to annotate.
Ontology Browser: Allows easy navigation through the wiki's ontology without the need to access individual articles. It helps the user to understand the ontology and to keep an overview about it.
Question Formulation Interface: Normally, making queries against the semantic wiki involve knowing and using a complex syntax. The Question Formulation Interface provides a graphical interface that lets inexperienced users easily compose their own queries.

Auto completion: This tool greatly simplifies users' ability to generate annotations. With auto completion activated, users don't have to care about correct spelling of an article’s or property's name, because the tool extracts possible completions from the semantic context. For example, it checks what attribute values are possible for a particular attribute and show only these to the user. This tool is used in the wiki text editor, the semantic tool bar, the query interface and the combined search.

ARC

ARC

microformats

ARC includes the following capabilities:

Parsers for RDF/XML, Turtle, SPARQL + SPOG, Legacy XML, HTML "tag soup," RSS 2.0, and others.
Serializers for N-Triples, RDF/JSON, RDF/XML, Turtle, SPOG dumps.
RDF Storage using MySQL with support for SPARQL queries
SemHTML RDF extractors for Duplin Core, eRDF, microformats, OpenID, RDFa
Use of remote stores, allowing the website to query remote SPARQL endpoints as if they were local stores (results are returned as native PHP arrays)
SPARQLScript, a SPARQL-based scripting language combined with output templating
Light-weight inferencing

ARC applications and websites. Of as much interest as ARC itself are the numerous applications and extensions that have already been built with it, many of which are useful for semantically enhancing websites on their own. The following are a few examples:

dooit - Simple to-do lists
irs - (i)nterlinking of (r)esources with (s)emantics
Life Science Identifier (LSID) Tester
OpenVocab - Community-maintained RDF vocabulary workspace
paggr - smart data + personalized portals
Scregg, an "Online Semantic Community Framework"
SIOC Importer for WordPress
SMOB - Semantic Microblogging
SPARQL Endpoint for Library of Congress Subject Headings (2,441,494 triples)
SPARQLBot - a tiny software agent that simplifies access to linked data and the general Semantic Web
SparqlPress - WordPress enhanced through use of linked data. Spoogle is a demo site for SparqlPress.
Talis Applications:

Trice - A Semantic Web framework (still in development).

Calais Marmoset

Marmoset

OpenCalais

SearchMonkey

Microformats

Other Resources

Ontology Libraries One of the best features of ontologies is their design for reuse. It's not clear to me what happens when you encounter a dozen ontologies for "person" or "job", etc., in the ontology libraries on the web, but it's certainly useful that you can search for existing ontologies and bring the objects you want to model into your own ontology. There are a few ontologies for commonly used objects that are nearly defacto standards now:

Friend-of-a-Friend (FOAF) for People and Organizations
Dublin Core for Publications
Simple Knowledge Organization System (SKOS) for thesauri
OWL-Time for time intervals
SIOC (Semantically-Interlinked Online Communities) is commonly used in conjunction with the FOAF vocabulary for expressing personal profile and social networking information.
Tags, Places, and other specific topics, a repository of ontologies developed by Richard Newman.

The following is a list of other resources available for finding ontologies on specific topics:

Protege Ontology Library This library is part of the Protege Wiki.

Simile Ontologies This library includes those developed by MIT as part of the Simile project as well as a list of others that have been used by the project.

Swoogle Swoogle is a research project being carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland

Google Google can restrict its search to files of type "owl", as this sample search shows.

OntoSelect Ontology Library This library has an ontology search system with several unique and innovative features, including use of Wikipedia topics as the basis for one type of search.

BioPortal BioPortal is a sophisticated web application for accessing and sharing biomedical ontologies. It features several advanced search and visualization tools, as well as tools for mapping concepts between different ontologies.

SchemaWeb This is a comprehensive directory of RDF schemas which, in addition to typical browse-and-search interfaces, also provides an extensive set of web services to be used by software agents for processing RDF data.

Watson This link points to Watson's terrific web interface, which is one of the best for searching out ontologies that match your topics of interest. Watson also has a Protege plugin, but I haven't been able to make it work. The plugin, when working, would let a developer search and add classes to their ontology directly from within Protege.

TONES Ontology Repository This repository is primarily designed to be a central location for ontologies that might be of use to ontology tools developers for testing purposes.

Ping the Semantic Web Developed as a free web service by Zitgist, a company "incubated" by OpenLink, PingtheSemanticWeb (PTSW) is an archive of recently created/updated RDF documents on the web. If one of those documents is created or updated, its author can notify PTSW that the document has been created or updated by pinging the service with the URL of the document. PTSW is used by crawlers or other types of software agents to know when and where the latest updated RDF documents can be found. This dynamically updated library displays the 25 most recently updated ontologies, in real time. Using PTSW's data store, you can retrieve data on all RDF files by namespace or by class, with the option to download the files.

Papers, Projects and Documentation

W3C Semantic Web Activity This portal can be thought of as the Semantic Web's "Home Page." It brings together a vast amount of primary source documentation of the Semantic Web's languages and other standard specifications, including OWL, RDF, RDFa in XHTML, and SPARQL. In addition, this portal gathers all the major ongoing projects involving the Semantic Web and the groups conducting them. The page also lists a large number of publications and presentations on Semantic Web topics.

Rich Tags This paper describes a proposal/project for developing a system that uses semantic tags for enhancing the searchability of web pages. (The proposal sounds similar to the W3C specification for RDFa in XHTML.)

Projects That Use Protege This page on the "old" Protege wiki has an extensive list of applications built with Protege.

Building A Semantic Website This article is a little old (2001), but has a good overview of the steps and components of building a web application using RDF ontologies.

Ontology Extraction from Text Based on Linguistic Analysis This paper describes the concepts and technical approaches behind the OntoLT Protege plug-in.

Extracting Ontologies from Relational Databases A detailed, highly technical paper describing the approach adopted and the actual extraction algorithm used by the tool OntoSynt.

TONES TONES is a European Union research project into the design and use of Thinking ONtologiES. Begun in 2005, it is scheduled to complete its work in 2008. The TONES website has links to all of the outputs of the project, including software tools and research papers. This PDF contains a 2006 presentation overview of the TONES project.

RapidOWL This methodology for developing OWL ontologies is based on the idea of iterative refinement, annotation and structuring of a knowledge base. A central paradigm for the RapidOWL methodology is the concentration on smallest possible information chunks. The collaborative aspect comes into play, when those information chunks can be selectively added, removed, annotated with comments or ratings. Design rationales for the RapidOWL methodology are to be light-weight, easy-to-implement, and support of spatially distributed and highly collaborative scenarios. This methodology is implemented in the OntoWiki software project.

Agile Knowledge Engineering and Semantic Web (AKSW) AKSW has been very prolific in providing the Semantic Web community with eye-opening research projects, which have led to several complete applications, including: Powl and OntoWiki, DBPedia, Triplify, and R2D2. Their work has also spawned numerous other public interfaces to the Semantic Web. In addition, the AKSW website publishes a large number of presentations and research papers describing the work leading to their various Semantic Web applications.

DBPedia information This useful page collects blog posts about DBpedia, publications about the project and related websites.

Linked Data Comes of Age This very useful article clearly explains what is meant by linked data based on RDF and how it fits into the overarching vision of the Semantic Web.

Zitgist's Papers and Reports This is a useful list of resources on subjects relevant to Semantic Web research. The Zitgist Lab site also has a good page of documents on Best Practices for RDF.

RDF Schemas This site has a clear explanation of the various "vocabularies" used to develop ontologies: RDF, RDFS, OWL, and Dublic Core. The site also has a terrific list of resources for programmers.

Nodalities Magazine Sponsored by Talis, this free, bimonthly online magazine (released in PDF format) tries to bridge the divide between those building the Semantic Web and those interested in applying it to their business requirements. The magazine is supported by the Nodalities blog, podcasts, and Semantic Web development work.

DERI Papers and Reports This site contains a large collection of research papers and technical reports produced by DERI International.

Business Resources This list includes companies I've encountered that appear to have substantial expertise in applying Semantic Web technologies to practical business requirements. BBN Technologies

DARPA

DAML

OWL

Asio Tool Suite

Cycorp

Clark & Parsia

commercial support

Pellet

Semantic Arts

Zitgist

Semantic Web Company

Talis

Talis Platform

applications

Semsol

ARC

Cortex

Competitiva

This article was posted 16 years, 1 month ago on Saturday, February 21st, 2009 at 7:23 pm and is filed under Enterprise Software, Internet, Knowledge Management, Open Standards, Semantic WebTags: Articles.
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Show Comments

Taking a Snapshot of the Semantic Web:
Mighty Big, But Still Kinda Blurry

Ontology Development Tools

Application Development Tools

Database Tools

Application Servers

Semantic Application Demos

Semantic Website Enhancements

Other Resources

Articles

Latest

Articles

Topics

Software

Mars Approved Nuggets

Software

New Software Tryouts

Software

Software Rejects

Resources

Latest Finds

Resources

Tags: Resources

News

Latest News

News

Tags

Contribute To Mars?

Taking a Snapshot of the Semantic Web:Mighty Big, But Still Kinda Blurry

Ontology Development Tools

Application Development Tools

Database Tools

Application Servers

Semantic Application Demos

Semantic Website Enhancements

Other Resources

Articles

Latest

Articles

Related

Articles

Topics

Software

Mars Approved Nuggets

Software

New Software Tryouts

Software

Software Rejects

Resources

Latest Finds

Resources

Tags: Resources

News

Latest News

News

Tags

Contribute To Mars?

Taking a Snapshot of the Semantic Web:
Mighty Big, But Still Kinda Blurry