An ontology for web development concepts

Abstract

Wado project aims to construct and manage an ontology with information about web development. Informations that will be included in ontology varies from programming languages and IDEs to specific algorithms and architecture patterns. The design of ontology aims a modular approach, such that new knowledge in different formats can easily extend it. Also, project includes a interface which allows a user to search information about web development using natural language. The main architectural propose is to link information from different sources and to infer new data from that information.

Table of content

1. Introduction
2. Objectives
3. Architecture
3.1. Ontology Architecture
3.2. Github GraphQL
3.3. Natural language like Input
3.4. Ontology schema
4. Development
4.1. Build Ontology Microservice
4.2. REST API
4.3. Client Application
5. Conclusion
6. Bibliography

1. Introduction

Web semantics represents a new step in the evolution of web and the use of the huge amount of information from World Wide Web. Ontologies with help from reasoners can save information and the semantic links between them. A lot of domains are represented under an ontology which grow in time and offers new ways of understanding a domain and its links with others. Computer science is definitely a domain which deserve a comprehensive ontology.

2. Objectives

In order to create an ontology that saves semantic data about computer science domains, a few points must be taken into consideration:

what is the ideal architectural design for creation of a maintainable, rich in information and easy to extend ontology
which concepts must such an ontology save through its classes, properties and instances
how can it take advantage of reasoner inference to find new links between information especially between data come from different sources and different formats
what sources of data can be used to create such an ontology
how can a user retrieve information from such an ontology in the easiest way

3. Architecture

The architecture of the ontology must allow different sources of data(like a REST API, a GraphQL API or even an existing ontology) with different formats(like json, turtle etc.). To achieve modularity it will be used a set of design patterns specific for ontologies. Also, not only ontology must be modular, but entire project as well. Because of this, an architecture based on micro-services will be used.

3.1 Ontology's architecture

The ontology uses data from GraphQL GitHub API that is linked under WADO ontology classes. To assure correctness and modular design Aligment Ontology Design Patterns will be used

Aligment of data sources under Wado ontology is made following Composition over Equivalence Principle. The problem of modular ontology design is that the number of subconcept is not know and the name of concept as well.The common approach is making classes which representing a new concept, equivalent but, when the number of classes grow is difficult to add new class( this means to modify existing classes to be equivalent with new class). Because of this, composition over equivalence is choose to bo main principle for linking data. So, classes of WADO ontology will be defined to represent different representation of same concept. For exemple WADO class wado:ProgrammingLanguage will represent GitHub language concept which will be created after a GraphQL query. The client will work only with WADO classes so different datasources search will be a transparent process. Also properties from FOAF and RDFs will be used to represent relation between classes from different ontologies and discover new relation from merging classes with same URI.

Fig 1.1 Exemple of aligment data through composition.

Another alignment design pattern which will be used is Normalization ODP. Following this principle will be defined some axes of ontology (layers). Every layer will have only one parent class. The relations between layers will be defined using restrictions not inheritance because this way layers can be independent. Also using restriction to establish relations between classes and not using inheritance increase level of inference and pass the maintenance to the reasoner.For example, in order to define relation between wado:Paradigm and wado:FunctionalParadigm and wado:ObjectOrientedParadigm it will be used a universal restriction


                    Paradigm has only Functional or ObjectOriented or RuleBased

An example of using restriction to define relations between classes

3.2 Ontology's data sources. GitHub GraphQL API

One of the most important data source for WADO ontology is GitHub that offers GraphQL API v4. In section dedicated to development there is defined a two steps process: creation of the ontology and interrogation. In creation step a module will populate ontology with instances with correspondence in GitHub GraphQL API. The main types from GitHub API which will be used in ontology are:

Organization
Repository
Pull Requests
Tags
Language
Topic
License

3.5 Ontology schema

Vocabularies:

RDFs to represent taxonomy relations
FOAF is used to represent relation between concepts and also between people. In this way ontology can suggest related data about a client's search
OWL is used to represent complex relation between concepts. Also define restriction to help a reasoner infer more data.
WADO is used to represent specific relations between WADO ontology classes: wado:hasParadigm, wado:runsOn etc.

Also, to ensure consistency and correctness of data from ontology, will be used a tool for defining constraints. SHACL is a recommendation from W3 and a powerful tool to write constraints.

4. Development

To achieve all objectives WADO project development can be split in two parts:

ontology creation
ontology interogation

Not only ontology must be modular to obtain independence and modularity but whole project too. Because of this WADO use a microservices design and define a set of independent modules, independent of each other, specialized on a certain task. Entire project is construct on the follow microservices:

All micro services mentioned will be use to create aClient Application module represented by an IntelliJ Plug-in.

Inter service communication will be realized with REST paradigm. Other tools which will be used :

Protege
Jena
Fuseki

4.1 BuildOntology Micro service

This microservice cover up the first part of project development, ontology creation.

Scope: aggregate information from sources independent of formats or types and convert in WADO classes/predicates
Input: GitHub GraphQL API
Output: WADO classes/predicates

Fig 1.2 Build Ontology Miroservice Arhitecture

4.2 REST API

This microservice is the principal node between a client request and Wado Ontology.

Scope: query ontology based on request (filters, paginating, suggestion)
Input: input described by: a route string and filters expressed with help of ontology classes
Output: a JSON Response based on the search through ontology

In order for the REST API to be able to interrogate the WADO ontology, Fuseki server is used. Fuseki creates and endpoint for WADO ontology, endpoint that can be used by anyone through HTTP protocol. An example of a query that is done by the REST API is the following:

4.3 Client Application

Client application role is to link together the functionalities. It is represented by an IntelliJ Plug-in that is able to find repositories from GitHub using different filters like: programming language, editors, paradigms etc.

5. Conclusion

In conclusion, an ontology with a powerful impact in computer science development must adopt a architecture based on design patterns to remain open to extension, easy to maintain and debug. Also is important to catch important aspects of the domain and the relations between them, and define classes as restriction of good implemented properties to offer the possibility of a qualitative and quantitative inferred data which, at its turn to be interrogated easily in a natural language manner.

WADO is a project created to help developers to find helpful things in order to use them in their daily activity. Because GitHub is the main source of data, users will have access to a large number of repositories that contain only relevant data including: name of frameworks, programming languages, code snippets and even full implementations of some functionalities.