SwetoDblp Ontology of Computer Science Publications

Description: SwetoDblp is a large-size ontology focused on bibliography data of Computer Science publications where the main data source is DBLP. SwetoDblp is created primarily from a large XML document available at DBLP's website and secondarily from other datasets that are used to add relationships to other entities such as Publishers, Companies and Universities. The schema-vocabulary part of the ontology utilizes concepts and relationships from other vocabularies such as FOAF, and Dublin Core. (Old SwetoDblp website)

Download Latest Version:

SwetoDblp Version April 2008

When referring to SwetoDblp, please cite/link the following:

Applications/Use/Citations of SWETO:

SwetoDblp goes beyond one-to-one mapping of XML elements to RDF data

  • Every person in the original data becomes an entity having its own URI that actually points to her/his DBPL entry page on the web. For example, a data value such as <author>Prabhakar Raghavan</author> from the original XML data becomes an RDF entity with an URI more likely to be (re-)used elsewhere. (show/hide data snippet sample in rdf/xml)

  • Whenever the homepage of a person is known in the original dataset, such relationship is kept in the resulting RDF by using widely used vocabulary. (show/hide data snippet sample in rdf/xml)

  • In some cases, the 'affiliation' of a person is automatically extracted from his/her homepage by looking at the actual URL. (show/hide data snippet sample in rdf/xml)

  • The affiliation information can be automatically extracted depending on one of the additional data sources, namely the Universities dataset or the Organizations dataset. The universities dataset consists of two parts. The first is a list of universities obtained from a web-source. The following is an example of an instance of the Universities dataset.

    Affiliation is also extracted from note elements in some XML elements of homepages authors, such as <note>University of Waterloo</note>. In this cases, a lookup operation can provide the affiliation relation by relying upon match of the name or 'alternative' name of a university. Thus, the second part of the Universities dataset is a (much smaller) manually created list of universities containing synonyms and alternative spellings. It also includes universities not listed in the web source before mentioned. The Universities and Organizations datasets are encoded in RDF.

  • DBLP has made a great job dealing with ambiguos names or name changes. Whenever the original data from DBLP indicates that a person can be referred to by more than one name, the corresponding entities in SwetoDblp are explicitly related with a owl:sameAs relationship. (show/hide data snippet sample in rdf/xml)

  • Publisher's information is converted to relationships to 'publisher' entities in RDF by using a data source of Publishers (encoded in RDF). (show/hide data snippet sample in rdf/xml)

  • Series' information such as Lecture Notes in Computer Science, CEUR Workshops, etc. is converted to relationships to 'series' entities in RDF by using a data source of Series (encoded in RDF). (show/hide data snippet sample in rdf/xml)

Schema Vocabulary
The schema vocabulary of the ontology reuses existing vocabulary whenever possible (e.g., FOAF, DC). In addition, statements are included to indicate equivalence of classes or properties with respect to other (similar) schemas for describing publications/researchers. In particular, we use owl:equivalentClass and owl:equivalentProperty (where applicable) to relate our schema with that of: MarcOnt Initiative, KnowledgeWeb Portal, SWRC Ontology, AKT Portal Ontology, SWPortal Ontology, and a bibTeX Ontology. This screenshot illustrates the OWL equivalent class triples for SwetoDblp Schema

XML Datatypes
We did not include xml datatype for literals that are of type string or for which no direct mapping is available, such as the case for 'pages' as it could have values with dash or letters. We included xml datatypes for the following datatype properties: chapter (xsd:integer), mdate (xsd:date), month (xsd:gMonth) but we also kept original value of month in opus:month for backwards compatibility (we didnt produce gMonth values for the few cases that had values such as January/February) and year (xsd:gYear)

SwetoDblp is created by a SAX-parser process that reads dblp.xml (available at DBLP website). The code includes a number of domain-dependent mappings for producing the RDF. This process reads data files of Organizations, Universities, Publishers, and Series (available above) and uses them to look up values in order to establish relationships to entities within them (instead of keeping just the literal values). Such data files are encoded in RDF (facilitating representation of synonyms) and read using the SemDis API. Hence, the code needs a few jar files from here and there; we indicate which ones and where to get them from. The code is organized as an ant project (the file dblp.xml should be placed in the data directory; the file dblp.dtd should be placed in the working directory)
Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

  • Code: code-swetodblp-april2007.zip
  • commons-lang-2.1.jar - from apache's commons-lang
  • commons-logging.jar - from apache
  • icu4j_3_0.jar - ICU4J v3.0 from IBM
  • jena_v2_3.jar - Jena's jena.jar version 2.3 (we renamed it to avoid confusion on the version number)
  • semdisAPI_v0_3.jar - from SemDis API
  • semdisImpl_v0_6.jar - from SemDis API (version 0.5 also ok)
  • xercesImpl.jar - from xerces

Old Versions (Archive): August 2006, September 2006, October 2006, November 2006, January 2007, February 2007, March 2007, April 2007, May 2007, June 2007, July 2007, August 2007, September 2007, October 2007, November 2007, December 2007, January 2008, February 2008, March 2008

Contact Person:
Information on this page created and maintained by - Boanerges Aleman-Meza

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0325464 titled "SemDis: Discovering Complex Relationships in Semantic Web". Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.