Schema.org should have mappings to Wikidata terms where possible #280

Open
danbri opened this Issue Jan 23, 2015 · 68 comments
@danbri
danbri commented Jan 23, 2015 edited

From Lydia Pintscher in https://twitter.com/nightrose/status/558549091844886528

@danbri any issue to track progress on http://schema.org  mapping to Wikidata? 
Maybe even get people to help out?

Update 2016-01-26 - since the original post there have been some improvements at both Wikidata and Schema.org:

  • Wikidata: mappings (exact, super/sub) from properties and (perhaps to a lesser extent in that the notion isn't so built-in) types to schema.org can be expressed within Wikidata.
  • Wikidata now has a SPARQL endpoint at https://query.wikidata.org which is the most natural way of retrieving data; other explorations such as JSON dumps below are less important now.
  • Schema.org has updated its extension mechanism and is encouraging both hosted and external extensions.
  • D3-compatible RDFS JSON-LD is published from schema.org and can be used for visualization; this would also be a good model for getting an overview of Wikidata. See http://bl.ocks.org/danbri/1c121ea8bd2189cf411c for example visualization.
  • Various notes towards using Wikidata as an extension language for Schema.org are explored towards the end of this issue, as are SPARQL queries for extracting Wikidata's structure and property metadata for use in mappings.

# Nearby

@danbri danbri self-assigned this Jan 23, 2015
@danbri danbri added this to the 2015 Q1 milestone Jan 23, 2015
@danbri

Notes from IRC,

@lydiapintscher

Here is how mapping can be done on the Wikidata side for example: https://www.wikidata.org/wiki/Property:P31

The JSON dumps are the best dumps.

@elf-pavlik

happy to help here a little! I had chance to meet few people from Wikidata crew during 31C3 and remember that serving turtle also needs some fixing... but it already uses schema.org quite a lot!

$ curl http://www.wikidata.org/entity/Q80 -iL -H "Accept: text/turtle"
@danbri

I went looking for the code that generates this. For those without turtle, an excerpt from running

curl http://www.wikidata.org/entity/Q42 -iL -H "Accept: text/turtle"

(full response is at https://gist.github.com/danbri/66616096d42e595376f6 )

[update]Hmm actually you can get it all in the browser without using content negotiation, just via suffixes:

( edit! I have moved a big chunk of text to https://gist.github.com/danbri/181ff7763f479c397e10 - apologies to those who got accidental notifications due to the '@' symbol.)

This is great but also unfortunately "the easy part" in that these are fixed built-in properties that each Wikidata entry will always carry.

Looking around for relevant source code,

It would be interesting to see how addEntityMetaData might be amended to exploit equivalentProperty information in Wikidata, at @lydiapintscher mentioned re https://www.wikidata.org/wiki/Property:P31

@ppKrauss

I agree, "Schema.org should have mappings to Wikidata terms where possible". How to vote? or how to colaborate and/or check work in progress? There are a link about work in this issue?

@elf-pavlik

@danbri please remember to fence code snippets with three backticks which can also include clue for syntax highlighting

```ttl
  code goes here @bg @dr @mr
  @prefix data: http://www.wikidata.org/wiki/Special:EntityData/ .
  @prefix schema: http://schema.org/ .
  no mentions using @foo

```

also see code tab in Examples of github markdown https://guides.github.com/features/mastering-markdown/#examples

@elf-pavlik

@ppKrauss I think people would appreciate more machine readable mappings using owl:equivalentProperty etc.
e.g. https://github.com/schemaorg/schemaorg/blob/d370e33a97654746e696973c7966b84b501a59dc/data/schema.rdfa#L5706

IMO we could consider everything from subset of OWL used by RDFa Vocabulary Entailment
http://www.w3.org/TR/rdfa-syntax/#s_vocab_expansion

@ppKrauss

@elf-pavlik thanks (!), so the issue now is only to add something as
<link property="owl:equivalentProperty" href="http://WikiDataURL"/>
in each rdf:Property and each rdfs:Class ... is it?

New suggestion: we may colaborate with an online interface or (initially) by a spreadsheet (ex. Excel) at github, with the columns wikidataID and Property or wikidataID and Class.

@lydiapintscher

Why not add it directly in Wikidata?

@ppKrauss

@lydiapintscher , perhaps I am not understanding your point, sorry... The objetive in this issue is to map the Schema.org's definitions into the Wikidata.org's concept-definitions, not the inverse.

@lydiapintscher

Both should happen, no? ;-)

@ppKrauss

@lydiapintscher , I think it is a matter of scope. You can imagine Wikidata as an (external and closed) didictionary, like Webster, not like an open project like Wiipedia.

@lydiapintscher

Wikidata is just as open as Wikipedia.

@nemobis
@elf-pavlik

@lydiapintscher once schema.org URIs have mappings to wikidata URIs added, do you see a way to add them to wikidata in programmable way? IMO it doesn't make sense to do it manualy via web UI... maybe wikidata team could just import them from schema.rdfa?

BTW I'll stay most of march ~Berlin and could meet IRL with you and anyone else from wikidata interested in this issue... Whenever in Berlin I go anyways to #OKLab / CodeForBerlin on every monday evening at Wikimedia HQ 😄 (we can discuss details over pm - just see my gh profile)

@ppKrauss

I am trying (with bad English) to consolidate this issue in a draft of the proposal, can you help?

A next step will be to create a Readme.md for everybody edit this text, perhaps with the #352 mechanism, and (phase1) implement "by hand" some examples in schema.rdfa.


Foundations collected from comments posted in this discussion:

  1. @danbri and Lydia Pintscher summary, "schema.org mapping to Wikidata".

  2. Techinal suggestion to "schema.org property marked as equiv to another: schema:description ", @danbri.

  3. @danbri and @elf-pavlik looking for some automation ... or "how addEntityMetaData might be amended to exploit equivalentProperty information in Wikidata".

  4. ...

  5. @elf-pavlik suggestion to add the tag <link property="owl:equivalentProperty" href="http://WikiDataURL"/>, into each rdfs:Class and each rdf:Property resource definitions.
    The equivalentProperty is the same as showed in the Property:P31 example) of @lydiapintscher.

  6. Proposal of @ppkrauss to start at Schema.org and with human work, with no automation (for test and start).

  7. Suggestion of @lydiapintscher for think also about Wikidata mapping to Schema.org...

PROPOSAL OF THE ISSUE #280

Proposal for enhance schema.rdfa definition descriptors (rdfs:comment) and semantics, mapping each vocabulary item to a Wikidata item.

A sibling project at Wikidata will be the Wikidata.org-to-Schema.org mapping.

PART 1 - SchemaOrg mapping to Wikidata

Actions: add <link property="{$OWL}" href="{$WikiDataURL}"/> with the correct $WikiDataURL.

  • At each rdfs:Class add the <link> tag with $OWL="owl:equivalentClass" or, when not possible, use$OWL="rdfs:subClassOf".

  • At each rdf:Property add the <link> tag with $OWL="owl:equivalentProperty" or, when not possible, use$OWL="rdfs:subPropertyOf".

Actions on testing phase: do some with no automation. Example: start with classes Person and Organization, and its properties.

Examples


PART 2 - Wikidata mapping to SchemaOrg

... under construction... see similar mappings at schema.rdfs.org/mappings.html... Wikidata also have a lot of iniciatives maping Wikidata to external vocabularies (ex. there are a map from Wikidata to BNCF Thesaurus)...

@ppKrauss

@lydiapintscher , Sorry again... I not saw that there are also a proposal of "sibling project at Wikidata" (!)... Can you please check if my "draft of this proposal" text is now on the rails? I am trying to "translate" and consolidate all comments in one document... To start all with the same scope, objective, etc.

@ppKrauss

@danbri , @elf-pavlik , and others, I not understand if there are a "formal procedure for create proposals" here...

Can you please check if my "draft of this proposal" text is now on the rails? I need your help to "translate" and consolidate it.


About automation, I still do not understand well, you want to automate?
My opinion. I think we can start with non-automated procedures, that will be util to check automated ones, which happen to be introduced later... Or to check the "size" of the non-automated task (~1000 items!). I think that a reliable mapping needs human control.

@elf-pavlik

@ppKrauss thanks for trying to summarize this thread into a proposal!

http://schema.org/Organization is owl:equivalentProperty to Q43229

please don't confuse owl:equivalentClass with owl:equivalentProperty

if you look at schema.rdf we need accordingly

  • typeof="rdfs:Class" needs owl:equivalentClass or rdfs:subClassOf
  • typeof="rdf:Property" needs owl:equivalentProperty or rdfs:subPropertyOf

for the automation, once we map one way schema.org -> wikidata (however we manage to do it) then we can automate importing most of that mapping into wikidata so no one needs to click and copy&paste...

Last but not least, schema.org just starts using github recently and also seems to go through various other processes, I would encourage you to stay patient and give people time to reply 😄

@danbri
@ppKrauss

@elf-pavlik thanks (!), I edited with your correction (and now coping also to my issue280 "ahead of work" :-)


@danbri Ok I send to to this googleDoc and updated my #352 with the tool that generates the spreadsheet.


@elf-pavlik and @danbri , no urgence (!). As a novice here, I am experimenting/testing the collaboration possibilities, and studing schemaOrg as a project ... Now I have a better "schema.org big picture", I see a good work(!), by moderators and vibrant community. My only help/clue about "better Github use" is at #352, and perhaps still a little messy.

Returning to talk about the spreadsheet, there are ~1500 items (!)... A good starting point is the classes Person and Organization, the "vCard semantic" is the more used in the Web,

http://webdatacommons.org/structureddata/index.html#toc2

so, I am starting to work with them (Person and Organization)... It is ok, good starting point?

@danbri
@jimkont

Wikidata provides RDF dumps here: http://tools.wmflabs.org/wikidata-exports/rdf/exports/20150126/

It is easy to get the classes from the wikidata-taxonomy dump but needs to be joined with the wikidata-terms dump to get the labels. For properties you can use the wikidata-properties dump

If you want something more fine-grained you can try the WKDT toolkit
https://github.com/Wikidata/Wikidata-Toolkit

Or create a DBpedia extractor, we have experimental support for wikidata in this branch:
https://github.com/alismayilov/extraction-framework/tree/wikidataAllCommits

RDF dumps can be directly loaded in a SPARQL endpoint or easily manipulated in CLI/code and load in any store.

@ppKrauss ppKrauss added a commit that referenced this issue Feb 24, 2015
@ppKrauss ppKrauss prop of #280, phase1 is ok! 2a19907
@ppKrauss

OK, phase1 completed! In this phase we can only to use "by hand" procedures... My basic work was,

  • develop a basic tool (with 10-years old PHP+DOM technology), to transform RDFa to CSV and to get back the RDFa from updated CSV.
  • the GoogleDoc with spreadsheet. I edited only a dozen of items; see classes Person (search by Q215627) and 3 itens of Organization (line 85).
  • after goole export, my tool generates a new version for schema.rdfa.htm

... for more details (while the corresponding fork is pending) here.


I finished my first test with report/edit/rewrite "by hand" process... And, some new (minor) problems were evidenced, a kind of normalization demand:

  1. HTML-soruce-code normalization problems: reported as #360 and #359.

  2. "<link> vs <span><a>", seems also a normalization problem. My suggestion is to show transparently all the links to the crown, so, format link with the span template.

About item 2, countings:

  • all ~670 "subClassOf" was in the template <span>Subclass of: ...</span>
  • all ~60 "equivalentClass" , "subPropertyOf" and "equivalentProperty" was tagged with <link>

Question (perhaps for @elf-pavlik, but no urgence!): can I adopt the span templates instead simple link tag? An convert all the residual <link ..../> also to span?

@ppKrauss

Starting phase2: let's discuss and check the automation possibilities!

( while anybody can enhance the volume of Wikidata links at the GoogleDoc with spreadsheet of the phase1).


The first step here is to discuss about reality, that is summarized by the "schema_org_rdfa profile" (see #361).

gozer release profile (countings):

  • number of div tags (nDivs): 1521
  • number of definitions by classes+properties (nDefs): 1478
  • number of rdfs-classes (nClass): 620
  • number of rdf-Properties (nProp): 858
  • number of schema-supersededBy (nSupBy): 33
  • number of duplicated rdfs-labels (nDup): 2
  • number of defs with link tag (nLinks): 105
  • total number of link tags over defs (nLinksTot): 112
  • tag link countings:
    • links with property='owl:equivalentClass': 10
    • links with property='rdfs:subPropertyOf': 53
    • links with property='http://schema.org/inverseOf': 16
    • links with property='http://schema.org/supersededBy': 1
    • links with property='owl:equivalentProperty': 8
    • links with property='dc:source': 24
  • tag span a countings:

AUTOMATION OPPORTUNITIES:

  1. Propagating as semantic subset: it is valid for specific items, as to say "addressLocality is a semantic subset of PostalAddress", when we can propagate the WikidataID (ex. as rdfs:subPropertyOf); but not for broader items as Thing. There are 663 (!), so, we can expect some automation here... The first step is to indicate (we can add a column in the spreadsheet) who items are "broader" (so can not be used as semantic super-classes for WikidataID).

    1.1. inheriting semantic: all Property inherits the semantic of its parent Class, so, it is also a kind of "semantic subset" (and gain need to excluse the "broader cases")... There are another indirect situations in the graph? We must excluse all elected cases to excluse (later) from the spreadsheet.

  2. Geting WikidataID from external-equivalent item: I not see many, there are only ~70 links relating semantic definition in external vocabularies, see nLinks and countings with 'owl:equivalentClass', 'rdfs:subPropertyOf' and 'owl:equivalentProperty'. Perhaps 'dc:source', but it adds only more 24.

@danbri danbri modified the milestone: 2015 Q2, 2015 Q1 Apr 17, 2015
@boanuge

The mapping from Schema.org types to Wikidata conceptual items seems very interesting.
How is it going? I can see there has been no comment for a while.
If applicable, I would like to join for the effort of mapping between these two. :)

ps. I found it hard to get the meaning of Wikidata class concepts (schema-level, not instance-level) as they use Qxxxxxx (not intuitive) terms for their conceptual items.
Is there any tip to figure out what Qxxxxxxes mean in usual words?

@ppKrauss

Hello @boanuge, well come to this iniciative! It is not abandoned... Do you try to collaborate here, with the GoogleDoc with spreadsheet of the phase1?

Perhaps we (you and I at this moment) need to show "more and good results" to restart this proposal... So, you can also help here in an extesion of the spreadsheet... Them, later, when we have "critical mass" of results, we will return here.

About your PS: no, Qxxxxxx is a Wikidata's project decision, an opaque identifier have some advantages.

@JanZerebecki

For a human the label and description should define the meaning in words so far as it needs to be disambiguated from other concepts. Just look the Qxxxxxx up on wikidata.org or use its API to get the label and description in your favorite language.

@boanuge

@ppKrauss Thank you very much. I will see what I can do. :)
@JanZerebecki Thank you for the comment.
I hoped there is a nice one page view for each Qxxxxxx term with label and description, such as schema.rdfa, instead of looking up one by one (there are too many Qxxxxxx to go through. :)
Any comments about how Wikidata generates their items are appreciated.

@JanZerebecki

There are too many items (item = Qxxxxxx) to list them all (currently more than 13 million). These are edited manually and in automated ways, see https://www.wikidata.org/wiki/Wikidata:Introduction for more information. Note that there are also properties. There is a list of all properties: https://www.wikidata.org/wiki/Wikidata:List_of_properties/all .

Example: https://www.wikidata.org/wiki/Q25169#P50 tells us: "The Hitchhiker's Guide to the Galaxy" (item Q25169) its author (property P50) is Douglas Adams (item Q42).
https://www.wikidata.org/wiki/Property:P50#P1629 tells us that the property author (P50) is for the subject (P1629) item author (Q482980).

Maybe it is more useful to map to Wikidata properties instead of Wikidata items. https://schema.org/author would map to https://www.wikidata.org/wiki/Property:P50 .

Note that people already use Wikidata.org itself to do this mapping, like is done on https://www.wikidata.org/wiki/Property:P18#P1628 which says the Wikidata property image (P18) is equivalent to http://schema.org/image . These could be exported and added to schema.org which would ensure that the mapping is actually symmetric.

@danbri

Yes the idea is purely to map the descriptive vocabulary (hundreds or low thousands of mainly types/properties), not millions of items.

@thadguidry

@danbri then update the issue title.. instead of wikidata terms ... wikidata properties

@ppKrauss

@JanZerebecki, as stated by Dan, "the idea is purely to map the descriptive vocabulary", is a SchemaOrg-to-Wikidata map, and SchemaOrd have max. ~1500 items, see countings above...

The main objective is to complement the poor/imprecise descriptions (rdfs:comment) of SchemaOrg.

(also @thadguidry) About properties like P50, in my opinion, they are like "internal database descriptors" of Wikidata, while Qxxxxxx are the entries for Wikipedia concepts.
So the "author" concept is not P50, it is Q482980... The Qxxxxxx concepts are more stable and complete.

PS: the properties can generate cyclic references for SchemaOrg.

@westurner

https://en.wikipedia.org/wiki/Ontology_alignment

So there are entity (class, property) resolutions (and disambiguation trees (w/ information gain))?

@ppKrauss ppKrauss referenced this issue in JATS4R/JATS4R-Participant-Hub Aug 5, 2015
Open

There are a "semantic mapping" initiative here? #97

@twamarc

Mapping to wiki is a plus -Among others. The health extension is now experimenting mapping concepts to defined concepts in healthcare standards and terminologies like SNOMED CT but also to RxNorm, LOINC, and ICD.

@westurner
@danbri

Quick update to make sure everyone is aware that Wikidata has a SPARQL endpoint now; linked from https://www.wikidata.org/wiki/Wikidata:Data_access#SPARQL_endpoints

@ppKrauss ppKrauss referenced this issue in OpenGovLD/skos Dec 29, 2015
Open

Relate to Wikidata #10

@danbri

I've been looking into how wikidata could look at an external schema.org extension. Perhaps something like this (don't worry about the big header, eventually it would be hidden behind a simple URL). It be good if the corresponding triples were as close as possible to those in the Wikidata SPARQL endpoint.

<script type="application/ld+json">
{
"@context": {
  "@vocab": "http://schema.org/",
   "wd_lnbIdentifier": {"@id": "https://www.wikidata.org/entity/P1368" },
   "wd_countryOfCitizenship": {"@id": "https://www.wikidata.org/entity/P27" , "@type": "@id"},
   "wd_religion": {"@id": "https://www.wikidata.org/entity/P140", "@type": "@id"},
   "wd_nativeLanguage": {"@id": "https://www.wikidata.org/entity/P103", "@type": "@id"}
 },
  "@type": "Person",
  "@id": "https://www.wikidata.org/entity/Q42",
  "name": "Douglas Adams",
  "wd_lnbIdentifier": "000057405",
  "wd_countryOfCitizenship":
    {
      "@type": "Country",
      "@id": "https://www.wikidata.org/entity/Q145",
      "name": "United Kingdom"
    },
  "wd_religion": {
    "@id": "https://www.wikidata.org/entity/Q7066",
    "name": "atheism"
  },
  "wd_nativeLanguage": {
     "@type": "Language",
     "@id": "https://www.wikidata.org/entity/Q7979",
     "name": "British English"
  }
 }

</script>
@danbri

@vrandezo and I have been exploring this some more.

For now, just a SPARQL query to try at query.wikidata.org

PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?property ?ptype ?label ?extsuper ?extsub ?extequiv
WHERE {?property a wikibase:Property; rdfs:label ?label; wikibase:propertyType ?ptype .
OPTIONAL { ?property wdt:P2235 ?extsuper . }
OPTIONAL { ?property wdt:P2236 ?extsub . }
OPTIONAL { ?property wdt:P1628 ?extequiv . }
FILTER( REGEX(STR(?extequiv), "schema.org") ||
  REGEX(STR(?extsub), "schema.org") ||
  REGEX(STR(?extsuper), "schema.org") )
FILTER(LANG(?label) = "en")}

... this shows that Wikidata itself can be used as a registry of mappings to/from schema.org terms :)

@Dataliberate
@danbri

Here is another one (thanks to Denny):

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

# which properties are most commonly found on things that are 
# 'instance of' (P31) the 'Cat' type (Q146)?
SELECT ?prop (count(?prop) as ?count) WHERE {
  ?i  wdt:P31 wd:Q146 .
  ?i ?prop ?val .
  FILTER(STRSTARTS(STR(?prop), "http://www.wikidata.org/prop/direct/"))
} group by ?prop order by desc(?count)

# TODO: 
# - figure out how to get the rdfs:label of these 
# - figure out how to handle v common types like human (Q5), can we sample e.g. 1000 items only?

... any help with the last parts gratefully received :)

@danbri

Here is the complementary query, which finds most common properties whose value is something 'instance of' 'Cat'.

The query is written more compactly here, and has the same issues/problems as noted above:

SELECT ?prop (count(?prop) as ?count) WHERE {

  # some thing with some property that is some item, where that ...
  ?x ?prop ?i . 

  # item instanceOf Cat.
  ?i  <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q146> . 

  FILTER(STRSTARTS(STR(?prop), "http://www.wikidata.org/prop/direct/"))
} group by ?prop order by desc(?count)

This corresponds loosly to the notion of properties whose http://schema.org/rangeIncludes is the type "Cat":

To compare, here are top results for the earlier query, i.e. properties
whose domainIncludes the type "Cat". In other words, properties commonly
found on items that are cats. Here is the earlier query in more compact form:

SELECT ?prop (count(?prop) as ?count) WHERE {

  ?i ?prop ?x . # some item has some property whose value x is an item, where that ...

  # item instanceOf Cat.
  ?i  <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q146> . 

  FILTER(STRSTARTS(STR(?prop), "http://www.wikidata.org/prop/direct/"))
} group by ?prop order by desc(?count)

Currently this gives 45 results, the most common properties (from 68 cats in wikidata) being:

Both Wikidata and schema.org vocabularies have a relatively loose, flexible and evolving association between types and properties; Wikidata even more so. While schema.org lists a current set of incoming and outgoing properties on each type, often adjusting these over time, Wikidata does not formally do this at all. There are currently some non-machine-readable notes on the relevant talk pages but nothing exposed via RDF/SPARQL. Consequently we need to mine this information from actual descriptions (such as the 68 cat descriptions in Wikidata) to get a sense of the emergent structure. This process also gives a feel for the "long tail" of property definitions that exists in Wikidata and which we can now re-use within schema.org descriptions across the Web.

@danbri

We can use this to explore the data. For example, we see that one of the most common ways in which Wikidata references the Cat type is using property P161, 'cast member'. Who are are these famous acting cats?

SELECT * WHERE {
 ?x <http://www.wikidata.org/prop/direct/P161> ?i ; 
    # ?x with a 'cast member' that is some thing ?i...
    <http://www.w3.org/2000/01/rdf-schema#label> ?label .                                       

    # where ?i is 'instance of' the 'Cat' type:
 ?i <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q146>;
    <http://www.w3.org/2000/01/rdf-schema#label> ?catname ;
    FILTER(LANG(?label) = "en")
    FILTER(LANG(?catname) = "en")
}

From this we learn, amongst other things, of a famous cat actor, Orangey (http://www.wikidata.org/entity/Q677525) that starred in several works including versions of Breakfast at Tiffany's, The Diary of Anne Frank, Village of the Giants. The creature has an IMDB page, if you are curious: http://www.imdb.com/name/nm1248838/ . If you scan that page for ]embedded schema.org](https://developers.google.com/structured-data/testing-tool/?url=http://www.imdb.com/name/nm1248838/) you can find out more about Orangey expressed as schema.org, including an image, a jobTitle of "Actor", and a description ("Orangey the Cat is the only feline double-winner of the Patsy Award, the animal kingdom's equivalent of the Oscar. "...).

For completeness, let's look at outgoing properties of Cat too. Let's see well known cat ownership relationships. Try this in http://query.wikidata.org:

SELECT * WHERE {
  # ?c 'owned by' ?o, where ?c is a Cat:
  ?c <http://www.wikidata.org/prop/direct/P127> ?o .
  ?c <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q146> .

  ?c <http://www.w3.org/2000/01/rdf-schema#label> ?catname .
  ?o <http://www.w3.org/2000/01/rdf-schema#label> ?ownername.
  FILTER(LANG(?ownername) = "en")
  FILTER(LANG(?catname) = "en")
}

... you'll find Socks and Bill Clinton; India, owned by George W. and Laura Bush; Humphrey owned by the Cabinet Office etc.

Having got this far, there are a few things yet to investigate:

  • for each Wikidata property, which ones take Things vs Literal text as their values. This can be SPARQL'd and used to auto-generate context information for use in JSON-LD.
  • we should export the type hierarchy and commonly associated properties using improvements to these queries.
  • set up a demo site with batch-converted JSON-LD context file declaring short human readable names for each property. If we make dated snapshots too this gives some measure of protection against semantic drift within Wikidata, since instance data records could cite a particular snapshot instead of a volatile general URL.
  • More examples: what would cat ownership, cast membership etc properties look like in JSON-LD (with or without use of schema.org as a base)?
  • Do all Wikidata types have natural supertypes within schema.org? Should they? Are Cats Person etc etc?
  • Performance: this worked for Cat but currently fails on Person due to inefficient SPARQL. Help needed there.
  • Can we refine http://bl.ocks.org/danbri/1c121ea8bd2189cf411c and use it for wikidata viz too, to get a better common overview of both hierarchies?
  • Show this with W3C CSVW tabular data mappings e.g. see https://github.com/w3c/csvw/tree/gh-pages/experiments/quick-python-demo /cc @gkellogg
  • ... more stuff I've forgotten.
@gkellogg

@danbri, interesting direction to go:

  1. Figure out an RDFS schema for WikiData based on these queries.
  2. Given that, a tool such as the [Ruby JSON-LD Context Generator][https://github.com/ruby-rdf/json-ld/blob/develop/script/gen_context] can be used to construct a JSON-LD context based on the range of properties described within that schema (see, for example, the D3 compatible schema.org context constructed from a version of the schema.org RDFa definition, which also includes the full vocabulary definition in JSON-LD).

As you suggest, vocabulary range information can also be used to create a CSVW datatype definitions for mapping CSV tables to JSON or RDF with appropriate datatype fidelity.

Note that what's most useful for both CSVW and the JSON-LD context is the property ranges, but inferring a WikiData RDFS definition from SPARQL queries seems pretty useful.

@ppKrauss

Thanks @danbri and @gkellogg , good 2016 restaring work and results!

I imagine that you are looking for a SPARQL algorithm, that can do automatic recognition of each Wikidata item equivalent to a SchemaOrg item... Well, we can start with some sample of consensual item pairs, to check and/or discuss the behaviour of the proposed algorithms. Examples:

  1. Q82799 and schema/name: equivalent property, is ok?

  2. Q211198 and schema/audience: equivalent property, is ok?

  3. (Q482980 or P50) and schema/author: what to use?

  4. Q43229 and schema/Organization: equivalent class, is ok?

  5. ... more handmade examples here ...

Are these correspondences (1-4) consensual? Each is really an semantic equivalence relationship? How the algorithm will obtain these pairs?

Perhaps we can adapt this basic map algorithm as first tool for query.wikidata.


PS: Wikidata "could look at an external schema.org extension" as showed here and here, but we can also conclude that Wikidata is a good replacement for SchemaOrg :-) I will stay using Schema.

@nemobis
@ppKrauss

Hi @nemobis, thanks, can you express your ideia in a query at Wikidata? A query that demonstrate to us that an external concept (ex. SchemaOrg's author) is equivalent to a Wikidata item? (Q482980 in this ex.)... The problem is how to define each external concept in a generic SPARQL query.

About equivalence operator have two meanings here, in this issue (#280),

  • express the Wikidata links into the SchemaOrg formal specification. See elf-pavlik's comments (owl:equivalentClass or rdfs:subClassOf for classes and owl:equivalentProperty or rdfs:subPropertyOf for properties);

  • express queries in SPARQL, algorithms for the "concept matching automation". See danbri's query (extequiv or extsuper or extsub), and your suggestion of use P1709.

About my use of the term "consensual", is not about the equivalence operator, is about our "human understanding" and a community agreement (consensus) about understanding of each one (me, you and any other here discussing). Do you agree with my concept matching, at 1-4 listed pairs?


PS: as soon as there are more Wikidata items in the handmade sample set, the greater the difficulty of check or reaching consensus... So, we need to start with good consensus before to close the sample set. An homologated sample set is fundamental to test and discuss any kind of algorithm here.


NOTE about SOARQL algorithm aims

Oops, is important to distinguish some types of algorithms (approaches)...

  • Audit algorithms (simple reports). Audit registered equivalences. The SPARQL schemaOrg-restriction tested here, FILTER( REGEX(STR(?extequiv), "schema.org") || REGEX(STR(?extsub), "schema.org") || REGEX(STR(?extsuper), "schema.org") ) make use of Wikidata's existing association. We can check and/or audit previous human work registering pairs... As @danbri commented "Wikidata itself can be used as a registry of mappings to/from schema.org terms".

  • Terminological algorithms. List potential candidates from direct relationships of SchemaOrg registered concepts, and match new SchemaOrg candidates by the Wikidata english labels. It is an heuristic based on coincidence of terminology.

  • Complex inference algorithms. List candidates from another relationships: do some "magic"!
    Solves the commented "problem of define each external concept in a generic SPARQL query".

@nemobis
@danbri

I've opened #1186 for the specific investigation into using Wikidata terms as a vocabulary decoupled from Wikidata-the-dataset, and will try to be a bit more focussed there. We should refocus this issue on the specific question of mappings.

@thadguidry
thadguidry commented Jun 17, 2016 edited

@danbri @lydiapintscher
I and others have started on this task of filling it all in within Wikidata itself (check back in 1-2 more weeks and then we can close this issue). I will ensure to maintain the mappings over the life of Schema.org :) Your welcome.

Schema.org Types will be mapped using Equivalent Class https://www.wikidata.org/wiki/Property:P1709
Schema.org Properties will be mapped using equivalent property https://www.wikidata.org/wiki/Property:P1628
When there's no equivalent but there is a superproperty available, then it will be mapped using external superproperty https://www.wikidata.org/wiki/Property:P2235

Here's a few already where the work has already started:
https://www.wikidata.org/wiki/Q33999 <- shows http://schema.org/actor
https://www.wikidata.org/wiki/Q1656682 <- shows http://schema.org/Event
https://www.wikidata.org/wiki/Q11424 <- shows http://schema.org/Movie
https://www.wikidata.org/wiki/Q1983062 <- shows http://schema.org/Episode
https://www.wikidata.org/wiki/Q5398426 <- shows http://schema.org/TVSeries

@ppKrauss

@thadguidry
Good iniciative to "started on this task of filling it all in within Wikidata itself"!

Our first "hands-on mapping iniciative" was "from SchemaOrg to Wikidata", in this spreadsheet — as the title of this issue "mapping to Wikidata terms where possible" —, and now, better, your iniciative using @lydiapintscher suggestion, is to map "from Wikipedia to SchemaOrg". Let's go! ... But there are a software (ex. Wikidata query with RDFa report) to put back information (from Wikidata) into schema.rdfa?


Using this spreadsheet to express/audit some relationships, we obtain (plase check if make sense):

For properties:


A kind of "external suberproperty" is necessary:

@westurner
@danbri

@westurner @thadguidry @ppKrauss - any chance one of you could figure out the SPARQL for query.wikidata.org that would extract these mappings? We could then add them into the schema.org workflow too...

@thadguidry

@danbri I would advise you to just ask Magnus or Stas on Wikidata mailing list. They're the masters for that, and even I would probably have to ask them for that particular one. :)

@ppKrauss

@danbri and @thadguidry

Hi can test and try at weekend... What if we consolidate query howTo's and discussions at this project's wiki page?

@thadguidry

@ppKrauss yeah, sure, that wiki page is good as a place to store the queries that we find and can explore with.

@thadguidry

Ran across an issue with "external class about this class/subject" which I didn't see as an available statement in Wikidata to assert to help with mappings. So I posed the question to the Property Creator group on the WIkidata mailing list. Awaiting a few replies.

So far so good with mapping first 40% of Schema.org Types.

@ppKrauss

About the Wikidata quering , the "wanted universe" is provided by a simple query, and perhaps works fine for a local Wikidata user (at the query.wikidata.org's server without timeout restrictions), is like

SELECT * WHERE {
  ?x ?eqv ?s . 
  FILTER (?eqv = wdt:P1709 || ?eqv = wdt:P1628 || ?eqv = wdt:P2235) .
  FILTER (?s = schema:Person)
}

Instead FILTER (?s = schema:Person), need a kind of prefixed wildcard (imagine schema:*)... Using regex, for example FILTER( REGEX(STR(?s), "schema.org") ), it produces an error, "Query deadline is expired", even when using LIMIT 1 clause.


@danbri, do you know somebody at Wikidata? for run the generic query and send a CSV.zip for us.

@thadguidry

@ppKrauss @danbri This query works for now, which is ... to find the 4 known properties that are being mapped and filled in now, and filter all of them against "schema.org"

@lydiapintscher no one seems to be answering our questions to WD team on your mailing list about Properties. :(

SELECT * WHERE {
{?property wdt:P2235 ?extsuper.}
UNION { ?property wdt:P2236 ?extsub. }
UNION { ?property wdt:P1628 ?extequiv. }
UNION { ?property wdt:P1709 ?_equivalent_class. }
FILTER( REGEX(STR(?extequiv), "schema.org") ||
  REGEX(STR(?extsub), "schema.org") ||
  REGEX(STR(?extsuper), "schema.org") ||
  REGEX(STR(?_equivalent_class), "schema.org")
)
}
@ppKrauss

Hi @thadguidry , good solution!
Now we can check the real size of the work... I am organizing at the Wiki page about queries. Some preliminar results:

  • repeated Wikidata labels: 12 cases (8%), location (repeated 13 times), image (3), audience (2), author (2), brand (2), ..., Uniform Resource Locator (2).

  • counting relationship types:

reltype n
equivclass 68
equivprop 66
sub 12
super 10

(total 156)


PS: there are some ambiguity and confusion about preference, "entity or property" (Wikidata Q or P). We need a "SchemaOrg-Wikidata relationship attribution GUIDE".

@thadguidry

@ppKrauss Yes, that's the issue/problem....what to properly do for all the other cases besides equivalent_class. And that is where even I am struggling. I feel that if Wikidata had an "external subclass" then it would make things a bit better, perhaps it has one, dunno, I have asked on the mailing list and I am still awaiting a reply.

As to the location issue...that's a bias error on my part that will be fixed.
I needed to explore and see how the relationships would be affected with those "external property" statements. (it is not good, so I need to clean those up eventually with "external subclass"). But I still can't find "external subclass" or something close in Wikidata. And we don't really want to go down the transitive property rabbit hole of "has part" https://www.wikidata.org/wiki/Property:P527 or even "has parts of the class" https://www.wikidata.org/wiki/Property:P2670

@ppKrauss ppKrauss referenced this issue in dbpedia/mappings-tracker Jun 24, 2016
Open

Help in the Wikidata-to-SchemaOrg initiative #86

@ppKrauss

Hi @thadguidry ,

  • About "guidelines" and our "equiv-rules", is not very clear for me the problem with "other equivalent_class"... This is the mailing list thread that you still awaiting?

  • if the problem is a decision (or guidelines for best practices), let's adopt the "most popular", with an objective criteria: nw(x), the number of "watches" and "watchers who visited", nw(P527)>nw(P2670), so use P527. See P527 info and P2670 info.

  • About has part and similar transitives, let's edit amphibious automobile and egg yolk to show (and discuss with Wikidata community) the problem in a real case.

@thadguidry
thadguidry commented Jun 24, 2016 edited

@ppKrauss Its the later thread Help from Property Creators for external vocabularies

I don't think that using statistics for watching popularity makes a case for best practice. That's more a case of "someone has an interest, and might be waiting on a best practice from someone or some organization". Which I think starts with us at Schema.org.

I'm ok with editing those 2 wikidata items, but has part is not really the issue we need to solve. Instead it's finding external superclass and external subclass or a wikidata equivalent for us to populate with. For example, taking https://schema.org/Car and making it an external subclass under Wikidata's Automobile https://www.wikidata.org/wiki/Q1420

SQID views as well as Discussion (Talk) views are also very nice to see and help correlate classes, etc. https://tools.wmflabs.org/sqid/#/view?id=Q752870&lang=en
https://www.wikidata.org/wiki/Talk:Q1420

@ppKrauss

Hi all. About the progress of this project (the "work expected" of this issue)... We can move forward with what we have here (and here and at #1186 and perheps with this spreadsheet), and, in parallel, collabore to enhance the a Wikidata's Guidelines for external relationships.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment