Do you know which part of the word is producing most tennis players?

First of all, I have to admit that I was inspired by the article “How to Extract Knowledge from Wikipedia, Data Science Style” by “Michael Li”. Thank you, Michael Li.

I wanted to do the similar analysis, but in an area of sports, specifically in my favorite sport, tennis. I was wondering which part of the word is producing the most tennis players and wanted to answer this question.

List all players

I launched Wikidata Query Services and ran the following SPARQL query. Property P106 means the profession. Item Q10833314 is for tennis player. Basically, we are asking the Wikidata Query Services to give us all persons having tennis player as their profession.

SELECT ?person
WHERE {
?person wdt:P106 wd:Q10833314.
}

For complete list of properties, you may refer this page: https://www.wikidata.org/wiki/Wikidata:List_of_properties

The query output provided me a list, but it didn’t have the player name. So, I changed it to include the player’s name.

SELECT ?person ?playerName
WHERE {
?person wdt:P106 wd:Q10833314.
?person rdfs:label ?playerName
}

Here is the output now.

Get a list of unique players

As you might have noticed in the output above, the player name is repeating, but in different languages. There is a way to filter the entries with a specific language using the FILTER clause in SPARQL.

Here is my query now.

SELECT ?person ?playerName
WHERE {
?person wdt:P106 wd:Q10833314.
?person rdfs:label ?playerName.
FILTER ( LANGMATCHES ( LANG ( ?playerName ), “fr” ) )
}

And, here is the output.

As you can see, the list is clean and has unique names.

Identify the players’ countries

Now I want to identify the players’ countries. The property P1532 identifies the country of each tennis player. Here is my query.

SELECT ?person ?playerName ?playerCountryName
WHERE {
?person wdt:P106 wd:Q10833314.
?person rdfs:label ?playerName.
FILTER ( LANGMATCHES ( LANG ( ?playerName ), “fr” ) )

?person wdt:P1532 ?playerCountry.
?playerCountry rdfs:label ?playerCountryName.
FILTER ( LANGMATCHES ( LANG ( ?playerCountryName ), “fr” ) )
}

And, here is the output.

Add the players’ images

Won’t it be nice to see each player’s photo? The property P18 has each player’s photo. Let’s run the following query.

#defaultView:ImageGrid

SELECT ?person ?playerName ?playerCountryName ?image
WHERE {
?person wdt:P106 wd:Q10833314.
?person rdfs:label ?playerName.
FILTER ( LANGMATCHES ( LANG ( ?playerName ), “fr” ) )

?person wdt:P1532 ?playerCountry.
?playerCountry rdfs:label ?playerCountryName.
FILTER ( LANGMATCHES ( LANG ( ?playerCountryName ), “fr” ) )

OPTIONAL {
?person wdt:P18 ?image
}
}

This is what I got as the output.

Answer to our question

Getting back to what we want to achieve, i.e. identifying the part of the world that is producing the most tennis players. The property P19 identifies the birth place while P625 identifies the coordinates of the birth place.

#defaultView:ImageGrid

SELECT ?person ?playerName ?playerCountryName ?image ?birthPlaceLabel ?cood
WHERE {
?person wdt:P106 wd:Q10833314.
?person rdfs:label ?playerName.
FILTER ( LANGMATCHES ( LANG ( ?playerName ), “fr” ) )

?person wdt:P1532 ?playerCountry.
?playerCountry rdfs:label ?playerCountryName.
FILTER ( LANGMATCHES ( LANG ( ?playerCountryName ), “fr” ) )

OPTIONAL {
?person wdt:P18 ?image
}

OPTIONAL {?person wdt:P19 ?birthPlace .
?birthPlace rdfs:label ?birthPlaceLabel .
?birthPlace wdt:P625 ?cood .
FILTER ( LANGMATCHES ( LANG ( ?birthPlaceLabel ), “fr” ) )
}
}

And, here is the output.

I can change the view from Image grid to Table by selecting “Table” option in red-circled dropdown.

To answer our question, we must change the view to “Map”. I did so and here is the output. As you can see, Europe is completely covered with red dots — each dot represents a tennis player.

Hope you enjoyed reading this article. You may try this analysis or similar one in your favorite sport.

Sr. Azure Data/Solution Architect, Data Science Enthusiast