The source data is about 130K text/html pages, randomly selected from the list provided by dmoz.org and downloaded about a year ago. Then I looked for all HTML comments (found by an HTML5 parser) containing a substring "". 132 pages contained such a comment. (Only 16 pages contained a non-commented element seen by the HTML5 parser; I didn't include those pages in this analysis.) Ignoring version numbers and country codes, the number of usages of each license are: 13 http://creativecommons.org/licenses/by/ 5 http://creativecommons.org/licenses/by-nc/ 39 http://creativecommons.org/licenses/by-nc-nd/ 47 http://creativecommons.org/licenses/by-nc-sa/ 4 http://creativecommons.org/licenses/by-nd/ 6 http://creativecommons.org/licenses/by-nd-nc/ 13 http://creativecommons.org/licenses/by-sa/ 1 http://creativecommons.org/licenses/nc-sa/ 3 http://creativecommons.org/licenses/publicdomain/ 8 pages had a but no permits/requires/prohibits elements inside it. There are two with inconsistent licensing requirements. (They look quite like somebody modified the permits/requires lines without modifying the license URI.) http://www.ashrita.com/ Says that by-nc-nd permits DerivativeWorks: The Most Records http://www.tecniferio.com/ Doesn't say by-sa requires ShareAlike or that it permits DerivativeWorks, but does (non-standardly) say it permits CommercialUse: Tecniferio 2004 Diario en castellano sobre cultura y sociedad Juan Díez Blanco There is one that is not well-formed XML (it has random

tags inserted): http://www.icannchannel.de

There is one that has whitespace (from line-wrapping) in some rdf:resource URIs, which I think is not permitted? http://karlrichtermunich.blogspot.com/