❌

Reading view

There are new articles available, click to refresh the page.

He got sued for sharing public YouTube videos; nightmare ended in settlement

Nobody expects to get sued for re-posting a YouTube video on social media by using the β€œshare” button, but librarian Ian Linkletter spent the past five years embroiled in a copyright fight after doing just that.

Now that a settlement has been reached, Linkletter told Ars why he thinks his 2020 tweets sharing public YouTube videos put a target on his back.

Linkletter’s legal nightmare started in 2020 after an education technology company, Proctorio, began monitoring student backlash on Reddit over its AI tool used to remotely scan rooms, identify students, and prevent cheating on exams. On Reddit, students echoed serious concerns raised by researchers, warning of privacy issues, racist and sexist biases, and barriers to students with disabilities.

Read full article

Comments

Β© Ashley Linkletter

What C2PA Provides

Last month I released my big bulleted list of C2PA problems. Any one of these issues should make potential adopters think twice. But 27 pages? They should be running away!

Since then, my list has been discussed at the biweekly Provenance and Authenticity Standards Assessment Working Group (PASAWG). The PASAWG is working on an independent evaluation of C2PA. Myself and the other attendees are only there as resources. As resources, we're answering questions and discussing issues, but not doing their research. (We've had some intense discussions between the different attendees.) The PASAWG researchers have not yet disclosed their findings to the group, and as far as I can tell, they do not agree with me on every topic. (Good! It means my opinion is not biasing them!)

Full disclosure: The PASAWG meetings fall under Chatham House rules. That means I can mention the topics discussed, but not attribute the information to any specific person without permission. (I give myself permission to talk about my own comments.)

Clarifications

For the last few meetings, we have been going over topics related to my bulleted list, the associated issues, and clarifying what the C2PA specification really provides. Having said that, I have found nothing that makes me feel any need to update my big bulleted list, except maybe to add more issues to it. There are no inaccuracies or items needing correction. The 27 pages of issues are serious problems.

However, I do want to make a few clarifications.

First, I often refer to C2PA and CAI as "All Adobe All The Time". One of my big criticisms is that both C2PA and CAI seem to be Adobe-driven efforts with very little difference between Adobe, C2PA, and CAI. I still have that impression. However:
  • The C2PA organization appears to be a weak coalition of large tech companies. Adobe is the primary driving force, and to my knowledge, there are no C2PA working groups that do not have any Adobe employee as the chair or co-chair. The only exception is the top-level C2PA organization -- it's chaired by a Microsoft employee who is surrounded by Adobe employees. I refer to their structure as a "weak coalition" because Adobe appears to be the primary driving force.

  • The Content Authenticity Initiative (CAI) is not "looks like Adobe". No, it is Adobe. Owned, managed, operated, and all code is developed by Adobe employees as part of the Adobe corporation. When you visit the CAI's web site or Content Credentials web service, that's 100% Adobe.
It's the significant overlap of Adobe employees who are part of both C2PA and CAI that causes a lot of confusion. Even some of the Adobe employees mix up the attribution, but they are often corrected by other Adobe employees.

The second clarification comes from the roles of C2PA and CAI. C2PA only provides the specification; there is no implementation or code. CAI provides an implementation. My interpretation is that this is a blame game: if something is wrong, then the C2PA can blame the implementation for not following the specs, while the implementers can blame the specification for not being concise or for any oversights. (If something is broken, then both sides can readily blame the other rather than getting the problem fixed.)

The specifications dictate how the implementation should function. Unless there is a bug in the code, it's a specification issue and not a programming problem. For example, my sidecar swap exploit, which permits undetectable alterations of the visual image in a signed file, is made possible by the specification and not the implementation. The same goes for the new C2PA certificate conformance (which is defined but not implemented yet); choosing to use a new and untested CA management system is a risk from the spec and not the implementation.

The third clarification comes from "defined but not implemented yet". Because C2PA does not release any code, everything about their specification is theoretical. Moreover, the specification is usually 1-2 revisions ahead of any implementations. This makes it easy for C2PA to claim that something works since there are no implementation examples to the contrary. By the time there are implementations that demonstrate the issues, C2PA has moved on to newer requirements and seems to disregard previous findings. However, some of the specification's assumptions are grossly incorrect, such as relying on technologies that do not exist today. (More on that in a moment.)

New Changes: v2.2

The current specification, v2.2, came out a few months ago. Although my bulleted list was written based on v2.1 and earlier, the review was focused on v2.2. When I asked who supports v2.2, the only answer was "OpenAI". Great -- they're a signer. There are no tools that can fully validate v2.2 yet. But there is some partial support.

I've recently noticed that the Adobe/CAI Content Credentials web site no longer displays the embedded user attribution. For example, my Shmoocon forgery used to predominantly display the forged name of a Microsoft employee. However, last month they stopped displaying that. In fact, any pictures (not just my forgeries) that include a user ownership attribution are no longer displayed. This is because the Content Credentials web site is beginning to include some of the C2PA v2.2 specification's features. The feature? User's names are no longer trusted and are now no longer displayed.

That's right: all of those people who previously used C2PA to sign their names will no longer have their names displayed because their names are untrusted. (While the C2PA and Adobe/CAI organizations haven't said this, I think this is in direct response to some of my sample forgeries that included names.)

If you dive into the specifications, there's been a big change: C2PA v2.0 introduced the concepts of "gathered assertions" and "created assertions". However, these concepts were not clearly defined. By v2.2, these became a core requirement. Unfortunately, trying to figure out the purpose and definitions from the specs is as clear as mud. Fortunately, the differences were clarified at the PASAWG meetings. The risks and what can be trusted basically breaks down to gathered assertions, created assertions, trusted certificates, and reused certificates.

Risk #1: Gathered assertions
Gathered assertions cover any metadata or attribution that comes from an unvetted source, such as a user entering their name, copyright information, or even camera settings from unvetted devices. Because the information is unverified, it is explicitly untrusted.

When you see any information under a gathered assertion, it should be viewed skeptically. In effect, it's as reliable as existing standard metadata fields, like EXIF, IPTC, and XMP. (But if it's just as reliable as existing standards, then why do we need yet-another new way to store the same information?)

Risk #2: Created assertions
Created assertions are supposed to come from known-trusted and vetted hardware. (See the "C2PA Generator Product Security Requirements", section 6.1.2.) However, there is currently no such thing as trusted hardware. (There's one spec for some auto parts that describes a trusted camera sensor for the auto industry, but the specs are not publicly accessible. I can find no independent experts who have evaluated these trusted component specs, no devices use the specs right now, and it's definitely not available to general consumers. Until it's officially released, it's vaporware.) Since the GPS, time, camera sensor, etc. can all be forged or injected, none of these created assertions can be trusted.

This disparity between the specification's theoretical "created assertions" and reality creates a big gap in any C2PA implementation. The specs define the use of created assertions based on trusted hardware, but the reality is that there are no trusted hardware technologies available right now. Just consider the GPS sensor. Regardless of the device, it's going to connect to the board over I2C, UART, or some other publicly-known communication protocol. That means it's a straightforward hardware modification to provide false GPS information over the wire. But it can be easier than that! Apps can provide false GPS information to the C2PA signing app, while external devices can provide false GPS signals to the GPS receiver. Forging GPS information isn't even theoretical; the web site GPSwise shows real-time information (mostly in Europe) where GPS spoofing is occurring right now.



And that's just the GPS sensor. The same goes for the time on the device and the camera's sensor. A determined attacker with direct hardware access can always open the device, replace components (or splice traces), and forge the "trusted sensor" information. This means that the "created assertions" that denote what was photographed, when, and where can never be explicitly trusted.



Remember: Even if you trust your hardware, that doesn't help someone who receives the signed media. A C2PA implementation cannot verify that the hardware hasn't been tampered with, and the recipient cannot validate that trusted hardware was used.

Requiring hardware modifications does increase the level of technical difficulty needed to create a forgery. While your typical user cannot do this, it's not a deterrent for organized crime groups (insurance and medical fraud are billion-dollar-per-year industries), political influencers, propaganda generators, nation-states, or even determined individuals. A signed cat video on Tick Tack or Facegram may come from a legitimate source. However, if there is a legal outcome, political influence, money, or reputation on the line, then the signature should not be explicitly trusted even if it says that it used "trusted hardware".

Risk #3: Trusted Certificates
The C2PA specification uses a chain of X.509 certificates. Each certificate in the chain has two components: the cryptography (I have no issues with the cryptography) and the attribution about who owns each certificate. This attribution is a point of contention among the PASAWG attendees:
  • Some attendees believe that, as long as the root is trusted and we trust that every link in the chain follows the defined procedure of validating users before issuing certificates, then we can trust the name in the certificate. This optimistic view assumes that everyone associated with every node in the chain was trustworthy. Having well-defined policies, transparency, and auditing can help increase this trust and mitigate any risks. In effect, you can trust the name in the cert.

  • Other attendees, including myself, believe that trust attenuates as each new node in the chain is issued. In this pessimistic view, you can trust a chain of length "1" because it's the authoritative root. (We're assuming that the root certs are trusted. If that assumption is wrong, then nothing in C2PA works.) You can trust a length of "2" because the trusted root issued the first link. But every link in the chain beyond that cannot be fully trusted.
This pessimistic view even impacts web certificates. HTTPS gets around this trust attenuation by linking the last node in the chain back to the domain for validation. However, C2PA's certificates do not link back to anywhere. This means that we must trust that nobody in the chain made a mistake and that any mistakes are addressed quickly. ("Quickly" is a relative term. When WoSign and StartCom were found to be issuing unauthorized HTTPS certificates, it took years for them to be delisted as trusted CA services.)

In either case, you -- as the end user -- have no means to automatically validate the name in the signing certificate. You have to trust the signing chain.

As an explicit example, consider the HTTPS certificate used by TruePic's web site. (TruePic is a C2PA steering committee member). When you access their web site, their HTTPS connection currently uses a chain of three X.509 certificates:



  1. The root certificate is attributed to the Internet Security Research Group (ISRG Root X1). I trust this top level root certificate because it's in the CCADB list that is included in every web browser. (To be in the CCADB, they had to go through a digital colonoscopy and come out clean.)

  2. The second certificate is from Let's Encrypt. Specifically, ISRG Root X1 issued a certificate to Let's Encrypt's "R11" group. It's named in the cert. Since I trust Root X1, I assume that Root X1 did a thorough audit of Let's Encrypt before issuing the cert, so I trust Let's Encrypt's cert.

  3. Let's Encrypt then issued a cert to "www.truepic.com". However, their vetting process is really not very sophisticated: if you can show control over the host's DNS entry or web server, then you get a cert. In this case, the certificate's common name (CN) doesn't even name the company -- it just includes the hostname. (This is because Let's Encrypt never asked for the actual company name.) There is also no company address, organization, or even a contact person. The certificate has minimum vetting and no reliable attribution. If we just stop here, then I wouldn't trust it.

    However, there's an extra field in the certificate that specifies the DNS name where the cert should come from. Since this field matches the hostname where I received the cert (www.truepic.com), I know it belongs there. That's the essential cross-validation and is the only reason the cert should be trusted. We can't trust the validation process because, really, there wasn't much validation. And we can't trust the attribution because it was set by the second-level issuer and contains whatever information they wanted to include.
With web-based X.509 certificates, there is that link back to the domain that provides the final validation step. In contrast, C2PA uses a different kind of X.509 certificate that lacks this final validation step. If the C2PA signing certificate chain is longer than two certificates, then the pessimistic view calls the certificate's attribution and vetting process into question. The basic question becomes: How much should you trust that attribution?

Risk #4: Reused Certificates
Most services do not have user-specific signing certificates. For example, every picture signed today by Adobe Firefly uses the same Adobe certificate. The same goes for Microsoft Designer (a Microsoft certificate), OpenAI (a certificate issued by TruePic), and every other system that currently uses C2PA.

The attribution in the signature identifies the product that was used, but not the user who created the media. It's like having "Nike" on your shoes or "Levi's" on your jeans -- it names the brand but doesn't identify the individual. Unless you pay to have your own personalized signing certificate, the signature is not distinct to you. This means that it doesn't help artists protect their works. (Saying that the painter used Golden acrylic paint with a brush by Winsor & Newton doesn't identify the artist.)

As an aside, a personalized signing certificate can cost $50-$300 per year. Given all of C2PA's problems, you're better off using the US Copyright Office. They offer group registration for photographers: $55 for 750 photos per year, and the protection lasts for 70 years beyond the creator's lifetime. This seems like a more cost-effective and reliable option than C2PA.

Missing Goals

Each of these risks with C2PA pose serious concerns. And this is before we get into manifest/sidecar manipulations to alter the visual content without detection, inserting false provenance information, competing valid signatures, reissuing signatures without mentioning changes, applying legitimate signatures to false media, etc. Each of these exploits are independent of the implementation, and are due to the specifications.

The C2PA documentation makes many false statements regarding what C2PA provides, including:
  • Section 3, Core Principles: "Content Credentials provides a way to establish provenance of content."

  • Section 5.1: "Helping consumers check the provenance of the media they are consuming."

  • Section 5.2: "Enhancing clarity around provenance and edits for journalistic work."

  • Section 5.3: "Offering publishers opportunities to improve their brand value." (Except that the end consumer cannot prove that it came from the publishers.)

  • Section 5.4: "Providing quality data for indexer / platform content decisions."
This is not the entire list of goals. (I'm literally going section by section through their document.) Unfortunately, you cannot have reliable provenance without validation. C2PA lacks attribution validation so it cannot meet any of these goals. C2PA does not mitigate the risk from someone signing content as you, replacing your own attribution with a competing claim, or associating your valid media with false information (which is a great way to call your own legitimate attribution into question).

What Does C2PA Provide?

An independent report came out of the Netherlands last month that reviews C2PA and whether it can help "combat disinformation by ensuring the authenticity of reporting through digital certificates." (Basically, it's to see if C2PA is appropriate for use by media outlets.) This report was commissioned by NPO Innovatie (NPO), Media Campus NL, and Beeld & Geluid. The report is written in Dutch (Google Translate works well on it) and includes a summary in English. Their key findings (which they included with italics and bold):
C2PA is a representation of authenticity and provenance, but offers no guarantee of the truth or objectivity of the content itself, nor of the factual accuracy of the provenance claims within the manifest.
(Full disclosure: They interviewed many people for this report, including me. However, my opinions are not the dominant view in this report.)

C2PA does not provide trusted attribution information and it provides no means for the end recipient to automatically validate the attribution in the signing certificate. Moreover, the specifications depend on trusted hardware, even though there is no such thing as trusted hardware. This brings up a critical question: If you cannot rely on the information signed using C2PA, then what does C2PA provide?

My colleague, Shawn Masters, likens C2PA's signature to an "endorsement". Like in those political ads, "My name is <name>, and I approve this message." You, as the person watching the commercial, have no means to automatically validate that the entity mentioned in the promotion actually approved the message. (An example of this false attribution happened in New Hampshire in 2024, where a deep fake robocall pretended to be Joe Biden.) Moreover, the endorsement is based on a belief that the information is accurate, backed by the reputation of the endorser.

The same endorsement concept applies to C2PA: As the recipient of signed media, you have no means to automatically validate that the name in the signing cert actually represents the cert. The only things you know: (1) C2PA didn't validate the content, (2) C2PA didn't validate any gathered assertions, and (3) the signer believes the unverifiable created assertions are truthful. When it comes to authenticating media and determining provenance, we need a solution that provides more than "trust", "belief", and endorsements. What we need are verifiable facts, validation, provable attribution, and confirmation.

β€˜Shadow Libraries’ Are Moving Their Pirated Books to The Dark Web After Fed Crackdowns

Library Genesis (LibGen), the largest pirate repository of academic papers, doesn’t seem to be doing so hot.

Three years ago, LibGen had on average five different HTTP mirror websites backing up every upload, to ensure that the repository can’t be easily taken down. But as Reddit users pointed out this week, that number now looks more like two. After the recent takedown of another pirate site, the downturn has caused concern among β€œshadow archivists,” the term for volunteer digital librarians who maintain online repositories like LibGen and Z-Library, which host massive collections of pirated books, research papers, and other text-based materials.

Earlier this month, the head librarians of Z-Library were arrested and charged in federal court for criminal copyright infringement, wire fraud, and money laundering. After the FBI seized several websites associated with Z-Library, shadow archivists rushed to create mirrors of the site to continue enabling user access to more than 11 million books and over 80 million articles.Β 

For many students and researchers strapped for cash, LibGen is to scholarly journal articles what Z-Library is to books.Β 

β€œIt's truly important work, and so sad that such a repository could be lost or locked away due to greed, selfishness, and pursuit of power,” one Reddit user commented on r/DataHoarder. β€œWe are at a point in time where humanity could do so very much with the resources and knowledge that we have if it were only organized and accessible to all instead of kept under lock and key and only allowed access by a tiny percentage of the 8 billion people on this planet.”

There isn’t one clear explanation for what’s happening with LibGen’s HTTP mirrors. However, we do know that maintaining a shadow library is time-consuming and often isolating for the librarian or archivist. It makes perfect sense why a shadow librarian involved in this work for years may throw in the towel. This could also be the seed of a recruitment effort underway, much like we saw several years ago when archivists enacted a rescue mission to save Sci-Hub from disrepair.Β 

When news circulated that Z-Library was seized by the feds, some supporters stepped in with monetary donations to restore the repository. Members of the Z-Library team also expressed sadness about the arrests and thanked supporters in an official response, as reported by Torrent Freak.

β€œThank you for each donation you make. You are the ones who making the existence of the Z-Library possible,” the Z-Library members wrote in the statement, which was posted to a site on the anonymized Tor network. β€œWe believe the knowledge and cultural heritage of mankind should be accessible to all people around the world, regardless of their wealth, social status, nationality, citizenship, etc. This is the only purpose Z-Library is made for.” 

The usage of the anonymized network follows the movement of shadow libraries to more resilient hosting systems like the Interplanetary File System (IPFS), BitTorrent, and Tor. While there might be fewer HTTP mirrors of shadow libraries like LibGen, there are likely more mirrors on alternative networks that are slightly harder to access.

It’s unclear if LibGen will regain the authority it once had in the shadow library ecosystem, but as long as shadow librarians and archivists disagree with current copyright and institutional knowledge preservation practices, there will be shadow information specialists.

β€œShadow library volunteers come and go, but the important part is that the content (books, papers, etc) is public, and mirrored far and wide,” Anna, the pseudonymous creator of Anna’s Archive, a site that lets users search shadow archives and β€œaims to catalog every book in existence,” told Motherboard in a statement. β€œAs long as the content is widely available, new people can come in and keep the flame burning, and even innovate and improveβ€”without needing anyone's permission.”

Anna says the job of shadow librarians closely follows the ethos β€œinformation wants to be free,” which was famously put into practice by information activists like Aaron Swartz.Β 

β€œOnce the content is out there, it's hard to put the genie back in the bottle,” she added. β€œAt a minimum, we have to make sure that the content stays mirrored, because if that flame dies, it's gone. But that is relatively easy to do.” 

❌