❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayThe Hacker Factor Blog

Life After End-of -Life

3 December 2025 at 01:06
I've been battling with an operating system problem for the last few months. The problem? The operating system on some of my servers is screaming toward "end of life". That means they need to be updated.

Previously, I'd updated each server separately and taken notes as kind of an installation script. Of course, those scripts are great for notes but ended up not working well in practice. But at least I knew what needed to be installed.

This time I had the idea of actually scripting everything. This is particularly important since I'll be updating three servers, each with a handful of virtual machines -- and they all need to be updated. (Well, a few don't need to, but for consistency, I want to make them all the same.) The scripts should allow the migration to be more rapid and consistent, without depending on my memory or lots of manual steps.

There are a lot of steps to this process, but each step is pretty straightforward:
  1. Choose a new operating system. (I decided on Ubuntu 24.04 LTS for now.)

  2. Install the base operating system as a minimal server. Customize it to my liking. (E.g., I have some shell aliases and scripts that I use often and they need to be on every server. I also need to harden the basic OS and add in my custom server monitoring code.)

  3. Install a hypervisor. In virtual machine terminology, the hypervisor is "dom0" or the "host". It runs one or more virtual machines (VMs). Each VM is often called a "guest" or "domu". I have 3 production servers and a 4th "hot backup" in case of hardware failures or for staging migrations, so I'll be installing 4 dom0 systems and a bunch of domu on each dom0.

  4. Create a template virtual machine (template domu) and configure it with my defaults.

  5. I'll be updating the servers, one at a time. For each virtual machine (VM) on the old server:
    1. Copy the template on the staging server to a new VM.
    2. Transfer files from the old VM on the old server to the new VM on the staging server.
    3. Make sure it all works.

  6. When the staging server has everything running and the old server is no longer in use:
    1. Reinstall the old server using the installation scripts.
    2. Transfer each new VM from the staging server to the production server.
    3. Make sure it all works.

  7. When everything has been transferred and is running on the production server, remove it all from the staging server and then start the same process for the next old server.
It's a lot of steps, but it's really straightforward. My installation scripts have names like:
install-step00-base-os.sh
install-step01-user.sh
install-step02-harden.sh
install-step03-network.sh
install-step04-ufw.sh
install-step05-create-dom0.sh
install-step06-system-monitor.sh
install-step10-domu.sh
install-step11-domu-user.sh
install-step12-domu-harden.sh
install-step13-domu-network.sh
install-step14-domu-ufw.sh
install-step20-migration-prep.sh
install-step21-make-clone.sh
install-step22-copy-old2clone.sh
install-step23-validate.sh
install-step24-guest-move.sh
install-step25-guest-cleanup.sh
I expected this entire process to take about a month. In reality, I've been battling with every step of the process for nearly 3 months.

The first problems

I really thought choosing an operating system and a hypervisor was going to be the easiest choice. I had previously been using Xen. Unfortunately, Xen is not well-supported under Ubuntu 24.04. (Ubuntu with Xen refused to boot. I'm not the only person with this problem.)

Since Ubuntu 24.04 has been out for over a year, I'm not going to hold my breath for a quick fix. I decided to switch to KVM -- it's what the Debian and Ubuntu developers use. KVM has a lot of really nice features that Xen is missing, like an easy(?) way to move existing VMs between servers.

However, I absolutely could not get IPv6 working under KVM. My ISP doesn't sell fixed IPv6 ranges. Instead, everyone uses DHCPv6 with "sticky" addresses (you get an address once and then keep it).

I should have known that DHCPv6 would be a problem with Ubuntu 24.04: during the base Ubuntu OS install, it failed to acquire an IPv6 address from the installation screen. IPv4 works fine using DHCP, but IPv6 does not. Part of the problem seems to be with the OS installer.

However, I'm sure part of the problem is also with my ISP. You see, with IPv4, there's one way to get a dynamic address. However, IPv6 never solidified around a single method. For example:
  • DHCPv6 vs SLAAC: DHCPv6 provides stateful configurations, while SLAAC is for stateless. The ISP may even use a combination of them. For example, you may use DHCPv6 for the address, but SLAAC for the routes.

  • Addressing: There are options for acquiring a temporary address, prefix delegation, and more. (And if you request a prefix delegation but provide the wrong mask size, then it may not work.)

  • Routing: Even if you have the address assigned, you may not have a route until the ISP transmits an IPv6 router advertisement (RA). How often those appear depends on the ISP. My ISP transmits one RA every 10-20 minutes. So even if you think everything is working, you might need to wait 10-20 minutes to confirm that it works.
After a week of fighting with the IPv6 configuration, I managed to get DHCPv6 working with my ISP for a /128 on dom0, but I could never get the /56 or virtual servers to work.

While debugging the dom0 issues, I found a problem with KVM's internal bridging. I have a network interface (let's call it "wan"). All of the VMs access it over a bridge (called "br-wan").
  • Under Xen, each VM uses a dynamically allocated tap that interfaces with the bridge. The tap relays the VM's MAC address. As a result, the ISP's DHCPv6 server sees a request coming from the virtual system's MAC address and allocates an address associated with the MAC. This allowed IPv6 to work under Xen.

  • KVM also has a virtual tap that accesses the bridge, but the tap has a different MAC address than the VM. (This isn't a bug; it's just a different architectural decision that the KVM developers made.) As a result, the DHCPv6 server sees a request for an address coming from the tap's MAC, but the confirmation comes from the VM's MAC address. Since the address changed, the confirmation fails and the machine never gets an IPv6 address. (I could not find a workaround for this.)
I spent over a month battling with the IPv6 configuration. I am finally convinced that none of the KVM developers use IPv6 for their VMs, or they use an ISP with a less hardened DHCPv6 configuration. Since I'm up against a hard deadline, none of my new servers will have IPv6 enabled. (I'm still using CloudFlare for my front-end, and they support IPv6. But the connection from CloudFlare to me will be IPv4-only.)

How far did I get with IPv6?
  • dom0 can consistently get a /128 (that's ONE IPv6 address) if I clone the NIC's hardware address to the bridge.

  • With a modified configuration, the dhclient process on dom0 can request and receive a /56 from the ISP, but then the ISP refuses to confirm the allocation so dhclient never accepts it (because the MAC changes).

  • Switching from the default bridge to a macvtap makes no difference.

  • Flushing old leases, changing how the DUID is generated, creating my own post-dhcpv6 script to accept the allocation... these all fail.

  • While dom0 could partially work, my VM guest systems never worked. They were able to request and receive an allocation, but the confirmation was never accepted.
The lesson? If it doesn't work straight out of the box, then it doesn't work.

As one of my friends put it: "IPv6 is the way of the future, and it always will be." It's really no wonder that IPv6 hasn't had wider adoption over the last 30 years. It's just too complicated and there are too many incompatible configurations. (Given all of the problems I've encountered with virtual machines, I now understand why many cloud providers do not support IPv6.)

Cloning Templates

Once IPv6 was off the table, I turned my attention toward creating a reproducible VM template. Configuring a KVM domu/guest virtual server was relatively painless.

The idea is that I can clone the template and configure it for any new server. The cloning command seems relatively painless, unless you want to do something a little different.

For me, the template uses a QCOW2 file system. However, the running configured guest servers all use the Logical Volume Management (LVM) system. I allocate the logical volume (lvcreate) and then clone the template into the new LVM disk. (sudo qemu-img convert -p -f qcow2 -O raw "$SOURCE_QCOW2" "$LV_PATH")

The good news is that this works. The bad news is that the template is only a 25G disk, but the logical volume is allocated as 200G -- because the final server needs more disk space. If I boot the cloned system, then it only see a 25G hard drive. Expanding the cloned image to the full disk size is not documented anywhere that I could find, and it is definitely complicated. Here's the steps that I finally found that work:
# Create the new server's disk space
sudo lvcreate -L "$GuestVGsize" -n "$GuestName" "$GuestVG"
LV_PATH="/dev/$GuestVG/$GuestName"

# Find the template's disk. My template domu is called "template".
SOURCE_QCOW2=$(virsh dumpxml template | grep 'source file' | awk -F\' '{print $2}')

# Do the actual cloning (with -p to show progress)
sudo qemu-img convert -p -f qcow2 -O raw "$SOURCE_QCOW2" "$LV_PATH"

# LV_PATH is a single volume that contains multiple partitions.
# Partition 1 is the bootloader.
# Partition 2 is the file system.
# Get the file system's partition path
LV_CONTAINER_PARTITION_NAME=$(sudo kpartx -l "$LV_PATH" | tail -n 1 | awk '{print $1}')
LV_CONTAINER_PARTITION="/dev/mapper/${LV_CONTAINER_PARTITION_NAME}"

# Get the starting sector for resizing the 2nd (data) partition.
START_SECTOR=$(sudo gdisk -l "$LV_PATH" | grep '^ *2' | awk '{print $2}')
sudo kpartx -d "$LV_PATH"
sleep 2 # wait for it to finish

# Edit the partition table to expand the disk
sudo sgdisk "$LV_PATH" -d 2
sudo sgdisk "$LV_PATH" -n 2:"$START_SECTOR":0 -c 2:"Linux Filesystem" -t 2:8300
sudo sgdisk "$LV_PATH" -w
# Inform the operating system of this change
sudo partprobe "$LV_PATH"

# Extract links to the partitions for cleanup
sudo kpartx -a "$LV_PATH"
# Check the file system
sudo e2fsck -f "$LV_CONTAINER_PARTITION"
# Resize it
sudo resize2fs "$LV_CONTAINER_PARTITION"

# If you want: mount $LV_CONTAINER_PARTITION and edit it before the first boot.
# Be sure to umount it when you are done.

# Done! Remove the partition links
sudo kpartx -d "$LV_PATH"
This took me a few days to figure out, but now the cloned guest has the correct disk size. I can now easily clone the template and customize it for specific server configurations. (I skipped over the steps related to editing the KVM's xml for the new server or using virsh to activate the new cloned image -- because that would be an entire blog post all by itself.)

Copying Files

Okay, assume you have a working KVM server (dom0) and an allocated new server with the right disk size. Now I want to copy the files from the old server to the new server. This is mainly copying /home, /etc/postfix, /etc/nginx, and a few other directories. Copying the contents should be easy, right?

'rsync' would be a great option. However, I'm copying from a production server to the pre-deployment environment. Some of the files that need to be transferred are owned by other users, so the rsync would need to run as root on both the sender and recipient systems. However, my servers do not permit logins as root. This means that I can't rsync from one server to another.

'tar' is another great option. In theory, I could ssh into the remote system, tar up the files and transfer them to the new guest server. However, to get the files, tar needs to run as root on the production server. (We permit 'sudo', but not direct root logins.) An ideal solution would be like:
ssh prodserver "cd / ; sudo tar -cf - home" | (cd / ; sudo tar -xvf - )

Unfortunately, this approach has a few problems:
  • sudo requires a terminal to get the password. That means using "ssh -t" and not just "ssh".

  • The terminal receives a text prompt. That gets fed into the decoder's tar command. The decoder says "That's not a tar stream!" and aborts.
I finally worked out a solution using netcat:
On the receiving server:
cd / ; nc 12345 | tar -xvf - This waits for a connection on port 12345 and sends the data to tar to extract. netcat will terminate when the stream ends.

To get to the production server:
ssh -o LogLevel=error -t "$OldServer" "cd / ; sudo bash -c 'echo Running as sudo' ; sudo tar -cf - $GetPathsReal | nc newserver 12345

This is a little more complicated:
  • "-o LogLevel=error" My production ssh server displays a banner upon connection. I need to hide that banner so it doesn't confuse tar.

  • "-t" opens a terminal, so sudo will prompt for a password.

  • "sudo bash -c 'echo Running as sudo'" Get the sudo prompt out of the way. It must be done before the tar command. This way, the next sudo call won't prompt for a password.

  • "sudo tar -cf - $GetPathsReal | nc newserver 12345" This tars up the files that need to be transferred and sends them through a netcat tunnel.
The rsync solution would be simple and elegant. In contrast, using tar and netcat is a really ugly workaround -- but it works. Keep in mind, the netcat tunnel is not encrypted. However, I'm not worried about someone in my internal network sniffing the traffic. If you have that concern, then you need to establish an encrypted tunnel. The catch here is that ssh does not transfer the tar stream -- the tar stream comes over a parallel connection.

Current Status

These are far from all of the problems that I had to resolve. After nearly 3 months, I finally have the first three VMs from one dom0 migrated to the new OS. Moreover, my solution is 95% scripted. (The remaining 5% is a human entering prompts and validating the updated server.) Assuming no more big problems (HA!), it will probably take me one day per VM and a half-day per server.

The big lessons here?
  • With major OS migrations, expect things to break. Even common tasks or well-known processes are likely to change just enough to cause failures.

  • Automating this process is definitely worth it. By scripting every step, I ensure consistency and an ability create new services as needed. Moreover, some of the lessons (like battling with IPv6, fighting with file transfers, and working out how to deal with logical volumes) were needed anyway; those took months to figure out and the automated scripts document the process. Now I don't have to work it out each time as a special case.

  • After nearly 30 years, IPv6 is still much less standardized across real-world environments than people assume.

  • KVM, templates, and logical volumes require more knowledge than typical cloud workflows lead you to expect.
This process took far longer than I anticipated, but the scripting investment was worth it. I now have a reproducible, reliable, and mostly automated path for upgrading old systems and creating new ones.

Most of my running services will be moved over and nobody will notice. Downtimes will be measured in minutes. (Who cares if my mail server is offline for an hour? Mail will just queue up and be delivered when it comes back up.) Hintfo and RootAbout will probably be offline for about 5 minutes some evening. The big issue will be FotoForensics and Hacker Factor (this blog). Just the file transfers will probably take over an hour. I'm going to try to do this during the Christmas break (Dec 24-26) -- that's when there is historically very little traffic on these sites. Wish me luck!

Airtight SEAL

11 November 2025 at 20:39
Over the summer and fall, SEAL saw a lot of development. All of the core SEAL requirements are now implemented, and the promised functionality is finally available!



SEAL? What's SEAL?

I've written a lot of blog entries criticizing different aspects of C2PA. In December 2023, I was in a call with representatives from C2PA and CAI (all from Adobe) about the problems I was seeing. That's when their leadership repeatedly asked questions like, "What is the alternative?" and "Do you have a better solution?"

It took me about two weeks to decide on the initial requirements and architect the framework. Then I started writing the specs and building the implementation. The result was initially announced as "VIDA: Verifiable Identity using Distributed Authentication". But due to a naming conflict with a similar project, we renamed it to "SEAL: Secure Evidence Attribution Label".

C2PA tries to do a lot of things, but ends up doing none of it really well. In contrast, SEAL focuses on just one facet, and it does it incredibly well.

Think of SEAL like a digital notary. It verifies that a file hasn't changed since it was signed, and that the signer is who they say they are. Here's what that means in practice:
  • Authentication: You know who signed it. The signer can be found by name or using an anonymized identifier. In either case, the signature is tied to a domain name. Just as your email address is a unique name at a domain, the SEAL signer is unique to a domain.

  • No impersonations: Nobody else can sign as you. You can only sign as yourself. Of course, there are a few caveats here. For example, if someone compromised your computer and steals your signing key, then they are you. (SEAL includes revocation options, so this potential impact can be readily mitigated.) And nothing stops a visually similar name (e.g., "Neal" vs "Nea1" -- spelled with the number "1"), but "similar" is not the same.

  • Tamper proof: After signing, any change to the file or signature will invalidate the signature. (This is a signficantly stronger claim than C2PA's weaker "tamper evident" assertion, which doesn't detect all forms of tampering.)

  • No central authority: Everything about SEAL is distributed. You authenticate your signature, and it's easy for a validator to find you.

  • Privacy: Because SEAL's authentication information is stored in DNS, the signer doesn't know who is trying to validate any signature. DNS uses a store-and-forward request approach with caching. Even if I had the ability to watch my own authoritative DNS server, I wouldn't know who requested the authentication, why they were contacting my server (DNS is used for more things than validation), or how many files they were validating. (This is different from C2PA's X.509 and OCSP system, where the certificate owner definitely knows your IP address and when you tried to authenticate the certificate.)

  • Free: Having a domain name is part of doing business on the internet. With SEAL, there is no added cost beyond having a domain name. Moreover, if you don't have a domain name, then you can use a third-party signing service. I currently provide signmydata.com as a free third-party signer. However, anyone can create their own third-party signer. (This is different from C2PA, where acquiring an X.509 signing certificate can cost hundreds of dollars per year.)

One Label, Every Format

SEAL is based on a proven and battle-tested concept: DKIM. Virtually every email sent today uses DKIM to ensure that the subject, date, sender, recipients, and contents are not altered after pressing "send". (The only emails I see without DKIM are from spammers, and spam filters rapidly reject emails without DKIM.)

Since DKIM is good enough for protecting email, why not extend it to any file format? Today, SEAL supports:
  • Images: JPEG, PNG, WebP, HEIC, AVIF, GIF, TIFF, SVG, DICOM (for medical imaging files), and even portable pixel maps like PPM, PGM, and PNM.

  • Audio: AAC, AVIF, M4A, MKA, MP3, MPEG, and WAV. (Other than 'raw', this covers practically every audio format you will encounter.)

  • Videos: MP4, 3GP, AVI, AVIF, HEIF, HEVC, DIVX, MKV, MOV (Quicktime), MPEG, and WebM. (Again, this covers almost every video format you will encounter.)

  • Documents: PDF, XML, HTML, plain text, OpenDocument (docx, odt, pptx, etc.), and epub.

  • Package Formats: Java Archive (JAR), Android Application Package (APK), iOS Application Archive (iPA), Mozilla Extension (XPI), Zip, Zip64, and others.

  • Metadata Formats: EXIF, XMP, RIFF, ISO-BMFF, and Matroska.
If you're keeping count, then this is way more formats than what C2PA supports. Moreover, it includes some formats that the C2PA and CAI developers have said that they will not support.

What's New?

The newest SEAL release brings major functional improvements. These updates expand how SEAL can sign, reference, and verify media, making it more flexible for real-world workflows. The big changes to SEAL? Sidecars, Zip support, source referencing, and inline public keys.

New: Sidecars!

Typically, the SEAL signature is embedded into the file that is being signed. However, sometimes you cannot (or must not) alter the file. A sidecar stores the signature into a separate file. For verifying the media, you need to have the read-only file that is being checked and the sidecar file.

When are sidecars useful?
  • Read-only media: Whether it's a CD-ROM, DVD, or a write-blocker, sometimes the media cannot be altered. A sidecar can be used to sign the read-only media by storing the signature in a separate file.

  • Unsupported formats: SEAL supports a huge number of file formats, but we don't support everything. You can always use a sidecar to sign a file, even if it's an otherwise unsupported file format. (To the SEAL sidecar, what you are signing is just "data".)

  • Legal evidence: Legal evidence is often tracked with a cryptographic checksum, like SHA256, SHA1, or MD5. (Yes, legal often uses MD5. Ugh. Then again, they still think FAX is a secure transmission method.) If you change the file, then the checksum should fail to match. (I say "should" because of MD5. Without MD5, it becomes "will fail to match".) If it fails to match, then you have a broken chain of custody. A sidecar permits signing evidence without altering the digital media.

New: Zip!

The most recent addition to SEAL is support for Zip and Zip64. This makes SEAL compatible with the myriad of zip-based file types without introducing weird side effects. (OpenDocuments and all of the package formats are really just zip files containing a bunch of internal files.)

Deciding where to add the signature to Zip was the hardest part. I checked with the developers at libzip for the best options. Here's the choices we had and why we went with the approach we use:
  • Option 1: Sidecar. Include a "seal.sig" file (like a sidecar) in the zip archive.
    • Pro: Easy to implement.
    • Con: Users will see an unexpected "seal.sig" file when they open the archive.
    Since we don't want to surprise anyone with an unexpected file, we ruled out this option.

  • Option 2: Archive comment. Stuff the SEAL record in the zip archive's comment field.
    • Pro: Easy to implement.
    • Meh: Limited to 65K. (Unlikely to be a problem.)
    • Con: Repurposes the comment for something other than a comment.
    • Con: Someone using zipinfo or other tools to read the comment will see the SEAL record as a random text string.
    (Although there are more 'cons', none are really that bad.)

  • Option 3: Per-file attribute. Zip permits per-file extra attributes. We can stuff the SEAL in any of these and have it cover the entire archive.
    • Pro: Easy to implement.
    • Con: Repurposes the per-file attribute to span the entire archive. This conflicts with the basic concept of Zip, where each file is stored independently.

  • Option 4: Custom tag. Zip uses a bunch of 4-byte tags to denote different segments. SEAL could define its own unique 4-byte tag.
    • Pro: Flexible.
    • Con: Non-standard. It won't cause problems, but it also won't be retained.
    If this could be standardized, then this would be an ideal solution.

  • Option 5: Custom encryption field. Have the Zip folks add in a place for storing this. For example, they already have a place for storing X.509 certs, but that is very specific to Zip-based encryption.
    • Pro: Could be used by a wide range of Zip-signing technologies.
    • Con: We don't want to repurpose the specific X.509 area because that could cause compatibility problems.
    • Con: There are some numeric codes where you can store data. However, they are not standardized.
    The folks at libzip discouraged this approach.
After chatting with the libzip developers, we agreed that options 1 and 3 are not great, and options 4 and 5 would take years to become standardized. They recommended Option 2, noting that today, almost nobody uses zip archive comments.

For signing a zip file, we just stuff the text-based SEAL signature in the zip archive's comment field. *grin* The signature signs the zip file and all of its contents.

The funny thing about zip files is that they can be embedded into other file formats. (For those computer security "capture the flag" contests, the game makers often stuff zip files in JPEG, MP3, and other files formats.) The sealtool decoder scans the file for any embedded zip files and checks them for SEAL signatures.

New: Source Referencing!

This feature was requested by some CDN providers. Here's the problem: most content delivery networks resize, scale, and re-encode media in order to optimize the last-mile delivery. Any of these changes would invalidate the signer's signature.

With SEAL, you can now specify a source URL (src) for the validator to follow. It basically says "I got this content from here." The signer attests to the accuracy of the remote resource. (And they can typically do this by adding less than 200 bytes to the optimized file.)

Along with the source URL, there can also be a cryptographic checksum. This way, if the URL's contents change at a later date (which happens with web content), then you can determine if the URL still contains the source information. In effect, SEAL would tell you "it came from there, but it's not there anymore." This is similar to how bibliography formats, like APA, MLA, or Chicago, require "accessed on" dates for online citations. But SEAL can include a cryptographic checksum that ensures any content at the location matches the cited reference. (As an example, see the Harvard Referencing Guide. Page 42 shows how to cite social media sources, like this blog, when used as a source.)

As an example, your favorite news site may show a picture along with an article. The picture can be SEAL-signed by the news outlet and contain a link to the uncropped, full-size picture -- in case someone wants to fact-check them.

Source referencing provides a very rudimentary type of provenance. It says "The signer attests that this file came from here." It may not be there at a later date, but it was there at one time.

New: Inline Public Keys!

While Zip impacts the most file formats, inline public keys make the cryptography more flexible and future-proof.

With a typical SEAL signature, the public key is located in DNS. The association with the DNS record authenticates the signer, while the public key validates the cryptography. If the cryptography is invalid, then you cannot authenticate the signer.

With inline public keys, we split the functionality. The public key is stored inside the SEAL signature. This permits validating the cryptography at any time and without network access. You can readily detect post-signing tampering.

To authenticate the signer, we refer to DNS. The DNS record can either store the same public key, or it can store a smaller digest of the public key. If the cryptography is valid and the public key (either the whole key or the digest) exists in the DNS record, then SEAL authenticates the signer.

When should inline public keys be used?
  • Offline validation. Whether you're in airplane mode or sitting in a high security ("air gap") environment, you can still sign and validate media. However, you cannot authenticate the signature until you confirm the public key with the DNS record.

  • Future cryptography. Current cryptographic approaches (e.g., RSA and EC) use public keys that are small enough to fit in a DNS TXT field. However, post-quantum cryptography can have extremely long keys -- too long for DNS. In that case, you can store the public key in the SEAL field and the shorter public key digest in the DNS record.

  • Archivists. Let's face it, companies come and go and domain names may expire or change owners. Data that is verifiable today may not be verifiable if the DNS changes hands. With inline public keys, you can always validate the cryptography, even when the DNS changed and you can no longer authenticate the signer. For archiving, you can combine the archive with a sidecar that uses an inline public key. This way, you can say that this web archive file (WARC) was accurate at the time it was created, even if the source is no longer online.
Basically, inline public keys introduces a flexibility that the original SEAL solution was lacking.

Next Up

All of these new additions are fully backwards-compatible with the initial SEAL release. Things that were signed last year can still be validated with this newer code.

While the command-line signer and validator are complete, SEAL still needs more usability -- like an easy-access web front-end. Not for signing, but for validating. A place where you can load a web page and select the file for validating -- entirely in your web browser and without uploading content to a server. For example, another SEAL developer had created a proof-of-concept SEAL validator using TypeScript/JavaScript. I think the next step is to put more effort in this direction.

I'm also going to start incorporating SEAL into FotoForensics. Right now, every analysis image from FotoForensics is tagged with a source media reference. I think it would be great to replace that with a SEAL signature that includes a source reference. Over the years, I've seen a few people present fake FotoForensics analysis images as part of disinformation campaigns. (It's bound to become a bigger problem in the future.) Using SEAL will make that practice detectable.

While I started this effort, SEAL has definitely been a group project. I especially want to thank Shawn, The Boss, Bill not Bob, Bob not Bill, Dave (master of the dead piano), Dodo, BeamMeUp8, bgon, the folks at PASAWG for their initial feedback, and everyone else who has provided assistance, reviews, and criticisms. It's one thing to have a system that claims to provide authentication, provenance, and tamper detection, but it's another to have one that actually works -- reliably, transparently, at scale, and for free.

Tricks, Treats, and Terabits

30 October 2025 at 14:24
Scary stories come in all forms. For system administrators, late night outages and network attacks can cause nightmares.

It's been a year since my last big distributed denial-of-service (DDoS) attack. I had been holding off about blogging about this for a few reasons. First, I wanted to make sure it was over. Second, I didn't want to tip off the attackers about what I learned about them. (I had been quietly telling other people about the attack details. It's even helped some other victims of the same attack.) And finally, I wanted to see if they would come back and trigger any of my traps. (Nope!)

The First Wave

On Wednesday, 23-Oct-2024 at 3pm local time (21:00 GMT), my servers came under a massive distributed denial of service attack.

My servers are physically located in a machine room, about 20 feet away from my office. When my servers come under any kind of load, their fans rev up. Even though they are a room away, I can hear the fans pick up. (It sounds like a jet engine.) When the attack started, I heard two servers ramp up.

My first thought was that one of my customers was probably analyzing videos. That always causes a higher load, but it usually lasts a minute. When the sound continued, I checked the servers themselves. None of the virtual machines had any high-load processes running. In fact, the loads were all hovering around 0.1 (virtually no use). It took me a few moments to find the cause: my server was rejecting a huge number of packets. It was definitely a DDoS attack.

I don't know the exact volume of the attack. My servers were logging a sustained 300Mbps and over 150,000 packets per second. (The logging and packet rejections were enough to cause the fans to ramp up.) However, I'm sure the volume was more than that -- because the upstream router was failing. I later learned that it was even larger: the upstream router to the upstream router was failing. The 300Mbps was just the fraction that was getting through to me. The attacker wasn't just knocking my service offline; they took down a good portion of Northern Colorado. (I grabbed some sample packet captures for later analysis.)

I first confirmed that my servers had not been compromised. (Whew!) Then I called my ISP. They had already noticed since the attack was taking down a few hundred businesses that used the same router.

My ISP did the right thing: they issued a black-hole for my impacted IP address. This droped the traffic long before it reached my server or even the impacted routers.

The First Mitigation

Since the attackers were only going after one IP address, my ISP and I thought I could get away with changing my DNS and moving my impacted services to a different address. On that single IP address, I had a handful of services.
  • I first moved FotoForensics. No problem. Usually DNS records are cached for a few hours. Long before any of this happened, I had configured my DNS to only cache for 5 minutes (the minimum time). Five minutes after changing my DNS record, the service came back up and users were able to access it.

  • I then moved some of my minor services. Again, after 5 minutes, they were back up and running.

  • I could have moved all of my services at once, but I wanted to know which one was being attacked. The last service I moved was this blog. After 5 minutes, the DDoS returned, hitting the new address.
This told me a couple of things:
  1. The attack started at precisely 3:00pm and it lasted exactly 12 hours. This appeared to be a schedule attack.

  2. They were explicitly targeting my blog and Hacker Factor web service. (Why? What did I do this time? Or maybe, what did I write recently?)

  3. They were repeatedly checking DNS to see if I moved. They knew this was a logical step and they were watching for it. That's a level of sophistication that your typical script kiddie doesn't think about. Moreover, it appeared to be an automated check. (Automated? I might be able to use that for a counter attack.)
Looking over the network logs, I saw the packets that were doing the attack:
  • It was a flood over UDP. With UDP, you can just shoot out packets (including with fake sender IP addresses) and overwhelm the recipient. This attack varied from targeting port 123/udp (network time protocol) and 699/udp (an unknown port). Neither of these existed on my server. It wasn't about taking down my servers; it was about taking down the routers that lead to my servers.

  • Every UDP packet has a time-to-live (TTL) value that gets decremented with each router hop. The TTL values from the attack packets didn't match the sender's address in the UDP packets. That tells me that the sender IP addresses were forged. I run a bunch of honeypots that benchmark attacks year round. The packet TTLs and timings were consistent with traffic coming from Europe and Asia. I then tracked the attack to AS4134 (China).

  • They were only attacking over IPv4, not IPv6. That's typical for most bulletproof hosting providers. (These are the types of companies with high bandwidth and no concerns about their customers causing massive network attacks.)

  • When the network address was blocked (black hole), the DDoS stopped shortly afterwards. When my DNS changed, the attack restarted. This tells me that they were monitoring my address in order to see when it went down.

  • After I changed IP address, I noticed something. Buried in the logs was a single IP address at a university (not in China). It was continually polling to see if my server was up. Blocking that one IP address caused the DDoS against the new IP address to turn off. The attackers appeared to be using this as a way to decide when to disable the attack. (Finding this needle in the 150,000 packets-per-second haystack was the hard part.)
All of this tells me the how, but not the who or why.

Who and Why?

I turned the IP addresses, packet captures, and logs over to some of my, uh, friends. I do not know the details of their methods, but they are very effective.
  • They tracked the bulk of the DDoS attack to servers often associated with attacks from North Korea.

  • They found that the university system was in a specific university lab. The lab members mostly had Korean names. We suspect that either (A) at least one of the students was North Korean posing a South Korean, or (B) one of the students had downloaded or clicked something that allowed North Korea to compromise the system.
Then I looked back at my blog. Eight days before the attack, I had blogged about C2PA and used AI-generated pictures of North Korea's leader, Kim Jong Un, as my example. Here's the opening of the blog:
There's nothing worse than a depressed, drunk man who has his finger on the nuclear button.

It appears that this was enough to upset the North Korean government and make me a target for a massive network attack.

Hiding For Safety

Since I'm not taking down my blog, I decided to take additional steps in case the DDoS started up again.

There are some online services that provide DDoS protection. I looked into them and decided to switch to CloudFlare. What they provide:
  • Domain fronting. When you connect to hackerfactor.com or fotoforensics.com, you actually connect to one of CloudFlare's servers. They forward the request back to my services. If there is a network attack, then it will hit CloudFlare and not me.

  • DDoS protection. I kind of felt bad for setting up CloudFlare for an attack. However, this is one of the things they explicitly offer: DDoS protection, even at the free account level.

  • Content caching. By default, they will cache web content. This way, if a hundred people all ask for my blog, I only have to provide it once to CloudFlare. This cuts down on the network volume.

  • Filtering rules. Even at the free tier, you can create filtering rules to stop bots, AI-scrapers, block bullet-proof hosting providers, etc. (I'm using their paid tier for some of my domains because I wanted more filter options.)
Setting up an account and moving my main domains took hours -- not days or months.

The downside of using CloudFlare is that I like to monitor my network attacks. Since CloudFlare gets these attacks instead of me, I don't have that insight. However, I still run some honeypots outside of CloudFlare so I still have baseline attack metrics.

The Second Wave

Even though my servers had been hit by a massive attack, I decided to slowly move them to the new service. (I'd rather be slow and cautious and get everything right, than to rush it and make a different problem.)

On 28-Oct-2024 (five days after the first attack) at almost exactly 1:00 AM, the attack started again. Although I had moved my servers behind CloudFlare, they appeared to be directly attacking my previously-known location.

Unfortunately, they guessed correctly. Even though CloudFlare was protecting me from incoming attacks, CloudFlare was forwarding valid requests back to my servers. And my servers were still at the old IP addresses. By attacking the old addresses, the DDoS managed to take down my service again.

I called my ISP's emergency 24/7 support number to report the problem, but nobody answered so I left a message. I repeatedly called back every 30-60 minutes until I was able to reach a person -- at 7:20am. (I spoke to the head of my ISP's IT department. They will make sure the 24/7 support will actually be manned next time.) They issued another IP address black hole to stop the attack, and it stopped 20 minutes later.

At this point, I decided to switch around network addresses and bridge in a second ISP. If one ISP goes down, the other one should kick in.

The Third Wave

On 30-Oct-2024, the third wave happened. This one was kind of funny. While my servers were dual homed and on different IP addresses, I still had some equipment using the old addresses. I was working late at night and heard the server fans start up again...

It took me a moment to check all of my diagnostics and determine that, yes, it was the DDoS again. It only took a minute for me to look up the ISP's 24/7 support number. However, as I picked up the phone, I heard the fans rev down. (Odd.) A few seconds later, a different server began revving up. After a minute, it spun down and a third server revved up.

That's when I realized what the attacker was doing. I had a sequential block of IP addresses. They were DDoS'ing one address and checking if my server went offline. After a minute, they moved the DDoS to the next IP address, then the next one. Here's the problems they were facing:
  • I had moved my main services to different addresses. This meant that the attacker couldn't find me.

  • My services were behind CloudFlare and they cache content. Even if the attacker did find me, their polling to see if I was down would see cached content and think I was still up.
Later that day, CloudFlare posted about a massive DDoS that they had prevented.
Cloudflare
@cloudflare@noc.social

We recently thwarted a massive UDP Flood attack from 8-9K IPs targeting ~50 IP addresses of a Magic Transit customer. This was part of a larger campaign we covered in our Q3 2024 report. Check out the full details here: https://blog.cloudflare.com/ddos-threa...
5.6 terabits per second. Wow. When I wrote to CloudFlare asking if this was related to me, I received no reply. I'm certainly not saying that "this was due to me", but I kind of suspect that this might have been due to me. (Huge thanks to CloudFlare for offering free DDoS protection!)

Keep in mind, CloudFlare says that they can handle 296 terabits per second, so 5.4Tbps isn't going to negatively impact them. But I can totally understand why my (now former) ISP couldn't handle the volume.

Tricks, Treats, and Terabits

I did lay out a couple of detectors and devised a few ways to automatically redirect this attack toward other targets. However, it hasn't resurfaced in a year. (I really wanted to redirect North Korea's high-volume DDoS attack against Russian targets. Now that I've had time to prepare a proper response, I'm sure I can do the redirection with no impact to my local network. I mean, they watch my DNS, so I'd just need to change my DNS to point to Russia. I wonder if this redirected attack would cause an international incident?)

Halloween stories usually end when the monster is vanquished. The lights come back on, the hero breathes a sigh of relief. But for system administrators, the monsters don't die; they adapt. They change IPs, morph signatures, and wait for a moment of weakness.

Some people fear ghosts or ghouls. I fear the faint whine of server fans spinning up in the middle of the night. A sound that means something, somewhere, has found me again. The next time the servers ramps up, it might not be an innocent workload. It might be the North Korea bot army.

C2PA in a Court of Law

20 October 2025 at 09:16
Everyone with a new project or new technology wants rapid adoption. The bigger the customer base, the more successful the project can hope to be. However, in the race to be the first one out the door, these developers often overlook the customer needs. (Just because you can do it, does it mean the customer wants it? And does it solve a problem that the customer has?)

Many of my friends, colleagues, and clients work in security-related industries. That explicitly means that they are risk-averse and very slow to adopt new technologies. Even for existing technologies, some of my clients have multi-month release cycles. Between when I give them a code drop and when they deploy it for their own use, a few months might pass. During that time, the code is scanned for malware, tested in an isolated testing environment (regression testing), then tested in a second testing environment (more aggressive testing), maybe a third testing environment (real-world simulation), and then finally deployed to production. It's rare for large companies to just download the code drop and start using it in production.

There's a big reason for being risk-averse. If you work in the financial, medical, or insurance fields, then you have legal liability. You're not going to adopt something that puts you or your company at risk. If you can't trust that a tool's results will be consistent or trustworthy, then you're not going to use it.

In this blog, I explore whether C2PA-signed media, such as photos from the Google Pixel 10, should be accepted as reliable evidence in a court of law. My findings show serious inconsistencies in timestamps, metadata protection, and AI processing. These issues call its forensic reliability into question.

Legally Acceptable

Forensics effectively means "for use in a court of law". For any type of analysis tool, the biggest concern is whether the tool or the results are admissible in court. This means that it needs to comply with the Daubert or Frye standards and the Federal Rules of Evidence. These are the primary requirements needed to ensure that tools and their results can be admissible in a US courtroom.

Daubert and Frye refer to two different guidelines for accepting any tools in a court of law. As a non-attorney, my non-legal understanding is that they are mostly the same thing. Both require the tools, techniques, and methods to be relevant to the case, scientifically sound, and provide reliable interpretations. The main differences:
  • Daubert is used in federal and some state courts. Frye is used in the states that don't rely on Daubert. However, you might hear arguments related to both the Daubert and Frye interpretations in a single courtroom.

  • Frye is based on a "general acceptance" criteria: Is the underlying scientific principle or method "generally accepted" as reliable within the relevant scientific community?

  • Daubert requires the judge to act as the final "gatekeeper". The judge uses specific criteria to evaluate the principles and methodology (not the conclusion generated) before determining if the approach is acceptable. Because Daubert considers multiple factors (such as the error rate and methods used), it is often considered to be a stricter standard.

  • In both cases, judges often rely on precedence. If your tool is accepted in court one time, then it's more likely to be accepted the next time.
Along with Daubert and Frye, the Federal Rules of Evidence (FRE) define criteria for acceptability of both the evidence and anyone testifying about the evidence. These include guidance about relevancy (FRE Rules 401 and 403), expert testimony (FRE Rule 702), and scientific acceptability (FRE Rules 901 and 902). I'm not an attorney, and I'm sure that legal experts can identify additional requirements.

For my FotoForensics service:
  • All analyzers are based on solid (logical) theory. The outputs are deterministic and repeatable. Moreover, other people have been able to implement variations of my analyzers and can generate similar results. (This goes toward reproducibility.)

  • I explicitly avoid using any kind of deep learning AI. (I don't use AI to detect AI, which is the current fad.) This goes toward provability. In fact, every outcome on the commercial FotoForensics service is easily explainable -- it's a white box system, not a black box. When the expert gets up on the stand, they can explain how the software reached any results.

  • FotoForensics has been peer reviewed and referenced in academic publications. The back-end analysis tools also passed a functional review by the Department of Defense Cyber Crime Center (DC3). (The DC3 only gives a pass/fail rating. It passed, with the note "steep learning curve". FotoForensics was created as a front-end to simplify the learning curve.)

  • FotoForensics has been used in a court of law and deemed acceptable. (Technically, the tool itself was never questioned. The experts using the tool were found to be acceptable under FRE Rule 702.)
In contrast to FotoForensics, we have cameras adopting AI content creation and integrating with C2PA. With the Google Pixel 10, we have them combined. Each poses a problem and when combined, they make the problems worse.

Vetting the Pixel 10

Let's say you have a picture from Google's new Pixel 10 smartphone and you want to use it as evidence in a court of law. Is the picture reliable? Can the C2PA signature be used to authenticate the content?

Keep in mind, if you are law enforcement then you are a trained observer and sworn to uphold the law. In that case, saying "Yes, that picture represents the situation that I observed" is good enough. But if you're anyone else, then we need to do a deep analysis of the image -- in case you are trying to submit false media as evidence. This is where we run into problems.

To illustrate these issues, I'm going to use three pictures that "named router" (that's what he calls himself) captured. He literally walked in to a store, asked to test a Pixel 10 camera, and took three photos in succession. Click click click. (He gave me permission to use them in this blog.) Here are the pictures with links to the analyzers at FotoForensics, Hintfo, and C2PA's official Content Credentials website, as well as the EXIF timestamp and the trusted timestamp (TsT) from the notary that are found in the picture's metadata:

ImageAnalyzersEXIF vs TsT DifferenceMetadata Consistency
FotoForensics

Hintfo

Content Credentials
EXIF: 2025-10-14 17:19:25 +02:00
Notarized: 2025-10-14 15:19:26 GMT
Difference: +1 second
Consistent metadata and expected behavior.
FotoForensics

Hintfo

Content Credentials
EXIF: 2025-10-14 17:19:32 +02:00
Notarized: 2025-10-14 15:19:32 GMT
Difference: 0 seconds
Includes a video; an unexplained variation.
FotoForensics

Hintfo

Content Credentials
EXIF: 2025-10-14 17:19:36 +02:00
Notarized: 2025-10-14 15:19:35 GMT
Difference: -1 seconds
Metadata is consistent with the first image. However, the timestamps are inconsistent: the notary predates the EXIF.

If we accept the EXIF and XMP metadata at face value, then:
  • The pictures were taken seconds apart. (Three pictures in 11 seconds.)

  • The EXIF metadata identifies each picture as coming from a Google Pixel 10 camera. Each includes lens settings and device information that seem rational. The EXIF data matches the expectations.

  • There is one problem in the EXIF metadata: Every picture created by the Pixel 10 has EXIF metadata saying that it is a composite image: Composite Image Captured While Shooting. (Technically, this is EXIF tag 0xa460 with value 3. "3" means it is a composite image that was captured while shooting.) This tag means that every picture was adjusted by the camera. Moreover, it was more than the usual auto-exposure and auto-focus that other cameras provide. We do not know if the alterations significantly altered the meaning of the picture. Given that every picture from a Pixel 10 includes this label, it's expected but problematic.

  • Comparing the files, there is a bigger problem. These files were taken in rapid succession. That means the user didn't have time to intentionally change any camera settings. (I confirmed this with the photographer.) The first and third pictures have the same metadata fields in the same order, and both include a Gain Map. (Consistency is good.) However, the second picture includes an additional video attachment. Inconsistency is bad because it can appear to be tampering. (Talking with the photographer, there was nothing special done. We have no idea why the camera decided to randomly attach a video to the JPEG image.)
Many cameras change metadata based on orientation, zoom, or other settings. However, I don't know of any other cameras that arbitrarily change metadata for no reason.

Metadata, by itself, can be maliciously altered, but that usually isn't the default assumption. Instead, the typical heuristic is: trust the metadata until you seen an indication of alteration. At the first sign of any inconsistency, stop trusting the metadata. The same goes for the content: trust the content until you notice one inconsistency. The instant you notice something wrong, you need to call everything into question.

Add in C2PA

Given that every Pixel 10 picture natively includes AI and can randomly change the metadata, this can be very problematic for an examiner. However, it gets worse, because the Pixel 10 includes C2PA metadata.

As mentioned in my earlier blog entry, the Pixel 10's C2PA metadata (manifest) explicitly excludes the EXIF and XMP metadata from the cryptographic signature. Those values can be easily changed without invalidating the cryptographic signature. Again, this doesn't mean that the metadata was altered, but it does mean that the metadata could be altered without detection.

The C2PA metadata includes a timestamp from a trusted source. This is not when the picture was generated. This is when the picture was notarized. It says that the trusted source saw that the data existed at the time it was signed.

What I didn't know at the time I wrote my previous blog entry was that the Pixel 10's Trusted Timestamp Authority (TSA) is built into the camera! They appear to maintain two separate clocks in the Pixel 10. One clock is for the untrusted EXIF timestamps (when the file was created), while the other is for the trusted timestamp (when the notary signed the file). This is where the problem comes in: if both clocks exist on the same device, then I'd expect drift between the clocks to be negligible and times to be consistent.
  • In the first sample picture, the notary's timestamp is one second after the EXIF timestamp. This doesn't bother me since the actual difference could be a fraction of a second. For example, the EXIF data says the picture was captured at 17:19:25.659 seconds (in GMT+02:00). If the signing took 0.35 seconds, then the trusted timestamp would be at 26.xxx seconds -- with integer rounding, this would be 15:19:26 GMT, which matches the timestamp from the trusted notary.

  • The second picture was captured at 17:19:32 +02:00 and notarized within the same second: 15:19:32 GMT. This is consistent.

  • The third picture was captured at 17:19:36 +02:00 and notarized one second earlier! At 15:19:35 GMT. This is a significant inconsistency. Since it's all on the same device, the notary should be at or after the EXIF is generated; never before. Moreover, since the trusted timestamp shows that the file was notarized before the EXIF data was generated, we don't know what information was actually notarized.
Without C2PA, I would have had no reason to distrust the EXIF timestamps. But with C2PA, I now have a strong reason to suspect that a timestamp was altered. Moreover, we have two inconsistencies among these three pictures: one with different metadata and one with the notary generating a signed timestamp before the data was created.

As an analyst with no information about the ground truth, it's not just that I wouldn't trust the third picture; I wouldn't trust the entire set! It looks no different than if someone modified the EXIF timestamps on the third picture, and likely altered the first two pictures since the alteration would be undetectable. Moreover, the metadata difference in the second picture could be from someone inserting a different image into the sequence. (That's common with forgeries.)

Unfortunately, knowing the ground truth doesn't make this any better.

With other Google Pixel 10 images, I've seen the unaltered EXIF times and trusted timestamp differ by almost 10 seconds. For example, this haystack image came from DP Review:



The metadata says that the photo was captured on 2025-08-27 01:02:43 GMT and notarized a few seconds later, at 2025-08-27 01:02:46 GMT. The problem is, there's also a depth map attached to the image, and it also contains EXIF metadata. The main image's EXIF data is part of the C2PA exclusion list, but the depth map (and its EXIF data) is protected by a C2PA's cryptographic signature. The depth map's EXIF says it was created at "2025-08-27 01:02:47 GMT", which is one second after the picture was notarized.
  • The entire picture took four seconds to be created.
  • The notary signed the trusted timestamp before the camera finished creating the file.
This doesn't just call into question the trusted timestamp in the initial three pictures or the haystack photo. This calls into question every trusted timestamp from every C2PA-enabled application and device.

Time's Up

Back in 2024, I gave a presentation for the IPTC organization titled "C2PA from the Attacker's Perspective". During the talk, I demonstrated some of the problems that I had previously (privately) reported to the C2PA. Specifically, I showed how anyone could alter the trusted timestamp without detection. (Some of C2PA's steering committee members had previously heard me say it was possible, but they had not seen a working demonstration of the exploit.) There were two core problems that made this alteration possible:
  1. The C2PA implementation by Adobe/CAI was never checking if the trusted timestamp was altered. This meant that I could alter it without detection. (It was pretty entertaining when I demonstrated this live, and people were commenting in the real-time chat with remarks like "I just confirmed it.")

  2. As far as I could tell, the trusted timestamp was not protecting anything else. The trusted timestamp is supposed to be cryptographically encompassing some other data:

    1. You generate a hash of the data you want to sign.
    2. You send the hash to the Trusted Timestamp Authority (TSA).
    3. The TSA returns a certificate with a signed timestamp.

    Although my IPTC presentation criticized an earlier version of the C2PA specification, the most recent version (C2PA v2.2) retains the same problem. Unfortunately, C2PA's implementation doesn't appear to signing anything of importance. This appears to be a problem that stems from the C2PA specifications. Specifically, section 10.3.2.5 mentions signing a minor portion of the manifest: (my bold for emphasis)
    10.3.2.5.2. Choosing the Payload

    A previous version of this specification used the same value for the payload field in the time-stamp as was used in the Sig_signature as described in Section 10.3.2.4, β€œSigning a Claim”. This payload is henceforth referred to as a "v1 payload" in a "v1 time-stamp" and is considered deprecated. A claim generator shall not create one, but a validator shall process one if present.

    The "v2 payload", of the "v2 time-stamp", is the value of the signature field of the COSE_Sign1_Tagged structure created as part of Section 10.3.2.4, β€œSigning a Claim”. A "v2 payload" shall be used by claim generators performing a time-stamping operation.


    10.3.2.4. Signing a Claim

    Producing the signature is specified in Section 13.2, β€œDigital Signatures”.

    For both types of manifests, standard and update, the payload field of Sig_structure shall be the serialized CBOR of the claim document, and shall use detached content mode.

    The serialized COSE_Sign1_Tagged structure resulting from the digital signature procedure is written into the C2PA Claim Signature box.

    ...
    14.2.2. Use of COSE

    Payloads can either be present inside a COSE signature, or transported separately ("detached content" as described in RFC 8152 section 4.1). In "detached content" mode, the signed data is stored externally to the COSE_Sign1_Tagged structure, and the payload field of the COSE_Sign1_Tagged structure is always nil.
    It appears that other parts of the manifest (outside of the "v2 payload") can be altered after generating the trusted timestamp's signature. This gives the appearance of claiming that the timestamp authenticates the content or manifest, even though it permits the manifest to be changed after generating the trusted timestamp information.
After my live demo, the Adobe/CAI developers fixed the first bug. You can no longer manually alter the trusted timestamp without it being flagged. If you use the command-line c2patool, any attempt to alter the trusted timestamp returns a "timeStamp.mismatch", like:
"informational": [
{
"code": "timeStamp.mismatch",
"url": "self#jumbf=/c2pa/urn:c2pa:2c33ba25-6343-d662-8a8c-4c638f4a3e68",
"explanation": "timestamp did not match signed data"
}
However, as far as I can tell, they never fixed the second problem: the trusted timestamp isn't protecting anything important. (Well, other than itself.). The manifest can be changed after the trusted timestamp is generated, as long as the change doesn't touch the few bytes that the C2PA specifications says to use with the trusted timestamp.

With C2PA, the trusted timestamp is supposed to work as a notary. However, as a notary, this is effectively signing a blank piece of paper and letting someone fill in everything around it. The trusted timestamp appears to be generated before the file is fully created and the exclusion list permits the EXIF and XMP metadata to be altered without detection. In any legal context, these fatal flaws would immediately void the notary's certification.

This isn't just a problem with every Google Pixel photo. This seems to be a problem with every signed C2PA image that includes a TSA's trusted timestamp. Since the Google Pixel 10 is on C2PA's conforming product list, this is a problem with the C2PA specification and not the individual implementations.

Core Arguments

At this point, we have a photo from a Pixel 10 with a C2PA signature. What can we trust from it?
  • The Pixel 10 uses AI to alter the image prior to the initial saving. As demonstrated in the previous blog entry, the alterations are extreme enough to be confused with AI-generated images. There appears to be no way to disable these alterations. As a result of the AI manipulation, it becomes questionable whether the alterations change the media's meaning and intent.

  • The Pixel 10 does not protect the EXIF or XMP metadata from tampering. (Those metadata blocks are in the C2PA manifest's exclusion list.) An analyst should completely ignore the C2PA metadata and evaluate the EXIF and XMP independently.

  • The trusted timestamp appears to be mostly independent of the picture's creation time. (I say "mostly" because the camera app appears to call the TSA "sometime" around when the picture's C2PA manifest was being created.)
If the C2PA cryptographic signatures are valid, it only means that the file (the parts that were not excluded from the signature) was not altered after signing; it says nothing about the validity of the content or accuracy of the metadata. A valid C2PA manifest can be trusted to mean that parts of the file are cryptographically unaltered since the camera signed it. However, it does not mean that the visual content is a truthful representation of reality (due to AI) or that the metadata is pristine (due to exclusions).

But what if the C2PA signatures are invalid, or says that the media was altered? Does that mean you can still trust it? Unfortunately, the answer is "no". Just consider the case of the Holocaust deniers. (I blogged about them back in 2016.) Some Holocaust deniers were manufacturing fake Holocaust pictures and getting them inserted into otherwise-legitimate photo collections. Later, they would point out their forgeries (without admitting that they created them) in order to call the entire collection into question.

By the same means, if someone attaches an invalid C2PA manifest to an otherwise-legitimate picture, it will appear to be tampered with. If people believe that C2PA works, then they will assume that the visual content was altered even through the alteration involved the manifest. (This is the "Liar's Dividend" effect.)

Regardless of whether the C2PA's cryptographic signatures are valid or invalid, the results from the C2PA manifest cannot be trusted.

Summary Judgment

All of this takes us back to whether pictures from a Google Pixel 10, and C2PA in general, should be accepted for use in a court of law. (To reiterate, I'm not a legal scholar and this reflects my non-attorney understanding of how evidence and forensic tools are supposed to be presented in court.)
  • FRE Rule 901(b)(9) covers "Evidence About a Process or System". The proponent must show the "process or system produces an accurate result." In this blog entry, I have shown that the "Pixel 10/C2PA system" fails this on several counts:

    • Lack of "Accurate Result": The AI/Computational Photography from the Pixel 10 means the image is not a simple, accurate record of light, but an interpretation.

    • System Inconsistency: The random inclusion of a video attachment and the random, unexplained time shifts demonstrate that the "system" (the camera software) is not operating consistently or predictably, thus failing to show that it "produces an accurate result" in a reliably repeatable manner.

    • Lack of Relevancy: The trusted timestamp does not appear to protect anything of importance. The manifest and contents can be written after the Trusted Timestamp Authority generates a signed timestamp.

  • FRE Rule 902 goes toward self-authentication. It is my understanding that this rule exists to reduce the burden of authentication for certain reliable records (like certified business records or electronic process results). By failing to cryptographically protect the foundational metadata (EXIF/XMP), the C2PA system prevents the record from even being considered for self-authentication under modern amendments like FRE Rule 902(13) or 902(14) (certified electronic records and data copies), forcing it back to the more difficult FRE Rule 901 requirements.

  • FRE Rule 702 (Expert Testimony) and black boxes. FRE Rule 702 requires an expert's testimony to be based on "reliable principles and methods" which the expert has "reliably applied to the facts of the case." With AI-driven "computational capture", the expert cannot explain why the image looks the way it does, because the specific AI algorithm used to generate the final image is a proprietary black box. This is a significant Daubert and FRE Rule 702 hurdle that the Pixel 10 creates for any expert trying to authenticate the image.

  • The Kuhmo Tire, Joiner and Rule 702 Precedent. Even if the C2PA specification is found to be generally valid (Daubert/Frye), the Google Pixel 10's specific implementation may fail the Kuhmo Tire test because of its unreliable application (the timestamp reversal, the random video attachment, and the exclusion list). An attorney could argue that the manufacturer's implementation is a sloppy or unpredictable application of the C2PA method.
These failures mean that any evidence created by this system raises fundamental challenges. It fails the reliability, reproducibility, and scientific soundness of any media created by the Pixel 10 and of C2PA in general. Based on these inconsistencies, I believe it would likely fail the Daubert, Frye, and Rule 901 standards for admissibility.

The flaws that I've shown, including inconsistent timestamps and partial signing, are not edge cases or vendor oversights. They are baked into the specification's logic. Even a flawless implementation of C2PA version 2.2 would still produce unverifiable provenance because the standard itself allows essential data to remain outside the trusted boundaries.

Good intent doesn't make a flawed protocol trustworthy. In forensic work, evidence is either verifiable or it isn't. If a standard cannot guarantee integrity, then it doesn't matter how many companies endorse it since it fails to be trustworthy.

On Tuesday Oct 21 (tomorrow), the C2PA/CAI is holding a talk titled, "Beyond a Reasonable Doubt: Authenticity in Courtroom Evidence". (Here's the announcement link that includes the invite.) I've tried to reach out to both the speaker and Adobe about this, but nobody responded to me. I certainly hope people go and ask the hard questions about whether C2PA, in its current form, should be admissible in a court of law. (It would be great if some knowledgeable attorneys showed up and asked questions related to Daubert, Frye, Kuhmo Tire, Joiner, Rule 901, Rule 902, Rule 702, etc.) My concern is that having an Assistant District Attorney present in this forum may give false credibility toward C2PA. While I have no reason to doubt that the Assistant District Attorney is a credible source, I believe that the technology being reviewed and endorsed by association is not credible.

Photographic Revision vs Reality

6 October 2025 at 09:38
Last month, one of my PTSD (People Tech Support Duties) requests led me down a deep path related to AI-alterations inside images. It began with a plea to help photograph a minor skin irritation. But this merged with another request concerning automated AI alterations, provenance, and detection. Honestly, it looks like this rush to embrace "AI in everything" is resulting in some really bad manufacturer decisions.

What started with a request to photograph a minor skin rash ended up spiraling into a month-long investigation into how AI quietly rewrites what we see.

Cameras Causing Minor Irritations

The initial query came from a friend. Her kid had a recurring rash on one arm. These days, doctor visits are cheaper and significantly faster when done online or even asynchronously over email. In this case, the doctor sent a private message over the hospital's online system. He wanted a photo of the rash.

This sounds simple enough. Take out the camera, hold out the arm, take a photo, and then upload it to the doctor. Right?

Here's the problem: Her camera kept automatically applying filters to make the picture look better. Visually, the arm clearly had a rash. But through the camera, the picture just showed regular skin. It was like one of those haunted house scenes where the mirror shows something different. The camera wasn't capturing reality.



These days, smart cameras often automatically soften wrinkles and remove skin blemishes -- because who wants a picture of a smiling face with wrinkles and acne? But in this case, she really did want a photo showing the skin blemishes. No matter what she did, her camera wouldn't capture the rash. Keep in mind, to a human seeing it in real life, it was obvious: a red and pink spotted rash over light skin tone. We tried a couple of things:
  • Turn off all filters. (There are some hidden menus on both Android and iOS devices that can enable filters.) On the Android, we selected the "Original" filter option. (Some Androids call this "None".) Nope, it was still smoothing the skin and automatically removing the rash.

  • Try different orientations. On some devices (both Android and iOS), landscape and portrait modes apply different filters. Nope, the problem was still present.

  • Try different lighting. While bright daylight bulbs (4500K) helped a little, the camera was still mitigating most of it.

  • Try a different camera. My friend had both an Android phone and an Apple tablet; neither was more than 3 years old. Both were doing similar filterings.
We finally did find a few ways to get good pictures of the rash:
  • Use a really old digital camera. We had a 10+ year old Sony camera (not a phone; a real standalone camera). With new batteries, we could photograph the rash.

  • On my older iPhone 12 mini, I was able to increase the exposure to force the rash's red tint to stand out. I also needed bright lighting to make this work. While the colors were far from natural, they did allow the doctor to see the rash's pattern and color differential.

  • My laptop has a built-in camera that has almost no intelligence. (After peeling off the tape that I used to cover the camera...) We tried a picture and it worked well. Almost any desktop computer's standalone webcam, where all enhancements are expected to be performed by the application, should be able to take an unaltered image.
I'm glad my friend's kid found this entire experimentation process fascinating. But if this had been a more time-sensitive issue, I honestly don't know what a typical user with a newer device could have done.

This irritating experience was just a scratch of a much larger issue that kept recurring over the month. Specifically, how modern cameras' AI processing is quietly rewriting reality.

AI Photos

Since the start of digital photography, nearly all cameras have included some form of algorithmic automation. Normally it is something minor, like auto-focus or auto-contrast. We usually don't think of these as being "AI", but they are definitely a type of AI. However, it wasn't until 2021 when the first camera-enabled devices with smart-erase became available. (The Google Pixel 6, Samsung Galaxy S21, and a few others. Apple didn't introduce its "Clean Up" smart erase feature until 2024.)

Following the rash problem, I had multiple customer requests asking whether their pictures were real or AI. Each case concerned the same camera: The new Google Pixel 10. This is the problem that I predicted at the beginning of last month. Specifically, every picture from the new Google Pixel 10 is tagged by Google as being processed by AI. This is not something that can be turned off. Even if you do nothing more than bring up the camera app and take a photo, the picture is tagged with the label:
Digital Source Type: http://cv.iptc.org/newscodes/digitalsourcetype/computationalCapture
According to IPTC, this means:
The media is the result of capturing multiple frames from a real-life source using a digital camera or digital recording device, then automatically merging them into a single frame using digital signal processing techniques and/or non-generative AI. Includes High Dynamic Range (HDR) processing common in smartphone camera apps.

In other words, this is a composite image. And while it may not be created using a generative AI system ("and/or"), it was definitely combined using some kind of AI-based system.

In industries that are sensitive to fraud, including banking, insurance, know-your-customer (KYC), fact checking, legal evidence, and photojournalism, seeing any kind of media that is explicitly labeled as using AI is an immediate red flag. What's worse is that analysis tools that are designed to detect AI alterations, including my tools and products from other developers, are flagging Pixel 10 photos as being AI. Keep in mind: Google isn't lying -- every image is modified using AI and is properly labeled. The problem is that you can't turn it off.

One picture (that I'm not allowed to share) was part of an insurance claim. If taken at face value, it looked like the person's car had gone from 60-to-zero in 0.5 seconds (but the tree only sustained minor injuries). However, the backstory was suspicious and the photos, from a Google Pixel 10, had inconsistencies. Adding to these problems, the pictures were being flagged as being partially or entirely AI-generated.

We can see this same problem with a sample "original" Pixel 10 image that I previously used.



At FotoForensics, the Error Level Analysis (ELA) permits visualizing compression artifacts. All edges should look similar to other edges, surfaces should look like surfaces, and similar textures should look similar. With this image, we can see a horizontal split in the background, where the upper third of the picture is mostly black, while the lower two thirds shows a dark bluish tinge. The blue is due to a chrominance separation, which is usually associated with alterations. Visually, the background looks the same above and below (it's the same colors above and below), so there should not be a compression difference. The unexpected compression difference denotes an alteration.



The public FotoForensics service has limited analyzers. The commercial version also detects:
  • A halo around the light fixture, indicating that either the background was softened or the chandelier was added or altered. (Or all of the above.)

  • The chevrons in the stained glass were digitally altered. (The Pixel 10 boosted the colors.)

  • The chandelier has very strong artifacts that are associated with content from deep-learning AI systems.
None of these were intentional alterations. (Jeff just opened the camera app and took a picture. Nothing fancy by the human.) These are all AI-alterations by the Google Pixel 10 and they cannot be disabled.

In my previous blog entry, I showed that Google labels all photos as AI and that the metadata can be altered without detection. But with these automatic alterations baked into the image, we can no longer distinguish reality from revision.

Were the pictures real? With the car photos (that I cannot include here), my professional opinion was that, ignoring the AI and visual content, the photos were being misrepresented. (But doesn't the Pixel 10 use C2PA and sign every photo? Yes it does, but it doesn't help here because the C2PA signatures don't protect the metadata.) If I ignored the metadata, I'd see the alterations and AI fingerprints, and I'd be hard-pressed to determine if the detected artifacts were human initiated (intentional) or automated (unintentional). This isn't the desired AI promise, where AI generates content that looks like it came from a human. This is the opposite: AI forcing content from a human to look like AI.

Other Tools

After examining how these AI-enabled systems alter photos, the next question becomes: how well can our current tools even recognize these changes?

My analysis tools rely on deterministic algorithms. (That's why I call the service "FotoForensics" -- "Forensics" as in, evidence suitable for a court of law.) However, there are other online services that use AI to detect AI. Keep in mind, we don't know how well these AI systems were trained, what they actually learned, what biases they have, etc. This evaluation is not a recommendation to use any of these tools.

This inconsistency between different AI-based detection tools is one of the big reasons I don't view any of them as serious analyzers. For the Pixel 10 images, my clients had tried some of these systems and saw conflicting results. For example, using the same "original" Pixel 10 baseline image:
  • Hive Moderation trained their system to detect a wide range of specific AI systems. They claim a 0% chance that this Pixel 10 photo contains AI, because it doesn't look like any of the systems they had trained on. Since the Pixel 10 uses a different AI system, they didn't detect it.


  • Undetectable AI gives no information about what they detect. They claim this picture is "99% REAL". (Does that mean it's 1% fake?)


  • SightEngine decided that it was "3%" AI, with a little generative AI detected.


  • Illuminarty determined that it was "14.9%" AI-generated. I don't know if that refers to 14.9% of the image, or if that is the overall confidence level.


  • At the other extreme, Was It AI determined that this Google Pixel 10 picture was definitely AI. It concluded: "We are quite confident that this image, or significant part of it, was created by AI."

The ground truth is that the Pixel 10 always uses AI to auto-enhance the picture. If you work in a field that forbids any AI enhancement, then the Pixel 10 is a serious problem. (You can't just tell your client that they need to go back to the site of the accident and take pictures with a different camera.)

Fear the Future

Once upon a time, "taking a picture" meant pressing a button and capturing something that looked like reality. Today, it's more like negotiating with an algorithm about what version of reality it's willing to show you. The irony is that the more "intelligent" cameras become, the less their output can be trusted. When even a simple snapshot passes through layers of algorithmic enhancement, metadata rewriting, and AI tagging, the concept of an "original" photo starts to vanish.

People use AI for lots of tasks these days. This includes helping with research, editing text, or even assisting with diagnostics. However, each of these uses still leaves the human with the final decision about what to accept, reject, or cross-validate. In contrast, the human photographer has no option to reject the AI's alterations to these digital photos.

From medical photos and insurance claims to legal evidence, the line between "photo" and "AI-enhanced composite" has blurred. For fields that rely on authenticity, that's not a minor inconvenience; it's a systemic problem. Until manufacturers return real control to the photographer, sometimes the most reliable camera is the old one in the junk drawer -- like a decade-old Sony camera with no Wi-Fi, no filters, and no agenda.

P.S. Brain Dead Frogs turned this blog entry in a song for an upcoming album. Enjoy!

Vulnerability Reporting

19 September 2025 at 13:01
What do you do when you find a flaw in a piece of computer software or hardware? Depending on the bug, a legitimate researcher might:
  • Report it to the vendor. This is the most desirable solution. It should be easy to find a contact point and report the problem.

  • Publicly tell others. Full disclosure and public disclosure, especially with a history showing that you already tried to contact the vendor, helps everyone. Even if there is no patch currently available, it still helps other people know about the problem and work on mitigation options. (Even if you can't patch the system, you may be able to restrict how the vulnerable part is accessed.)

  • Privately tell contacts. This keeps a new vulnerability from being exploited publicly. Often, a vendor may not have a direct method of reporting, but you might know a friend of a friend who can report it to the vendor through other means.

  • Privately sell it. Keeping vulnerabilities quiet also permits making money by selling bugs privately to other interested parties. Of course, you don't know how the others are going to use the new exploit... But that's why you should try to report it to the vendor first. If the vendor isn't interested, then all bets are off. (You can get a premium if you can demonstrate a working exploit and show that the vendor is not interested in fixing it.)

  • Keep it to yourself. While there is a risk of someone else finding the same problem, sometimes it's better to keep the bug handy in case you need it in the future. (This is especially true if compromising systems is part of your job description, such as professional penetration testers or hired guns.)

  • Do nothing. This option is unfortunately common when the reporting method is unidentified or overly complicated. (I'm trying to report a bug, not build a desk from Ikea. I don't want pages of instructions and an Allen wrench.)
Reporting to the vendor, or at least trying to report to the vendor before resorting to some other option, falls under the well-known best practices of responsible disclosure and full disclosure.

Incentives

Some vendors offer incentives in order to receive bugs. These bug bounty programs often include financial rewards in exchange for informing the vendor and working with them to resolve the issue.

In the old days, bug bounties worked pretty well. (I've even made some money that way.) However, over the years, some companies have perverted the incentives. Rather than paying for the information so they can fix it, they pay for the information and require an agreement to legal terms. For example, some companies have attached stipulations to the payments, such as "by agreeing to this transaction, you will not make it public without coordinating the disclosure with us." (And every time you ask, they will say that they have prioritized it or are still working on it, even years later.) More than a few vendors use bug bounties as a way to bury the vulnerability by paying for silence.

I have personally had too many bad experiences with bug bounties and vendors paying for the privilege of being non-responsive. I don't think bounty programs are worth the effort anymore. Additionally, I won't do bug bounty programs if they require enrolling into any service or are associated with some kind of legal agreement. (If they want me to agree to legal terms, then they need to pay for my attorney to review the terms before I sign it.)

In contrast to bounties, some companies send very nice "thank you" response to people who took the effort to report a bug. Often, these are swag and not financial. I've received t-shirts, hats, mugs, stickers, and very nice thank-you letters. Unlike bounties, I've found that each time a vendor sends me an unsolicited "thank you" (even if it's just a nice personalized email), they are responsive and actively fix the bug.

While some people report bugs for the money or swag, I have a different incentive. If I found a bug and it impacts me or my clients, then I want it fixed. The best place to fix it is with the vendor, so I try to report the problem. This is very much a self-serving purpose: I want their code to work better for my needs. Then again, many vendors are incredibly responsive because they want to provide the best solution to their customers. My bug reports help them and me.

Jump Through Hoops

When it comes to bug reporting, the one thing I don't want to do is jump through hoops. The most common hoops include closed lists, scavenger hunts, strict reporting requirements, and forced legal terms.
  • Hoop #1: Closed Lists
    Many open source projects want bugs reported to their mailing list, discord channel, IRC server, private forum, or some other service that requires signing up before reporting. For example, Gentoo Linux wants to you sign up with their Bugzilla service in order to submit a bug, and to keep any discussions in their "forums, IRC or mailing lists". Their various discussion lists also require signing up.

    For me, this is the same as a vendor who is unreachable. When I want to report a bug, I don't want to join any lists or forums; I just want to report a bug.

  • Hoop #2: Scavenger Hunts
    Some organizations have turned "where to report" into a scavenger hunt. The Tor Project used to be a good example of this. Prior to 2019, they wanted you to search through all of their lists and try to find the correct contact point. If you contacted the wrong person, or gave up because you couldn't find the right person, then that's your fault.

    However, after years of complaining, the Tor Project finally simplified the reporting process. Now you can easily find multiple ways to report issues to them. (I still think they usually don't do anything when you report to them, but it's a step in the right direction.)

  • Hoop #3: Strict Requirements
    Some companies and organizations have strict requirements for reporting. While I'm fine with providing my name and contact information (in case they have questions or need more details), some forms require you to provide the device's serial number, software version, and other information that the reporter may not have.

    For example, many years ago I found a problem with a standalone kiosk. (I'm not naming names here.) The vendor only had one way to report problems: use their online form. Unfortunately, the reporting form absolutely would not let me report the problem without providing a valid serial number and software version for the device. The problem is, it's not my kiosk. I don't know the serial number, software version, or even the alphanumeric location code. Moreover, the exploit appeared to work on all of their kiosks. But due to their strict requirements, I was unable to report the problem. (I ended up finding a friend of a friend who had a contact in the company.)

    Some software providers only want reports via their Bugzilla service. Bugzilla often has fields for additional information. (Other bug tracking services have similar features.) Unfortunately, I've had some software groups (again, not naming names) eagerly close out the bug report because I didn't have all of their required information. (It's not that I intentionally didn't supply it; I just didn't have that information.) Beyond an automated message telling me that the bug was closed, they never contacted me. As far as I can tell, they never even looked at the bug because the form wasn't complete enough for them. Keep in mind: my bug reports include detailed step-by-step instructions showing how to do the exploit. (In a few of these cases, I ended up selling the exploits privately to interested parties since the vendors were non-responsive.)

  • Hoop #4: Mandatory Terms
    Some companies, like Google, don't have a means to report vulnerabilities without agreeing to their terms. Years ago, Google would accept vulnerability reports with no strings attached. But today, they direct everything to their "Bug Hunters" page. Before reporting a bug, you must agree to their terms.

    For me, any kind of non-negotiable mandatory agreement is a showstopper. Even though their terms seem simple enough, I cannot agree to them. For example, they expect "ethical conduct". I expect to act ethically, but since it's their terms, they get to determine what is and is not ethical conduct.
I can understand signing up if you want to be paid a bounty. (Google promises some very large payouts for rare conditions.) But often, I'm not looking for a bounty -- I just want to report a bug.

Interestingly, language barriers have never been a problem in my experience. Most companies auto-translate or accept reports in English. This makes the remaining hoops (legal terms, closed lists, etc.) stand out, because they are entirely self-imposed.

The Bug Bounty Scam

As a service provider, I also receive a lot of inquiries from people posing as potential bug reporters. Here's a sample email:
From: EMX Access Control Sensors <no-reply@[redacted]>
To: [redacted]@[redacted].com
Subject: Paid Bug Bounty Program Inquiry enquiry

Name:
Email: yaseen17money@hotmail.com
Message: Hi Team, I’m Ghulam Yaseen, a security researcher. I wanted to ask if you offer a paid bug bounty program or any rewards for responsibly disclosed vulnerabilities. If so, could you please share the details.
Yaseen writes to me often. Each time, from a different (compromised) domain. (I'm not worried about redacting his email since dozens of forums list this entire email in its entirety.) This isn't a real inquiry/enquiry. A legitimate query would come from their own mail server and use their own name as the sender. I think this is nothing more than testing the mail server to see if it's vulnerable.

In contrast, here's another query I received, and it seems to be from real people (names redacted):
Date: Thu, 21 Aug 2025 05:47:47 +0530
From: [Redacted1] <[redated1]@gmail.com>
Cc: [redacted2]@gmail.com, [Redacted1] <redacted1@gmail.com>
Subject: Inquiry Regarding Responsible Disclosure / Bug Bounty Program

Hello Team,

We are [Redacted1] and [Redacted2], independent security researchers specializing in vulnerability discovery and organizational security assessments.

- [Redacted2] – Microsoft MSRC MVR 2023, with prior vulnerability reports to Microsoft, SAP, BBC, and government entities in India and the UK.
- [Redacted1] – Experienced in bug bounty programs, penetration testing, and large-scale reconnaissance, with findings reported to multiple high-profile organizations.

We came across your organization’s security program information and are interested in contributing by identifying and reporting potential vulnerabilities in a responsible manner.Given our combined expertise in deep reconnaissance, application security, and infrastructure assessments, we believe we could contribute to uncovering critical security issues, including hidden or overlooked vulnerabilities.

Before conducting any form of testing, we would like to confirm the following:

1. Does your organization have an active responsible disclosure or bug bounty program?
2. Could you please define the exact scope of assets that are permitted for testing?
3. Are vulnerabilities discovered on assets outside the listed scopes but still belonging to your organization eligible for rewards?
4. Any rules of engagement, limitations, or legal requirements we should be aware of?
5. Bounty reward structure (if applicable).

We follow strict ethical guidelines and ensure all reports include clear technical detail, reproduction steps, and remediation recommendations.

Looking forward to your guidance and confirmation before initiating any further testing.

Best regards,
[Redacted1] & [Redacted2]

There's no way I'm going to respond to them.
  • If I'm going to have someone audit my servers, it will be someone I contact and not who contacts me out of the blue. As covered by Krebs's 3 Basic Rules for Online Safety, "If you didn’t go looking for it, don’t install it!" The same applies to unsolicited offers for help. I didn't go looking for this, so I'm certainly not going to allow them to audit my services.

  • Responding positively to any of their questions effectively gives them permission to attack your site.
These are not the only examples. I receive this kind of query at least weekly.
  • Some of these inquires mention that my server has a known vulnerability. However, they don't want to tell me what it is until after they confirm that I have a bug bounty program. If I don't respond, or respond by saying that I don't offer any bounties, then they never tell me the bug. Assuming they actually found something, then this feels like extortion. (Pay them or they won't tell me.)

  • A few people do point out my vulnerabilities. So far, every single case has either been from one of my honeypots or due to my fake /etc/passwd file. (They asked for a password file, so I gave them one. What's the problem?)
The best thing to do if you receive an unsolicited contact like this? Don't respond. Of course, if they do list a vulnerability, then definitely investigate it. Real vulnerabilities should receive real replies.

Sample Pre-reporting Requirements: Google

In my previous blog entry, I wrote about some significant failures in the Google Pixel 10's C2PA implementation. Before writing about it publicly, I tried multiple ways to report the issue:
  1. Direct contact: I repeatedly disclosed my concerns about C2PA to a Google representative. Unfortunately, I believe my concerns were disregarded. Eventually the Google contact directed me to report issues through Google's Bug Hunters system. (That's right: the direct contact didn't want to hear about bugs in his own system unless it came through Google's Bug Bounty service.)

  2. Google Bug Hunters: Google's Bug Hunters system requires agreeing to Google's vulnerability reporting terms before I could even submit the bug. I wasn't looking for a bounty; I simply wanted to report a serious problem. For me, being forced to accept legal terms before reporting is a showstopper.

  3. Private outreach: After I confirmed the flaws in Pixel 10's C2PA functionality, I reached out to my trusted security contacts. In the past, this network has connected me directly to security teams at Amazon, Facebook, and other major vendors. Since Google's C2PA team was non-responsive, I wanted to contact someone in Google's Android security team or legal department; I suspected that they had not independently reviewed the C2PA implementation for its security, trust, and liability implications. (If they had, I doubt this would have shipped in its current form.) Unfortunately, no one had a contact who could receive a report outside of Google's Bug Hunters. (It's really weird how Google directs everything through Bug Hunters.)
At that point, I had exhausted my options. For me, a reporting process that requires accepting legal terms just to submit a vulnerability -- especially when I am already in direct contact with the team responsible -- is a hoop too many. This is why I posted my blog about Pixel 10's C2PA flaws. (Don't be surprised if I eventually post about more Pixel 10's problems without telling Google first. And if someone inside Google reaches out to me, I'd be happy to discuss this directly, without agreeing to any terms.)

Sample Reporting Hoops: Nikon

Around the same time the Pixel 10 was released, Nikon rolled out C2PA support for the Nikon Z6 III camera. Within days, researcher Horshack discovered that he could get any file signed by the camera, which is a serious flaw in the authenticity system. He even released a signed forgery as a demonstration:



If you upload this picture to Adobe's Content Credentials service, it reports that it is a genuine picture from a "Nikon Z6 3" camera, with no indication that it was altered or forged.

To Nikon's credit, the day after DP Review published an article about this, Nikon temporarily suspended their C2PA signing service, saying they had "identified an issue" and would "work diligently" to resolve it. That's a strong response.



Two weeks after disabling the signing service, Nikon announced that they were revoking their signing certificate. (As of this writing, 2025-09-19, the C2PA and CAI have not revoked Nikon's certificate from their list of trusted certificates. Right now, Every Content Credentials service still says the pictures that are signed using a revoked certificate are still valid and trusted.)

It's unclear if Horshack ever tried to report directly to Nikon. When I searched for a security contact point, Nikon only listed HackerOne as their reporting mechanism. HackerOne is a bug bounty system that requires enrollment, personal information, and banking details. If you aren’t seeking a bounty, then this is a major hoop that discourages reporting.

The community response to Horshack's public disclosure was mostly positive, with many people alarmed and grateful that the issue came to light. However, a few commenters criticized the public release, suggesting it might hurt Nikon's reputation. While lawsuits are always a theoretical risk, I would argue that a vendor that only accepts reports through gated programs effectively pushes researchers toward public disclosure as the only viable reporting path.

In this case, Nikon acted quickly once the problem went public. This demonstrates that they can respond, but the process could have been much smoother if they provided a simple, open reporting channel.

When one problem is reported, it's not unusual to see other people identify related problems. In the comments to the original reporting, other people detailed additional issues. For example:
  • patrol_taking_9j noted that "NX Studio is completely unable to export JPEG's at all for any RAW or RAW+JPEG NEF files shot with C2PA enabled."

  • Horshack replied to his own posting, noting that pictures appear to be signed hours after capture.

  • Pierre Lagarde remarked that "The only thing I can say is C2PA still looks like a problem by itself, not that much like a solution to anything. At least, I think the inclusion of this feature at this stage seems premature." (I fully agree.)

  • To further demonstrate the problem, Horshack created a second signed forgery:



    As with his first forgery, the Content Credentials service reports that it is a photo from a "Nikon Z6 3" camera.
These additional problems show the power of public disclosure. Had Horshack not made the initial problem public, other people may have not looked as closely and these related concerns may not have been brought to light.

Lower Barriers, Better Security

Bug reporting should not feel like running an obstacle course. Every extra hurdle, whether it's mandatory legal terms, scavenger hunts, closed lists, or bounty-program enrollment, increases the likelihood that a researcher will give up, go public, or sell the exploit privately.

The Google and Nikon cases highlight the same lesson: if you make responsible reporting difficult, then you drive researchers toward public disclosure. That might still result in a fix, but it also increases the window of exposure for everyone who uses the product.

The vendors who handle vulnerability reporting the best are the ones who make it simple: a plain-text email address, a short web form, or even a contact page that doesn't require more than a description and a way to follow up. Many of these vendors don't pay bounties, yet they respond quickly and even say "thank you", which is often enough to keep security researchers engaged.

The industry doesn't need more hurdles. It needs frictionless reporting, fast acknowledgment, and a clear path from discovery to resolution. Good security starts with being willing to listen: make it as easy as possible for the next person who finds a flaw to tell you about it.

Google Pixel 10 and Massive C2PA Failures

5 September 2025 at 12:41
Google recently released their latest-greatest Android phone: the Google Pixel 10. The device has been met with mostly-positive reviews, with the main criticisms around the over-abundance of AI in the device.

However, I've been more interested in one specific feature: the built-in support for C2PA's Content Credentials. For the folks who are new to my blog, I've spent years pointing out problem after problem with C2PA's architecture and implementation. Moreover, I've included working demonstrations of these issues; these problems are not theoretical. C2PA is supposed to provide "provenance" and "authenticity" (the P and A in C2PA), but it's really just snake oil. Having a cryptographically verifiable signature doesn't prove anything about whether the file is trustworthy or how it was created.

A Flawed Premise

A great movie script usually results in a great movie, regardless of how bad the actors are. (In my opinion, The Matrix is an incredible movie despite Keanu Reeves' lackluster performance.) In contrast, a bad script will result in a bad movie, regardless of how many exceptional actors appear in the film, like Cloud Atlas or Movie 43. The same observation applies to computer software: a great architecture usually results in a great implementation, regardless of who implements it, while a bad design will result in a bad implementation despite the best developers.

C2PA starts from a bad architecture design: it makes assumptions based on vaporware, depends on hardware that doesn't exist today, and uses the wrong signing technology.

Google Pixel 10

I first heard that the Google Pixel 10 was going to have built-in C2PA support from Google's C2PA Product Lead, Sherif Hanna. As he posted on LinkedIn:
It's official β€” the Google Pixel 10 is the first smartphone to integrate C2PA Content Credentials in the native Pixel Camera app. This is not just for AI: *every photo* will get Content Credentials at capture, and so will every edit in Google Photosβ€”with or without AI.

Best of all, both Pixel Camera and Google Photos are *conformant Generator Products*, having passed through the C2PA Conformance Program.

If you didn't know better, this sounds like a great announcement! However, when I heard this, I knew it would be bad. But honestly, I didn't expect it to be this bad.

Sample Original Photo

One of my associates (Jeff) received the Google Pixel 10 shortly after it became available. He took a sample photo with C2PA enabled (the default configuration) and sent it to me. Here's the unaltered original picture (click to view it at FotoForensics):



If we evaluate the file:
  • Adobe (a C2PA steering committee member) provides the official "Content Credentials" web service for validating C2PA metadata. According to them, all digital signatures are valid. The site reports that this came from the "Google C2PA SDK for Android" and the signature was issued by "Google LLC" on "Aug 28, 2025 at 8:10 PM MDT" (they show the time relative to your own time zone). According to them, the image is legitimate.

  • Truepic (another C2PA steering committee member) runs a different "Content Credentials" web service. According to them, "Content credentials are invalid because this file was signed by an untrusted source."



    If we ignore that Truepic haven't updated their trusted certificate list in quite some time, then they claim that the manifest was signed by this signer and that it indicates no AI:
    detected_attributes: {
    is_ai_generated: false,
    is_ai_edited: false,
    contains_ai: false,
    is_camera_captured: false,
    is_visual_edit: false
    }
    Both authoritative sites should authenticate the same content the same way. This contradiction will definitely lead to user confusion.

  • My FotoForensics and Hintfo services display the metadata inside the file. This picture includes a rich set of EXIF, XMP, and MPF metadata, which is typical for a camera-original photo. The EXIF identifies the make and model (Google Pixel 10 Pro), capture timestamp (2025-08-28 22:10:17), and more. (Jeff didn't include GPS information or anything personal.)

  • There's also a C2PA manifest for the "Content Credentials". (It's in the JUMBF metadata block.) FotoForensics shows the basic JUMBF contents, but it's not easy to read. (FotoForensics doesn't try to format the data into something readable because all C2PA information is unreliable. Displaying it will confuse users by giving the C2PA information false credibility.) My Hintfo service shows the parsed data structure:

    • The manifest says it was created using "Google C2PA SDK for Android" and "Created by Pixel Camera".

    • There is a cryptographically signed timestamp that says "2025-08-29T02:10:21+00:00". This is not when the picture was created; this is when the file was notarized by Google's online timestamp service. This timestamp is four seconds after the EXIF data says the picture was captured. This is because it required a network request to Google in order to sign the media.

    • The manifest includes a chain of X.509 certificates for the signing. The signer's name is "Google LLC" and "Pixel Camera". If you trust the name in this certificate, then you can trust the certificate. However, it's just a name. End-users cannot validate that the certificate actually belongs to Google. Moreover, this does not include any unique identifiers for the device or user. Seeing this name is more "branding" than authentication. It's like having "Levi's" stamped on the butt of your jeans.

    Notice that the C2PA manifest does not list the camera's make, model, photo capture time, lens settings, or anything else. That information is only found in the EXIF metadata.

  • Inside the C2PA actions is a notation about the content:
    "digitalSourceType": "http://cv.iptc.org/newscodes/digitalsourcetype/computationalCapture"
    According to IPTC, this means:
    The media is the result of capturing multiple frames from a real-life source using a digital camera or digital recording device, then automatically merging them into a single frame using digital signal processing techniques and/or non-generative AI. Includes High Dynamic Range (HDR) processing common in smartphone camera apps.

    In other words, this is a composite image. And while it may not be created using a generative AI system ("and/or"), it was definitely combined using some kind of AI-based system.

    (Truepic's results are wrong when they say that no AI was used. They are also wrong when they say that it is not from a camera capture. Of course, someone might point out that Truepic only supports C2PA v2.1 and this picture uses C2PA v2.2. However, there is no C2PA version number in the metadata.)

    As an aside, Jeff assures me that he just took a photo; he didn't do anything special. But the metadata clearly states that it is a composite: "capturing multiple frames" and "automatically merging them". This same tag is seen with other Pixel 10 pictures. It appears that Google's Pixel 10 is taking the same route as the iPhone: they cannot stop altering your pictures and are incapable of taking an unaltered photo.

  • The most disturbing aspect comes from the manifest's exclusion list:
    "assertion_store":  {
    "c2pa.hash.data": {
    "exclusions": {
    [
    {
    "start": "6",
    "length": "11572"
    }
    ],
    [
    {
    "start": "11596",
    "length": "4924"
    }
    ],
    [
    {
    "start": "17126",
    "length": "1158"
    }
    ],
    [
    {
    "start": "18288",
    "length": "65458"
    }
    ],
    [
    {
    "start": "83750",
    "length": "7742"
    }
    ]
    },
    When computing the digital signature, it explicitly ignores:

    • 11,572 bytes beginning at byte 6 in the file. That's the EXIF data. None of the EXIF data is protected by this signature. Unfortunately, that's the only part that defines the make, model, settings, and when the photo was taken.

    • 4,924 bytes starting at position 11,596. That's the JUMBF C2PA manifest. This is the only component that's typically skipped when generating a C2PA record because most of it is protected by different C2PA digital signatures.

    • 1,158 bytes beginning at position 17,126 is the XMP data.

    • 65,458 bytes beginning at position 18,288 is the extended XMP metadata that includes Google's Makernotes.

    • 7,742 bytes beginning at position 83,750 is the continuation of the extended XMP metadata record.

    That's right: everything that identifies when, where, and how this image was created is unprotected by the C2PA signature. C2PA's cryptographic signatures only covers the manifest itself and the visual content. It doesn't cover how the content was created.
Without C2PA, anyone can alter the EXIF or XMP metadata. (It's a very common forgery approach.)

With the Google Pixel's C2PA implementation, anyone can still alter the EXIF or XMP metadata. But now there's a digital signature, even if it doesn't identify any alterations.

The problem is that nothing on either of the "Content Credentials" web services reports the exclusion range. If you're a typical user, then you haven't read through the C2PA specifications and will likely assume that the file is trustworthy with tamper-evident protection since the cryptographic signature is valid.

Forgery Time!

Knowing what I can and cannot edit in the file, I altered the image to create a forgery. Here's my forgery:



  • If you use the official Adobe/CAI Content Credentials validation tool, you will see that the entire file is still cryptographically sound and shows the same authoritative information. There is no indication of alteration or tampering. (The results at Truepic's validation service also haven't changed.)

  • The metadata displayed by FotoForensics and Hintfo shows some of the differences:

    • The device model is "Pixel 11 Pro" instead of "Pixel 10 Pro". I changed the model number.

    • The EXIF software version was "HDR+ 1.0.790960477zd". Now it is "HDR+ 3.14156926536zd". (Really, I can change it to anything.)

    • The EXIF create and modify date has been backdated to 2025-07-20 12:10:17. (One month, 8 days, and 12 hours earlier than the original.)

    Although this is all of the EXIF data that I changed for this example, I could literally change everything.

  • Hintfo shows the decoded JUMBF data that contains the C2PA manifest. I changed the manifest's UUID from "urn:c2pa:486cba89-a3cc-4076-5d91-4557a68e7347" to "urn:neal:neal-wuz-here-neal-wuz-here-neal-wuz". Although the signatures are supposed protect the manifest, they don't. (This is not the only part of the manifest that can be altered without detection.)
While I cannot change the visual content without generating a new signature, I can change everything in the metadata that describes how the visual content came to exist.

Consistently Inconsistent

Forgeries often stand out due to inconsistencies. However, the Pixel 10's camera has been observed making inconsistent metadata without any malicious intervention. For example:



According to Digital Photography Review, this photo of a truck is an out-of-the-camera original picture from a Pixel 10 using 2x zoom. The EXIF metadata records the subject distance. In this case, the distance claims to be "4,294,967,295 meters", or about 11 times the distance from the Earth to the Moon. (That's one hell of a digital zoom!) Of course, programmers will recognize that as uint32(-1). This shows that the Pixel 10 can naturally record invalid values in the metadata fields.

As another example:



DP Review describes this graffiti picture as another out-of-the-camera original using 2x zoom. It also has the "4,294,967,295 meters" problem, but it also has inconsistent timestamps. Specifically:
  • The EXIF metadata has a creation date of "2025-08-25 19:45:28". The time zone is "-06:00", so this is 2025-08-26 01:45:28 GMT.

  • The C2PA-compliant external trusted timestamp authority operated by Google says it notarized the file at 2025-08-26 01:45:30 GMT. This means it took about 2 seconds for the signing request to go over the network.

  • This picture has a few attached parasites. (A parasite refers to non-standard appended data after the end of the main JPEG image.) The XMP metadata identifies these extra JPEG images as the GainMap, Depth, and Confidence maps. Each of these images have their own EXIF data.

    ExifTool only displays the EXIF data for the main image. However, these parasites have their own EXIF data. Using the Strings analyzer at FotoForensics, you can see their EXIF dates. (Scroll to the bottom of the strings listing, then page-up about 3 times.) The data looks like:
    0x0030ed57: 2025:08:25 19:45:31
    0x0030ed8d: 0232
    0x0030ee35: 0100
    0x0030ef09: 2025:08:25 19:45:31
    0x0030ef1d: 2025:08:25 19:45:31
    0x0030ef31: -06:00
    0x0030ef39: -06:00
    0x0030ef41: -06:00
    This data says that the parasites were created at 2025-08-25 19:45:31 -06:00 (that's 2025-08-26 01:45:31 GMT). That is one second after the file was notarized. Moreover, while the C2PA's manifest excludes the main image's EXIF data, it includes these parasites and their EXIF data! This indicates that the parasites were created after the file was notarized by Google.
With photos, it's possible for the times to vary by a second. This is because the timestamps usually don't track fractions of a second. For example, if the picture was taken at 28.99 seconds and the file took 0.01 seconds to write, then the created and modified times might be truncated to 28 and 29 seconds. However, there is no explanation for the parasite's timestamp to be 3 seconds after the file was created, or any time after being notarized by the trusted timestamp provider.

Remember: this is not one of my forgeries. This is native to the camera, and I have no explanation for how Google managed to either post-date the parasites before notarizing, or generated the manifest after having the file notarized. This inconsistent metadata undermines the whole point of C2PA. When genuine Pixel 10 files look forged, investigators will conclude "tampering", even if the file is not manually altered.

With the Pixel 10's C2PA implementation, either the timestamps are untrustworthy, or the C2PA signatures are untrustworthy. But in either case, the recipient of the file cannot trust the data.

However, the problems don't stop there. Both of these sample pictures also include an MPF metadata field. The MPF data typically includes pointers to parasitic images at different resolutions. In the lamp picture, the MPF properly points to the Gain Map (a JPEG attached as a parasite). However, in these truck and graffiti examples, the MPF doesn't point to a JPEG. Typically, applications fail to update the MPF pointers after an alteration, which permits tamper detection. With these examples, we have clear indications of tampering: inconsistent metadata, inconsistent timestamps, evidence of post-dating or an untrusted signature, and a broken MPF. Yet, these are due to the camera app and Google's flawed implementation; they are not caused by a malicious user. Unfortunately, a forensic investigator cannot distinguish an altered Pixel 10 image from an unaltered photo.

Google Pixel 10: Now with Fraud Enabled by Default!

There's a very common insurance fraud scheme where someone will purchase their new insurance policy right after their valuable item is damaged or stolen. They will alter the date on their pre- and post-damage photos so that it appears to be damaged after the policy becomes active.
  • Without C2PA, the insurance investigator will need to carefully evaluate the metadata in order to detect signs of alterations.

  • With C2PA in a Google Pixel 10, the investigator still needs to evaluate the metadata, but now also needs to prove that the C2PA cryptographic signature from Google is meaningless.
Typical users might think that the cryptographic signature provides some assurance that the information is legitimate. However, the Pixel 10's implementation with C2PA is grossly flawed. (Both due to the Pixel 10 and due to C2PA.) There are no trustworthy assurances here.

Privacy Concerns

Beyond their inadequate implementation of the flawed C2PA technology, the Google Pixel 10 introduces serious privacy issues. Specifically, the camera queries Google each time a picture needs to be digitally signed by a trusted signing authority. Moreover, every picture taken on the Google Pixel 10 gets signed.

What can Google know about you?
  • The C2PA signing process generates a digest of the image and sends that digest to the remote trusted timestamp service for signing. Because your device contacted Google to sign the image, Google knows which signature they provided to which IP address and when. The IP address can be used for a rough location estimation. Google may not have a copy of the picture, but they do have a copy of the signature.

  • Since the Pixel 10 queries Google each time a photo is captured, Google knows how often you take pictures and how many pictures you take.

  • While the C2PA metadata can be easily removed, the Pixel 10 reportedly also uses an invisible digital watermark called "SynthID". Of course, the details are kept proprietary because, as Google describes it, "Each watermarking configuration you use should be stored securely and privately, otherwise your watermark may be trivially replicable by others." This means, the only way to validate the watermark is to contact Google and send them a copy of the media for evaluation.
All of this enables user and content tracking. As I understand it, there is no option to disable any of it. If Google's web crawler, email, messaging system, etc. ever sees that signature again, then they know who originated the image, when and where it was created, who received a copy of the media and, depending on how Google acquired the data, when it was received.

With any other company, you might question the data collection: "While they could collect this, we don't know if they are collecting it." However, Google is widely known to collect as much user information as possible. While I have no proof, I have very little doubt that Google is collecting all of this information (and probably much more).

The inclusion of C2PA into the Pixel 10 appears to be more about user data collection and tracking than authenticity or provenance.

Security and Conformance

C2PA recently introduced their new conformance program. This includes two assurance levels. Level 1 has minimal security requirements, while Level 2 is supposed to be much more difficult to achieve and provides greater confidence in the information within the file.

There is currently only one device on the conforming products list that has achieved assurance level 2: The Google Pixel Camera. That's right, the same one that I just used to create an undetectable forgery and that normally generates inconsistent metadata.

The Provenance and Authenticity Standards Assessment Working Group (PASAWG) is performing a formal evaluation on C2PA. Some folks in the group posed an interesting theory: perhaps the Google Pixel Camera is compliant with assurance level 2. Since Google explicitly excludes everything about the hardware, they are technically conforming by omitting that information. Think of this like intentionally not attaching your bicycle lock to the entire bike. Sure, the bike can get stolen, but the lock didn't fail!



What if they fixed it?

You're probably wondering how something like this could happen at Google. I mean, regardless of whether you like the company, Google is usually known for cutting edge technology, high quality, and above-average security.
  • Maybe this is just an implementation bug. Maybe nobody at Google did any kind of quality assurance testing on this functionality and it slipped past quality control.

  • Maybe they were so focused on getting that "we use C2PA" and "Assurance Level 2" checkbox for marketing that they didn't mind that it didn't protect any of the metadata.

  • Maybe nobody in Google's security group evaluated C2PA. This would certainly explain how they could put their corporate reputation on this flawed solution.

  • Maybe nobody in Google's legal department was consulted about Google's liability regarding authenticating a forgery that could be used for financial fraud, harassment, or propaganda.
You might be thinking that Google could fix this if they didn't exclude the EXIF and XMP metadata from the cryptographic protection. (That would certainly be a step in the right direction.) Or maybe they could put some device metadata in the manifest for protection? However, you'd still be wrong. The C2PA implementation is still vulnerable to file system and hardware exploits.

These are not the only problems I've found with Google Pixel 10's C2PA implementation. For example:
  • In the last few days, FotoForensics has received a handful of these pictures, including multiple pictures from the same physical device. As far as I can tell, Google uses the exact same four root certificates on every camera:

    • Google LLC, s/n 4B06ED7C78A80AFEB7193539E42F8418336D2F27
    • Google LLC, s/n 4FCA31F82632E6E6B03D6B83AB98B9D61B453722
    • Google LLC, s/n 5EF6120CF4D31EBAEAF13FB9288800D8446676BA
    • Google LLC, s/n 744428E3A7477CEDFDE9BD4D164607A9B95F5730

    I don't know why Google uses multiple root certs. It doesn't seem to be tied to the selected camera or photo options.

    While there are a limited number of root certs, every picture seems to use a different signing certificate, even if it comes from the same camera. It appears that Google may be generating a new signing certificate per picture. What this means: if a device is compromised and used for fraud, they cannot revoke the certificate for that device. Either they have to revoke a root cert that is on every device (revoking everyone's pictures), or they have to issue revocations on a per-photo basis (that doesn't scale).

  • My associates and I have already identified easy ways to alter the timestamps, GPS information, and more. This includes ways that require no technical knowledge. The C2PA proponents will probably claim something like "The C2PA manifest don't protect that information!" Yeah, but tell that to the typical user who doesn't understand the technical details. They see a valid signature and assume the picture is valid.

  • There's a physical dismantling (teardown) video on YouTube. At 12:27 - 12:47, you can see the cable for the front-facing camera. At 14:18 - 14:35 and 15:30 - 16:20, you can see how to replace the back-facing cameras. Both options provide a straightforward way for hardware hackers to feed in a false image signal for signing. With this device, the C2PA cryptographic signature excludes the metadata but covers the visual content. Unfortunately, you cannot inherently trust the signed image.

  • Even if you assume that the hardware hasn't been modified, every picture has been tagged by Google as a composite image. That will impact insurance claims, legal evidence, and photo journalism. In fields where a composite image is not permitted, the Google Pixel 10 should not be used.
With Google's current implementation, their C2PA cryptographic signature is as reliable as signing a blank piece of paper. It doesn't protect the important information. But even if they fix their exclusion list, they are still vulnerable to C2PA's fundamental limitations. C2PA gives the appearance of authentication and provenance without providing any validation, and Google's flawed implementation just makes it worse. It's a snake oil solution that provides no meaningful information and no reliable assurances.

A lot of people are excited about the new Google Pixel 10. If you want a device that takes a pretty picture, then the Pixel 10 works. However, if you want to prove that you took the picture, value privacy, or plan to use the photos for proof or evidence, then absolutely avoid the Pixel 10. The cryptographic "proof" provided by the Pixel 10 is worse than having a device without a cryptographic signature. Every picture requires contacting Google, the unaltered metadata is inconsistent, the visual content is labeled as an AI-generated composite, the signed data may be post-dated, and there is no difference between an altered picture and an unaltered photo. I have honestly never encountered a device as untrustworthy as the Pixel 10.

Solar Project Update

1 September 2025 at 11:50
A few months ago I wrote about my experimentation this year with solar power. I thought I would give a couple of updates.

The basic architecture hasn't changed, but some of the components have:



Given that I've never done this before, I expected to have some problems. However, I didn't expect every problem to be related to the power inverter. The inverter converts the 12V DC battery's power to 120V AC for the servers to use. Due to technical issues (none of which were my fault), I'm currently on my fourth power inverter.

Inverter Problem #1: "I'm Bond, N-G Bond"

The first inverter that I purchased was a Renogy 2000W Pure Sine Wave Inverter.



This inverter worked fine when I was only using the battery. However, if I plugged it into the automated transfer switch (ATS), it immediately tripped the wall outlet's circuit breaker. The problem was an undocumented grounding loop. Specifically, the three-prong outlets used in the United States are "hot", "neutral", and "ground". For safety, the neutral and ground should be tied together at one location; it's called a neutral-ground bond, or N-G bond. (For building wiring, the N-G bond is in your home or office breaker box.) Every outlet should only have one N-G bond. If you have two N-G bonds, then you have a grounding loop and an electrocution hazard. (A circuit breaker should detect this and trip immediately.)

The opposite of a N-G bond is a "floating neutral". Only use a floating neutral if some other part of the circuit has the N-G bond. In my case, the automated transfer switch (AFS) connects to the inverter and the utility/wall outlet. The wall outlet connects to the breaker box where the N-G bond is located.

What wasn't mentioned anywhere on the Amazon product page or Renogy web site is that this inverter has a built-in N-G bond. It will work great if you only use it with a battery, but it cannot be used with an ATS or utility/shore power.

There are some YouTube videos that show people opening the inverter, disabling the N-G bond, and disabling the "unsafe alarm". I'm not linking to any of those videos because overriding a safety mechanism for high voltage is incredibly stoopid.

Instead, I spoke to Renogy's customer support. They recommended a different inverter that has an N-G bond switch: you can choose to safely enable or disable the N-G bond. I contacted Amazon since it was just past the 30-day return period. Amazon allowed the return with the condition that I also ordered the correct one. No problem.

The big lesson here: Before buying an inverter, ask if it has a N-G bond, a floating neutral, or a way to toggle between them. Most inverters don't make this detail easy to find. (If you can't find it, then don't buy the inverter.) Make sure the configuration is correct for your environment.
  • If you ever plan to connect the inverter to an ATS that switches between the inverter and wall/utility/shore power, then you need an inverter that supports a floating neutral.

  • If you only plan to connect the inverter to a DC power source, like a battery or generator, then you need an inverter that has a built-in N-G bond.

Inverter Problem #2: It's Wrong Because It Hertz

The second inverter had a built-in switch to enable and disable the N-G bond. The good news it that, with the N-G bond disabled, it worked correctly through the ATS. To toggle the ATS, I put a Shelly Plug smart outlet between the utility/wall outlet and the ATS.



I built my own controller and it tracks the battery charge level. When the battery is charged enough, the controller tells the inverter to turn on and then remotely tells the Shelly Plug to turn off the wall outlet. That causes the ATS to switch over to the inverter.

Keep in mind, the inverter has it's own built-in transfer switch. However, the documentation doesn't mention that it is "utility/shore priority". That is, when the wall outlet has power, the inverter will use the utility power instead of the battery. It has no option to be plugged into a working outlet and to use the battery power instead of the outlet's power. So, I didn't use their built-in transfer switch.

This configuration worked great for about two weeks. That's when I heard a lot of beeping coming from the computer rack. The inverter was on and the wall outlet was off (good), but the Tripp Lite UPS feeding the equipment was screaming about a generic "bad power" problem. I manually toggled the inverter off and on. It came up again and the UPS was happy. (Very odd.)

I started to see this "bad power" issue about 25% of the time when the inverter turned on. I ended up installing the Renogy app to monitor the inverter over the built-in Bluetooth. That's when I saw the problem. The inverter has a frequency switch: 50Hz or 60Hz. The switch was in the 60Hz setting, but sometimes the inverter was starting up at 50Hz. This is bad, like, "fire hazard" bad, and I'm glad that the UPS detected and prevented the problem. Some of my screenshots from the app even showed it starting up low, like at 53-58 Hz, and then falling back to 50Hz a few seconds later.


(In this screenshot, the inverter started up at 53.9Hz. After about 15 seconds, it dropped down to 50Hz.)

I eventually added Bluetooth support to my homemade controller so that I could monitor and log the inverter's output voltage and frequency. The controller would start up the inverter and wait for the built-in Bluetooth to come online. Then it would read the status and make sure it was at 60Hz (+/- 0.5Hz) and 120V (+/- 6V) before turning off the utility and transferring the load to the inverter. If it came up at the wrong Hz, the controller would shut down the inverter for a minute before trying again.

It took some back-and-forth discussions with the Renogy technical support before they decided that it was a defect. They offered me a warranty-exchange. It took about two weeks for the inverter to be exchanged (one week there, one week back). The entire discussion and replacement took a month.

The replacement inverter was the same make and model. It worked great for the first two weeks, then developed the exact same problem! But rather than happening 25% of the time, it was happening about 10% of the time. To me, this looks like either a design flaw or a faulty component that impacts the entire product line. The folks at Renogy provided me with a warranty return and full refund.

If you read the Amazon reviews for the 2000W and 3000W models, they have a lot of 1-star reviews with comments about various defects. Other forums mention that items plugged into the inverter melted and motors burned out. Melting and burned out motors are problems that can happen if the inverter is running at 50Hz instead of 60Hz.

The Fourth Inverter

For the fourth inverter, I went with a completely different brand: a Landerpow 1500W inverter. Besides having what I needed, it also had a few unexpectedly nice benefits compared to the Renogy:
  • I had wanted a 2000W inverter, but a 1500W inverter is good enough. Honestly, my servers are drawing about 1.5 - 2.5 amps, so this is still plenty of overkill for my needs. The inverter says it can also handle surges of up to 3000W, so it can easily handle a server booting (which draws much more power than post-boot usage).

  • The documentation clearly specifies that the Landerpow does not have an N-G bond. That's perfect for my own needs.

  • As for dimensions, it's easily half the size of the Renogy 2000W inverter. The Landerpow also weighs much less. (When the box first arrived, I thought it might be empty because it was so lightweight.)

  • The Renogy has a built-in Bluetooth interface. In contrast, the Landerpow doesn't have built-in Bluetooth. That's not an issue for me. In fact, I consider Renogy's built-in Bluetooth to be a security risk since it didn't require a login and would connect to anyone running the app within 50 feet of the inverter.

  • The Landerpow has a quiet beep when it turns on and off, nothing like Renogy's incredibly loud beep. (Renogy's inverter beep could be heard outside the machine room and across the building.) I view Landerpow's quiet beep as a positive feature.

  • With a fully charged battery and with no solar charging, my math said that I should get about 5 hours of use out of the inverter:

    • The 12V, 100Ah LiFePO4 battery should provide 10Ah at 120V. (That's 10 hours of power if you're using 1 amp.)

    • There's a DC-to-AC conversion loss around 90%, so that's 9Ah under ideal circumstances.

    • You shouldn't use the battery below 20% or 12V. That leaves 7.2Ah usable.

    • I'm consuming power at a rate of about 1.3Ah at 120V. That optimistically leaves 5.5 hours of usable power.

    With the same test setup, none of the Renogy inverters gave me more than 3 hours. The Landerpow gave me over 5 hours. The same battery appears to last over 60% longer with the Landerpow. I don't know what the Renogy inverter is doing, but it's consuming much more battery power than the Landerpow.

  • Overnight, when there is no charging, the battery equalizes, so the voltage may appear to change overnight. Additionally, the MPPT and the controller both run off the battery all night. (The controller is an embedded system requires 5VDC and the MPPT requires 9VDC; combined, it's less than 400mA.) On top of this, we have the inverter connected to the battery. The Landerpow doesn't appear to cause any additional drain when powered off. ("Off" means off.) In contrast, the Renogy inverter (all of them) caused the battery to drain by an additional 1Ah-2Ah overnight. Even though nothing on the Renogy inverter appears to be functioning, "off" doesn't appear to be off.

  • The Renogy inverter required a huge surge when first starting up. My battery monitor would see it go from 100% to 80% during startup, and then settle at around 90%-95%. Part of this is the inverter charging the internal electronics, but part is testing the fans at the maximum rating. In contrast, the Landerpow has no noticeable startup surge. (If it starts when the battery is at 100% capacity and 13.5V, then it will still be at 100% capacity and 13.5V after startup.) Additionally, then Landerpow is really quiet; it doesn't run the fans when it first turns on.
The Renogy inverter cost over $300. The Landerpow is about $100. Smaller, lighter, quieter, works properly, consumes less power, and less expensive? This is just icing on the cake.

Enabling Automation

My controller determines when the inverter should turn on/off. With the Renogy, there's an RJ-11 plug for a wired remote switch. The plug has 4 wires (using telephone coloring, that's black, red, green, and yellow). The middle two wires (red and green) are a switch. If they are connected, then the inverter turns on; disconnected turns it off.

The Landerpow also has a four-wire RJ-11 connector for the remote. I couldn't find the pinout, but I reverse-engineered the switch in minutes.

The remote contains a display that shows voltage, frequency, load, etc. That information has to come over a protocol like one-wire, I2C (two wire), UART (one or two wire), or a three wire serial connection like RS232 or RS485. However, when the inverter is turned off, there are no electronics running. That means it cannot be a communication protocol to turn it on. I connected my multimeter to the controller and quickly found that the physical on/off switch was connected to the green-yellow wires. I wired that up to my controller's on/off relay and it worked perfectly on the first try.

I still haven't worked out the communication protocol. (I'll save that for another day, unless someone else can provide the answer.) At minimum, the wires need to provide ground, +5VDC power for the display, and a data line. I wouldn't be surprised if they were using a one-wire protocol, or using the switch wires for part of a serial communication like UART or RS485. (I suspect the four wires are part of a UART communication protocol: black=ground, red=+5VDC, green=data return, and yellow=TX/RX, with green/yellow also acting as a simple on/off switch for the inverter.)

Pictures!

I've mounted everything to a board for easy maintenance. Here's the previous configuration board with the Renogy inverter:



And here's the current configuration board with the Landerpow inverter:



You can see that the new inverter is significantly smaller. I've also added in a manual shutoff switch to the solar panels. (The shutoff is completely mounted to the board; it's the weird camera angle that makes it look like it's hanging off the side.) Any work on the battery requires turning off the power. The MPPT will try to run off solar-only, but the manual warns about running from solar-only without a battery attached. The shutoff allows me to turn off the solar panels before working on the battery.

Next on the to-do list:
  • Add my own voltmeter so the controller can monitor the battery's power directly. Reading the voltage from the MPPT seem to be a little inaccurate.

  • Reverse-engineering the communication to the inverter over the remote interface. Ideally, I want my own M5StampS3 controller to read the inverter's status directly from the inverter.
As components go, the Renogy solar panels seem very good. The Renogy MPPT is good, but maybe not the best option. Avoid Renogy inverters and consider the Landerpow inverter instead. I'm also a huge fan of Shelly Plugs for smart outlets and the M5StampS3 for the DIY controller.

Efficiency

Due to all of the inverter problems, I haven't had a solid month of use from the solar panels yet. We've also had a lot of overcast and rainy days. However, I have had some good weeks. A typical overcast day saves about 400Wh per day. (That translates to about 12kWh/month in the worst case.) I've only had one clear-sky day with the new inverter, and I logged 1.2kWh of power in that single day. (A month of sunny days would be over 30kWh in the best case.) Even with partial usage and overcast skies, my last two utility bills were around 20kWh lower than expected, matching my logs -- so this solar powered system is doing its job!

I've also noticed something that I probably should have realized earlier. My solar panels are installed as awnings on the side of the building. At the start of the summer, the solar panels received direct sunlight just after sunrise. The direct light ended abruptly at noon as the sun passes over the building and no longer hit the awnings. They generate less than 2A of power for the rest of the day through ambient sunlight.

However, we're nearing the end of summer and the sun's path through the sky has shifted. These days, the panels don't receive direct light until about 9am and it continues until nearly 2pm. By the time winter rolls around, it should receive direct light from mid-morning until a few hours before sunset. The panels should be generating more power during the winter due to their location on the building and the sun's trajectory across the sky. With the current "overcast with afternoon rain", I'm currently getting about 4.5 hours a day out of the battery+solar configuration. (The panels generate a maximum of 200W, and are currently averaging around 180W during direct sunlight with partially-cloudy skies.)

I originally allocated $1,000 for this project. With the less expensive inverter, I'm now hovering around $800 in expenses. The panels are saving me a few dollars per month. At this rate, they will probably never pay off this investment. However, it has been a great way to learn about solar power and DIY control systems. Even with the inverter frustrations, it's been a fun summer project.

PTSD and the News

21 August 2025 at 09:31
Usually my parent/pal/people tech support duty (PTSD) blogs are about friends asking me for help. But sometimes I come across people who don't know enough to ask, or don't know who to ask. (Note: I'm not soliciting people to ask me for help.)

I watch the local news every evening. Most people watch it over cable TV, but I don't have cable. (I haven't subscribed to a cable TV service in over 25 years.) They do have broadcast news, but when everything moved from analog to digital in 2009, I stopped receiving that. So, I watch the news over their streaming service. They offer a live stream on their web site and through their Roku app.

I'm a huge fan of our local news service, 9News in Denver. They cover local things around the state, as well as the major national and international topics. Moreover, they report on positive, entertaining, and inspirational topics, not just the negative. (They are not like other news channels. I've seen some news stations that are nothing more than a list of deaths and body counts around the city, state, country, and world.) 9News is really good at sifting through everything and giving accurate fact-based reporting while not causing viewers to become steadily depressed.

9News has even made national news, such as when Kyle Clark moderated a 2024 political debate. The Columbia Journalism Review remarked, "That’s how you run a debate!", noting: "[Kyle Clark] refused to allow the candidates to evade his direct questions with waffling, rambly answers, instead repeatedly cutting them off: β€œYou didn’t make any attempt to answer the actual question,” he said at one point." (Kyle also became internet-famous in 2013 for ranting about snow-covered patio photos.)

Keep in mind, it's not just Kyle Clark. Kim Christiansen, Jennifer Meckles, Jeremy Jojola, and other staffers have each earned several awards for news reporting. I don't mean to slight the others through omission; it's a long list of reporters and investigators. There are no slackers on their staff, and they are all held to a very high standard. Just by being on 9News, I trust the quality of their reporting. (And for my regular blog followers, you know that I don't typically have blind trust.)

Technical Difficulties

A few months ago, their newsroom was doing some upgrades that were causing technical problems. One day they had no video. Another day there was no audio. I mean, seriously, the news anchor used a paper flipboard to write out the news!



(Instead of calling it "9News", we jokingly called it "Mime News". Kyle Clark was the guy trapped in the TV box and couldn't make a sound.)

The Next! Problem

Fortunately, the audio problem only lasted for one broadcast. Unfortunately, it was followed by another problem: the live stream broke. For a few days, it wouldn't play at all. After a few days, it started up in 4x fast forward mode for a few seconds (without sound) before freezing completely. Meanwhile, I was writing in almost daily complaining that their Roku and live streaming services were not working. (I want to watch their news!)

After more than a week of this, the problem changed. I could see the video, but whenever it switched to or from a commercial break, the audio would drop. It wouldn't recover without restarting the stream. (You could either reload the web page or close and restart the Roku app. In either case, you'd miss the first 20-30 seconds of each news segment.)

My inquiries eventually got me in touch with their IT staff. Yes, they knew there was a problem. Unfortunately, it was inconsistent and not impacting everyone. Moreover, they were having trouble tracking down the cause. (As a techie, I can totally understand how hard it is to track down an inconsistent problem, especially when you cannot reproduce it locally.)

Well, this sounded like a job for "PTSD Man!" and his People Tech Support Duties!



Debugging a Stream

The first thing I did was check with a few friends who watch the same news using the same Roku app. One friend had the same "it's broken" problem. The other friend had no problem playing the newscast. (At least I could duplicate the "doesn't impact everyone" issue reported by the IT staff.)

Debugging the live stream wasn't easy. While some video streams are unencrypted, the news was being streamed over HTTPS. What this meant: I couldn't just use wireshark to sniff the stream and I couldn't directly detect the problem's cause.

I tried a different approach. If I couldn't see the streaming data directly, perhaps I could download fragments and identify any issues.

Chrome and Firefox have a developer panel that shows the web-based network requests. Unfortunately, it doesn't show the raw media streams. However, live streams typically have a series of web requests that contain URLs to the raw stream segments. I could see those requests in the developer panel and their the list of URLs. A typical sequence reply might look like:
#EXTM3U
#EXT-X-VERSION:6
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:7800004
#EXT-X-DISCONTINUITY-SEQUENCE:0
#EXT-X-PROGRAM-DATE-TIME:2025-08-20T18:52:06.336Z
#EXTINF:6.006,
https://playback.tegnaone.com/kusa/live/index_3_7800004.ts?m=1716401672
#EXT-X-PROGRAM-DATE-TIME:2025-08-20T18:52:12.342Z
#EXTINF:6.006,
https://playback.tegnaone.com/kusa/live/index_3_7800005.ts?m=1716401672
#EXT-X-PROGRAM-DATE-TIME:2025-08-20T18:52:18.348Z
#EXTINF:6.006,
https://playback.tegnaone.com/kusa/live/index_3_7800006.ts?m=1716401672
#EXT-X-PROGRAM-DATE-TIME:2025-08-20T18:52:24.354Z
...

These URLs to the raw MPEG stream data can be easily requested with wget. I grabbed a few samples during the newscast and during commercials. Poof -- I found the problem.

There are many different ways to encode a video stream. There's not just one compression setting; there are lots of choices. A video may use a constant bitrate (CBR) or variable bitrate (VBR). The frame rate can also be constant or variable (CFR or VFR). At least, that's the theory.

In practice, there are some things that should never change and some things always change. For example:
  • With video (the visual portion), the aspect ratio should never change. (This doesn't mean it doesn't, but it shouldn't.)

  • VBR is very common with most audio codecs. Some, like the Advanced Audio Coding (AAC) method (which is extremely common), almost always uses VBR. MPEG supports VBR, but CBR is common.

  • VFR for video is often used if the video has long segments that don't change. ("Long" could be a fraction of a second or minutes.) This way, they only encode the one frame and leave it on the screen, rather than re-encoding and transmitting the same data over and over.

  • VFR for audio is very uncommon because it can cause synchronization errors. Moreover, audio VFR can result in audio misalignment if you try to fast-forward or rewind the audio stream. (If you've ever fast-forwarded or rewound a video and the audio was out of sync for a few seconds, that could be VFR or just bad alignment.)

  • While the bitrate and frame rate may change in the stream, the sample rate is usually constant. For audio, MPEG-1 uses a fixed sample rate and does not support changing the rate within a single stream. WAV and PCM only define the rate once, so it cannot change. AAC does support a variable rate, but it's uncommon; a fixed sample rate is typical. Moreover, some AAC profiles (like the kind typically used for streaming broadcasts) does not support a variable sample rate.
The FFMpeg media library doesn't always work well with a variable sample rate. Depending on the codec, the library might detect a change in the rate and decide the stream is corrupted. Worse: if the stream is associated with the video, then the corruption may cause unexpected results, like playing at the wrong speed or hanging. If the sample rate changes in the audio stream, then the player may just assume the audio is corrupted and stop playing sound.

That's what I was seeing with the live newsfeed. They were changing rates between the shows and the commercials. The change was detected as a corruption and the stream would drop sound.

Keep in mind, not all media players do this. It depends on the player and library version. (And as the user, you probably don't know what you're using.) Some libraries see the change, flush the buffer, and can safely recover from the corruption. However, other libraries see the corruption and give up. This makes the problem inconsistent between different people and different media players.

News to You

I reported my findings to the news channel's IT staff. They went running off and had the problem fixed in under 30 minutes. It's worked flawlessly since. (However, if you use Wireshark, you can see a ton of out-of-order TCP packets and retries, so I think they still have a networking problem. But that's probably due to the CDN and not the IT staff.) Today, I can watch the news again via Roku or in my web browser. (Huge thanks to the IT staff for listening to a rando spouting technical details over email, and for being incredibly responsive as soon as the problem was explained.)

On various online "can anyone help" forums, there are a lot of people reporting similar streaming problems with other online streaming services. I suspect they are all the same problem: the stream providers are changing the sample rate incorrectly, changing the aspect ratio (never change the aspect ratio in a video stream!), or otherwise failing to normalize the media between different segments. This is causing the media library to detect a corruption and the stream fails.

Now for the Bad News

I'm thrilled to be able to watch the news again via the streaming services. Unfortunately, earlier this week it was announced that the local Denver NBC affiliate's parent company, TEGNA, is being sold to Nexstar. Nexstar owns the local FOX station.

Personally, I equate FOX with fiction, conspiracies, and propaganda. This goes along with FOX repeatedly being involved in defamation and false reporting lawsuits, such as paying $758M to settle with Dominion Voting over FOX's false reporting, being sued for $2.7 Billion by Smartmatic, and most recently (June 2025) being sued by California's Governor Newsom for alleged false reporting. (Friends don't let friends watch FOX.)

In contrast to FOX, I think our local NBC 9News provides fair and balanced reporting. To me, they epitomize journalistic integrity. I don't know if they will continue that way after the merger and restructuring, or if we will have one less reliable news source in the area and the world.

Even though I don't know them personally, their newscasts come into my home every evening. They're so regular, that they feel like part of my extended family. (And like my extended family, I'm glad they don't regularly visit in person.) I typically reserve tech support for family and friends, which is why the folks at 9News became my newest cause of PTSD. If the staff at 9News end up jumping ship to another station, I'm certain to follow them.

Detecting AI Music

10 August 2025 at 09:05
I've spent the last few months evaluating different AI sound generation systems. My original question was whether I could detect AI speech. (Yup.) However, this exploration took me from AI voices to AI videos and eventually to AI music. The music aspect strikes me as really interesting because of the different required components that are combined. These include:
  • Score: The written music that conveys the tune, melody, harmony, instruments, etc. This is the documentation so that other musicians can try to replay the same music. But even with unwritten music, something has to come up with the melody, harmony, composition, etc.

  • Instrumentation: The score doesn't contain everything. For example, if the music says it is written for a piano, there are still lots of different types of pianos and they all have different sounds. The instrumentation is the selection of instruments and when they play.

  • Lyrics: The written words that are sung.

  • Vocals: The actual singing.
This is far from everything. I'm not a musician; real musicians have their own terminologies and additional breakdowns of these components.

Each of these components have their own gray levels between AI and human. For example, the vocals can be:
  • AI: Completely generated using AI.

  • Real: Completely human provided.

  • Augmented: A human voice that is adjusted, such as with autotune or other synthetic modulators. (Like Cher singing "Believe", or almost anything from Daft Punk.)

  • Synthetic: From a synthesizer -- artificially created, but not AI. This could be a drum machine, a full synthesizer, like Moog or Roland, or even a Yamaha hybrid piano with built-in background rhythm player like Rowlf the Dog uses. (The Eurythmics is a good example of a music group whose earlier works were heavily dependent on synthesizers.)

  • Human edited: Regardless of the source, a human may edit the vocals during post-production. For example, Imogen Heap's "Hide and Seek" loops the human's voice at different pitches to create a layered vocal harmony. And Billy Joel's "The Longest Time" features multiple singing voices that are all Billy Joel. He recorded himself singing each part, then combined them for the full song.

  • AI edited: Regardless of the source, an AI system may edit the vocals during post-production. Tools that can do this include Audimee and SoundID VoiceAI. (Not an endorsement of either product.) Both can generate harmonies from single voice recordings.
The same goes for the score, arrangement, and even the lyrics. There isn't a clear line between "human" and artificial creations. Unless it's a completely live performance (acoustic and unplugged), most music is a combination.

Detecting AI (Just Beat It!)

Detecting whether something is AI-generated, synthetic, or human -- and the degree of each combination -- can be really difficult. Currently, different AI systems have different 'tells' that can be detected. Each AI system seems to have different quirks. However, each generation changes the detectable artifacts and there may be a new generation every few months. In effect, detection is a moving target.

Because this is a rapidly changing field, I'm not too concerned about giving away anything by disclosing detection methods. Any artifacts that I can detect today are likely to change during the next iteration.

Having said that, it seems relatively easy to differentiate between AI, synthetic, and human these days. Just consider the music. An easy heuristic relies on the "beat":
  • No matter how good a human musician is, there are always micro-variations in the beat. The overall song may be at 140 bpm (beats per minute), but the tempo at any given moment may be +/- 5 bpm (or more).

    For example, I graphed the beats over time for the song "Istanbul (Not Constantinople)" from They Might Be Giants. This comes from their live album: Severe Tire Damage:


    The red lines at the bottom identifies each detected beat, while the blue line shows a ten-second running average to determine the beats per minute. This version of the song starts with a trumpet solo and then they bring in the drums. The trumpet appears as a very unsteady rhythm, while the drums are steadier, but show a continual waver due to the human element.

    As another example, here's Tears for Fears singing "Everybody Wants to Rule the World":


    Even though they use a lot of synthetic music elements, Tears for Fears combines the synthetic components with humans playing instruments. The human contribution causes the beats per minute to vary.

  • Synthetic music, or music on a loop, is incredibly consistent. The only times it changes is when a human changes up the loop or melody. For example, this is Rayelle singing "Good Thing Going":


    Even though the vocals are from a human, the music appears synthetic due to the consistency. There's a slight change-up around 2 minutes into the song, where the music switches to a different loop.

    Another incredible example is "Too High" by Neon Vines:


    The music is completely synthetic. You can see her creating the song in the YouTube video. Initially, she manually taps the beat, but then it loops for a consistent 109 bpm. At different points in the song, she manually adds to the rhythm (like at the 1 minute mark), adding in a human variation to the tempo.

    Many professional musicians often record to an electronic "click track" or digital metronome that can help lock-in the beat, which also makes the music's rhythm extremely consistent. The song "Bad Things" by Jace Everett is a perfect example:



    If you listen to the music, you can clearly hear the electronic percussion keeping time at 132 bpm. There may have also been some post-production alignment for the actual drums. (No offense to the drummer.)

  • AI music systems, like Suno or Udio, have a slow variation to the beat. It varies too much to be synthetic, but not enough to be human. For example, "MAYBE?!" by Around Once uses Suno's AI-generated music:


    The beat has a slow variation that doesn't change with the different parts of the song (verse, bridge, chorus, etc.). This is typical for AI generated music (not just Suno).
The distinction between "synthetic" and "AI" music is not always clear. Modern synthesizers often use embedded AI models for sound design or generative sequencing, which blurs the boundaries. Moreover, post-processing for beat alignment against a click-track can make a real performance appear as synthetic or AI.

By evaluating the beat over time, the music can be initially classified into human, synthetic, or AI-generated. (And now that I've pointed that out, I expect the next version of these AI systems to make it more difficult to detect.)

Lyrics and AI (Word Up!)

Lyrics are equally interesting because of how the different approaches combine words. For example:
  • ChatGPT can compose lyrics. However, the model I tested usually drops "g" from "ing" words (e.g., "drivin'" instead of "driving") and uses lots of m-dashes for pauses. (Real writers use m-dashes sparingly.) When mentioning women, it regularly uses "baby" or "honey". (Because those are the only two terms of endearment in the English language, right?) It also seems incapable of repeating a verse without changing words. As far as intricacy goes, ChatGPT is great at including subtle elements such as emotion or innuendo. (Note: I tested against ChatGPT 4. Version 5 came out a few days ago.)

  • Suno has two lyric modes: the user can provide their own lyrics, or it can generate lyrics from a prompt. When using the prompt-based system, the songs are usually short (six structures, like verses, bridge, and chorus). As for linguistics, it seems to prefer night/evening language over day/morning and uses the specific word "chaos" (or "chaotic") far too often.

    Unlike ChatGPT, Suno is great at repeating verses, but it sometimes ends the song with partial repetition of previous verses, weird sounds, or completely different musical compositions. The AI-generated score, instrumentation, and vocal components don't seem to known when to end the song, and may not follow the written lyrics to the letter. The free Suno version 3.5 does this often; the paid version 4.5 does it much less often, but still does it.

  • Microsoft Copilot? Just don't. It writes really primitive short songs with bad wording and inconsistent meter.

  • Gemini is a slight improvement over Copilot. The lyrics are often longer and have better rhyming structure, but lack any subtly or nuance.
In every case, the AI-generated lyrics often include poor word choices. This could be because the AI doesn't understand the full meaning of the word in context, or because it chose the word's sound over the meaning for a stronger rhyme. (If you're going to use AI to write a song, then I strongly suggest having a human act as a copy editor and fix up the lyrics.)

If you need a fast song, then try using ChatGPT to write the initial draft. However, be sure to use a human to edit the wordings and maybe replace the chorus. Then use Suno to put it to music and generate the AI vocals.

During my testing, I also evaluated songs from many many popular artists. From what I can tell, some recent pop songs appear to do just that: it sounds like ChatGPT followed by human edits. A few songs also seem to use some AI music to create the first draft of the song, then used humans and/or synthesizers to recreate the AI music so it becomes "human made". Basically, there are some composition choices that AI systems like to make that differ from human-scored music. (I'm not going to name names because the music industry is litigious.)

Ethics and AI (Should I Stay or Should I Go?)

I'm not sure how I feel about AI-generated music. On one hand, a lot of AI-generated music is really just bad. (Not 'bad' like 'I don't like that genre' or song, but 'bad' as in 'Turn it off', 'My ears are bleeding', and 'The lyrics make no sense'.) Even with AI assistance, it's hard to make good music.

On the other hand, not everyone is a musician or can afford a recording studio. Even if you have a good song in your head, you may not have the means to turn it into a recording. AI-generated music offers the ability for less-talented people (like me) to make songs.

The use of AI to make creative arts is very controversial and strong concerns are found in the AI-generated artwork field. However, copying an artist's style (e.g., "in the artistic style of Larry Elmore") is not the same as saying "in the musical genre of 80s Hair Metal". One impersonates a known artist and impedes on their artistic rights, while the other is a generalized artistic style. (In copyright and trademark law, there have been "sound-alike" disputes with style emulation. So it's not as simple as saying that it's fine to stick to a genre. A possible defense is to show that everyone copies everyone and then find a similar rift from a much older recording that is out of copyright.)

AI-generated music is also different from AI-books. I'm not seeing a flood of AI-created crap albums flooding online sellers. In contrast, some book sellers are drowning in AI-written junk novels. The difference is that you often can't tell the quality of a book's contents without reading it, while online music sellers often includes a preview option, making it relatively easy to spot the bad music before you buy it.

Unlike artwork or books, music often has cover performances, where one band plays a song that was originally written by someone else. For example:
  • They Might Be Giants' version of "Istanbul (Not Constantinople)" (1990) was a cover of a song originally made famous in 1953 by Jimmy Kennedy, but dates back to 1928's Paul Whiteman and His Orchestra.

  • Elvis Presley's "Hound Dog" (1956) is based on a 1952 version by Big Mama Thornton. (I prefer Big Mama's version.)

  • And let's not forget Hayseed Dixie's A Hillbilly Tribute to AC/DC. They recreated some of AC/DC's rock classics as country music -- same lyrics, similar melody, but different pace and instrumentation. (No offense to AC/DC, but I like the hillbilly version of "You Shook Me All Night Long" much more than the original.)
I don't see AI-music as "creating more competition for real musicians". Between cover bands and the wide range of current alterations permitted in music (autotune, beat machines, synthesizers, mashups, sampling, etc.), having completely AI-generated music and vocals seems like a logical next step. Rather, I view AI-music as way for amateurs, and the less musically talented, to create something that could pass for real music.

Bad Poetry and AI (Where Did My Rhythm Go?)

I was told by a friend who is a musician that songs often start with the melody and then they write lyrics to fit the music. However, many AI-music systems flip that around. They start with the lyrics and then fit the melody to the tempo of the lyrics. It's an interesting approach and sometimes works really well. This means: if you can write, then you can write a song. (Not necessarily a good song, but you can write a song.)

I'm definitely not a musician. I joke that I play a mean kazoo. (A kazoo doesn't have to be on key or play well with others.) As for lyrics, well, I tell people that I'm an award-winning professional poet. While that's all true, it's a little misleading:
  • I've entered two limerick contests and won both of them. (Thus, "award winning.")

  • One of my winning limericks was published in the Saturday Evening Post. (Thus, published.)

  • And the published poem paid me $25! (Paid for my poem? That makes me a professional!)
With these AI systems, I quickly learned that a formal poetic structure really doesn't sound good when put to music. The uniformity of syllables per line makes for a boring song. For a good song, you really need to introduce rhythm variations and half-rhymes. (In my opinion, a good song is really just a bad poem put to music.)

During my testing, I listened to lots of real music (from professional musicians) as well as AI-generated music (from amateurs). I also created a ton of test examples, using a variety of lyric generation methods and genres. (Yes, the beat detector holds up regardless of the genre, singing voice, etc.) Let me emphasize: most of my test songs were crap. But among them were a few lucky gems. (Well, at least I enjoyed them.) These include songs with some of my longer poems put to music. (I wrote the poems while creating controlled test samples for human vs AI lyric detection.)

Having said that, I put some of my favorite test songs together into two virtual albums: Thinning the Herd and Memory Leaks. One of my friends suggested that I needed a band name ("Don't do this as Hacker Factor"), so "Brain Dead Frogs" was born.



I have two albums on the web site. They span a variety of genres and techniques. All of them use AI-generated music and AI voices (because I can't play music and I can't sing on key); only the lyrics vary: some are completely human written, some are based on AI creations but human edited, a few use AI-editing a human's writings, and one is completely written by AI. If you listen to them, can you tell which is which?

Hint: When editing, even if I changed all of the lyrics, I tried to keep the original AI artifacts. I found it was easiest to use AI to create the first draft of the song -- because that creates the pacing and tempo. Then I'd rewrite the lyrics while retaining the same syllables per line, which kept the pacing and tempo.

Outro (Thank You For The Music)

As a virtual musician and amateur songwriter, I'd like to thank everyone who helped make these songs possible. That includes Suno, ChatGPT, Gemini, Audacity, the online Merriam-Webster dictionary (for help with rhyming and synonyms), and Madcat for temporarily suspending her devotion to Taylor Swift long enough to be my first fan and website developer.

I definitely must thank my critical reviewers, including Bob ("I had low expectations, so this was better than I thought."), Richard ("Have you thought about therapy?"), The Boss ("At least it's not another C2PA blog"), and Dave for his insights about music and pizza. I'd also like to thank my muses, including Todd (nerdcore genre) and Zach Weinersmith for his cartoon about the fourth little pig (genre: traditional ska; a calypso and jazz precursor to reggae).

Finally, I'd like to issue a preemptive apology. For my attempt at K-pop ("Jelly Button"), I thought it needed some Korean in the bridge, because you can't have K-pop without some Korean lyrics. I don't speak Korean, and I suspect that my translator (ChatGPT) and singer (Suno) also don't. So I'll apologize in advance if the words are wrong. (But if the words are funny or appropriate, I'll unapologize and claim it was intentional.)

What C2PA Provides

1 August 2025 at 10:00
Last month I released my big bulleted list of C2PA problems. Any one of these issues should make potential adopters think twice. But 27 pages? They should be running away!

Since then, my list has been discussed at the biweekly Provenance and Authenticity Standards Assessment Working Group (PASAWG). The PASAWG is working on an independent evaluation of C2PA. Myself and the other attendees are only there as resources. As resources, we're answering questions and discussing issues, but not doing their research. (We've had some intense discussions between the different attendees.) The PASAWG researchers have not yet disclosed their findings to the group, and as far as I can tell, they do not agree with me on every topic. (Good! It means my opinion is not biasing them!)

Full disclosure: The PASAWG meetings fall under Chatham House rules. That means I can mention the topics discussed, but not attribute the information to any specific person without permission. (I give myself permission to talk about my own comments.)

Clarifications

For the last few meetings, we have been going over topics related to my bulleted list, the associated issues, and clarifying what the C2PA specification really provides. Having said that, I have found nothing that makes me feel any need to update my big bulleted list, except maybe to add more issues to it. There are no inaccuracies or items needing correction. The 27 pages of issues are serious problems.

However, I do want to make a few clarifications.

First, I often refer to C2PA and CAI as "All Adobe All The Time". One of my big criticisms is that both C2PA and CAI seem to be Adobe-driven efforts with very little difference between Adobe, C2PA, and CAI. I still have that impression. However:
  • The C2PA organization appears to be a weak coalition of large tech companies. Adobe is the primary driving force, and to my knowledge, there are no C2PA working groups that do not have any Adobe employee as the chair or co-chair. The only exception is the top-level C2PA organization -- it's chaired by a Microsoft employee who is surrounded by Adobe employees. I refer to their structure as a "weak coalition" because Adobe appears to be the primary driving force.

  • The Content Authenticity Initiative (CAI) is not "looks like Adobe". No, it is Adobe. Owned, managed, operated, and all code is developed by Adobe employees as part of the Adobe corporation. When you visit the CAI's web site or Content Credentials web service, that's 100% Adobe.
It's the significant overlap of Adobe employees who are part of both C2PA and CAI that causes a lot of confusion. Even some of the Adobe employees mix up the attribution, but they are often corrected by other Adobe employees.

The second clarification comes from the roles of C2PA and CAI. C2PA only provides the specification; there is no implementation or code. CAI provides an implementation. My interpretation is that this is a blame game: if something is wrong, then the C2PA can blame the implementation for not following the specs, while the implementers can blame the specification for not being concise or for any oversights. (If something is broken, then both sides can readily blame the other rather than getting the problem fixed.)

The specifications dictate how the implementation should function. Unless there is a bug in the code, it's a specification issue and not a programming problem. For example, my sidecar swap exploit, which permits undetectable alterations of the visual image in a signed file, is made possible by the specification and not the implementation. The same goes for the new C2PA certificate conformance (which is defined but not implemented yet); choosing to use a new and untested CA management system is a risk from the spec and not the implementation.

The third clarification comes from "defined but not implemented yet". Because C2PA does not release any code, everything about their specification is theoretical. Moreover, the specification is usually 1-2 revisions ahead of any implementations. This makes it easy for C2PA to claim that something works since there are no implementation examples to the contrary. By the time there are implementations that demonstrate the issues, C2PA has moved on to newer requirements and seems to disregard previous findings. However, some of the specification's assumptions are grossly incorrect, such as relying on technologies that do not exist today. (More on that in a moment.)

New Changes: v2.2

The current specification, v2.2, came out a few months ago. Although my bulleted list was written based on v2.1 and earlier, the review was focused on v2.2. When I asked who supports v2.2, the only answer was "OpenAI". Great -- they're a signer. There are no tools that can fully validate v2.2 yet. But there is some partial support.

I've recently noticed that the Adobe/CAI Content Credentials web site no longer displays the embedded user attribution. For example, my Shmoocon forgery used to predominantly display the forged name of a Microsoft employee. However, last month they stopped displaying that. In fact, any pictures (not just my forgeries) that include a user ownership attribution are no longer displayed. This is because the Content Credentials web site is beginning to include some of the C2PA v2.2 specification's features. The feature? User's names are no longer trusted and are now no longer displayed.

That's right: all of those people who previously used C2PA to sign their names will no longer have their names displayed because their names are untrusted. (While the C2PA and Adobe/CAI organizations haven't said this, I think this is in direct response to some of my sample forgeries that included names.)

If you dive into the specifications, there's been a big change: C2PA v2.0 introduced the concepts of "gathered assertions" and "created assertions". However, these concepts were not clearly defined. By v2.2, these became a core requirement. Unfortunately, trying to figure out the purpose and definitions from the specs is as clear as mud. Fortunately, the differences were clarified at the PASAWG meetings. The risks and what can be trusted basically breaks down to gathered assertions, created assertions, trusted certificates, and reused certificates.

Risk #1: Gathered assertions
Gathered assertions cover any metadata or attribution that comes from an unvetted source, such as a user entering their name, copyright information, or even camera settings from unvetted devices. Because the information is unverified, it is explicitly untrusted.

When you see any information under a gathered assertion, it should be viewed skeptically. In effect, it's as reliable as existing standard metadata fields, like EXIF, IPTC, and XMP. (But if it's just as reliable as existing standards, then why do we need yet-another new way to store the same information?)

Risk #2: Created assertions
Created assertions are supposed to come from known-trusted and vetted hardware. (See the "C2PA Generator Product Security Requirements", section 6.1.2.) However, there is currently no such thing as trusted hardware. (There's one spec for some auto parts that describes a trusted camera sensor for the auto industry, but the specs are not publicly accessible. I can find no independent experts who have evaluated these trusted component specs, no devices use the specs right now, and it's definitely not available to general consumers. Until it's officially released, it's vaporware.) Since the GPS, time, camera sensor, etc. can all be forged or injected, none of these created assertions can be trusted.

This disparity between the specification's theoretical "created assertions" and reality creates a big gap in any C2PA implementation. The specs define the use of created assertions based on trusted hardware, but the reality is that there are no trusted hardware technologies available right now. Just consider the GPS sensor. Regardless of the device, it's going to connect to the board over I2C, UART, or some other publicly-known communication protocol. That means it's a straightforward hardware modification to provide false GPS information over the wire. But it can be easier than that! Apps can provide false GPS information to the C2PA signing app, while external devices can provide false GPS signals to the GPS receiver. Forging GPS information isn't even theoretical; the web site GPSwise shows real-time information (mostly in Europe) where GPS spoofing is occurring right now.



And that's just the GPS sensor. The same goes for the time on the device and the camera's sensor. A determined attacker with direct hardware access can always open the device, replace components (or splice traces), and forge the "trusted sensor" information. This means that the "created assertions" that denote what was photographed, when, and where can never be explicitly trusted.



Remember: Even if you trust your hardware, that doesn't help someone who receives the signed media. A C2PA implementation cannot verify that the hardware hasn't been tampered with, and the recipient cannot validate that trusted hardware was used.

Requiring hardware modifications does increase the level of technical difficulty needed to create a forgery. While your typical user cannot do this, it's not a deterrent for organized crime groups (insurance and medical fraud are billion-dollar-per-year industries), political influencers, propaganda generators, nation-states, or even determined individuals. A signed cat video on Tick Tack or Facegram may come from a legitimate source. However, if there is a legal outcome, political influence, money, or reputation on the line, then the signature should not be explicitly trusted even if it says that it used "trusted hardware".

Risk #3: Trusted Certificates
The C2PA specification uses a chain of X.509 certificates. Each certificate in the chain has two components: the cryptography (I have no issues with the cryptography) and the attribution about who owns each certificate. This attribution is a point of contention among the PASAWG attendees:
  • Some attendees believe that, as long as the root is trusted and we trust that every link in the chain follows the defined procedure of validating users before issuing certificates, then we can trust the name in the certificate. This optimistic view assumes that everyone associated with every node in the chain was trustworthy. Having well-defined policies, transparency, and auditing can help increase this trust and mitigate any risks. In effect, you can trust the name in the cert.

  • Other attendees, including myself, believe that trust attenuates as each new node in the chain is issued. In this pessimistic view, you can trust a chain of length "1" because it's the authoritative root. (We're assuming that the root certs are trusted. If that assumption is wrong, then nothing in C2PA works.) You can trust a length of "2" because the trusted root issued the first link. But every link in the chain beyond that cannot be fully trusted.
This pessimistic view even impacts web certificates. HTTPS gets around this trust attenuation by linking the last node in the chain back to the domain for validation. However, C2PA's certificates do not link back to anywhere. This means that we must trust that nobody in the chain made a mistake and that any mistakes are addressed quickly. ("Quickly" is a relative term. When WoSign and StartCom were found to be issuing unauthorized HTTPS certificates, it took years for them to be delisted as trusted CA services.)

In either case, you -- as the end user -- have no means to automatically validate the name in the signing certificate. You have to trust the signing chain.

As an explicit example, consider the HTTPS certificate used by TruePic's web site. (TruePic is a C2PA steering committee member). When you access their web site, their HTTPS connection currently uses a chain of three X.509 certificates:



  1. The root certificate is attributed to the Internet Security Research Group (ISRG Root X1). I trust this top level root certificate because it's in the CCADB list that is included in every web browser. (To be in the CCADB, they had to go through a digital colonoscopy and come out clean.)

  2. The second certificate is from Let's Encrypt. Specifically, ISRG Root X1 issued a certificate to Let's Encrypt's "R11" group. It's named in the cert. Since I trust Root X1, I assume that Root X1 did a thorough audit of Let's Encrypt before issuing the cert, so I trust Let's Encrypt's cert.

  3. Let's Encrypt then issued a cert to "www.truepic.com". However, their vetting process is really not very sophisticated: if you can show control over the host's DNS entry or web server, then you get a cert. In this case, the certificate's common name (CN) doesn't even name the company -- it just includes the hostname. (This is because Let's Encrypt never asked for the actual company name.) There is also no company address, organization, or even a contact person. The certificate has minimum vetting and no reliable attribution. If we just stop here, then I wouldn't trust it.

    However, there's an extra field in the certificate that specifies the DNS name where the cert should come from. Since this field matches the hostname where I received the cert (www.truepic.com), I know it belongs there. That's the essential cross-validation and is the only reason the cert should be trusted. We can't trust the validation process because, really, there wasn't much validation. And we can't trust the attribution because it was set by the second-level issuer and contains whatever information they wanted to include.
With web-based X.509 certificates, there is that link back to the domain that provides the final validation step. In contrast, C2PA uses a different kind of X.509 certificate that lacks this final validation step. If the C2PA signing certificate chain is longer than two certificates, then the pessimistic view calls the certificate's attribution and vetting process into question. The basic question becomes: How much should you trust that attribution?

Risk #4: Reused Certificates
Most services do not have user-specific signing certificates. For example, every picture signed today by Adobe Firefly uses the same Adobe certificate. The same goes for Microsoft Designer (a Microsoft certificate), OpenAI (a certificate issued by TruePic), and every other system that currently uses C2PA.

The attribution in the signature identifies the product that was used, but not the user who created the media. It's like having "Nike" on your shoes or "Levi's" on your jeans -- it names the brand but doesn't identify the individual. Unless you pay to have your own personalized signing certificate, the signature is not distinct to you. This means that it doesn't help artists protect their works. (Saying that the painter used Golden acrylic paint with a brush by Winsor & Newton doesn't identify the artist.)

As an aside, a personalized signing certificate can cost $50-$300 per year. Given all of C2PA's problems, you're better off using the US Copyright Office. They offer group registration for photographers: $55 for 750 photos per year, and the protection lasts for 70 years beyond the creator's lifetime. This seems like a more cost-effective and reliable option than C2PA.

Missing Goals

Each of these risks with C2PA pose serious concerns. And this is before we get into manifest/sidecar manipulations to alter the visual content without detection, inserting false provenance information, competing valid signatures, reissuing signatures without mentioning changes, applying legitimate signatures to false media, etc. Each of these exploits are independent of the implementation, and are due to the specifications.

The C2PA documentation makes many false statements regarding what C2PA provides, including:
  • Section 3, Core Principles: "Content Credentials provides a way to establish provenance of content."

  • Section 5.1: "Helping consumers check the provenance of the media they are consuming."

  • Section 5.2: "Enhancing clarity around provenance and edits for journalistic work."

  • Section 5.3: "Offering publishers opportunities to improve their brand value." (Except that the end consumer cannot prove that it came from the publishers.)

  • Section 5.4: "Providing quality data for indexer / platform content decisions."
This is not the entire list of goals. (I'm literally going section by section through their document.) Unfortunately, you cannot have reliable provenance without validation. C2PA lacks attribution validation so it cannot meet any of these goals. C2PA does not mitigate the risk from someone signing content as you, replacing your own attribution with a competing claim, or associating your valid media with false information (which is a great way to call your own legitimate attribution into question).

What Does C2PA Provide?

An independent report came out of the Netherlands last month that reviews C2PA and whether it can help "combat disinformation by ensuring the authenticity of reporting through digital certificates." (Basically, it's to see if C2PA is appropriate for use by media outlets.) This report was commissioned by NPO Innovatie (NPO), Media Campus NL, and Beeld & Geluid. The report is written in Dutch (Google Translate works well on it) and includes a summary in English. Their key findings (which they included with italics and bold):
C2PA is a representation of authenticity and provenance, but offers no guarantee of the truth or objectivity of the content itself, nor of the factual accuracy of the provenance claims within the manifest.
(Full disclosure: They interviewed many people for this report, including me. However, my opinions are not the dominant view in this report.)

C2PA does not provide trusted attribution information and it provides no means for the end recipient to automatically validate the attribution in the signing certificate. Moreover, the specifications depend on trusted hardware, even though there is no such thing as trusted hardware. This brings up a critical question: If you cannot rely on the information signed using C2PA, then what does C2PA provide?

My colleague, Shawn Masters, likens C2PA's signature to an "endorsement". Like in those political ads, "My name is <name>, and I approve this message." You, as the person watching the commercial, have no means to automatically validate that the entity mentioned in the promotion actually approved the message. (An example of this false attribution happened in New Hampshire in 2024, where a deep fake robocall pretended to be Joe Biden.) Moreover, the endorsement is based on a belief that the information is accurate, backed by the reputation of the endorser.

The same endorsement concept applies to C2PA: As the recipient of signed media, you have no means to automatically validate that the name in the signing cert actually represents the cert. The only things you know: (1) C2PA didn't validate the content, (2) C2PA didn't validate any gathered assertions, and (3) the signer believes the unverifiable created assertions are truthful. When it comes to authenticating media and determining provenance, we need a solution that provides more than "trust", "belief", and endorsements. What we need are verifiable facts, validation, provable attribution, and confirmation.

Breaking Windows and Linux Customizations

18 July 2025 at 17:28
I like small laptops. Years ago I got a 10-inch Asus EeePC with an Atom processor. It wasn't very powerful, but it ran Linux. Well, mostly. The audio drivers sometimes had problems and I never got Bluetooth to work. Battery storage capacity degrades over time. The EeePC battery originally lasted over 12 hours per charge, but after nearly a decade, it would get about 2 hours. I couldn't find a replacement battery, so five years ago I decided to get a new laptop.

The replacement laptop was a little larger (13-inch), but all of the components were supposed to be compatible with Linux. It also came with Windows 10 installed. I always intended to put Linux on it, but never got around to it. Since I only used it for web browsing and remote logins (using PuTTy), upgrading was never an urgency and Win10 was good enough.

However, over the years the laptop began to develop problems:
  • The network sometimes wouldn't auto-connect after waking from suspend mode. I'd have to toggle the network on and off a few times before it would work. Other people had similar problems with Win10. Their solutions didn't work for me, but it wasn't annoying enough to replace the operating system.

  • The mousepad's button began losing sensitivity. Just as I had worn through the keyboard on my desktop computer, I was wearing out the trackpad's button. But it wasn't bad enough to replace the entire laptop.

  • With Win10 heading toward end-of-support (EoS is October 14, 2025), I knew I needed to upgrade the operating system sooner than later.
The final straw was the most recent Patch Tuesday. The laptop downloaded the updates, rebooted, and just sat at "Restarting". I couldn't figure out how to get past this. I'm sure there are instructions online somewhere, but I decided that it would be easier to install Linux.

(While I couldn't get back into the Windows system, I wasn't worried about backing up any files. This laptop is only used for remote access to the web and servers, and for giving presentations. All personal files already existed on my other systems.)

Intentional Procrastination

There's one reason I kept putting off installing Linux. It's not as simple as downloading the OS and installing it. (If that's all it took, I'd have done it years ago.) Rather, it usually takes a few days to customize it just the way I like it.

This time, I installed Ubuntu 24.04.2 (Noble Numbat). The hardest part was figuring out how to unlock the drive (UEFI secure boot). Otherwise, the installation was painless.

On the up side:
  • The laptop is noticeably faster. (I had forgotten how much of a resource hog Win10 is.)

  • The hard drive has a lot more room. (Win10 is a serious disk hog.)

  • The network wakes up immediately from suspend. That was a Windows bug, and Linux handles it correctly.

  • This is an older laptop. The battery originally lasted 8-9 hours under Windows, but had aged to lasting 4-6 hours from a full charge. With Linux, the same laptop and same old battery is getting closer to 10-12 hours, and that's while doing heavy computations and compiling code.

  • Unexpected: The trackpad's buttons work fine under Linux. I thought I had worn out the physical contacts. Turns out, it was Win10.
On the downside, it's yet another Linux desktop, and that means learning new ways to customize it. (Linux is made by developers for developers, so the UI really lacks usability.)

Disabling Updates

My first customization was to disable updates. I know, this sounds completely backwards. However, I use my laptop when I'm traveling or giving presentations. I do not want anything updating on the laptop while I'm out of the office. I want the laptop to be as absolutely stable and reliable as possible. (I've seen way too many conference presentations that begin with the speaker apologizing for his computer deciding to update or failing to boot due to an auto-update.)

In the old days, there was just one process for doing updates. But today? There are lots of them, including apt, snap, and individual browsers.
  • Snap: Snap accesses a remote repository and updates at least four times a day. (Seriously!) On my desktop computers, I've changed snap to update weekly. On my production servers and laptops, I completely disabled snap updates. Here are the commands to check and alter snap updates:

    • To see when it last ran and will next run: snap refresh --time --abs-time

    • To disable snap auto-updates: sudo snap refresh --hold

    • To restart auto-updating: sudo snap refresh --unhold

    • To manually check for updates: sudo snap refresh

    Now the laptop only updates snap applications when I want to do the update.

  • Apt: In older versions of Linux, apt used cron to update. Today, it uses system timers. To see the current timers, use:
    systemctl list-timers --all
    Leave the housekeeping timers (anacron, e2scrub, etc.), but remove the auto-update timers. This requires using 'stop' to stop the current timer, 'disable' to prevent it from starting after the next boot, and optionally 'mask' to prevent anything else from turning it back on. For example:
    # Turn off apt's daily update.
    sudo systemctl stop apt-daily-upgrade.timer
    sudo systemctl disable apt-daily-upgrade.timer
    sudo systemctl stop apt-daily.timer
    sudo systemctl disable apt-daily.timer

    # turn off motd; I don't use it.
    sudo systemctl stop motd-news.timer
    sudo systemctl disable motd-news.timer
    But wait! There's more! You also need to disable and remove some packages and settings:

    • Remove unintended upgrades: sudo apt remove unattended-upgrades

    • Edit /etc/apt/apt.conf.d/20auto-upgrades and set APT::Periodic::Update-Package-Lists and APT::Periodic::Unattended-Upgrade to "0".

    • And be sure to really disable it: sudo systemctl disable --now unattended-upgrades

    If you don't do all of these steps, then the system will still try to update daily.

  • Ubuntu Advantage: Brian Krebs has his "3 Basic Rules for Online Safety". His third rule is "If you no longer need it, remove it." I have a more generalized corollary: "If you don't use it, remove it." (This is why I always try to remove bloatware from my devices.) Canonical provides Ubuntu Advantage as their commercial support, but I never use it. Following this rule for online safety, I disabled and removed it:
    sudo systemctl stop ua-messaging.timer
    sudo systemctl stop ua-messaging.service
    sudo systemctl stop ua-timer.timer
    sudo systemctl mask ua-messaging.timer
    sudo systemctl mask ua-messaging.service
    sudo systemctl mask ua-timer.timer
    sudo rm /etc/apt/apt.conf.d/20apt-esm-hook.conf
    sudo apt remove ubuntu-advantage-tools
    sudo apt autoremove

  • Browsers: I use both Firefox and Chrome (Chromium). The problem is, both browsers often check for updates and install them immediately. Again, if I'm traveling or giving a presentation, then I do not want any updates.

    • I installed Chrome using snap. Disabling snap's auto-update fixed that problem. Now Chrome updates when I refresh snap.

    • Firefox was installed using apt. Disabling the browser's auto-update requires going into about:config. Search for "app.update.auto" and set it to "false". At any time, I can go to the browser's menu bar and select Help->About to manually trigger an update check.
While I turned off auto-updates, I set a calendar event to periodically remind me to manually perform updates on all of my computers. (I may not have the latest patch within hours of it being posted, but I do update more often than Window's monthly Patch Tuesday.) To update the system, either when the calendar reminds me or before going on a trip, I use:
sudo apt update ; sudo apt upgrade ; sudo snap refresh

Phone Home

I've configured my laptop, cellphone, and every other remote device to "phone home" each time they go online, change network addresses, or have a status update. One of my servers has a simple web service that listens for status updates and records them. This way, I know which device checked in, when, and from where (IP address). I also have the option to send back remote commands to the device. (Like "Beep like crazy because I misplaced you!") It's basically the poor-man's version of Apple's "Find My" service.

Figuring out where to put the phone-home script was the hard part. With Ubuntu 24.04, it goes in: /etc/network/if-up.d/phonehome. My basic script looks like this:
#!/bin/sh
curl 'https://myserver/my_url?status=Online' >/dev/null 2>&1

(Make sure to make it executable.) This way, whenever the laptop goes online, it pings my server. (My actual script is a little more complicated, because it also runs commands depending on the server's response.)

Desktop Background

I like a simple desktop. Few or no icons, a small task bar, and a plain dark-colored background. Unfortunately, Ubuntu has migrated away from having solid color backgrounds. Instead, Ubuntu 24.04 only has an option to use a picture. Fortunately, there are two commands that can disable the background picture and specify a solid color. (I like a dark blue.)
gsettings set org.gnome.desktop.background picture-uri none
gsettings set org.gnome.desktop.background primary-color '#236'

These changes take effect immediately.

Terminal Colors

With such as small laptop screen, I don't want large title bars or borders around windows. However, the Ubuntu developers seem to have taken this to an extreme. I spend a lot of time using the command-line with lots of terminal windows open. The default terminal has a dark purple background (a good, solid color) and no visible border around the window. But that's a problem: If I have three terminal windows open, then there is almost no visual cue about where one terminal window ends and the next begins.



I quickly found myself constantly fiddling with title bars to figure out which terminal window was on top and wiggling the window's position to figure out where the borders were located. Even with tabbed terminal windows, there is very little visual distinction telling me which tab is active or letting me know when I've switched tabs.

After a few days of this, I came up with a workaround: I give every terminal window a different background color. Now there's a clear visual cue telling me which window and tab is active.



The default shell uses bash, which means it runs $HOME/.bash_aliases each time a new window is opened. Here's the code I added to the end of my .bash_aliases file:
##### Set terminal background color based on terminal number
# get terminal name, like: /dev/pts/0
termnum=$(tty)
# reduce the name to the number: /dev/pts/1 become 1
termnum=${termnum##*/}
# I have 10 unique colors; if more than 10 terminals, then repeat colors
((termnum=$termnum % 10))
# set the color based on the terminal number, using escape codes.
case $termnum in
0) echo -n -e "\e]11;#002\e\\" ;;
1) echo -n -e "\e]11;#010\e\\" ;;
2) echo -n -e "\e]11;#200\e\\" ;;
3) echo -n -e "\e]11;#202\e\\" ;;
4) echo -n -e "\e]11;#111\e\\" ;;
5) echo -n -e "\e]11;#220\e\\" ;;
7) echo -n -e "\e]11;#321\e\\" ;;
8) echo -n -e "\e]11;#231\e\\" ;;
9) echo -n -e "\e]11;#123\e\\" ;;
esac

Now I can have five open terminals, each with a different background color. Each terminal is easy to distinguish from any adjacent or overlapping windows.

Almost Done

It took about two hours to install the laptop. (That includes downloading Ubuntu, copying it to a thumb drive, and installing it on the laptop.) Many of my customizations, such as setting up my remote server access and setting my preferred shell preferences, were straightforward.

Switching from Windows to Ubuntu gave this older laptop a lot of new life. But with any new system, there are always little things that can be improved based on your own preferences. Each time I use the laptop, I watch for the next annoyance and try to address it. I suspect that I'll stop fiddling with configurations after a month. Until then, this is a great exercise for real-time problem solving, while forcing me to dive deeper into this new Ubuntu version.

Feel the Burn

3 July 2025 at 09:37
I recently traveled to see some extended family and friends. Of course, that means PTSD (People Tech Support Duties). I did everything from hardware (installing a grab bar in the shower) to software (showing how to use AirDrop to transfer a picture -- because there's no way I'm plugging my personal device into their equipment).

Having said that, I did receive one really odd question, but it needs some context. An older friend constantly watches FOX News, spends way too much online time in far-right extremist forums on Facebook, and isn't very tech savvy. His question: How does he get a burner phone?

Huh? What? Why are they talking about burner phones in those far-right forums? And why does he think he needs a burner phone? Keep in mind, he currently doesn't have a cellphone at all. Perhaps it's time to set the story straight about burner phones and how they, by themselves, really don't help with anonymity.

Go for the Burn

Let's start at the beginning: a "burner phone" is not a brand or style of phone. Any cellphone can be a burner phone. "Burner" just means that there's no impact to you if it is lost or discarded.

Keep in mind, some people don't use a cellphone and only want it when traveling. (This is my friend's situation.) If you just want a short-term phone with no long-term contract, then you don't need a burner. Instead:
  • Get a pre-paid phone. When you need it, you can add minutes and use the phone. And if you don't need it for months at a time, then just store it with your other travel supplies.

  • Use a service like Consumer Cellular, Cricket, Visible, Tracfone, or T-Mobile's Metro. (Not an endorsement; just examples.) They don't have long-term contracts and you can cancel or suspend the service when you want. This way, you don't have to pay for a monthly service if you don't need it that month.
In either case, you can pay one time for a cheap phone, then put on minutes or enable the service as needed. This isn't a burner phone; this is a travel phone or a no-contract phone.

Burner Level 1: Phone Number

Cellphones have a bunch of identifiers used by phone companies to track the service. However, they can also be used by law enforcement (or "hackers") to track people. This is why it's important to identify why you want a burner and what you consider to be your threat.

At the most superficial level is your phone number. People with this number can contact you. A burner number is a phone number that you can easily change whenever you want. This is often useful if you have an ex-partner or ex-employer who you no longer want to talk to. You can change your number, inform your friends, and exclude the people you want to avoid.

Of course, this can be a hardship if you use your phone number for work or have hundreds of friends. There are also some risks associated with changing your phone number:
  • If you have lots of friends, one of them could leak your new number to the person you're trying to avoid. (This is a real problem when your crazy ex-partner has a lot of common friends.)

  • If you get a new phone number, then you inherit all of the crazy people who were ghosted by the former owner of that same number. Remember: who changes phone number? People with crazy stalkers and debt collectors.

  • Many online services use your phone number for two-factor authentication. If you lose your number, you lose your 2FA. Be sure to disable 2FA on every service before getting rid of the phone, and enable 2FA on your new phone number. However, some services require 2FA. In that case, get the new number and transfer all of your 2FA registrations before ghosting your old number. (This is a painful process.)

  • If you stay with the same carrier, then changing numbers may include a "call forwarding" service. Be sure to disable that, otherwise your crazy stalkers will just follow you to your new number.
There are some better alternatives to changing phone numbers. For example, I never give out my direct cell number. Instead, I use Google Voice (GV). GV will ring my office, cellphone, and any other numbers I use. I can easily change my cellphone number without anyone knowing. GV also provides filtering services, like spam call blocking and "state your name" before forwarding the call to me. This is a better approach for filtering out people you no longer want to talk to.

As another option, some carriers permit you to change numbers a few times each year. This is useful if you regularly use burner numbers.

Burner phone numbers are for ghosting known people who you no longer want to communicate with. However, they won't stop law enforcement from tracking you. A simple warrant to the carrier is often enough to get your old and new numbers.

Burner Level 2: SIM

The Subscriber Identity Module (SIM) is a small smartcard that links the phone to your account at the phone carrier. It contains the International Mobile Subscriber Identity (IMSI), which is a unique number for tracking the account, an authentication key (to deter impersonating someone else's account), your phone's PIN code, mobile country code and mobile network code (MCC and MNC), and a few other pieces of unique information. The SIM also includes a little storage area (like 8KB or 256KB), but you're not going to store hundreds of contacts or camera photos in the SIM.



You can change phone numbers without changing the SIM, because the carrier assigns that. You can also change phones without changing phone numbers by moving the SIM into the new device. (You might also need to tell your carrier that you are using a new device, since the carrier needs both the SIM and device information. However, some carriers auto-detect the new device when they see the IMSI associated with a new phone.)

Since the SIM is tied to your account, a burner account would use a different SIM that isn't associated with your normal billing account at the carrier. Some carriers offer pre-paid phone services, so the SIM is relatively anonymized -- it doesn't need to be tied to a person since the account has funds already paid.

When you get rid of your phone, be sure to remove the SIM. Either move it to the new phone, or destroy it. (Use scissors to cut it into tiny pieces; be sure to crack the chip inside it.)

Burner Level 3: IMEI

Every cellphone has a unique International Mobile Equipment Identity (IMEI). Let's say you give your phone to a friend as a hand-me-down. You remove your SIM and your friend inserts their SIM. Somewhere among the phone carriers is a record that the physical device (IMEI) had been associated with your IMSI and phone number, and now is associated with your friend's IMSI and phone number.

A burner phone means you want a new device with a new IMEI. However: If you moved your SIM to the new phone, then the IMEI changes but the IMSI stays the same. This means that someone with a warrant can track you. If you want a burner phone, then use a new device (IMEI), new SIM (IMSI), and new phone number. Everything needs to be new.

If you want to get rid of your old phone, then:
  1. Copy off anything you want to keep. (Photos, contacts, etc.)

  2. Remove the SIM and any additional storage media (SDcard).

  3. Perform a factory reset. (How to do this depends on the phone.) This will remove your call logs, photos, contacts, and everything else.

  4. If you're super paranoid like me, then destroy the phone. (Open it up. If it looks like a memory chip, then drill through it.)
In TV shows and movies, you often see someone discard a burner phone by stomping on it. However, while it makes for good cinema, you're probably just going to crack the screen. The memory, SDcard, and SIM are probably still intact. (Inside the phone is a lithium battery. If you don't step hard enough to crack the battery, then the memory is probably recoverable. And if you do crack the battery, then you have seconds to get clear before it catches fire and possibly explodes.)



The better way to dispose of a burner phone: Perform a factory reset to remove the call logs and contacts, then "accidentally" leave the phone in a crowded area. Someone else is almost certain to find it and walk off with it. This way, anyone who is tracking the IMEI and IMSI will start following the wrong person who's heading in a different direction.

Burner Level 4: Anonymity

Just getting a burner phone is not enough to be anonymous. You also need to consider operational security (OPSEC).

Consider this: Who are you going to call with your burner phone? If you call anyone who you previously called with the old phone, then there's an immediate trail from that known number back to your new phone.

By the same means, where are you going to go online? You might have a new IMEI, new IMSI, and new phone number, but if you log into your existing Facebook account, then you just linked everything back to the new no-longer-anonymous phone. Even if you create a new Facebook account, if you reach out to any of your previous contacts or forums, then you just linked back to yourself.

But it's actually worse than that. Often in movies, you'll see a gang of thieves or spies distribute new burner phones before starting a caper. They all receive their new phones and then immediately check that the phones work. The problem is, cell towers triangulate phone locations. (This is how the phone knows which tower has the strongest signal to use.) If a bunch of new phones turn up at the same physical location, then they can immediately be linked together. If law enforcement recovers one phone, then they can trace the cell tower records back to a location where there were other new phones. Then the feds can guestimate how many people are in the group (assume one per phone) and track where they are now.

To really be anonymous, you need new phones with new SIMS and new phone numbers that only call other new numbers and use new accounts that were never linked back to any old identities. And you need to do this without getting any of the new devices into the same vicinity as any other new device. (This is much harder than it sounds.) Like the T-800 said to Sarah Connor in Terminator: Dark Fate: "If you're going to keep your phone in a bag of potato chips, then keep your phone in a bag of potato chips." It just takes one mistake to link any tracking back to you and all of your other co-conspirators.

Burner Level 5: Naked Man

At the far extreme end is a spy technique that is informally called the "naked man" (not literally naked). The idea is that an operative arrives in a new location with nothing: no phone, no laptop, no technology. When they need a phone or laptop, they buy it there (using cash). If they need an online account, they make one as they need it. This creates a new online identity with no links back to their previous identity.

There's just one problem: everyone leave a digital trail. Having someone suddenly appear on the grid without any previous trail is a red flag and stands out like a sore thumb. Consider this:
  • How many people go through an airport with zero electronics? You stand out as a naked man. (Well, unless you're really old or traveling from a third-world nation. But between developed nations? Tech is the norm.)

  • Okay, so to look like everyone else, you enter the airport with a burner and you ditch it when you arrive. Except that now there's a record of the previous phone going offline and a new entity with zero digital trail coming online. The authorities can often rule out everyone who has an established digital trail and then identify you.

  • You go to buy new tech using cash. Due to theft and robbery, nearly all stores have cameras at the registers. Since the cameras capture your face, you're no longer anonymous, and since you're paying with cash, you stand out. In addition, some stores won't accept lots of cash for a sale; having too much cash at the register makes them a big target for armed robbery. (There's a known scam that hits retailers: buy something expensive with lots of cash, then have your friend rob the store to recover the cash.) Big transactions usually require a credit card, but a credit card will link the transaction back to you. (Even pre-paid credit cards are not anonymous, but that's a topic for some other blog entry.)

  • And we get back to the same burner problem: You have your newly purchased tech. How are you going to contact your remote associates? The instant that you reach out to anyone, you create a digital link back to yourself.
Practicing good OPSEC and tradecraft isn't something you can learn from a blog. This takes serious training, consideration, and support infrastructure. The dark conspiracy groups on Facebook who are talking about using burner phones? Those are the rank amateurs who are certain to get caught.

From Sunny Skies to the Solar System

23 June 2025 at 17:54
I'm continuing to look for ways to lower my energy bill, even if only by a few dollars. One of my ideas was to use solar panels. However, the roof on the office building isn't ideal for solar.
  • The optimal direction is East to South-East for morning and South-West to West for afternoon. Unfortunately, the southern facing parts of the roof have lots of small sections, so there's no place to mount a lot of solar panels. But I do have space for a few panels on the roof; probably enough to power the server rack.

  • All of the professional solar installation companies either don't want to install panels if it's less than 100% of your energy needs, or they want to charge so much that it won't be worth the installation costs. This rules out the "few solar panels" option from a professional installer.
Last year, I decided that it would be a good learning experience to make my own solar panel Energy Storage System (ESS). My goal was not to power the entire office or sell power back to the electric company. Rather, I wanted an off-grid solution to just power the server rack for a few hours each day. If it worked, it should save me somewhere between 20kWh and 40kWh per month. That's less than 10% of my utility bill, but it's better than nothing. And assuming I ran the numbers correctly, it should pay itself off in about 5 years. (I hoped to keep the costs significantly lower by doing the installation by myself.)

In the worst case, it may never earn enough to pay itself off. But at least I'll learn something about solar panels, energy storage systems, and high voltage.

Having said that, developing it myself was certainly full of unexpected surprises and learning curves. Each time I thought I had everything I needed, I ended up finding another problem. (Now I know why professional installers charge tens of thousands of dollars. I don't even want to think about how much of my labor that went into this.)

The Basic Idea

I started this project with a basic concept. For the rest of the details, I decided that I'd figure it out as I went along.
  1. Goal: I want an off-grid solar powered system for my server rack. It is not intended to run 24 hours a day, cover all of my energy needs, or sell excess power back to the utilities. I only want to reduce my power usage and related costs by a little. (When I consulted with professional solar installers, this is a concept that they could not comprehend.)

  2. Low budget: A professional installation can cost over $20,000. I want to keep it under $1,000. For me, I wanted this to be a learning experience that included solar power and embedded controllers.

  3. Roof: The original plan was to put some panels on the roof. Since I don't have much roof space, I was only going to have two panels that, under ideal conditions, could generate about 100 watts of power each, forming a 200W solar system. This won't power the entire office, but it should power the server rack for a few hours each day. (I ended up not going with a roof solution, but I'll cover that in a moment.)
The entire architecture is kind of overwhelming. Here's a drawing that shows what I'm doing:



And here's the final system:



Note: I'm naming a lot of brands to denote what I finally went with. This is not an endorsement or sponsorship; this is what (eventually) worked for me. I'm sure there are other alternatives, and I didn't necessarily choose the least expensive route. (This was a learning experience.)
  • Solar charger: The solar panels connect to a battery charger, or Maximum Power Point Tracking (MPPT) system. The MPPT receives power from the solar cells and optimally charges the battery. Make sure to get an MPPT that can handle all of the power from your panels! My MPPT is a Renogy Rover 20, a 20-amp charger that can handle a wide range of batteries. The two black wires coming out the bottom go to the battery. There's also a thin black line that monitors the battery's temperature, preventing overcharging and heat-related problems. Coming off the left side are two additional black lines that connect to the solar panels. (The vendor only included black cables. I marked one with red electrical tape so I could track which one carried the positive charge.) There's also a 10-amp fuse (not pictured) from the solar panels to the MPPT.

  • Battery: The MTTP receives power form the panels and charges up a moderately large battery: 12V 100Ah LiFePO4 deep cycle battery. (Not pictured; it's in the cabinet.) When fully charged, the battery should be able to keep the servers running for about 30 minutes.

  • Inverter: On the right is a Renogy 2000W power inverter. It converts the 12V DC battery into 120V 60Hz AC power. It has two thick cables that go to the battery, with red going through a 20-amp fuse. (Always put fuses on the red/positive lines.)

  • Automatic Transfer Switch (ATS): At the top (yellow box) is the automatic transfer switch (ATS) that toggles between utility/wall power and the inverter's power. It has a 30ms transfer speed. I had been using this box for years to manually switch between power sources without interruption. The three cables coming out of it go to the two inputs: primary is the inverter and fallback is the wall outlet. The output line goes to the UPS in the server rack. The UPS ensures that there isn't an outage during the power transfer. It also includes a power smoother to resolve any potential power spikes or phase issues.

  • Output power: The ATS's output AC power (from grid or inverter) goes into a smart outlet (not pictured in the line drawing, but visible in the photo below as a white box plugged into the yellow connector at the top). This allows me to measure how much power the server rack consumes. There's a second smart outlet (not pictured) between the wall outlet and the ATS, allowing me to measure the power consumption from the utility. When I'm running off grid power, both smart outlets report the same power consumption (+/- a few milliamps). But when I'm running off the inverter, the grid usage drops to zero.

  • Controller: In the middle (with the pretty lights) is my DIY embedded controller. It reads the battery level and charging state from the MPPT and has a line that can remotely turn on and off the inverter. It decides when the inverter runs based on the battery charge level and available voltage. It also has a web interface so I can query the current status, adjust parameters, and manually override when it runs.

  • Ground: Not seen in the picture, there's grounding wire from the inverter's external ground screw to the server rack. The server rack is tied to the building's "earth ground". Proper grounding is essential for safety.
Everything is mounted vertically to a board that is hung from the side of the server rack. This allows me to easily take it down for any maintenance issues. (And when doing the initial testing, I could carry the entire thing outside.)

Even though I knew I'd be starting this project around March of this year, I started ordering supplies five months earlier (last November). This included solar panels, a solar charger, battery, and an inverter. I ordered other components as I realized I needed them. Why did I start this so early? I believed Trump when he said he would be imposing stiff tariffs, making everything more expensive. (In hindsight, this was a great decision. If I started ordering everything today, some items would cost nearly twice as much.)

Measuring Power

Before starting this project, I needed to understand how much power I'd require and how much it might save me on my utility bill.

As a software (not hardware) person, I'm definitely not an electrical engineer. For you non-electricians, there are three parts of electricity that need to be tracked:
  • Voltage (V). This is the amount of power supplied on the wires. Think of it like the pressure in a water pipe.

  • Amps (A). This is the amount of current available. Think of this like the size of the water pipe. A typical desktop computer may require a few amps of power. Your refrigerator probably uses around 20 amps when the compressor is running, while an IoT embedded device usually uses 200mA (milliamps, or 0.2A, that's flea power).

  • Watts (W). This is the amount of work available. W=AΓ—V.
These measurements are often compared to flowing water. Volts identify how fast a river is flowing (the water pressure). Amps identify how large the river is, and watts is the total energy delivered by the river. A wide but slow moving river has high amps but low voltage. A narrow but fast flowing river has a low current (low amps) but a high voltage. Because of the relationship between W, A, and V, the electronics can adjust the A and V distribution while mantaining the same W.

W, A, and V are instantaneous values. To measure over time, you typically see Watt-hours (Wh) and Amp-hours (Ah). Your utility bill usually specifies how many Wh you used (or kilowatts for 1000 Wh; kWh), while your battery will identify the amount of power it can store in terms of Ah at a given V. If you use fewer amps, then the battery will last longer.

Keep in mind, this can really screw up the power calculations if you get them wrong. For example, my 12V 100Ah DC battery is being converted to 120V AC power. If the AC uses a 1-amp load (like one server in the rack), then that's not 100 hours of battery; that's 10 hours. Why? 12V at 100Ah is 1200Wh. 1200Wh&div;120V=10Ah, or 10 hours of power. (And with inverter's overhead and conversion loss, it's actually less.)

Parts and Parts

While I work with computers daily, I'm really a "software" specialist. Besides a few embedded systems, I don't do much with hardware. Moreover, the computer components that I deal with are typically low voltage DC (3.3V, 5V, or 12V and milliamps of power; it's hard to kill yourself if you briefly short out a 9V battery).

When it comes to high voltage, my electrical engineering friends all had the same advice:
  1. Don't kill yourself.

  2. Assume that all wires have enough power to kill you. Even when turned off.

  3. When possible, over-spec the components. If you need 5 amps, get something that can handle 10 amps. If you need 12 gauge wire (12awg), then use 8awg (a thicker wire). If you need 2 hours of power, get something that can provide 4 hours of power. You can never go wrong by over-spec'ing the components. (Not exactly true, but it's a really good heuristic.)
For the last year, I've been using some Shelly plugs to monitor the energy consumption of my server rack. Every hour I take a reading and store it in a database. I also wrote a web interface that can display the real-time information and graph the hourly usage. (Every vertical bar is an hour, and every color is one day.)



The lower part of the rack hosts FotoForensics, Hacker Factor, and my other primary services. It usually consumes about 230W of power, or 2A. (It can fluctuate up to almost 300W during a reboot or high load, but those don't last long.) The upper rack is for the development systems, and uses around 180W. (180W at 120V is 1.5A.) That's right, the entire rack is usually consuming less than 4Ah of power at any given time.

For this solar experiment, I decided to initially only power the upper rack with solar. (If it turns out to be really successful, then I might add in the lower rack's power needs.)

The Bad Experiences

I had a few bad experiences while getting this to work. I chalk all of them up to the learning curve.

Problem #1: The Battery
Setting up the MPPT, inverter, and ATS was easy. The battery, on the other hand, was problematic. There are lots of batteries available and the prices range wildly. I went with a LiFePO4 "deep cycle" battery because they last longer than typical lead acid and lithium batteries and are designed for repeatedly powering up and draining. LiFePO4 also doesn't have the "toxic fumes" or "runaway heat" problems that the other batteries often have.

I found a LiFePO4 battery on Amazon that said it was UL-1973 certified. (That means for use with a solar project.) However when it arrived, it didn't say "UL 1973" anywhere on the battery or manuals. I then checked with Underwriter Labs web site. The battery was not listed. The model was not listed. The brand was listed, but none of their products had UL certifications. This is a knock-off forgery of a battery. If they lied about their certification, then I'm not going to trust the battery.

Amazon said that the vendor handles returns directly. My first request to the vendor was answered quickly with an unrelated response. I wrote to them: "I'd like to return the battery since it is not UL certified, as stated on your product description page." The reply? "The bluetooth battery needs to be charged before you can use it." (This battery doesn't even have bluetooth!)

My second request to the vendor received no response at all.

I told my credit card company. They stopped payment, sent an inquiry to the vendor, and gave them 15 days to respond. Two weeks later, with no response, I was refunded the costs. The day after the credit card issued the refund, the vendor reached out to me. After a short exchange, they paid to have the battery returned to them.

The second battery that I ordered, from a different vendor, had all of the certificates that they claimed.

Problem #2: The Inverter
The first inverter that I got looked right. However, when I connected it to the ATS, the wall outlet's circuit breaker immediately tripped. Okay, that's really bad. (But also, really good that the circuit breaker did its job and I didn't die.) It turns out, inverters above a certain wattage are required to have a "neutral-ground bond". The typical American three-prong outlet has a hot, neutral, and ground wire. The N-G bond means that neutral and ground are tied together. This is a required safety feature. Every home and office circuit has exactly one N-G bond. (It's in the home or building's circuit breaker panel.)

The four-poll (4P) ATS ties all grounds together while it switches the hot and neutrals. The problem: If the inverter and wall outlet both have a N-G bond, then it creates a grounding loop. (That's bad and immediately trips the circuit breaker.) For most inverters, this functionality is either not documented or poorly documented. Some inverters have a built-in N-G bond, some have a floating neutral (no bond) and are expected to be used with an ATS, and some have a switch to enable/disable the N-G bond.

My first inverter didn't mention the N-G bond and it couldn't be disabled. Fortunately, I was able to replace it with one that has a switch. With the N-G bond safely disabled, I can use it with the ATS without tripping the circuit breaker.

Keep this in mind when looking for an inverter. Most of the ones I looked at don't mention how they are bonded (or unbonded).

Problem #3: The ATS
I spent days tracking down this problem. The ATS output goes to a big UPS. This way, any transfer delays or phase issues are cleaned up before reaching the computers. When the inverter turned on, I would see a variety of different problems:
  • Sometimes the UPS would run fine.

  • Sometimes the UPS would scream about an input problem, but still run off the input power.

  • Sometimes the UPS would not scream, and would slowly drain its internal battery while also using the inverter's power.

  • Sometimes the UPS would scream and refuse to use the input power, preferring to run off the UPS battery.
The problem was incredibly inconsistent.

If I removed the ATS, then the UPS had no problem running off utility power. If I moved the electrical plug manually to the inverter (with the N-G bond enabled), it also ran without any problems.

Long story short: Most automatic transfer switches have a "direction". If primary is utility and backup is the generator (or solar), then it demands to be installed in that direction. For my configuration, I want a battery-priority ATS, but most ATSs (including mine) are utility-priority. You cannot just swap the inputs to the ATS and have it work. If, like me, you swap them, then the results become incredibly inconsistent and will lead you down the wrong debugging path.

My solution? Someday I'll purchase a smart switch that is battery-priority. In the meantime, I have a Shelly smart-plug monitoring the utility power. My DIY smart controller tells the Shelly plug to turn on or off utility power. When it turns off, the ATS immediately switches over to using the inverter's power. And when my DIY controller see that the solar battery is getting low, it turns the utility grid back on.

The added benefit for my method of turning on or off the utility power is that I can control the switching delay. The inverter takes a few seconds to start up. I have a 15-second timer between turning on the inverter (letting it power up and normalize) and turning off the utility power. This seems to help the UPS accept the transfer faster.

Problem #4: Over-spec'd Inverter
Remember that advice I got? Always over-spec the equipment? Well, that's not always a good idea. As it turns out, a bigger inverter requires more energy to run (18Wh for a 2000W inverter vs 12Wh for a 1000W inverter). It also has a worse conversion rate for a low load. (The inverter claims to have >92% conversion rate, meaning that the 100Ah battery should last for 92Ah. But with a light load, it may be closer to 80%.)

I'll stick with the inverter that I got, but I could probably have used the next smaller model.

Problem #5: The Roof
I wanted to put the solar panels on the roof. I really thought this was going to be the easiest part. Boy, was I wrong.

There are federal, municipal, and local building requirements, and that means getting a permit. The city requires a formal report from a licensed structural engineer to testify that the roof can hold the solar panels. Keep in mind, I'm talking about two panels that weigh 14lbs (6kg) each. The inspector who goes up on the roof weighs more. If a big bird lands on my roof (we have huge Canadian geese), it weighs more. We get snow in the winter and the snow weighs more!

Unfortunately, the city made it clear that there is no waiver. I had earmarked $1000 for everything, from the panels to the battery, inverter, wires, fuses, mounting brackets, etc. I got quotes from multiple structural engineers -- they all wanted around $500. (There goes my budget!) And that's before paying for the permit. In effect, the project was no longer financially viable.

Fortunately, I found a workaround: awnings. The city says that you don't need a permit for awnings if they (1) are attached to an exterior wall, (2) require no external support, and (3) stick out less than 54 inches. My solar panels are mounted at an angle and act as an awning that sticks out 16 inches. (The mounts are so sturdy that I think they can hold my body weight.) No permit needed.

The awnings turned out to be great! They receive direct sunlight starting an hour after sunrise and it lasts until about 1pm in the summer. (It should get even more in the winter.) They continue generating power from ambient lighting until an hour before sundown. This is as good as having them on the roof!

The Scariest Part

High voltage scares me. (That's probably a healthy fear.) Connecting cables to the powered-off system doesn't bother me. But connecting wires to the big battery is dangerous.

Using rubber-gripped tools, I attached one cable. However, when I tried to connect the other cable, there was a big spark. It's a 100Ah battery, so that's expected. But it still scared the donuts out of me! I stopped all work and ordered some rubber electrical gloves. (Get rubber or nitrile, and make sure they are class 00 or higher.)

Along with the gloves, I ordered a huge on/off switch. This isn't your typical light switch. This monster can handle 24V at 275 amps! (It's good to over-spec.)

I connected the MPPT and inverter to one side of the switch. An 8awg cable that can handle 50 amps connects to the battery's negative pole. (Since the MPPT and inverter are both limited to 20 amps, the 50 amp cable shouldn't be a problem.)

With the gloves on, the switch powered off, and rubber-gripped tools in hand, I connected the switch to the battery. No zap or spark at all. Turning the switch on is easy and there is no spark or pop. This is the right way to do it.

Expected Savings

Without the solar project (just using the utility power), the server rack costs me about $35 per month in electricity to run.

I've been running some tests on the solar project's performance, and am very happy with the results.

Under theoretically ideal conditions, two 100W panels should generate a maximum of 200W. However, between power conversion loss, loss from cabling, and other factors, this theoretical maximum never happens. I was told to be happy if it generated a maximum of 150W. Well, I'm very happy because I've measured daily maximums between 170W and 180W received at the MPPT.

Fort Collins gets over 300 sunny days a year, so clear skies are the norm. With clear skies, the battery starts charging about 30 minutes after sunrise and gets direct (optimal) sunlight between 9am and 1pm. It generates an incredible amount of power -- the inverter drains the battery slower than the panels can charge it. For the rest of the afternoon, it slowly charges up the battery through indirect ambient light. The net result? It can run the upper half of the rack for over 10 hours.

This kind of makes sense:
  • The battery usually starts the day at about 50% capacity. It charges to 90% in under 2 hours of direct daylight.

  • In theory, I run the battery from 90% down to 20%. In practice, the battery usually hits 100% charged during the morning because if charges faster than it drains. It doesn't start running below 100% until the afternoon. (That's wasted power! I need to turn on more computers!)

  • I'm only using the upper rack right now (1.5Ah, 180Wh). The inverter consumes another 18W, so call it 200W of power. Assume a fully-charged 1200Wh battery with 920Wh available, draining at a rate of 200Wh. It should last about 4.5 hours. If I power the entire rack, it will be closer to 2 hours. And in either case, that countdown only starts when it's running off of battery in the late afternoon.
We had one dark and stormy day, and one very overcast day so far. In both instances, it took most of the morning to charge the battery, but it still managed to run the upper rack for a few hours. Fort Collins has "surge pricing" for electricity. In the summer, than means 2pm to 7pm has the most expensive power (about 3x more than non-surge times). Fortunately, the batteries keep the rack running during much of that expensive period.

I'm aiming to use the battery during the surge pricing period. If I ran the numbers correctly, the server might reduce my $35/month cost by $20-$25/month. At that rate, it will pay off the $1000 investment in under 4.5 years. If we have a lot of bad weather, then it might end up being 5 years. The batteries and panels will need to be replaced in 8-10 years, so as long as it pays off before then, I'll be in the profit range.

As self-paced learning goes, I don't recommend high voltage as an introductory project. Having said that, I really feel like I've learned a lot from this experiment. And who knows? Maybe next time I'll try wind power. Fort Collins has lots of windy days!

The Big Bulleted List

10 June 2025 at 12:36
Today's online environment permits the easy manipulation and distribution of digital media, including images, videos, and audio files, which can lead to the spread of misinformation and erode trust. The media authentication problem refers to the challenge of verifying the authenticity and integrity of digital media.

For four years, I have been reviewing the solution offered by the Coalition for Content Provenance and Authenticity (C2PA). In each blog entry, I either focused on different types of exploits, exposed new vulnerabilities, or debunked demonstrations. You would think that, after nearly 30 blogs, that I would run out of new problems with it. And yet...

I'm not the only person evaluating C2PA. I know of three other organizations, four government groups, and a half dozen companies that are doing their own independent evaluations. I learned about these groups the easy way: they start with an online search, looking for issues with C2PA's solution, and Google returns my blog entries in the first few results. They reach out to me, we chat, and then they go on with their own investigations. (With my blog entries, it's not just me voicing an opinion. I include technical explanations and step-by-step demonstrations. It's hard to argue with a working exploit.) In every case, they already had their own concerns and suspicions, but nothing concrete. My working examples and detailed descriptions helped them solidify their concerns.

"I've got a little list."

Near the beginning of this year, a couple of different groups asked me if I had a list of the known issues with C2PA. At the time, I didn't. I have my own private notes and my various blog entries, but no formal list. I ended up going through everything and making a bulleted document of concerns, issues, and vulnerabilities. The initial list was 15 PAGES! That's 15 pages of bulleted items, with each bullet describing a different problem.

I had the paper reviewed by some technical peers. Based on their feedback, I changed some statements, elaborated on some details, added more exploits, and included lots of references. It grew to 22 pages.

I noticed that Google Docs defaulted to really wide margins. (I'm not sure why.) I shrank the margins to something reasonable. The reformatting made it a 20-page paper. Then I added more items that had been overlooked. Today, the paper is 27 pages long, including a one-page introduction, one-page conclusion, and some screenshots. (The bulleted list alone is about 24 pages.)

With blogs, you might not realize how much information is included over time. It wasn't until I saw over 20 pages of bulleted concerns -- strictly from public information in my blogs -- that I realized why I was so passionate about these concerns. It's not just one thing. It's issue after issue after issue!

The current paper is almost entirely a list of items previously mentioned in my blog. There are only a few bullets that are from my "future blog topics" list. (After four years, you'd think I'd run out of new concerns. Yet, I'm still not done. At what point should people realize that C2PA is Adobe's Edsel?)

Last week, C2PA, CAI, and IPTC held their "Content Authenticity Summit". Following this closed-door conference, I was contacted by some folks and asked if they could get a copy of my "C2PA issues" list. I decided that, if it's going to get a wider distribution, then I might as well make my paper public. Having said that, here's a link to the current paper:

C2PA Issues and Evaluation (public release, 2025-06-09)

If you have questions, comments, thoughts, think there is something that should be added, or see any errors, please let me know! I'm very open to updating the document based on feedback.

Organizing Content

There are many ways to organize this data, such as by risk level, functionality, technical vs. usability, etc. I decided to organize the document by root cause:
  • Fundamental problems: Issues that cannot be fixed without a significant redesign.

  • Poor design decisions: Issues that could be fixed if C2PA changed their approach. (However, C2PA has been very clear that they do not want to change these decisions.)

  • Implementation issues: Bugs that could be fixed. (Many haven’t been fixed in years, but they are fixable with little or no undesirable side-effects.)
There are also topics that I am intentionally not including in the paper. These include:
  • Severity. While I don't think there are many minor issues, I don't try to classify the severity. Why? Well, one company might think an issue is "critical" while another may argue that it is "severe" or "medium". Severity is subjective, based on your requirements. I think the paper provides enough detail for readers to make their own judgment call.

  • Simple bugs. The existing code base is far from perfect, but I really wanted to focus only on the bugs due to the specifications or design decisions. Problems due to the specifications or policies impact every implementation and not one specific code base. However, the paper does include some complicated bugs, where addressing them will require a significant effort. (For example, c2patool has over 500 dependency packages, many of which are unvetted, include known vulnerabilities, and have not been patched in years.)

  • Solutions. With few exceptions, I do not recommend possible solutions since I am in no position to fix their problems. There may be an undisclosed reason why C2PA, CAI, or Adobe does not want to implement a particular solution. The paper does include some solution discussions, when every solution I could think of just introduces more problems; sometimes developers work themselves into a corner, where there is no good solution.

  • Alternatives. Except as limited examples, I intentionally avoid discussing alternative technologies. (And yes, there are some alternatives with their own trade-offs.) I want this paper to only be focused on C2PA.

  • Wider ramifications. If C2PA is deployed in its current state, it can lead to incredibly serious problems. It would become the start of something with a massive domino effect. Rather than focusing on theoretical outcomes, the paper is directed toward immediate problems and direct effects.
Simply seeing a random list of problems can be overwhelming. I hope this type of organization makes it easier to absorb the information.

The Most Important Issue

While the first question I usually receive is "Do you have a list?", the second question is almost always "What is the most important issue?" Looking at the bulleted list, you might think it would be the misuse of X.509 for certificate management. I mean, lots of problems in the paper fall under that general topic. However, I think the bigger issue is the lack of validation. C2PA is completely based on 'trust':
  • You trust that the name in the certificate represents the signer.

  • You trust that the information in the file (visual content, metadata, timestamps, etc.) is legitimate and not misrepresented by the signer.

  • You trust that the signer didn't alter any information related to the dependencies.

  • You trust that the timestamps are accurate.
And the list goes on. C2PA provides authentication without validation and assumes that the signer is not intentionally malicious. However, if I trust the source of the media, then I don't need C2PA to tell me that I can trust it. And if I don't trust the signer, then nothing in C2PA helps increase the trustworthiness.

Alternatives

Following some of my initial critiques on my blog, representatives from C2PA and CAI (all Adobe employees) asked me for a video chat. (This was on December 13, 2023.) During the call, C2PA's chief architect became frustrated with my criticisms. He asked me if I could do better. A few weeks later, I created VIDA, which was later renamed to SEAL: the Secure Evidence Attribution Label. SEAL is still being actively developed, with some great additions coming soon. The additions include support for derivation references (simple provenance), support for offline cryptographic validation, and maybe even support for folks who don't have their own domains.

SEAL is a much simpler solution compared to C2PA. While C2PA tries to do a lot, it fails to do any of it properly. In contrast, SEAL focuses on one thing, and it does it incredibly well.

Just as I've been critical of C2PA, I've been looking at SEAL with the same critical view. (I want to know the problems!) I've had my technical peers also review SEAL. I instructed them to be brutal and hyper-critical. The result? Two pages of bullets (it's a 3 page PDF with an introduction). Moreover, almost every bullet is from the general problem of relying on DNS and networked time servers. (Everyone with a domain or time-based signing service has these problems; it's not just SEAL. If/when someone solves these problems, they will be solved for everyone, including SEAL.)

While I don't think the C2PA paper has many "minor" issues, I think all of SEAL's issues appear to be minor. For example, problems like domain squatting apply to anyone with a domain. It doesn't directly impact your own domain name, but could fool users who don't look too closely.

Here's the current draft of the SEAL Issues (2025-06-09 draft). Again, if you see anything wrong, missing, or just have questions or concerns, please let me know!

Independent Reviews

Both of these documents represent my own research and findings, with a few contributions from peers and associates. Other groups are doing their own research. I've shared earlier drafts of these lists with some of those other groups; most seem to use these lists as starting points for their own research. Having said that, I hope that these documents will help raise awareness of the risks associated with adopting new technologies without proper vetting.

I can't help but hum "I'm Got A Little List" from The Mikado whenever I work on these bullet points.

My New Old Keyboard

5 June 2025 at 09:46
It should come as no surprise that I type a lot. And I really mean a lot. Almost every keyboard I own has worn off the letters on the keys. Then again, I type so much that I'm really a touch-typist; I rarely look at the keyboard. (Even before I took a class on typing on a typewriter in elementary school, I was already typing as fast I as I could talk. Do schools still teach typing?)

There are lots of different types of keyboards out there: high profile, low profile, feather sensitivity or heavy hitters, curved keys, uniform height, etc. Personally, I like the loud, heavy keyboards with high profiles (keys stand up) and no gap between the keys. This way, I can feel the keys without looking down and tell the difference between brushing a finger over a key and actually typing a letter. (If you can't hear my typing during a video call, then the keyboard isn't loud enough.)

Most of my keyboards have had black or gray keys. A few years ago (2022), I bought a "large print keyboard" for the fun of it. It had big keys, a high profile, and a loud click. The selling point for me were the huge letters and the bright yellow color.



Unfortunately, it didn't last very long. Within a few months, the black paint on the letters began to vanish. 'A', 'S', and 'D' were the first to go, followed by 'X', 'C', and 'V' (for cut, copy, and paste). Fast forward to today (3 years later):



The shift-key on the right doesn't have a black scratch on it. That's where I've literally worn through the yellow plastic. It's not that I don't use the letters Q, P, H, or U; they just seem to have lasted longer. (I joked with my colleagues that the backspace and delete keys are in pristine conditions -- because I don't make mistakes.)

The New Problems

When a keyboard gets worn down this much, I typically go out and buy a new cheap keyboard. Given that I wear through keyboards every few years, I have trouble justifying $100 for a fancy replacement. Give me a $10 cheap-plastic keyboard every few years and I'll be happy. (Seriously, I splurged $23 on the yellow keyboard. It lasted 3 years, so that's less than $8 a year. Before the yellow keyboard, I had a cheap $12 one that also lasted 3 years, so it cost $4 per year to use.)

Over the last 40+ years, I've seen the quality degrade as vendors cut costs by using cheaper materials. The old heavy IBM PC keyboards were built like tanks -- they never broke down, even if the letters might fade a little. The PS/2 keyboards (circa 1987-1997) had more plastic and occasionally the key switches would degrade before the print on the keys wore off. (I have one old PS/2 keyboard that types "jn" every time you press "n". Beneath each key is a switch. This problem might be a dirty contact, but I don't think I can open the keyboard up without breaking it.) Today's USB keyboards are extremely lightweight but also cheaply constructed; letters fade fast and the plastic on the keys might wear out. Today's keyboards are not built to last.

Making matters worse, most keyboards are made overseas. Between the (insane) tariffs and shipping delays, I don't want to wait. And with the current economic instability, I'd rather not spend the money, even on a new cheap keyboard, if I absolutely don't have to.

What's Old is New

Fortunately, I have a huge box of old keyboards in the storage area. It includes everything from modern USB to old PS/2 and the super old 5-pin DIN connectors. (I think the oldest keyboard in the box is from the early 1980s.) Some computer manufactures would bundle a keyboard with every new computer. Other times I'd pick up a keyboard in a box of auction junk. (Often, I'd want something else at the auction, but the box being sold also contained keyboards.) Any keyboard I don't like, don't need, don't use, or is broken for some reason gets put in the big box of keyboards.

Today I went digging through the box, looking for something with the right profile and feel.
  • The first good one was a 105 keys with a PS/2 connector. (Most US keyboards have 101 keys.) My computer doesn't have a PS/2 port, but in the "big box of old keyboards" was an old PS2-to-USB adapter! That's the nice thing about keyboards -- they all use the same communication protocol. As long as you have the right adapter to plug it in, the computer will recognize it and it will just work.

    This new old keyboard was manufactured in 1992 by a company that no longer exists. (I looked them up. Today, there's a company with the same name, but they were founded in 2001.) And yet, the keyboard still works fine. Well, sort of. All of the standard "101" keys still work fine, but the custom "power", "sleep", "wake", and "Fn" buttons don't register when I press them. (Maybe I need to tweak the keyboard mapping? Probably not worth the effort.) Since it's not perfect, I went back to the box of keyboards.

  • The next keyboard had a bunch of sticky keys that push down but pop up slowly. (From an auction, someone probably spilled a drink on the keyboard a few decades ago.)

  • The original "Sun" keyboard looks like a PS/2 but doesn't work; it's probably not really communicating with PS/2. (When possible, stay away from proprietary connectors.)

  • I found one of my old keyboards that I used with my OS/2 workstation. After plugging it in, I remembered why I replaced it: the space bar was broken. Many space bars have a metal wire that ensures that the key goes down evenly. The wire fits into some plastic clips underneath. After years of use, those clips had broken off.
I finally settled on an old HP keyboard that was buried in the box. It's a C1405B #ABA, manufactured in 1992, back when HP actually made keyboards. (OMG, sellers on Etsy and eBay call it "Vintage"!) It's a heavy monster and yellowed from age, but no wear on the letters. It has a good feel and every key seems to work.

There's just one problem. It predates the appearance of the "Super" key ("Windows" or "Command" key on keyboards, next to the shift buttons). On my desk are two computers that share the same keyboard and mouse: a Linux box and a Mac. I use some software called 'Synergy' to link the desktops. As the mouse goes off the side of one monitor, it appears on the next computer's screen. Linux doesn't use the Windows/Command key, but Macs do. This missing key is going to be a problem... Fortunately, Synergy permits me to remap 'alt' to the Mac 'Command' key. (Problem solved.)

Macros

On my desktop computer, I have a few macros mapped to certain keys. For example:
  • I almost never use the function keys. I've mapped "F9" to toggle my mouse size. If I press it, then the cursor becomes larger -- which is great for video chats and sharing my screen -- the big icons help people to see my mouse. If I press F9 again, then the mouse returns to the normal small size.

  • I've remapped the "Pause/Break" button. (In 40+ years, I've never used that button to pause/break anything.) Instead, it turns on/off the audio recorder on my telephone. With the push of a button, I can record any call to an MP3. (I use it to record spam phone calls; I wrote about the script back in 2014.) If the phone rings from an unknown caller, I press the button to record and then answer the phone. (And yes, recording calls is legal in Colorado.)

  • The lower-right corner of most 101-key keyboards has a "Menu" button. I've remapped that to mute/unmute my speakers. (Sometimes I can't immediately find the app that is making sounds and I just want the computer to shut up while I take a call. Tap one key for mute.) However, this HP keyboard predates the "Windows" and "Menu" buttons, so I'll need to remap the mute/unmute to a different key. (Maybe F8; I never use that key!)
Unsurprisingly, the macros work with this new old keyboard. While the manufacturing quality has evolved over time, the keyboard communication protocol and key codes hasn't changed.

Old Learning Curve

I think the biggest hurdle for this new old keyboard will be my own adjusting to the physical key spacing. Cheap keyboards and older keyboards often use different key sizes. With this keyboard, the spacing is a little wider than the yellow keyboard. It also has a different sensitivity. (Not bad, just different.) Then again, if I decide I don't like it, then I can always go back to digging through my big box of old keyboards.

Eleven Years of FotoForensics

9 February 2023 at 17:03
Today, FotoForensics turns 11 years old! When I first introduced FotoForensics, I didn't know if it would be used by anyone or even if the implementation would have problems with the load. (As I originally wrote, "I wonder how it will scale?") Today, it has received over 6,050,000 unique pictures (with over 800,000 in the last year) and it's in the top 80,000 of internet destinations (the exact position changes every few minutes, but it's around 80,000 right now). As far as scaling is concerned, it seems to be holding up well.

Science!

Even though the site is popular, there are always some people who wonder if it is "scientific" or if it really works. A quick search on Google Scholar turns up lots of scientific journal articles that discuss FotoForensics and Error Level Analysis. They all conclude that it does, in fact, work as advertised. Google Scholar returns over 400 results. Here is a random selection of examples:
  • K., P. B. M., Singh, K., Pandey, S. S., & O'Kennedy, R. (2019). Identification of the forged images using image forensic tools. In Communication and computing systems: Proceedings of the 2nd International Conference on Communication and Computing Systems (ICCCS 2018), December 1-2, 2018, Gurgaon, India. essay, CRC Press.
    Abstract
    The contents of the digital images can be easily manipulated with image editing software like Adobe Photoshop, Pixelmator, Inkscape, Fireworks, etc. In real life applications, it is indispensable to check the authenticity of the digital images because forged images could deliver misleading information and messages to our community. Different tools have been developed to detect the forged images. In literature, there is no study which presents an insight into image forensic tools and their evaluation on the basis of different criteria. Therefore, to address this issue, we present an insight into digital image forensic tools; and evaluate it on the basis of 15 different parameters like β€œerror level analysis”, β€œmetadata analysis”, β€œJPEG luminance and chrominance data”, etc. For our experimental work, we choose β€œFotoForensics” tool to show the forged region in digital images; and JPEGsnoop tool has been used to extract the metadata of the images.

  • Kageyama, K., Kumaki, T., Ogura, T., & Fujino, T. (2015). Digital image forensics using morphological pattern spectrum. Journal of Signal Processing, 19(4), 159–162. https://www.jstage.jst.go.jp/article/jsp/19/4/19_159/_article/-char/ja/

  • Scheidt, N., Adda, M., Chateau, L., & Kutlu, Y. E. (2021). Forensic tools for IOT device investigations in regards to human trafficking. 2021 IEEE International Conference on Smart Internet of Things (SmartIoT). https://doi.org/10.1109/smartiot52359.2021.00010

  • Almalki, S., Almalki, H., & Almansour, A. (2018, November). Detecting Deceptive Images in Online Content. In 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (pp. 380-386). IEEE. https://ieeexplore.ieee.org/abstract/document/8706216
This is nowhere near the complete list. I'm seeing dozens of journal articles every year. Some evaluate FotoForensics, some use it to support conclusions, and others treat it as a baseline for evaluating new techniques. Moreover, those are just the articles that talk about "FotoForensics". The number of journal articles is even higher if I search for "Error Level Analysis".

Legal Use

"Forensics" means "for use in a court of law." When it comes to understanding forensic tools, the courts use a few criteria to determine if the tool or expert witness is qualified. In the United States, the criteria varies by state, but it's usually either the Daubert standard (from Daubert v. Merrell Dow Pharmaceuticals Inc., 509 U.S. 579 (1993)) or the Frye standard (from Frye v. United States, 293 F. 1013 (D.C. Cir. 1923)). In either case, there are five criteria for determining if evidence and expert testimony should be considered or accepted by the court. I think FotoForensics addresses each of them to the extreme:
  1. Has the theory or technique in question been tested?
    In the case of FotoForensics, every algorithm and technique has been tested. Both by myself and by other experts in the field. The public FotoForensics service has a commercial counterpart. Every single one of the commercial customers has relied on their own independent tests before regularly using the tools.

  2. Has it has been subjected to peer review and publication?
    This is a definite yes. It has both formally and informally been repeatedly subjected to peer review. While the original algorithms were published in a conference white paper, subsequent publications include this blog, the training material at FotoForensics, and the more than 400 third-party book and journal publications. (It's not just me writing about it.)

  3. Does it have known or potential error rates?
    The question of "error rate" has always been difficult to answer. Confidence intervals are part of a statistical hypothesis. The cryptographic hashes from the Digest analyzer are good examples here. We can compute the SHA1 hash of two pictures and determine the likelihood of a mismatch. With cryptographic hashes, different hash values means that there were different input data sets. The likelihood of a false-negative match, where two byte-per-byte identical files are marked as being different, is zero (0); it doesn't happen. However, two different files could generate the same SHA1 hash value. The computed odds are about 1 in 2160 (a huge number). It drops to 280 if we incorporate the Birthday Paradox.

    (Not all cryptographic hashes are the same. MD5 is considered 'weak'. A collision can be forced in around 218 tries, or about 1 in 262,144.)

    In contrast, ELA, Hidden Pixels, Metadata, and the other data extraction systems do not use not a statistical hypothesis. These tools work like a microscope. What are the false-positive and false-negative rates for a microscope? It's a trick question; a microscope does not have them. As with other non-statistical systems, a microscope only identifies artifacts. The tests are deterministic and repeatable. It is up to a human to identify possible scenarios that are consistent with the observations. The documentation at FotoForensics identifies the various caveats and issues, but the tools never draw a conclusion. It's up to the human expert to evaluate the results and draw a conclusion.

    Since the various caveats and corner-case conditions are identified, it meets this requirement.

  4. Are there existing and maintained standards controlling its operation?
    Yes. Most of the algorithms are documented and fixed (have not changed in a decade). If there is an implementation error, then we perform updates (maintenance). And some of the dependent applications, like ExifTool for metadata extraction, are regularly updated for detecting more information. This meets the criteria.

  5. Has it attracted widespread acceptance within a relevant scientific community?
    Absolutely yes. Both the public and commercial versions are regularly used across a wide range of communities: mass media, financial, legal, insurance, UFO photo analysis (don't laugh), sales verification (receipt fraud is a big problem), contest validation, and more.
The courts also like to see historical precedence. The tools used at FotoForensics have been repeatedly used in legal cases. Everything from child custody battles and human trafficking to insurance and banking fraud. (I'm not the only expert using these tools.)

One Oversight

In my original (February 2012) announcement, I voiced some concerns about making tools publicly available. I was primarily concerned about possible misuse and the risks from educating criminals.

As for the tool being misused: I addressed this by releasing tutorials and challenges. Based on my web logs, these are some of the most popular documents I've ever written. Shortly after FotoForensics went live, a few trollish people posted bogus analysis on a wide range of topics to social media sites (Reddit, Twitter, Facebook, etc.). Each claimed that FotoForensics supported their arguments. I knew I did something right when those bogus claims would immediately be corrected by people who saw and understood the tutorials. (I don't have to police the internet; the community is doing that all by themselves.)

With regards to criminal behavior, I went so far as to write:
From an ethical viewpoint, I don't think this site violates concerns about educating criminals since (1) I don't distribute code, (2) bad guys generally don't like to submit their content to remote servers for evaluation, and (3) with the tutorial, people have the option to learn how to use the tool and are not left with a push-button solution.
Boy, was I wrong. Bad guys do like to submit their content to remote systems for evaluation! The public FotoForensics service regularly sees people developing new fraud techniques. Because new techniques stand out, I can often identify their tools and methods before they have a chance to deploy (weaponize) it for widespread fraud. Often, I can develop automated detectors before they distribute their forgery software. Over the years, I've written about everything from fraudulent scientific publications and government-sponsored techniques to widespread passport forgeries and commercially-sponsored fraud from Bayer and AvtoVaz.

FotoForensics is hardly a deploy-once-and-done service. I'm constantly learning new things and regularly improving it. I'm very thankful to my friends, partners, various collaborators, and the public for over a decade of helpful feedback, assistance, and insights. This year, I especially want to thank my mental support group (including Bill, Bob, and Dave), my totally technical support group (Marc, Jim, Richard, Wendy, Troy, and everyone else), Joe, Joe, Joe, AXT, the Masters and their wandering slaves, Evil Neal, Loris, and The Boss. Their advice, support, assistance, and feedback has been invaluable. And most importantly, I want to thank the literally millions of people who have used FotoForensics and helped make it what it is today.

An Itty Midi Mystery

27 January 2023 at 15:46
My online metadata viewer service, Hintfo, has been watching for any file formats that have little or no metadata support. It recently sent me an alert, informing me that the MIDI file format fits this problem space.

The Musical Instrument Digital Interface (MIDI) file format is really old, like from 1983. (Editor's note: Way to reach out to GenZ!) To put this into perspective, GIF came out four years later (1987), the first consumer digital camera appeared in 1990, JPEG was standardized in 1992, and PNG followed in 1996. This means that MIDI doesn't have metadata in any of the standardized formats (EXIF, XMP, IPTC, etc.) because it predates all of them.

There are plenty of really old and obsolete formats. For example, I doubt that anyone still regularly use Zsoft's "PCX" for images or WordPerfect's WPD documents. However, while MIDI is old, it isn't obsolete. If you're a musician, then you are almost certainly familiar with MIDI files and electronic instruments with built-in MIDI support. I've also seen MIDI files used for ring tones and karaoke music.

What does a MIDI file contain? It stores music. But while MP3 and WAV encode sound waves (combined frequencies from each audible element), MIDI stores the actual music notes. The internal data structure has tracks, instruments, and the start and end times of each note.


(Screenshot of GarageBand)

When trying to conceptualize the format, think of MIDI like a digital format for a player piano, Guitar Hero, or a hand-crank music box.


(This is the "Happy Birthday" music box by Kikkerland.)

To play sound, there are a fixed set of music notes for pressing or plucking. As the music template plays, it identifies which note to play at a specific time. MIDI stores the same thing, but can be extended for a large ensemble. As the file is processed, it identifies which notes on which instruments are played at which time.

Inside a MIDI File

As far as metadata goes, this file format doesn't store very much:
  • It can store multiple tracks, with up to 15 simultaneous channels per track. Each track may contain multiple notes and even instrument changes.

  • It can store a few text fields, like "Text" (arbitrary text, usually contains the song's title), "Copyright", "Track Name", "Instrument Name", and "Lyric". (While harvesting MIDI files for testing, I've only seen files that use the Text, Track Name, and Instrument Name fields.)

  • It has a list of hard-coded values that denote different instruments. For example, "0" is an Acoustic Grand Piano, an Electric Grand Piano is "2", and a Glockenspiel is "9". There are 127 defined instruments that range from string to woodwind to percussion.
As an old file format, it's not very extendable. Uncommon or newer instruments are not listed as options. For example, I grew up in a household with an upright piano. However, "upright piano" isn't one of the 127 known instruments. (I guess I could choose a "Bright Acoustic Piano", but it was really more of a "Warm Acoustic Piano", which also isn't any of the known MIDI instruments.)

A MIDI-enabled instrument can be used to record music as you play, or play back music automatically. Even if the instrument is not known by name, you can still hook it up to a MIDI player and you can assign any identification number to any instrument. This is how the Floppotron works. There isn't a MIDI instrument for "document scanner" or "floppy drive", so he just assigned a number to each instrument. However, the instrument's hard-coded name won't match the real name. This means that a software MIDI player (for previewing on your computer without the instrument) will generate the wrong sound for your new instrument.

The MIDI file format is designed for playing specific notes at specific times. For example, it can be told to play a Grand Piano's middle C at 2.12 seconds into the song and hold it for 0.2 seconds. However, the MIDI file format can't easily represent instruments that don't have fixed note positions, like a theremin.

A Small Mystery

I didn't expect MIDI to have very much metadata. In fact, just seeing that it has text fields for copyright and lyrics came as a surprise to me.

However, the really unexpected thing is the lack of metadata support from other applications. There are almost no metadata viewers for MIDI files, and none appear to be consistent.
  • ExifTool is typically the go-to program for metadata extraction. It knows how to parse hundreds of different file formats, including many that are old and obsolete. However, ExifTool doesn't recognize MIDI.
    ---- ExifTool ----
    ExifTool Version Number : 12.45
    Error : Unknown file type
    ---- File ----
    File Name : miditest.mid
    Directory : .
    File Size : 24 kB
    File Modification Date/Time : 2023:01:26 13:13:52-07:00
    File Access Date/Time : 2023:01:26 13:13:52-07:00
    File Inode Change Date/Time : 2023:01:26 13:13:52-07:00
    File Permissions : -rw-rw-r--

  • MediaInfo is good for most media files (pictures, audio, and video). It identifies my test file as being a MIDI file, but that's it. Zero additional information.
    General
    Complete name : miditest.mid
    Format : MIDI
    Format/Info : RIFF Musical Instrument Digital Interface
    File size : 23.4 KiB

    Audio
    Format : MIDI
    Format/Info : RIFF Musical Instrument Digital Interface

  • FFmpeg supports tons of audio and video codecs. While it can display a lot of information about the formats it supports, FFmpeg doesn't recognize MIDI.
In the old days, every personal computer (OS/2, Windows 3.x, MacOS) natively supported MIDI. Old browsers also supported MIDI. (Some browsers supported MIDI through the old QuickTime media plugin.) These days, that isn't the case. On my Linux box, I had to install 'timidity' to just play MIDI files. Firefox, Chrome, Safari, and Edge do not support the format. Even Apple's current QuickTime player doesn't recognize MIDI. (But Apple's GarageBand does!)

While looking for tools that can display MIDI metadata, I found a bunch of MIDI applications for composing or editing. Others, like musescore3, can convert a MIDI file to sheet music! (This is page 1 of 29 for a short jazz song called "Mean Woman"):

Update: My friend, Bob, pointed out that this music looks really complicated because it's a transcription based on a human playing music. Humans are inaccurate; they might not hit a quarter note at exactly the quarter time and they might not hold it for exactly one quarter beat. The MIDI data recorded exactly what the human played, and the complex transcription tries to match the human's timing.

The closest thing I could find for actually displaying the metadata was a program called "lilymidi", which is part of the lilypond suite. While it lists text fields, track information, notes, and timings (using the "--pretty" parameter), it doesn't list instrument names. If you want instrument names, then you need to use musescore3 with XML output, but that omits the Track names and other text fields.

I did find a few Python scripts that claim to parse MIDI files, but they were really hit-or-miss on my test suite. They might display everything for one file, but skip a lot of data for other files. Just to understand the file format, I ended up spending a day building my own MIDI parser. (This is how I know that lilymidi, musescore3, and many of the Python scripts were missing information.)

A Hint about Information

My online metadata service, Hintfo, lets people upload all kinds of files in order to see the associated metadata. Since going live, Hintfo has mostly received JPEG, WebP, PNG, and PDF files. Less common (but still supported) are video, audio, and executable file formats. Hintfo applies different metadata analyzers depending on the type of file. Why? Because 'ExifTool' isn't always the best tool for evaluating metadata.

Hintfo only keeps the type of file (mime type) that is uploaded (or samples of files that cause problems, like crashes). The reason I'm watching for the file type is that I want to identify file formats that have weak metadata support or that appear to be hostile. So far, I have only seen a few "application/octet-stream" files, indicating an unknown file format. The logs suggest that these were corrupted files.

Recently, I had a few people upload some MIDI files, but they didn't see much due to lack of metadata support. I've since updated Hintfo to use lilymidi for metadata extraction. I strip out all of the notes and timing information since that's really overkill for a metadata viewer. I had wanted to convert the MIDI data to sheet music, but musescore3 uses QT for rendering, and that means having a desktop and display. (My production server is 'headless' and lacks any kind of display. Also, spawning xvfb with musescore3 takes too much time for a real-time web service.) I'm still looking for a better MIDI metadata extractor, but this is a really good start.

Six Million Pictures

19 January 2023 at 15:36
Last Saturday we hit a milestone at FotoForensics: 6 million unique pictures! I was really hoping that this achievement wouldn't be marred by porn so I could do a deep dive into it. (SPOILER ALERT: Not porn! Woo hoo!)

Here's the picture! It arrived on 2023-01-14 at 11:50:55 GMT:


I'm not big on following sports, celebrities, or pop culture, so I approached this picture with zero knowledge. The picture shows two women and a guy at some kind of club or restaurant. However, I don't know the people or the situation. This sounds like a great opportunity to do some image sleuthing. (Click on the picture to view it at FotoForensics.)

Side note: I'm writing this as a streaming flow of consciousness. I didn't gather the pictures or complete this investigation before I started writing.

Where to start? Metadata!

When evaluating a picture, it's always good to check the metadata. A camera-original picture will often include date, time, camera settings, and other information that can help track down the source. For example, an embedded time zone or region-specific device can provide a good guess about where the photo was taken. Similarly, many photo editors leave details in the metadata.

On the downside, many applications re-encode the image and strip out the source metadata. If the metadata was stripped, then there may be no camera or location information.

Unfortunately with this picture, there is no informative metadata. At minimum, this means that the picture has been resaved from some other photo.

The only interesting thing in the metadata is the ICC Profile. This specific profile is from Google and indicates that the picture was processed by an app -- either through an Android application or a Google service.

Hidden Pixels and Quality

JPEG encodes pixels using an 8x8 grid. If the image doesn't align with the grid, then there are hidden pixel along the right and bottom edges that pad out the image. This image size is 940x788 -- neither dimension is divisible by 8, so there are 4x4 hidden pixels. (940+4 = 944, which is divisible by 8. Similarly, 788+4 = 792 which is also dibisible by 8.) The encoded image is 944x792 pixels, but automatically cropped to 940x788 before being displayed.

Different applications use different approaches for filling the JPEG padding. Adobe uses a mirrored pattern than often produces a butterfly-wing shape on high-contrast curves. In contrast, libjpeg just repeats the last pixel value, creating a stretched effect. However, a lossless crop often leaves the original uncropped pixels. With this picture, there is a stretched pattern used for the padding. That's consistent with libjpeg and not an Adobe product.


Similarly, different applications use different encoding tables. The 'JPEG %' analyzer shows that this image was encoded as a JPEG at 92% using the JPEG Standard.

While this doesn't tell us who these people are, the results from the metadata, hidden pixels, and JPEG % are consistent: this was re-encoded using a standard JPEG library. (Google uses standard libraries.) This was not last saved using an Adobe product.

The final quality test is the error level analysis (ELA). ELA evaluates the compression quality. Bright colors indicates the areas that will change more during a JPEG re-encoding. You should compare similar surfaces, similar textures, and similar edges. Any inconsistencies, such as a flat surface that is at a different intensity from other flat surfaces, denotes an alteration.


With this picture, there are a couple of things that stand out:
  • All of the flat, smooth surfaces are equally dark. The dark clothing, dark ceiling, and even the smooth skin. (No comment about any potential plastic surgery to remove wrinkles.) An image that is this dark -- and yet last encoded at a high quality like 92% -- means that it has been re-encoded multiple times.

  • The areas with fine details (high frequencies), such as the lace, hair, and jewerly, are very high quality. This could be due to someone dramatically scaling the picture smaller, but it also could be due to selectively editing. Someone likely touched up the faces and hair. In addition, Adobe products can boost high frequency regions. While this was not last processed by an Adobe product, the second-to-last processing could have been with an Adobe product.
If we can find the original picture, then I'd expect the people to not be as brightly lit or crisp; they appear to be selectively touched up. I also would expect to find an Adobe application, like Photoshop or Lightroom.

External Sources

Back in 2016, I wrote about different search-by-picture systems. FotoForensics includes quick links for sending the pictures to TinEye, Google Image Search, and Bing Image Search. These might find different web sites that host similar pictures. If they find any, then it can provide context.

Google's image search has undergone many changes. Prior to 2015, it was really good at finding variations of the same picture. Then they changed it to a system that uses AI to identify the content and shows you similar content. (In my 2016 example, I used a photo of Brad Pitt. Google's AI identified 'Brad Pitt' as the key term and returned lots of different photos of Brad Pitt, but none of the same photo.) Last year, Google replaced their system with Google Lens. According to Google Lens, this photo visually matches "Boys Tails Tuxedo with Cummerbund" from Walmart. (It's not even the same tux! And he doesn't have a cummerbund!)


At the top of the image in Google Lens is a button that says "Find image source". This does the type of "find similar picture" search that I want. Google associated the picture with the name "Lisa Marie Presley" and found news articles that included variations of the picture. For example, People Magazine has an article from last week titled, "Lisa Marie Presley, Daughter of Elvis and Priscilla, Dead at 54: 'The Most Strong and Loving Woman'". (Oddly, People put this in the "Entertainment" category. Do they think people's deaths are entertainment?) People's article included this picture:


The metadata includes a caption: "Priscilla Presley, Austin Butler and Lisa Marie Presley at the Golden Globes on Jan. 10, 2023. Shutterstock for HFPA". Now we know the who, where, and when. We can also see that this picture is vertically taller and contains more content. However, the image's URL shows that it was also post-processed by People's web site:
https://people.com/thmb/K08A8Ur6jWci4DJwdFzNT-vlzxg=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc():focal(924x19:926x21):format(webp)/Lisa-Marie-Presley-Hospitalized-after-Cardiac-Arrest-011223-5512728ae3084977bd9eb9e0001c3411.jpg

In order to serve this picture, their web server:
  • Stripped out any ICC Profile information. (The "strip_icc()" parameter.)

  • Selected a focal point. ("focal(924x19:926x21)")

  • Converted the file format to webp (dropping all JPEG metadata; "format(webp)").

  • Used variable compression to ensure the file size is no longer than 150,000 bytes ("max_bytes(150000)"). The resulting webp is 136,214 bytes.
However, these alterations imply that there is another source image out there somewhere that isn't altered.

Bing Image Search worked similarly to Google Lens. However, instead of identifying clothing, it identified the people. Oddly, when I first ran this test last night, it only identified Austin Butler and Priscilla Presley. Today (as I proofread my writing), it also identifies Lisa Marie Presley.

TinEye was more interesting. It didn't just find the picture, it found an expanded version of the picture at The Daily Mail! If you scroll past all of the disturbing paparazzi photos, you'll eventually find this image:


The picture is annotated with credits at the bottom and scaled very small; there's no original metadata. The only informative metadata says "Copyright Shutterstock 2023;121266844;5372;4000;1673420322096;Wed, 11 Jan 2023 06:58:42 GMT;0". However, this version is wider, showing another man in the photo! Who's he? The movie Elvis won an award at the Golden Globes. Priscilla and Lisa Marie are the real Elvis's wife/widow and daughter. The tall man in the middle is Austin Butler, who won Best Actor for his role as Elvis in the movie. The man who was cropped out is the movie's director, Mark Anthony "Baz" Luhrmann, who didn't win his nomination. (They cropped him out! Oh, the burn!)

You might also notice that the faces and hair are not as bright as the 6 millionth image. This version of the picture is darker. (The photo was likely taken in a room with bad lighting.)

Bigger Version?

I found another version of the picture at US Magazine.


This is a large image distributed by Shutterstock. It's not original, but it's much closer than my starting point.
  • The metadata says it was processed by Adobe Photoshop 2022 on a Mac.

  • There are still hidden pixels (not the original dimensions) and they show padding that is consistent with Adobe's butterfly pattern.

  • The JPEG quantization tables (JPEG %) are consistent with Adobe Save-for-Web quality 100 (equivalent to 99%).

  • ELA shows that the faces and hair of Austin, Lisa Marie, and Baz were selectively touched up. Priscilla's eyes appear touched up, but not her face.
Interestingly, even though this picture was touched up, the faces are visually darker and not as digitally sharpened compared to the previous versions. This picture shows edits, while the previous versions are edits on top of edits.

The Shutterstock ID "13707319l" finds the source picture's sale page: https://www.shutterstock.com/editorial/image-editorial/13707319l. (They list it as "Editorial" and not "Entertainment".) According to them, the largest size should be 5372x4000 pixels.

Much Closer!

I ended up finding the 5372x4000 picture at Closure Weekly. The URL is https://www.closerweekly.com/wp-content/uploads/2023/01/Lisa-Marie-Presley-Then-and-Now-Elvis-Daughter-Over-the-Years-.jpg. However, depending on your web browser, their web server may return a JPEG or WebP file. My Firefox web browser could only download the WebP version, but FotoForensics was able to retrieve the JPEG. The WebP lacks any informative metadata, but the JPEG has everything that was provided by Shutterstock.


The metadata still doesn't identify the type of camera. The annotated metadata was added using ExifTool 10.80. (ExifTool 10.80 is a production release that came out on 2018-02-22. Shutterstock really should update to get the latest patches.) The embedded information still identifies the people, but also includes the location!
Mandatory Credit: Photo by Shutterstock for HFPA (13707319l)..Priscilla Presley, Austin Butler, Lisa Marie Presley and Baz Luhrmann..80th Annual Golden Globe Awards, Inside, Beverly Hilton, Los Angeles, USA - 10 Jan 2023

(I find it interesting that none of the other photos include this "mandatory" credit.)

The ELA is also interesting -- it's almost entirely dark. That indicates a resave but no significant alterations.


With this version, there is no indication of selective editing to the faces. Visually, the faces are even darker (bad lighting) than the previous version. If you look at the full size picture, you can see that everyone has acne, freckles, pores, and other human features that were removed by the selective edits.

Now we know the history of this 6 millionth image:
  1. The Golden Globe Awards were held on 10 Jan 2023 at the Beverly Hilton in Los Angeles. (Technically, it's in a suburb called Beverly Hills). Priscilla Presley, Austin Butler, Lisa Marie Presley, and Baz Luhrmann posed for a photo at around 8:24pm (local time, according to the metadata). That same evening, Austin Butler won Best Actor for his role in the movie Elvis.

  2. A photo was taken of the ensemble. The metadata does not identify the photographer or the type of camera.

  3. The photo was sent to Shutterstock, where it was re-encoded (resaved) and metadata was added using a five-year-old version of ExifTool.

  4. The Shutterstock image went to a media outlet (like US Magazine), where the faces were selectively touched up using an Adobe application.

  5. The touched up version was then cropped on the right to remove Baz (maybe because he didn't win). Their faces were further brightened up and digitally smoothed out.

  6. The cropped version was further cropped (bottom, left, right, and top) with some kind of Google application. The cropping focused the content on the three people.

  7. Then the picture was uploaded to FotoForensics.
I'm certain that the source image used by the media came from Shutterstock (or related company owned by Shutterstock). However, I don't know if the picture went from Shutterstock to Closure Weekly to US Magazine to The Daily Mail to People to somewhere else before ending up at FotoForensics, or whether it took some alternate path. In addition, different media outlets may have applied similar brightness and sharpening edits; these may be branches of variations and not a linear chain of edits. However, given the similarities in cropping, nested edits, and handling artifacts, I don't think the final version took a much shorter path.

The picture originally had limited circulation since it was only associated with the Golden Globes. However, two days later, Lisa Marie Presley was hospitalized and then died. This picture received a resurgence in reporting and viral dissemination because it was taken shortly before her death.

When I first started FotoForensics (back in 2012), I was thrilled to see it receiving a few hundred pictures per day. These days, it receives over a thousand a day (and some days with over 10,000). Excluding two network outages, the last time it received fewer than 1000 pictures in a single day was 2016-12-31. (Dec 31 is always a very slow day, weekends are usually slower, and Dec 31 on a weekend? Only 818 uploads.) Still, six million pictures is quite a milestone. And every one of those pictures has some kind of story behind it.

No Apps!

12 January 2023 at 18:12
I recently received requests from two different people who wanted help with their FotoForensics apps. It seems that their apps stopped working.

I do not provide a "FotoForensics app". Over the last decade, I have identified over a half-dozen knock-off apps that claimed to be "FotoForensics". These fake apps fall into 3 basic categories:
  • Malware. A few of the knock-offs install viruses on the user's device. They just reuse a good product's name in order to lure the victim into installing the software. If the application is not from the official vendor, then assume it is malicious.

  • Ads. Some of these knock-offs just wrapped my web service in their own application. This way, they can show ads and collect revenue from views and clicks. (I never see a penny of it, but my servers do all of the work.) My sites do not have ads. If you ever see an ad when viewing FotoForensics, this blog, RootAbout, Hintfo, or any of my other services, then your device is likely infected with adware or some other kind of unwanted application. I don't use ads.

  • Theft. One knock-off just wanted to use my service's name for their app. Basically, he wanted to hijack the name recognition. (Apple gave them 24 hours to change their apps name or be kicked out of Apple's app store.)
When I learn of these knock-offs, I have them pulled from the Apple and Android stores. I also look for their application signatures on my site. If I detect an unauthorized app, I block access and return some kind of nasty notice.

Last month, I blocked another unofficial app. These users had likely installed something with adware and malware.

Why no app?

For other projects (including some research-only testing), I've made everything from Apple-specific apps using Swift to cross-platform apps using Flutter and Progressive Web Apps (PWA). (Personally, I found PWAs to be the easiest to build.) It isn't that I don't know how to build an app. Rather, it's that I understand the limitations. Some things just don't work well in apps and FotoForensics is one of them.

With any kind of forensic analysis, you want consistent results. If you run a test on your computer and your friend runs the same test on his computer, then both computers should show the exact same results. If everyone runs the same software, then we all get the same result. However, different libraries sometimes do different things. For example, in 2014 I mentioned that I use the old libjpeg6b (with patches) for image analysis. This is because the newer libjpeg8 does something "more than JPEG". You won't get the same results from a forensic test with libjpeg8 and the differences from libjpeg8 are not part of the JPEG Standard.

For my tools, I make no assumption about which image library you use. FotoForensics does the processing on the server side (using consistent libraries) and then shows the results in the web browser. The results are always PNG files, so I don't have to worry the client's JPEG library version. I also remove any ICC color profile information in order to mitigate any color shifting by the web client.

Inconsistent JPEG Libraries

Whenever someone talks to me about creating an app for FotoForensics, I channel my inner Edna Mode: "No apps!"

Error Level Analysis (ELA) might seem like a really simple algorithm. You load a JPEG, save it at a known quality level (e.g., 75%), and then see how much it changed. However, ELA is very dependent on the JPEG library. Different JPEG libraries implement things just a little differently. libjpeg6b is different from libjpeg8, libjpeg-turbo, Microsoft's JPEG library, Apple's JPEG library, etc.

Many years ago, I create a small "corrupt JPEG" test file that demonstrates these rendering differences. While this JPEG is intentionally corrupted, it's not bad enough to prevent any JPEG library from rendering it. The thing is, every JPEG library renders it differently.

Here's the test image. The different color blobs that you see will depend on your web browser:


I've been using this image for years to profile the underlying JPEG libraries. Depending on what is rendered, I can determine exactly which JPEG library and version some application is running.

For example:
libjpeg6b (follows the JPEG Standard)
libjpeg8. It may start the same as libjpeg6b, but bottom half is very different.
libjpeg-turbo (2.1.3 or earlier); Microsoft Edge and older Chrome and Firefox browsers.
libjpeg-turbo (2.1.4 or later); current Chrome and Firefox browsers. It might look similar to the previous libjpeg-turbo, but there is a little difference:
libjpeg-turbo 2.1.4, but using Firefox on a Mac. The colors are a little different.
Apple's library used by Safari (desktop) and iOS browsers (Mobile Safari, Mobile Chrome, and Mobile Firefox -- on an iPhone or iPad) is the only one that just gave up. (Of all of the libraries, this is probably the correct solution when encountering a corruption.)
I need to dig into Mastodon and see what they use for re-encoding images. It looks like the default Windows 10 library.

This isn't the entire list. Older Androids used a different library than current Androids. Windows 7 is different from Windows 8 is different from Windows 10, etc.

It's not good science if the results cannot be reproduced by someone else. You don't want to do media forensics on any device where the results can vary based on some unspecified library version. This is why I don't have an app. If I provided an app, it would need to be massive in order to include all of the known, trusted, and vetted libraries. Instead, I use a web interface and have the server perform the evaluation. The FotoForensics web site provides consistent results regardless of your web browser.

Unnecessary Apps

There are many different types of apps that you should never install. These includes redundant functionality apps (flashlights, keyboards, etc.) that duplicate existing functionality while requiring invasive access to your device, customer loyalty apps that are really nothing more than a front for deep user-tracking, and ineffective apps. For example, most anti-virus apps are ineffective due to localized sandboxing and may actually be trojan malware.

In addition, there are some apps that really should make you wonder: why does it need to be an app? For example, do you really need an app for your coffee maker or washing machine? Then again, I'm opposed to apps that disable the house alarm and unlock the front door. (Lose your phone, lose you home.) Beyond the physical, there are some impressive apps for making fake photos. Fake images for fun is one thing, but some of these apps seem clearly designed for people interested in committing insurance and banking fraud. Do we really need to make crime easier?

Finally, there are some things that should not be apps for various technical reasons. Kim Rust wrote a great list of technical reasons why you shouldn't build an app. The list includes insufficient resources (time, money, engineering effort) to support the app and apps that provide minimal functionality. Her reason #4 really matches my concerns: "Don't Build an App When it provides no improvement upon your mobile website".
❌
❌