How to copy XMP metadata between Images
Copying XMP metadata between images isn't straightforward. Read how it's done correctly.
27 October 2021Ask a question
Introduction
This article is obsolete. Please read the new article on XMP
XMP (Extensible Metadata Platform) is a metadata format introduced by Adobe Systems Inc. Sometimes it is necessary that we alter an image (resize or change pixels), but want to keep all the metadata. It turns out that this isn't that straightforward when dealing with XMP metadata. There exist generally two ways to copy XMP data in Python using a dedicated framekwork. These are:
- py3exiv2, which is a Python binding to Exiv2 which is a tool for managing image data, and
- Pythom XMP Toolkit thats works with XMP image data.
The main problem with those frameworks is that after copying the XMP data to another image, the size of the XMP metadata changes.
For the following experiments we will use this test image, which we will refer as original.jpg
.
XMP hidden in the JPEG
Let us look more closely at this using ImageMagick to check the size:
> identify -verbose original.jpg
Profile-xmp: 28401 bytes
Another way to check is with the ExifTool by Phil Harvey:
> exiftool -v original.jpg
JPEG APP1 (28430 bytes):
+ [XMP directory, 28401 bytes]
Vanilla copying
The destination image that will receive the XMP content will be created on the fly. With the Python XMP Toolkit we can copy the data the following way.
import PIL
from PIL import Image
from libxmp import XMPFiles, consts
from libxmp.utils import file_to_dict
source = 'original.jpg'
dest = 'new.jpg'
new_image = PIL.Image.new(mode="RGB", size=(200, 200))
new_image.save(dest)
xmpfile = XMPFiles(file_path = source, open_forupdate = True)
xmpfile2 = XMPFiles(file_path = dest, open_forupdate = True)
xmp = xmpfile.get_xmp()
xmpfile2.put_xmp(xmp)
xmpfile2.can_put_xmp(xmp)
xmpfile.close_file()
xmpfile2.close_file()
The other method involves using the py3exiv2 library:
import pyexiv2
from PIL import Image
import PIL
new_image = PIL.Image.new(mode="RGB", size=(200, 200))
new_image.save("new.jpg")
metadata_1 = pyexiv2.ImageMetadata('original.jpg')
metadata_1.read()
metadata_1.modified = True
metadata_2 = pyexiv2.metadata.ImageMetadata('new.jpg')
metadata_2.read()
metadata_1.copy(metadata_2, xmp = True)
metadata_2.write()
Let us now check the size of the XMP section in the image we just created.
> identify -verbose new.jpg
Profile-xmp: 19658 bytes
Let's check with exiftool just to be sure that it is not the fault of ImageMagic not reading meta data correctly.
> exiftool -v new.jpg
JPEG APP1 (19687 bytes):
+ [XMP directory, 19658 bytes]
As you see there is discrepancy of around 9000 bytes, caused by a different XMPToolkit Tag and some reformatting of the bytestring. There also might be some custom XMP data that are not be copied. Thus to be completly sure that no information is lost, changed or reformatted we can directly copy the whole XMP part of the bytestring to the new image.
How XMP starts...
We run the following piece of code to see how the bytestring looks like.
filename = 'original.jpg'
with open(filename, 'rb') as file:
contents = file.read()
The output looks quite overwhelming. According to the XMP specification the following line indicates the beginning of the XMP specification.
\xff\xe1o\x10http://ns.adobe.com/xap/1.0/\x00<?xpacket begin=\'\xef\xbb\xbf\' id=\'W5M0MpCehiHzreSzNTczkc9d\'?>\n<x:xmpmeta xmlns:x=\'adobe:ns:meta/\' x:xmptk=\'Image::ExifTool 10.96\'>
\xff\xe1
indicates the value of the field name APP1
which has the size of 2 bytes. Then we have 2 bytes that indicate the size of the XMP packet and 29 bytes for the namespace. The rest of the bytestring is the XMP packet itself. In our image the size is o\x10
(encoded as ASCII - thus 6F10
in hexadecimal format) which corresponds to 28432 bytes. Thus we have 31 bytes at the start of the XMP section for the namespace (29 bytes) and the representation of the size itself (2 bytes). The exiftool framework above shows us a size of 28430 bytes, which is 2 bytes off, probably due to not counting the 2 bytes that represent the total size of the section.
... and how it ends
The end of the XMP packet is basically where the next section starts. This is again indicated by a marker. According to the exiv2 documentation, the marker is \xff\xdb
which describes the DQT (Define Quantization Table) section.
Working script
Now we just need to copy the data between the start marker \xff\xe1o\x10http://ns.adobe.com/xap/1.0/\x00
and the end marker \xff\xdb
, ending up with the following script:
def add_xmp(source: str, dest: str):
with open(source, 'r+b') as file_1:
o_img = file_1.read()
xmp_start = o_img.find(b'http://ns.adobe.com/xap/1.0/\0')
xmp_end = o_img.find(b'\xff\xdb', xmp_start)
if xmp_start == -1:
return
xmp_str = o_img[xmp_start - 4: xmp_end]
with open(dest, 'r+b') as file_2:
d_img = file_2.read()
xmp_end = d_img.find(b'\xff\xdb')
first_part = d_img[:xmp_end]
second_part = d_img[xmp_end:]
new_str = first_part + xmp_str + second_part
file_2.seek(0)
file_2.truncate()
file_2.write(new_str)
file_2.close()
Now the XMP size should be the same in the new image. I hope that was helpful and clarified the process of copying XMP data.
Ask us Anything. We'll get back to you shortly