MozillaZine

Maildir format and number of files downloaded

User Help for Mozilla Thunderbird
jrrp
 
Posts: 2
Joined: May 24th, 2018, 7:51 am

Post Posted May 24th, 2018, 1:21 pm

Hello!

My setup:
- several IMAP accounts
- TB configured to use the Maildir format
- TB configured to download a copy of all mails

I assume that in the Maildir format, each single mail is stored in a single file. So if I go into the file folder, I should find as many files as mails in the corresponding mail folder. Unfortunately, TB downloads the mails many times.

Example 1:
Mail folder contains 4057 mails (almost no attachments, 70 MB).
Corresponding file folder on hard disk contains 440000 files.
However, if I sort the files according to size, I see that it not that most mails have been downloaded many times, but that there are a few mails that have been downloaded literally thousands of times.
If I choose Repair Folder in Properties, the 440000 files are deleted and only 4057 files are left (as expected). But 30 minutes later, there are already 4080 files, one hour later even more, and the next day many more files.

Example 2:
Mail folder contains 21752 mails (almost no attachments, 344 MB).
Corresponding file folder on hard disk contains 227564 files.
Using the file sizes and the message IDs I have analyzed which mails have been downloaded how many times. It turns out that out of 21752 mails, 10 mails have been downloaded exactly 17145 times each, creating an overload of 171450 files. Still there are 56114 files, so some other mails have been downloaded more than once. As a remark, the 17145 files corresponding to the same mail do not have the same checksum, it is always a group of 6 or 7 files that has the same checksum.

TB creating more files than mails happens with two different mail accounts of different providers, none of them is Gmail.

I have searched here and elsewhere, but I haven't even found a description of this issue. I haven no clue what is happening here. Am I doing something wrong? A wrong TB setting? Can TB be set to use the message ID to identify IMAP mails?

Thank you very much in advance for any suggestion!

Best,
jrrp

tanstaafl
Moderator

User avatar
 
Posts: 44461
Joined: July 30th, 2003, 5:06 pm

Post Posted May 26th, 2018, 10:58 pm

What version of Thunderbird are you using?
Does this problem occur with each IMAP account?
If "Local Folders" uses maildir does it have the same problem?
Have you always had this problem? If not, did it start shortly after you upgraded versions?

Are you using https://addons.mozilla.org/en-US/thunde ... -messages/ to remove duplicates or doing it with a utility? According to https://www.dovecot.org/list/dovecot/20 ... 47552.html that add-on works with maildir.

I haven't found anybody else in these forums or the SUMO forums having a similar problem. Usually messages being repeatably downloaded is due to a problem with the popstate.dat file used by the POP account to keep track of whats been downloaded (not relevant in your case), or a corrupted global-messages-db.sqlite used by global search. The workaround for the latter is to delete global-messages-db.sqlite and global-messages-db.journal and let it rebuild the search index. I suggest you temporarily disable global search/index using tools -> options -> advanced -> general to simplify things. There are other ways to search.

You said "10 mails have been downloaded exactly 17145 times each". Do the duplicates have the same Message-Id: header as the original but a different file name? Are all of the filenames unique?

Do all of the files in each directory seem to start with the same numeric prefix? For example in my gmail inbox\cur directory the first file is 1522598109434000, the last file is 1527198699426000 and all of the files in that directory start with 152.

I seem to have a handful of messages with the same filename except for a -1 suffix. i.e. 1527198282287000 and 1527198282287000-1 . In each case they have the same date/time stamp, have different Message-Ids, and a different message body. So they're not duplicate messages. Do any of your messages have a numeric suffix like that?

I've never had to use repair folder with a maildir based account. With mbox based accounts I've never noticed it effect the content of the mbox file, it only effects the *.msf file (cache of the folder listing).

I had used a "Daily" build of Thunderbird to convert some mbox accounts to maildir and the accounts seem to work okay with Thunderbird 52.8.0 but the specified local directory doesn't have the maildir files. My gmail account which had been created using maildir looks okay. Let me create a new profile and then see if I can duplicate your problem.

I've found several non-Thunderbird posts/blogs that talk about problems removing duplicate files in maildir. For example https://serverfault.com/questions/25566 ... om-maildir . Unfortunately none of them talk about why the duplicates were created or how often this problem occurs..

https://wiki.mozilla.org/Thunderbird/Maildir
https://wiki2.dovecot.org/MailboxFormat/Maildir

jrrp
 
Posts: 2
Joined: May 24th, 2018, 7:51 am

Post Posted May 27th, 2018, 6:03 am

Dear tanstaafl,

thank you so much for your long and helpful reply.

tanstaafl wrote:What version of Thunderbird are you using?

I am using the latest version of Thunderbird in the release channel, presently 52.8.0 32-bit under Windows 10.

tanstaafl wrote:Does this problem occur with each IMAP account?

I have 7 accounts in Thunderbird, all of them with IMAP and with Maildir. However, only two of them are heavy accounts with thousands of messages. As far as I can see, it happens only with one of the heavy use accounts. However, it does not happen in all mail folders in this account. Since it is only a few mails being downloaded many times, I suspect that the problem is directly caused by those mails. In any case, I will delete all local files from this account, disable global search (see below) and see what happens when Thunderbird recreates and downloads all messages.

tanstaafl wrote:If "Local Folders" uses maildir does it have the same problem?
Have you always had this problem? If not, did it start shortly after you upgraded versions?

I do not have any local folders. I have had this problem since I switched to using Maildir in June 2017. Then I created a brand new Thunderbird profile.

tanstaafl wrote:Are you using https://addons.mozilla.org/en-US/thunde ... -messages/ to remove duplicates or doing it with a utility? According to https://www.dovecot.org/list/dovecot/20 ... 47552.html that add-on works with maildir.

I am not using the Remove Duplicate Messages extension. I believe that this extension removes duplicate messages. But my messages are not duplicated, they appear just once. It is only the files that are downloades many times.

tanstaafl wrote:I haven't found anybody else in these forums or the SUMO forums having a similar problem. Usually messages being repeatably downloaded is due to a problem with the popstate.dat file used by the POP account to keep track of whats been downloaded (not relevant in your case), or a corrupted global-messages-db.sqlite used by global search. The workaround for the latter is to delete global-messages-db.sqlite and global-messages-db.journal and let it rebuild the search index. I suggest you temporarily disable global search/index using tools -> options -> advanced -> general to simplify things. There are other ways to search.

Thank you for this suggestion. My global-messages-db.sqlite is more than 800 MB big. I have disabled global search and I will repair (recreate) the mail folders to check whether the problem disappears.

tanstaafl wrote:You said "10 mails have been downloaded exactly 17145 times each". Do the duplicates have the same Message-Id: header as the original but a different file name? Are all of the filenames unique?

Yes, the 17145 files have exactly the same Message-ID but a different and unique file name.

tanstaafl wrote:Do all of the files in each directory seem to start with the same numeric prefix? For example in my gmail inbox\cur directory the first file is 1522598109434000, the last file is 1527198699426000 and all of the files in that directory start with 152.

Yes, same here. In one of the problematic folders, the first file is 1527171250763000, the last one 1527408775703000.

tanstaafl wrote:I seem to have a handful of messages with the same filename except for a -1 suffix. i.e. 1527198282287000 and 1527198282287000-1 . In each case they have the same date/time stamp, have different Message-Ids, and a different message body. So they're not duplicate messages. Do any of your messages have a numeric suffix like that?

Yes, same here. Some files have a -1, -2, and some even have a -3 suffix. As in your case, they have different Message-Ids, different file sizes etc. They are not duplicate messages and not duplicate files.

tanstaafl wrote:I had used a "Daily" build of Thunderbird to convert some mbox accounts to maildir and the accounts seem to work okay with Thunderbird 52.8.0 but the specified local directory doesn't have the maildir files. My gmail account which had been created using maildir looks okay. Let me create a new profile and then see if I can duplicate your problem.

Thank you so much, but in the meantime, I have disabled global search and have had Thunderbird recreate and download all messages from the affected account. The problem is gone! The number of messages matches exactly the number of mails plus a few msf-files. The file global-messages-db.sqlite is less than 2MB. It seems that the problem is solved.

Many thanks againg for all your efforts.

Best,
jrrp

Return to Thunderbird Support


Who is online

Users browsing this forum: No registered users and 20 guests