Re: How Genetic studies reveal new relationships, species
Posted: Fri Dec 13, 2024 1:01 pm
Some comments and insights came up on another thread concerning MtDNA COI, trees showing purported taxonomic relationships and sample sizes. These are copied below because they are applicable to the topic of genetics, and useful to understand what these trees and such mean in recent publications.
*************************
many of Huang's friends didn't like the idea that there could be several species within what looked to them like a single species. A number of Huang's illustrated mandarinus group specimens were misidentified (his so-called 'hybrids' were actually 1st generation G. confucius, and several other specimens were misidentified)
**************************
To demonstrate the ambiguity in COI: a soon-to-be-published paper has both COI 3’ and 5’ tree diagrams for P rutulus and eurymedon. One of the trees highly suggests that one is a ssp of the other; the other tree slightly suggests it might be the other way around.
Further, only some specimens of said ssp fall under the other species; but other specimens group oddly, kinda all over the place, including one level up. I believe these inferences to not reflect the real situation, however it has happened for other taxa that “one species” has been split, some elevated, etc.
The lesson is that COI trees for 3’ and 5’ don’t always agree, and any tree built with few, or ONE, specimen are highly suspect to the point of being useless at species level and below. That’s where SNP and other analyses shed more light.
To be clear COI is useful, and more so with more specimens. If a series of specimens all group together you’re probably on to something. But when they don’t differentiate well with COI- as is the case with the group in the original question- it means other tools are required, tools which are more costly and time consuming.
*****************************
There is yet another explanation: provisional misidentification of specimens.
The ID you see for a specimen in a tree must be assigned by a human. If that ID is wholly wrong it will show up there anyway, based on MtDNA. The tree is automated based on COI, not names. If someone IDs a Graphium mullah as T Rex, then it will look like T Rex groups with those butterflies.
For example, in the Tiger Swallowtail trees you will see soon there is an outlier- my nemesis. Amongst all the glaucus is one Papilio alexiares ssp. garcia. This is simply because some moron misidentified a glaucus as the Mexican Papilio alexiares ssp. garcia, and it was uploaded to BOLD with that ID. And, since most studies that use BOLD COI also use that specimens ID, it shows up in the tree that way.
When there are two or more easily misidentified taxa, and the tree is generated, it will show two or more distinct groups, with the named specimens all jumbled up in both or all groups. It just means that many were misidentified.
When I assign an identifier to a specimen it includes location and date. Not name. In a tree you can see what groups together and AH HA moments can arise if the lab test ID has this data. As in, all the AZ P rutulus group together, aside from the main rutulus. When the specimenID is “Jim vacation” it’s useless, you have to research every damned specimen. When a provisional ID is just wrong it will be misleading in the tree. A bit of common sense goes far.
******************************
[misidentification of specimens] happens more than often indeed. Having sampled hundreds of Delias legs myself and put them in little vials... I can tell you that if someone talks to you or phones you in the middle of the process... you don't know where you are and you hesitate to start all over again.
In a study I took part in, the source specimens from the Museums were misidentified... by the Museums. As a result, conclusions of synonymy or differentiation were erroneous. If the scientists put a label on or took photos of the sampled specimens, we can confirm the initial error. Otherwise, it's impossible to know... and even then, it has to be verified.
This is why any DNA analysis must be carried out on SEVERAL specimens per taxon. But this is obviously not possible for rare species in collections. If a DNA study concludes something really strange... the first reflex must be: “can we see photos of the sampled specimens”.
*************************
many of Huang's friends didn't like the idea that there could be several species within what looked to them like a single species. A number of Huang's illustrated mandarinus group specimens were misidentified (his so-called 'hybrids' were actually 1st generation G. confucius, and several other specimens were misidentified)
**************************
To demonstrate the ambiguity in COI: a soon-to-be-published paper has both COI 3’ and 5’ tree diagrams for P rutulus and eurymedon. One of the trees highly suggests that one is a ssp of the other; the other tree slightly suggests it might be the other way around.
Further, only some specimens of said ssp fall under the other species; but other specimens group oddly, kinda all over the place, including one level up. I believe these inferences to not reflect the real situation, however it has happened for other taxa that “one species” has been split, some elevated, etc.
The lesson is that COI trees for 3’ and 5’ don’t always agree, and any tree built with few, or ONE, specimen are highly suspect to the point of being useless at species level and below. That’s where SNP and other analyses shed more light.
To be clear COI is useful, and more so with more specimens. If a series of specimens all group together you’re probably on to something. But when they don’t differentiate well with COI- as is the case with the group in the original question- it means other tools are required, tools which are more costly and time consuming.
*****************************
There is yet another explanation: provisional misidentification of specimens.
The ID you see for a specimen in a tree must be assigned by a human. If that ID is wholly wrong it will show up there anyway, based on MtDNA. The tree is automated based on COI, not names. If someone IDs a Graphium mullah as T Rex, then it will look like T Rex groups with those butterflies.
For example, in the Tiger Swallowtail trees you will see soon there is an outlier- my nemesis. Amongst all the glaucus is one Papilio alexiares ssp. garcia. This is simply because some moron misidentified a glaucus as the Mexican Papilio alexiares ssp. garcia, and it was uploaded to BOLD with that ID. And, since most studies that use BOLD COI also use that specimens ID, it shows up in the tree that way.
When there are two or more easily misidentified taxa, and the tree is generated, it will show two or more distinct groups, with the named specimens all jumbled up in both or all groups. It just means that many were misidentified.
When I assign an identifier to a specimen it includes location and date. Not name. In a tree you can see what groups together and AH HA moments can arise if the lab test ID has this data. As in, all the AZ P rutulus group together, aside from the main rutulus. When the specimenID is “Jim vacation” it’s useless, you have to research every damned specimen. When a provisional ID is just wrong it will be misleading in the tree. A bit of common sense goes far.
******************************
[misidentification of specimens] happens more than often indeed. Having sampled hundreds of Delias legs myself and put them in little vials... I can tell you that if someone talks to you or phones you in the middle of the process... you don't know where you are and you hesitate to start all over again.
In a study I took part in, the source specimens from the Museums were misidentified... by the Museums. As a result, conclusions of synonymy or differentiation were erroneous. If the scientists put a label on or took photos of the sampled specimens, we can confirm the initial error. Otherwise, it's impossible to know... and even then, it has to be verified.
This is why any DNA analysis must be carried out on SEVERAL specimens per taxon. But this is obviously not possible for rare species in collections. If a DNA study concludes something really strange... the first reflex must be: “can we see photos of the sampled specimens”.