The problem with stylebook comma rules is that they look at the type of node rather than the tree structure.
What in the thundering infernal blazes do I mean by that?
Well, bear with me. It will take a bunch of words to explain, but it's really very simple. It's also easier to explain with drawings than in words alone, and I might put some up later.
Some among us may remember 'diagramming' sentences as a way to show the grammatical structure graphically. The point of diagramming is that our grammar is (a) hierarchical (tree-structured) and (b) recursive--you can turn entire clauses into modifiers (using relative pronouns and subordinating conjunctions) or nouns (using relative pronouns), bury nouns in modifiers (prepositional phrases) and turn predicates into nouns and adjectives (gerunds and participles). This means that structure repeats within structure. Mathematics has a tool to describe this: the tree. This tree, and the theory around it, is a limited kind of 'graph'. It has another tool: Formal Language Theory, which makes heavy use of graphs.
There's all sorts of neat theory here, and especially a lot of neat algorithms, but we won't need to wrangle any of that. We just need the tree as a picture.
A graph, in graph theory, is a set of points (called nodes) connected by lines (or curves) called arcs. A tree is a graph with a single 'top' node called the root (for us, the whole sentence) and without cycles. No cycles means that from one node to another there's only one path, and a node has at most one 'ancestor' leading toward the root. Nodes may have descendents; these are called 'inner' (or 'non-terminal') nodes. Nodes at the bottom of the tree, called 'leaf' or 'terminal' nodes have no descendents. Leaf nodes represent unmodified individual words. The inner nodes represent constructions (words with modifiers, phrases, clauses, conjuntion-joinings, appositives and parentheticals, &c &c &c & &c).
Why is this more important than the cost of a good cup of coffee?
We speak, write, hear, and read linear strings of words. When we speak or write, we need to turn the tree of meaning (in our minds) into that string of words. When we hear or read, we need to turn the linear string of words back into that tree of meaning.
'Grammar' describes the allowable structure of the tree, and how it is converted to and from the linear sequence of words. Converting the string into the tree is called 'parsing' and it is a harder problem than generating the string. Strunk and White present the example of an eliided 'that'. I'll modify the example:
'He felt * his nose, which was over an inch long, made him look ridiculous.'
There's an implied 'that' at the asterisk, but the reader can't realize it until somewhere after the word 'made'. This case, the elided 'that', is peculiar to English, but the general parsing problem is inherently more difficult than generating the string. You can explore the drowning depths of the question in the WikiP articles on Backus-Naur Form and Formal Language Theory. But the essence is fairly simple.
When you are constructing the hierarchical sentence in your mind, you are attaching the words, one by one, to that notional parse tree. You need to know where to attach each word. Does it go on the previous word or construct, or does it go on a more remote node?
'red train' => 'train' inserted above 'red'
'Jesus wept' => 'Jesus' the subject below 'wept, 'wept' the verb below predicate, predicate below clause. below sentence.
Now consider "He felt that his big nose, which was over an inch long, made him look ridiculous." This is an easy sentence for an adult reader of English to understand, but it is frightfully complex viewed as grammar. "that his big nose ...." is a free relative clause serving as the direct object of 'felt'. (Yes, there are other ways to describe it, but they won't change my basic point.) Let's look at that clause.
His big nose which was over an inch long made him look ridiculous.
I've omitted all commas so we can ask "Why and where do we put commas?" (Have patience. We're getting close to my point.)
Read this sentence aloud (if you can, otherwise aloud in your mind). Where do you pause slightly? I think you'll find it's before 'which' and after 'long'. Why do you delay there? Is it because you've been taught where to put commas? Or is it because the pauses are natural in the sentence structure?
Whether you pause because of the mental processes involved in constructing the sentence, or because you want to communicate that structure, the listener will discern the pauses and infer from them the structure they indicate.
I want to focus now on that second pause: "... over an inch long, made him look ridiculous." What does this indicate in the tree-structured grammar hierarchy? It indicates that the word 'ridiculous' ends a branch of the tree, and the next word attaches somewhere above that branch. The next word, 'made', ties back to 'nose'. 'made' is the verb (of the predicate) of which 'nose' is the subject. Everything between was a modifier on 'nose', a node below 'nose' in the grammar tree. But 'made' belongs to the predicate, which is actually above nose in the grammar tree.
This break upward in the parse is what the natural pause indicates. This is what I hold the comma ought to indicate.
Now we can examine why the stylebook rules cannot, in general, be right.
They call for commas to be placed before or after certain types of clauses, phrases, or other constructions. For appositives and parenthetical phrases, the comma use is part of basic English punctuation, universal and not limited to a stylebook. Likewise, commas used for series are pretty much univeral (excluding the Oxford comma). I don't think any stylebook will say not to use them.
But the prescription to place commas before or after certain clauses and phrases, or between this and that, are based on the types of the nodes in the tree, not on the need to indicate a change, from adding to the current place in the tree, to a much higher place in the tree. What the reader needs to know is where the parse breaks from a lower-level structure and moves back up the tree. If the comma is to help the reader, it must tell the reader what the reader needs to know. It must tell the reader when the parse breaks back up the parse tree.
Rules based on the type of the grammar nodes can only be right in some cases, not all. Nor can they deal with sentences that might require, according to their rules, many commas at many levels of the grammar tree. A sentence festooned with one comma for every five or six words will most often be hard to read. The greater the break in levels, the greater the need for the comma. The mind can easily connect a break of a level or two, but when the break is the end of a clause nested in a clause or phrase, or a combination of clause and conjunction, the comma is a great help to the reader. Thus, where there is a question of where to put the comma(s), the comma(s) should be placed at the largest breaks, that is, the breaks across the greatest number of levels of the parse.
The stylebooks' use of the grammar node type is an attempt to spare their users the need to fully understand their sentences' structures. The consequences are not good.
So I hold.
So I declare.
So I proclaim.