I prefer simple
I can be called lots of things ( :-) ), but in terms of what I do, I'm a compiler writer. Ignoring my first efforts in a compiler course at University and another project I was hired and paid to do, all of my compilers have been straightforward monolithic programs. No passes, no phases. Read source code and emit optimized object files. (My current project is all one program, but does create a detailed internal representation of the program.)
Makes for fast execution and minimal I/O - very important back in the days of floppy disks.
Also makes for a minimum of total code involved, hence reducing the opportunity for bugs.
Sure, if you have multiple phases/passes you can in theory manually examine the intermediate stuff, but do you really want to? It's a whole other language and set of conventions you now have to understand. Simple, specific test code can nearly always trigger any bugs you are looking for, and following the path of what's going on is far simpler in a monolithic compiler.
As for some of the other points folks have raised...
Yes, you can implement a language with more checking (array bounds, arithmetic overflows, ...) by emitting C source code. The checks are simply there in the source you emit. A good C compiler can likely optimize those into something closely resembling what a monolithic compiler would generate.
Sorry to hear that both gcc and llvm are hard to work with. I vaguely recall friends trying to send me in the direction of using one of them, but I think now I'm thankfull that I didn't.
Early compilers were often split up simply because of memory constraints. With the K & R compiler, one of the things they didn't have to deal with in the compiler proper was branch shortening. Most CPUs have branch instructions with both long and short offsets. You want to use the short-offset forms wherever possible, but its hard to know where the target of a forward branch is, because you haven't generated code for all the stuff between the branch and the target yet. The Unix PDP-11 assembler handled that for them. It also handled the details of emitting valid object files. It wasn't all about not dealing with the target instruction set.
The code generator for X86-64 in my current project is still under 8000 lines of code. There are a few things left - currently I'm doing my 'bits' types. The CPU has single bit extract/insert, but nothing more, so I'll have to special case to use those instructions. I badly need at least a move/move optimizer, and doing that for this CPU will need an abstract representation of the instructions, unfortunately - the binary format is just too complex to do directly. The AMD CPU manuals are large and complete, but as in any such technical endeavour, you have to keep in mind stuff you read a few hundred pages back when figuring out exactly what can happen in encodings, semantics, etc. Sigh.