AD Replication Troubleshooting steps (DNS)
Quick Checks:
-Check and confirm the network browsing is working and it has
not changed your firewall to public
-Use Event Viewer to get more detail. It is the easiest quick
resource but note the timeline of the errors many times you will see errors as
the server is booting up and initializing. Ignore those and pay attention to
the logs after full boot up completes. (i.e. don’t chase your tail)
-Most alerts are generic in their details/description. Make
sure you are addressing ones that matter for your current AD forest and domain
level. If it doubt escalate to someone that knows. You can melt down your
domain by following bad online advice.
-Check DNS settings on the network interface and point your
DC to another DC first then itself second. There are different views on this but
I do so to avoid DNS isolation.
-Know your FSMO Role holders. If you don’t know what FSMO
role holders are you should not be learning on production. Quickest way to check
them is:
NETDOM /query FSMO
-if you need to transfer FSMO Roles see my FSMO posts.
Open a command Prompt as an admin
repadmin.exe /showreps
- this will show DCs listed in domain
repadmin.exe /replsummary
- This will give you a DC replication summary
On DCs run “dcdiag” for a quick searchable summary
- The following command will export a detailed diagnostics
to the path specified
DCDIAG /e /v /c /f:C:\DCDiagReport.txt
-you can then review those for errors and more detail. This report
takes a few minutes to run, longer with more DC’s and/or errors/retries.
The following is a recent real AD replication issue I troubleshot:
I was getting the 1722 RPC service error below on repadmin.exe
/replsummary:
This is a generic error that can mean lots of communication
issues.
In this case I confirmed that the RPC services were running
and set to Automatic as expected
From the errors I saw in DCDiag I ran a quick command on DNS
DCDIAG /TEST:DNS
-be patient with this test it takes a few minutes
On one DC it reported no errors. On the other server it took
longer but was filled with errors. One in particular caught my attention:
It caught my attention because 192.168.1.22 in this case is
NOT a current DC or DNS server.
I then confirmed the troubled servers network interface had the wrong DNS address.
That was causing it to query to nowhere and thus break
replication.
With the change to a correct DNS server all errors were
cleared and replication started working perfectly
Final Note to remember:
IT'S DNS! Even when it can't be DNS, IT'S DNS
Comments
Post a Comment