Azure Stack: Fixing Admin and Tenant Portals Login Times

Around the time that we updated to 1808 on our multimode Azure Stack, we started seeing extremely long login times for both our Admin Portal and Tenant Portals. At first, it wasn’t an important issue to look into until we started having other users start mentioning the long login times to log in to their tenant portals. There wasn’t anything mentioned from others I followed so I did what most people will do on something that isn’t critical, I sat on it. Yes, I admit, I prioritized this issue as low impact and didn’t really dig into it until I started to get annoyed by the login times. After all, everything is about ourselves when it comes to prioritizations, right? Just kidding or am I? I looked around and didn’t see anything out of the ordinary with our Stack. Everything looked healthy, all the roles were healthy and nothing really caught my eye. I talked to a few people at Ignite and found a small number that had the same issue. So I finally opened a ticket with Microsoft Support just to get the process started. Still, at this time I didn’t consider it a high priority and went on focusing on other issues I was working on.

After a few weeks, we were told that there was a possible solution. It turns out that the issue causing the slow login time was related to the time being off on the WAS and WASP infrastructure roles which host the two portals. So how do you fix this or verify that this was the issue? You can do a few things but overall we had to “break glass” to verify why the instances were not getting the correct time from the assigned NTP server.

How to Troubleshoot Time Sync on Azure Stack without Breaking Glass

If you haven’t broken glass yet there is a command you can run within the ERCS that will show you the status of the internal domain controllers and the other infrastructure roles. Get-AzsTimeService will query DC01 and DC02 and report basic time server information like where the node is pulling time from, the type of time server node, etc. It was one of the first things we used in order to start troubleshooting our issue.

[10.99.0.224]: PS C:\> Get-AzSTimeSource
VERBOSE: Getting time configuration and status of node AZS-DC02,AZS-DC01
VERBOSE: Begin ‘Get’ for node AZS-DC02
VERBOSE: Attempting to create a PSSession to node AZS-DC02
VERBOSE: Getting domain role of node AZS-DC02
VERBOSE: Domain role of node AZS-DC02 is ‘Domain Controller’
VERBOSE: Getting W32Time service status of node AZS-DC02
VERBOSE: W32Time service status of node AZS-DC02 is ‘Running’
VERBOSE: Getting time configuration of node AZS-DC02
VERBOSE: Type of node AZS-DC02 is ‘NT5DS’
VERBOSE: Getting time status of node AZS-DC02
VERBOSE: Source of node AZS-DC02 is ‘AzS-DC01.azurestack.local ‘
VERBOSE: Last successful sync time of node AZS-DC02 is ’10/5/2018 6:17:08 PM’
VERBOSE: Removing PSSession to node AZS-DC02
VERBOSE: End ‘Get’ for node AZS-DC02
VERBOSE: Begin ‘Get’ for node AZS-DC01
VERBOSE: Attempting to create a PSSession to node AZS-DC01
VERBOSE: Getting domain role of node AZS-DC01
VERBOSE: Domain role of node AZS-DC01 is ‘Domain Controller’
VERBOSE: Getting W32Time service status of node AZS-DC01
VERBOSE: W32Time service status of node AZS-DC01 is ‘Running’
VERBOSE: Getting time configuration of node AZS-DC01
VERBOSE: Type of node AZS-DC01 is ‘NTP’
VERBOSE: Getting time status of node AZS-DC01
VERBOSE: NtpServer of node AZS-DC01 is ‘10.91.232.117’
VERBOSE: Source of node AZS-DC01 is ‘Local CMOS Clock’
VERBOSE: Last successful sync time of node AZS-DC01 is ’10/5/2018 5:45:21 PM’
VERBOSE: Removing PSSession to node AZS-DC01
VERBOSE: End ‘Get’ for node AZS-DC01

ComputerName : AZS-DC02
DomainRole : Domain Controller
Service : Running
Type : NT5DS
NtpServer :
Source : AzS-DC01.azurestack.local
LastSuccessfulSyncTime : 10/5/2018 6:17:08 PM

ComputerName : AZS-DC01
DomainRole : Domain Controller
Service : Running
Type : NTP
NtpServer : {10.91.232.117}
Source : Local CMOS Clock
LastSuccessfulSyncTime : 10/5/2018 5:45:21 PM

[10.99.0.224]: PS C:\>

As you can see the data returned shows what NTP server and the source the two domain controllers are using. This also showed us something, the actual source for DC01 is set at Local CMOS Clock. This isn’t a good thing. More than likely the NTP time server we have configured for isn’t reachable. We tried to reconfigure the time server using the Set-AzsTimeSource –TimeServer 10.91.232.117

[10.99.0.224]: PS C:\> Set-AzSTimeSource -TimeServer 10.91.232.117
VERBOSE: Time server 10.91.232.117 is a valid IP address
VERBOSE: Testing time server 10.91.232.117
VERBOSE: Time server 10.91.232.117 reported time 20:08:31, -22.8533228s
VERBOSE: Getting time configuration and status of node AZS-DC02,AZS-DC01
VERBOSE: Begin ‘Get’ for node AZS-DC02
VERBOSE: Attempting to create a PSSession to node AZS-DC02
VERBOSE: Getting domain role of node AZS-DC02
VERBOSE: Domain role of node AZS-DC02 is ‘Domain Controller’
VERBOSE: Getting W32Time service status of node AZS-DC02
VERBOSE: W32Time service status of node AZS-DC02 is ‘Running’
VERBOSE: Getting time configuration of node AZS-DC02
VERBOSE: Type of node AZS-DC02 is ‘NT5DS’
VERBOSE: Getting time status of node AZS-DC02
VERBOSE: Source of node AZS-DC02 is ‘AzS-DC01.azurestack.local ‘
VERBOSE: Last successful sync time of node AZS-DC02 is ’10/10/2018 8:03:40 PM’
VERBOSE: Removing PSSession to node AZS-DC02
VERBOSE: End ‘Get’ for node AZS-DC02
VERBOSE: Begin ‘Get’ for node AZS-DC01
VERBOSE: Attempting to create a PSSession to node AZS-DC01
VERBOSE: Getting domain role of node AZS-DC01
VERBOSE: Domain role of node AZS-DC01 is ‘Domain Controller’
VERBOSE: Getting W32Time service status of node AZS-DC01
VERBOSE: W32Time service status of node AZS-DC01 is ‘Running’
VERBOSE: Getting time configuration of node AZS-DC01
VERBOSE: Type of node AZS-DC01 is ‘NTP’
VERBOSE: Getting time status of node AZS-DC01
VERBOSE: NtpServer of node AZS-DC01 is ‘10.91.232.118’
VERBOSE: Source of node AZS-DC01 is ‘Local CMOS Clock’
VERBOSE: Last successful sync time of node AZS-DC01 is ’10/10/2018 8:07:13 PM’
VERBOSE: Removing PSSession to node AZS-DC01
VERBOSE: End ‘Get’ for node AZS-DC01
VERBOSE: Domain Controller for external time sync is AZS-DC01
VERBOSE: Begin ‘Set’ for time server
VERBOSE: Attempting to create a PSSession to node AZS-DC01
VERBOSE: Setting time server on AZS-DC01 to 10.91.232.117
VERBOSE: Resync time on AZS-DC01 with wait
VERBOSE: Removing PSSession to node AZS-DC01
VERBOSE: Getting time configuration and status of node AZS-DC01
VERBOSE: Begin ‘Get’ for node AZS-DC01
VERBOSE: Attempting to create a PSSession to node AZS-DC01
VERBOSE: Getting domain role of node AZS-DC01
VERBOSE: Domain role of node AZS-DC01 is ‘Domain Controller’
VERBOSE: Getting W32Time service status of node AZS-DC01
VERBOSE: W32Time service status of node AZS-DC01 is ‘Running’
VERBOSE: Getting time configuration of node AZS-DC01
VERBOSE: Type of node AZS-DC01 is ‘NTP’
VERBOSE: Getting time status of node AZS-DC01
VERBOSE: NtpServer of node AZS-DC01 is ‘10.91.232.117’
VERBOSE: Source of node AZS-DC01 is ‘Local CMOS Clock’
VERBOSE: Last successful sync time of node AZS-DC01 is ’10/10/2018 8:08:32 PM’
VERBOSE: Removing PSSession to node AZS-DC01
VERBOSE: End ‘Get’ for node AZS-DC01
Time server was updated to 10.91.232.117 but AZS-DC01 time was sourced from Local CMOS Clock
+ CategoryInfo : OperationStopped: (Time server was…ocal CMOS Clock:String) [], RuntimeException
+ FullyQualifiedErrorId : Time server was updated to 10.91.232.117 but AZS-DC01 time was sourced from Local CMOS C
lock

[10.99.0.224]: PS C:\>

As you can see DC01 was configured again to use the correct NTP server we have setup internally. However, something went wrong and that time server source was again set for Local CMOS Clock. Again more than likely something is not routed correctly between that DC01 and our NTP server internally. So in order to troubleshoot more at this point, we will need to break glass. Breaking glass just allows us to do more commands on our ERCS but we have to get a support token from Microsoft in order to do this.

Troubleshooting Time Sync issues after breaking glass

So we now have unlocked our ERCS session (Broke glass). Now we have access to load modules, run more advanced PowerShell scripts, and get down and dirty within Azure Stack. I love breaking glass and I really love being able to dig deep within the Azure Stack infrastructure that we normally don’t get to see.

So we just started off doing basic troubleshooting PowerShell commands again. To verify configurations and time, etc.

Write-Host “$($StampInfo.Prefix)-DC01” -ForegroundColor green
$StampInfo=get-azurestackstampinformation
Invoke-command -Computername “$($StampInfo.Prefix)-DC01” -ScriptBlock {w32tm /query /source}

This just confirmed that the source was still using Local CMOS Clock on DC01.

Write-Host “$($StampInfo.Prefix)-DC01” -ForegroundColor green
$StampInfo=get-azurestackstampinformation
Invoke-command -Computername “$($StampInfo.Prefix)-DC01” -ScriptBlock {Get-Date}

This just confirmed that our time was still off by 11 minutes and causing our issues. We then tried to force a resync which didn’t work since DC01 was still trying to use the source Local CMOS Clock. At this time we needed to verify that DC01 could communicate to the NTP server over UDP port 123. This had worked in the past since this was required for our deployment of Azure Stack. So I had assumed the ports were still open. I know for sure that any external NTP source would be blocked.

We then ran the following Script to test connectivity to our NTP servers. We did one for an external address just to verify that those ports were not open and then we did a test to our internal NTP server.

Write-Host “$($StampInfo.Prefix)-DC01” -ForegroundColor green
$StampInfo=get-azurestackstampinformation
Invoke-command -Computername “$($StampInfo.Prefix)-DC01” -ScriptBlock {w32tm /stripchart /Computer:time.windows.com /dataonly /samples:5}

Write-Host “$($StampInfo.Prefix)-DC01” -ForegroundColor green
$StampInfo=get-azurestackstampinformation
Invoke-command -Computername “$($StampInfo.Prefix)-DC01” -ScriptBlock {w32tm /stripchart /Computer:10.91.232.117 /dataonly /samples:5}

So our results were a bit confusing. We could get a response from our external time source but our internal time source we couldn’t get anything back.

[10.99.0.224]: PS C:\> Write-Host “$($StampInfo.Prefix)-DC01” -ForegroundColor green
AzS-DC01
[10.99.0.224]: PS C:\> $StampInfo=get-azurestackstampinformation
WARNING: The names of some imported commands from the module ‘ECEClient’ include unapproved verbs that might make them
less discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose
parameter. For a list of approved verbs, type Get-Verb.
[10.99.0.224]: PS C:\> Invoke-command -Computername “$($StampInfo.Prefix)-DC01” -ScriptBlock {w32tm /stripchart /Compute
r:time.windows.com /dataonly /samples:5}
Tracking time.windows.com [13.65.88.161:123].
Collecting 5 samples.
The current time is 10/10/2018 8:28:14 PM.
20:28:14, -22.7968339s
20:28:16, -22.7966784s
20:28:18, -22.7978192s
20:28:20, -22.7955695s
20:28:22, -22.8008482s
[10.99.0.224]: PS C:\>
[10.99.0.224]: PS C:\> Write-Host “$($StampInfo.Prefix)-DC01” -ForegroundColor green
AzS-DC01
[10.99.0.224]: PS C:\> $StampInfo=get-azurestackstampinformation
WARNING: The names of some imported commands from the module ‘ECEClient’ include unapproved verbs that might make them
less discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose
parameter. For a list of approved verbs, type Get-Verb.
[10.99.0.224]: PS C:\> Invoke-command -Computername “$($StampInfo.Prefix)-DC01” -ScriptBlock {w32tm /stripchart /Compute
r:10.91.232.117 /dataonly /samples:5}
Tracking 10.91.232.117 [10.91.232.117:123].
Collecting 5 samples.
The current time is 10/10/2018 8:28:59 PM.
20:28:59, error: 0x800705B4
20:29:02, error: 0x800705B4
20:29:05, error: 0x800705B4
20:29:08, error: 0x800705B4
20:29:11, error: 0x800705B4
[10.99.0.224]: PS C:\>

However, if we try to change the time server to the external source we get the same error that it cannot communicate to it that we are getting for the internal NTP server. If we try to use the Set-AzsTimeSource –TimeServer time.windows.com it will fail because it can’t communicate.

[10.99.0.224]: PS C:\> Set-AzSTimeSource -TimeServer 13.65.245.138
VERBOSE: Time server 13.65.245.138 is a valid IP address
VERBOSE: Testing time server 13.65.245.138
Time server 13.65.245.138 did not respond with time
+ CategoryInfo : OperationStopped: (Time server 13….spond with time:String) [], RuntimeException
+ FullyQualifiedErrorId : Time server 13.65.245.138 did not respond with time
[10.99.0.224]: PS C:\>

We then ran a network trace from DC01.

# FIRST – start tracing
Invoke-command-Computername “$($StampInfo.Prefix)-DC01”-ScriptBlock {mkdir c:\temp}
Invoke-command-Computername “$($StampInfo.Prefix)-DC01”-ScriptBlock {netsh trace start capture=yes tracefile=c:\temp\timetest.etl}
# SECOND – repro issue & stop trace
Invoke-command-Computername “$($StampInfo.Prefix)-DC01”-ScriptBlock {w32tm /stripchart /computer:10.91.232.117}
Invoke-command-Computername “$($StampInfo.Prefix)-DC01”-ScriptBlock {netsh trace stop}
# THIRD – copy file off of DC01 to ERCS
copy-item-path \\”$($StampInfo.Prefix)-DC01″\c$\temp\timetest.etl -Destination c:\temp
# Then copy off the file to a share on a jumpbox to get the file off the stamp
# FOURTH – clean up DC01
Invoke-command-Computername “$($StampInfo.Prefix)-DC01”-ScriptBlock {remove-item c:\temp\timetest.*-Recurse}

The trace did show that traffic was not going through on UPD port 123 to our NTP server. So at this point, we know for sure that something is blocking UDP port 123 to our NTP Time server internally and to any NTP server externally. Just to be transparent, I have not gotten the routing fixed as of this blog.  However, I feel very confident once that UDP 123 to that NTP server is open again, our DC will be able to sync time. I did manually update the time on those nodes and roles while we had the unlocked support session.  The admin portal and tenant portal which was taking 3 minutes and 38 seconds to log in after entering credentials now only takes a few seconds.

Final Thoughts

I really love working with this technology. I also really enjoy working with all the guys at Microsoft when we have issues. I wish we had a way to break glass on our own but also understand why they limit this ability. There are a lot of things that one can do with an unlocked session. So, if you are experiencing long log in times, check your time on your Stack to verify that it is correct. I forced the clock changes on all my nodes and on the infrastructure roles and my login time is now seconds compared to the 3 minutes and 38 secs it was taking. I still am trying to work out the network access to our NTP servers but at least I know the fix works.

Tagged with: , ,
Posted in Azure Stack

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Follow Kristopher Jon Turner on WordPress.com
Categories
Archives
Follow me on Twitter
%d bloggers like this: