Scott Olswold

SNMP-based Functions (Device Status, Device Discovery, Collection) Aren't Working

Blog Post created by Scott Olswold on Sep 6, 2019

Products

The following products support an SNMP (Simple Network Management Protocol) function, and may be affected by operational failure due to the cause described in this article:

 

  • Pharos Systems Device Scout
  • Pharos Systems Blueprint
  • Pharos Systems Uniprint
  • Pharos Systems Site Monitor
  • Pharos Systems Print Center

 

Symptoms

  • Printing devices are not found during discovery.
  • Discovered printing devices have no, or incomplete, associated data.
  • Released print jobs do not print out.
  • ERROR found in operational logs
    [2019/09/04 09:54:54.318 PDA4 D001 T02E e AgentDiscovery] [Messenger] (Get) Failed to get or interpret response from '10.50.0.40:161'. wrong response sequence: expected 212442588, received 212442385

 

Cause

A previous SNMP request was cancelled due to timeout, but the requested data was still received.

 

Background

The Simple Network Management Protocol (SNMP) is a stateless network application. This means that communications sent over SNMP do not necessarily have to be returned, or may take time to return. This is fine in many situations, but when an attempt is being made to add a printer to a managed pool, updated meter information is being sought, or a print job is being released, a long wait (greater than 30 seconds) may be unreasonable.

For those operations where a Pharos product utilizes SNMP (printer discovery, printer data collection, or printer status before print job release), timers have been implemented around those communications so that "sorry, this took too long, so we ended it" messages can be fed back to either the application or the user. The last symptom, above, is one such message.

 

So, What Happens?

In most cases, a timeout happens because the printer on the other side of the communication is really offline (turned off; moved, so it has a different IP address; network connection unplugged; etc.) so the timeout message is the last message found in a log file or on the user interface.

Sometimes, however, a device or the network is just slow. The "traffic lane" that SNMP uses in a lot of networks is the slow lane: bandwidth priority is given to Voice Over IP (VoIP), email, web traffic, file sharing, and other needs, so "non-essential" network apps get the left-overs. It could be that the printer is having a rough time waking up and responding to the SNMP message fast enough, or the network is experiencing high utilization and some latency ("lag") is creeping in. When something like that happens, the Pharos software ends up cancelling the initial data request and sending another. Each data request gets an identifying sequence number so that the application can keep things straight.

In the error message above, the initial sequence number, 212442385, was cancelled because the timeout was reached and it followed up with a new request with a sequence number of 212442588. While it was waiting for a response using the new sequence number, the original one (that was cancelled) came back. Since the initial request was closed, the response is rejected.

What will likely happen with this device is that the subsequent attempts will be stymied by the same problem that delayed the first connection attempt, so the device won't be added, updated, or sent the print job.

 

Resolution

The resolution is to increase the timeout value. The product documentation for your specific application (found here in Documentation & Downloads) will tell you how to increase the timeout. When increasing timeouts, remember that this may have an impact on the time required to perform a network scan (Device Scout or Site Monitor), update device data (Device Scout or Site Monitor), or get the user's print job out (Blueprint and Uniprint's Secure Release Here), so start small and make incrementally longer changes as necessary.

Outcomes