Date

Attendees

Goals

  • Understand what's required in the field

Discussion items

ItemNotes
Bench-marking
  1. stabilization of the network takes a lot of effort
  2. IOR -> poor performance -> disk analysis -> network between server client -> check each server one by one (determine the problematic server)
  3. many clients to one server
  4. network bandwidth problem. Where's the bottle neck?
  5. network testing one by one (1:1 client to server) - N:N clients:server more difficult to find what the problem is.
  6. each of the server and client bandwidth separately can be helpful. helps pin point the problem path.
  7. When traffic is not distributed very well on the different ports of the switch. Although this is below LNet, but usually there is a lot of back and forth with the network provider to get someone to own the problem. Would be useful to have tools, which provide evidence of where the problem is.
  8. expose performance information per connection
  9. tools to figure out more general networking issues.
After Production
  1. unexpected client eviction -> timeout between server and client
  2. collect more info for that specific link on eviction
Features
  1. LNet top, to show time and rate performance
  2. Ability to capture traffic on a specific connection

Action items

  •