Bug 2084186 - [RFE] Improve the management for multiple network adapters in oVirt
Summary: [RFE] Improve the management for multiple network adapters in oVirt
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.4.10.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: eraviv
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-11 15:43 UTC by Yury.Panchenko
Modified: 2022-10-11 10:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-11 10:32:51 UTC
oVirt Team: Network
Embargoed:


Attachments (Terms of Use)
nic config + ping (73.77 KB, application/zip)
2022-06-01 14:27 UTC, Yury.Panchenko
no flags Details
network setup (239.16 KB, application/zip)
2022-06-02 18:28 UTC, Yury.Panchenko
no flags Details
vdsm log (604.85 KB, application/x-7z-compressed)
2022-06-09 12:57 UTC, Yury.Panchenko
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-46012 0 None None None 2022-05-11 16:07:01 UTC

Description Yury.Panchenko 2022-05-11 15:43:12 UTC
Iā€™d like to offer feature request. 
Is it possible to improve management of multiple network adapters in RHV servers?

Using different network adapters for vms traffic and for SAN are the best practices now. In addition, most modern servers use network adapters with a minimum two ports.

I managed to create this type of RHV cluster in my lab. There are two nodes and one storage server, SAN and network have different subnets and work via different network ports on each server. 

I faced some problems during this process.
The main problem was that I couldn't configure the network from the web UI, instead of this I had to configure proper network routes for each port from the ssh (each network has a different subnet).

Comment 1 eraviv 2022-05-16 11:51:09 UTC
Hi  Yury.Panchenko,

Regarding separating the traffic to storage and VM networks - this is supported by defining several networks and assigning different roles to them using 'network roles' in Clusters|Logical Networks|Manage Networks.
Regarding number of ports per NIC - please explain what exactly is your use case so we can better evaluate.

Thanks,

Eitan
RHV - networking

Comment 2 Yury.Panchenko 2022-05-20 11:07:00 UTC
Hello Eitan,
> Regarding separating the traffic to storage and VM networks - this is supported by defining several networks and assigning different roles to them using 'network roles' in Clusters|Logical Networks|Manage Networks.
This doesn't work on the core level.

For example, I have two networks NetA - in VLAN a and NetB in VLAN b, they have different subnets and gateways and they both have access to the Internet.
There is RHV node server with two port 10G adapter, port 1 is connected to NetA and port 2 is connected to NetB. 
In this setup one of the ports doesn't work, it has configuration, but all traffic goes through only one port. To resolve that I must connect to the terminal and manually define routes for every network port.
Then I deploy ovirt engine and connect storage domain via NetB. When I define the second netwrok via "'network roles' in Clusters|Logical Networks|Manage Networks." the ovirt drops all my network routes from the server and the second port again doesn't work.
So I must again connect to the server and define this routes manually.

According to my experience with other hypervisors I'd like to get 'out of box' funcionality in this area, because the setup is very basic and there will be many customers who will get the same problems

Comment 3 Michal Skrivanek 2022-05-25 11:47:08 UTC
I still do not follow what are you trying to do. Please describe exact steps. You are supposed to configure host networking in Host:NEtwork Interfaces:Setup Host Networks

Comment 4 Yury.Panchenko 2022-05-26 18:32:41 UTC
> You are supposed to configure host networking in Host:NEtwork Interfaces:Setup Host Networks
Yes I use this to configure two networks, but one of the nework always doesn't work in this case.
It connected but network traffic doesn't come. Source of the problem is incorrect network routes created by this utility. The host trys to pass all network traffic via one adapter.
You must connect to the host and manually define routes for each port in terminal to make them work together.
And when you configure this nics in the Ovirt, you will repeate this trick with the routes again.

Comment 5 Michael Burman 2022-06-01 11:01:59 UTC
Hi Yury,

I think we are still not clearly understand(DEV + QE) what you are trying to achieve.

Let's try to understand. 

First of all the 'ovirtmgmt' network is the default route network by default. This is possible to change and assign another network that will be the default network of the host. 

From your comments, it seems like you want to have a default route per port/network, but only one network can be the default route of the host.
Also, it is possible to use VLANs, assign vlan networks on same port or on different, this way to separate traffic. 
One network can be used for storage connection and one for management. The VLANs must be properly configured on the switch side of course.

roles:
default route - by default is ovirtmgmt
all traffic is via ovirtmgmt, unless specified otherwise. It is possible to change this role and assign any other network, as long as it's has a bootproto configured. 
You can set one vlan network to be the default route of the host
and the other network for the other usage, like storage.  

Can you share with us a screenshot of the UI setup host networks, to see what you are trying to do? 

Thanks,

Comment 6 Yury.Panchenko 2022-06-01 14:27:12 UTC
Created attachment 1885809 [details]
nic config + ping

Comment 7 Yury.Panchenko 2022-06-01 14:27:29 UTC
Hello Muchael,
I uploaded few screenshots to describe my current setup.
You also can see that ping works only via the ovirmmgmt interface

Comment 8 eraviv 2022-06-02 13:08:17 UTC
Hi Yury,

From the attachment OvirtNetwork.png I see that you have two networks with two separate subnets on two separate nics- this is a standard and well supported use case by engine. We are not aware of any problems or lack of functionality around it.
In the attachment ping.txt the existing ovirtmgmt bridge and nics reflect the setup viewable on engine. 


What is not clear to me:
1. I don't see any vlan usage which is inconsistent with what you wrote in comment 2. 
2. The SAN network is marked as out of sync which means that whatever you configured on engine is not consistent with what is configured on the host. This in itself might be an indicator of a mis-configuration.
3. Both ovirtmgmt and SAN have quite a few dropped packets which also might indicate a mis-configuration on the switch this host is connected to or the switch-host connection.


In the attachment ping.txt, the fact that ping results signals to me that -
1. ovirtmgmt is the default gateway on the host - this is the default configuration by engine.
2. another route\gateway for the second subnet is missing on the host. or,
3. maybe the host to switch or switch configuration has a problem? 
You have not included the routing table on the host or the nic configuration in engine so I cannot ascertain this.

Did you configure the boot protocol of enp94s0f0 nic for the SAN network? 
By default it is none in which case there cannot be any L3 communication on that network. 
If you set it to static, you can specify the gateway for that subnet in setup networks dialog in engine by clicking the pencil icon. 
If it is configured to dhcp the gateway is acquired automatically.

Can you check the above and provide more details? 
Also, engine.log and vdsm.log might shed some more light on the situation.

Thanks,
Eitan

Comment 9 Yury.Panchenko 2022-06-02 18:28:16 UTC
Created attachment 1886152 [details]
network setup

Comment 10 Yury.Panchenko 2022-06-02 18:38:50 UTC
Hello Eitan,
I resolved the out of sync problem and uploaded the nework setup, but I still can't ping san network via the SAN interface, I can do this only via the Ovirt nic
> 1. I don't see any vlan usage which is inconsistent with what you wrote in
I use native tagging on the switch ports, this part is done by swith. The both nics have different VLANs

> 2. The SAN network is marked as out of sync which means that whatever you configured on engine is not consistent with what is configured on the host. This in itself might be an indicator of a mis-configuration.
fixed

> 1. ovirtmgmt is the default gateway on the host - this is the default configuration by engine.
It's ok for me. I'd like to use the SAN nic only for the SAN subnet 172.24.175.x

> 2. another route\gateway for the second subnet is missing on the host. or,
I expected that the engine or the RHV create this route automatically

> 3. maybe the host to switch or switch configuration has a problem? 
No, there are many other servers on the same switch and networks, so they don't have any problems.

> You have not included the routing table on the host or the nic configuration in engine so I cannot ascertain this.
There is. I don't change anything in the routes
[root@PDCQA189 /]# ip route
default via 172.25.16.1 dev ovirtmgmt proto static metric 1
172.25.16.0/22 dev ovirtmgmt proto kernel scope link src 172.25.16.61 metric 425

> Did you configure the boot protocol of enp94s0f0 nic for the SAN network? 
no

> Also, engine.log and vdsm.log might shed some more light on the situation.
I didn't see any problems in the logs. I will upload them if it needs

Comment 11 Yury.Panchenko 2022-06-06 12:06:03 UTC
Hello Eitan,
to make the second nic work, I must add this routes on the RHV node
[root@PDCQA189 ~]# ip route add 172.24.175.1 dev enp94s0f0
[root@PDCQA189 ~]# ip route add 172.24.175.0/24 via 172.24.175.1 dev enp94s0f0

Comment 12 eraviv 2022-06-08 08:53:27 UTC
Hi Yuri,

In comment#10 you mentioned that you did not configure the boot protocol of enp94s0f0 but attachment ovirt1.png shows that a DHCP configuration has been set up. With this configuration RHV should have supported the communication on the SAN network. 

So in order to understand what's wrong could you please: 
1. Remove all the manual changes you made to the host
2. RHV > webadmin > Hosts > your_host > Management > Refresh Capabilities
3. RHV > webadmin > Hosts > your_host > Network Interfaces > make sure all interfaces are synced and if not run setup networks with sync for each
4. Refresh capabilities again and make sure all interfaces are in sync

This will ensure the RHV side configuration has been applied to the host interfaces.
Next, could you please provide the following output:
1. RHV > webadmin > Hosts > your_host > General > Software > VDSM version
2. /var/log/vdsm/vdsm.log from the host during the interval when steps 2,3,4 above were performed
3. On the host shell:
   2.1 `ip route show table all`
   2.2 `ip rule show all`

Thanks,
Eitan

Comment 13 Yury.Panchenko 2022-06-09 12:56:29 UTC
Hello Eitan,
I've done all the steps
VDSM version is vdsm-4.50.0.13-1.el8ev

[root@PDCQA189 ~]# ip route show table all
default via 172.24.175.1 dev enp94s0f0 table 264766186 proto dhcp metric 100
172.24.175.0/24 dev enp94s0f0 table 264766186 proto kernel scope link src 172.24                                                                                                                                                                                                                                             .175.28 metric 100
default via 172.25.16.1 dev ovirtmgmt table 329647082 proto static metric 425
172.25.16.0/22 via 172.25.16.61 dev ovirtmgmt table 329647082 proto static metri                                                                                                                                                                                                                                             c 425
172.25.16.1 dev ovirtmgmt table 329647082 proto static scope link metric 425
default via 172.25.16.1 dev ovirtmgmt proto static metric 1
172.25.16.0/22 dev ovirtmgmt proto kernel scope link src 172.25.16.61 metric 425                                                                                                                                                                                                                                             
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0                                                                                                                                                                                                                                             .1
broadcast 172.24.175.0 dev enp94s0f0 table local proto kernel scope link src 172                                                                                                                                                                                                                                             .24.175.28
local 172.24.175.28 dev enp94s0f0 table local proto kernel scope host src 172.24                                                                                                                                                                                                                                             .175.28
broadcast 172.24.175.255 dev enp94s0f0 table local proto kernel scope link src 1                                                                                                                                                                                                                                             72.24.175.28
broadcast 172.25.16.0 dev ovirtmgmt table local proto kernel scope link src 172.                                                                                                                                                                                                                                             25.16.61
local 172.25.16.61 dev ovirtmgmt table local proto kernel scope host src 172.25.                                                                                                                                                                                                                                             16.61
broadcast 172.25.19.255 dev ovirtmgmt table local proto kernel scope link src 17                                                                                                                                                                                                                                             2.25.16.61
::1 dev lo proto kernel metric 256 pref medium
fe80::/64 dev vnet1 proto kernel metric 256 pref medium
fe80::/64 dev vnet3 proto kernel metric 256 pref medium
fe80::/64 dev vnet5 proto kernel metric 256 pref medium
fe80::/64 dev vnet11 proto kernel metric 256 pref medium
fe80::/64 dev vnet12 proto kernel metric 256 pref medium
fe80::/64 dev vnet13 proto kernel metric 256 pref medium
fe80::/64 dev vnet14 proto kernel metric 256 pref medium
fe80::/64 dev vnet15 proto kernel metric 256 pref medium
fe80::/64 dev vnet16 proto kernel metric 256 pref medium
fe80::/64 dev vnet18 proto kernel metric 256 pref medium
fe80::/64 dev vnet19 proto kernel metric 256 pref medium
local ::1 dev lo table local proto kernel metric 0 pref medium
local fe80::fc16:3eff:fe32:6d7f dev vnet13 table local proto kernel metric 0 pre                                                                                                                                                                                                                                             f medium
local fe80::fc6f:38ff:feed:2 dev vnet3 table local proto kernel metric 0 pref me                                                                                                                                                                                                                                             dium
local fe80::fc6f:38ff:feed:10 dev vnet19 table local proto kernel metric 0 pref                                                                                                                                                                                                                                              medium
local fe80::fc6f:38ff:feed:4b dev vnet1 table local proto kernel metric 0 pref m                                                                                                                                                                                                                                             edium
local fe80::fc6f:38ff:feed:d7 dev vnet18 table local proto kernel metric 0 pref                                                                                                                                                                                                                                              medium
local fe80::fc6f:38ff:feed:d9 dev vnet14 table local proto kernel metric 0 pref                                                                                                                                                                                                                                              medium
local fe80::fc6f:38ff:feed:e9 dev vnet5 table local proto kernel metric 0 pref m                                                                                                                                                                                                                                             edium
local fe80::fc6f:38ff:feed:123 dev vnet15 table local proto kernel metric 0 pref                                                                                                                                                                                                                                              medium
local fe80::fc6f:38ff:feed:126 dev vnet16 table local proto kernel metric 0 pref                                                                                                                                                                                                                                              medium
local fe80::fc6f:67ff:fe42:ae dev vnet11 table local proto kernel metric 0 pref                                                                                                                                                                                                                                              medium
local fe80::fc6f:67ff:fe42:b0 dev vnet12 table local proto kernel metric 0 pref                                                                                                                                                                                                                                              medium
multicast ff00::/8 dev eno1 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev br-int table local proto kernel metric 256 pref medium
multicast ff00::/8 dev enp94s0f1 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet1 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet3 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet5 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet11 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet12 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet13 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet14 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet15 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet16 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet18 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet19 table local proto kernel metric 256 pref medium



[root@PDCQA189 ~]# ip route
default via 172.25.16.1 dev ovirtmgmt proto static metric 1
172.25.16.0/22 dev ovirtmgmt proto kernel scope link src 172.25.16.61 metric 425

[root@PDCQA189 ~]# ip rule show all
0:      from all lookup local
3200:   from all to 172.25.16.0/22 lookup 329647082
3200:   from 172.25.16.0/22 lookup 329647082
32766:  from all lookup main
32767:  from all lookup default

Comment 14 Yury.Panchenko 2022-06-09 12:57:22 UTC
Created attachment 1888351 [details]
vdsm log

Comment 15 eraviv 2022-06-16 08:23:00 UTC
Hi Yuri,

Apologies for the delayed reply...
It seems that the rules table you posted is not complete. A rule for subnet 172.24.175.0/24 is missing. This rule should have been created by RHV when you attached the SAN network to interface enp94s0f0 in the webadmin. I cannot say why this happened because I don't have the logs from that moment.

So let's try to fix by detaching the SAN network and then attaching back via the webadmin, with the intention that when attaching it again the routing rules will be correctly created. This is assuming that there are no leftover manual configurations that might interfere.

1. RHV > webadmin > Hosts > your_host > Network Interfaces > setup networks: detach SAN
2. wait for a confirmation event on the events tab that the network has been detached
3. print out the `ip route show table all` `ip rule show all` just to make sure we get the expected result
4. RHV > webadmin > Hosts > your_host > Network Interfaces > setup networks: attach SAN to enp94s0f0
5. wait for a confirmation event on the events tab that the network has been attached
5. print out the `ip route show table all` `ip rule show all`
6. try the ping... :)

In case we have the same failure again, it would be very helpful if you can post the vdsm.log and supervdsm.log logging the above flow.

Thanks,
Eitan

Comment 16 Yury.Panchenko 2022-06-16 13:39:35 UTC
Hello Eitan,
> A rule for subnet 172.24.175.0/24 is missing. This rule should have been created by RHV when you attached the SAN network to interface enp94s0f0 in the webadmin
Yes that is my point and I created it manually.

> So let's try to fix by detaching the SAN network and then attaching back via the webadmin
I've tried this few times, it didn't fix the problem.

> try the ping... :)
It works in case of manual route configuration, but the RHV node still uses ovirtmgmt

I fixed the problem but in a radical way
I disabled gateways of the storage san nics, then ovirtmgmt nics can't access to it and the RHV nodes use SAN nics instead.

Now the problem is more clear for me:
1) You must have two networks (let's name it NetA for ovirtmgmt and NetB for SAN)
2) Both networks must have gateways to common external network.
3) Ovirtmgmt uses the NetA as a primary network
4) The node has default route via the GatewayA and two routes for the SubnetA and the SubnetB
5) To access the SAN network the node must use NetB, but because it's able to reach it from the NetA via GatewayA it uses the NetA
6) If we block the external network access for the SAN network on the storage it will work normally

Comment 18 eraviv 2022-06-27 08:21:23 UTC
Hi Yury,

Comparing the output of `ip rule show all` and `ip route show table all` it occurs to me that although there is a missing rule in the rules output, the corresponding routing does appear in the route tables. So I suspect that something has been corrupted which RHV cannot fix just by detaching and re-attaching the network that as you commented does not help. Since your flow is fully supported by RHV and reproducible as working on our envs, please try to recreate it on a separate vanilla host just using RHV, preferably in a way that does not use the existing networks\switches between the 'bad' host and RHV.

Thanks,
Eitan

Comment 19 Martin Perina 2022-07-25 11:37:34 UTC
Hi Yury, have you had time to take a look?

Comment 20 Yury.Panchenko 2022-07-25 13:40:55 UTC
Hello Martin,
I'm working on this case.
I can provide some results on the next week.
thanks.

Comment 21 Yury.Panchenko 2022-08-03 17:26:02 UTC
Hello Martin,
I've done new setup with a new host, but I have the same problem

# ip rule show all
0:      from all lookup local
100:    from all to 192.168.222.1/24 lookup main
100:    from all to 192.168.1.1/24 lookup main
101:    from 192.168.222.1/24 lookup main
101:    from 192.168.1.1/24 lookup main
32766:  from all lookup main
32767:  from all lookup default
[root@psrh451 ~]# ip route show table all
default via 172.24.144.1 dev Net2 table 59048282 proto dhcp src 172.24.153.64 me                                                                                                                                                                                                                                             tric 426
172.24.144.0/20 dev Net2 table 59048282 proto kernel scope link src 172.24.153.6                                                                                                                                                                                                                                             4 metric 426
default via 172.25.16.1 dev ovirtmgmt proto dhcp src 172.25.16.88 metric 425
172.25.16.0/22 dev ovirtmgmt proto kernel scope link src 172.25.16.88 metric 425                                                                                                                                                                                                                                             
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0                                                                                                                                                                                                                                             .1
broadcast 172.24.144.0 dev Net2 table local proto kernel scope link src 172.24.1                                                                                                                                                                                                                                             53.64
local 172.24.153.64 dev Net2 table local proto kernel scope host src 172.24.153.                                                                                                                                                                                                                                             64
broadcast 172.24.159.255 dev Net2 table local proto kernel scope link src 172.24                                                                                                                                                                                                                                             .153.64
broadcast 172.25.16.0 dev ovirtmgmt table local proto kernel scope link src 172.                                                                                                                                                                                                                                             25.16.88
local 172.25.16.88 dev ovirtmgmt table local proto kernel scope host src 172.25.                                                                                                                                                                                                                                             16.88
broadcast 172.25.19.255 dev ovirtmgmt table local proto kernel scope link src 17                                                                                                                                                                                                                                             2.25.16.88
::1 dev lo proto kernel metric 256 pref medium
fe80::/64 dev vnet5 proto kernel metric 256 pref medium
fe80::/64 dev vnet6 proto kernel metric 256 pref medium
local ::1 dev lo table local proto kernel metric 0 pref medium
anycast fe80:: dev vnet5 table local proto kernel metric 0 pref medium
anycast fe80:: dev vnet6 table local proto kernel metric 0 pref medium
local fe80::fc16:3eff:fe7e:570e dev vnet5 table local proto kernel metric 0 pref                                                                                                                                                                                                                                              medium
local fe80::fc6f:b7ff:fe52:0 dev vnet6 table local proto kernel metric 0 pref me                                                                                                                                                                                                                                             dium
multicast ff00::/8 dev br-int table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet5 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev vnet6 table local proto kernel metric 256 pref medium
multicast ff00::/8 dev ens224 table local proto kernel metric 256 pref medium

Comment 22 Martin Perina 2022-09-06 07:19:32 UTC
We are not able to reproduce the issue you raised, if we perform setup on a new hosts using steps from Comment 15, everything works for us. Could you please recheck that your really followed the steps from Comment 15?

Comment 23 Casper (RHV QE bot) 2022-09-06 07:30:59 UTC
This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).

Comment 24 Yury.Panchenko 2022-09-23 12:10:10 UTC
This bug always confirms in my labs, but I don't understand which additional information I can provide here. In my env I use workaround which helps me.
So, let's wait for real customer cases.
Thanks.

Comment 25 Martin Perina 2022-10-11 10:32:51 UTC
We are not able to reproduce the issue despite our best effort, using steps from Comment 15 always get us to working status, so there must be something else in customer's environment, which causes that issue. But as we are out of ideas what to try and we didn't get report from other users about this issue, we need to close this bug as WORKSFORME


Note You need to log in before you can comment on or make changes to this bug.