PCIE example issues - PCIE example issues I am using the PCIE Example as described in UG-20234. As i want to connect another application behind the hard-IP EP, i am modifying the testbench and APPS to allow transfers >4 bytes, which is how the example is written. The idea is: i verify in simulation, before mapping it onto the FPGA, debugging on a mapped design is way more time consuming then in simulation. It is configured for 64 bit/250 Mhz / Atlanta interface between HardIP and APPS. I managed to send 8 bytes to APPS, which sends those 8 bytes to memory, and returns them again. I then add 1 to each byte, to be sure i do not mix up with the sent information. The comparator at testbench detects the differences, and goes on playing (in the example it stops when send != receive. ) I adapted the header for the CmplD TLP from | 77 RX | MWr | 0004 | 60000001_0000000F_00000001_00010000 | | 77 RX | MRd (00) | 0004 | 20000001_0000000F_00000001_00010000 | | 77 TX | CplD (00) | 0004 | 4A000001_01080004_00000000 | | 77 RX | MWr | 0004 | 60000001_0000000F_00000001_00010000 | | 77 RX | MRd (00) | 0004 | 20000001_0000000F_00000001_00010000 | | 77 TX | CplD (00) | 0004 | 4A000001_01080004_00000000 | (for 8 byte) to | 77 RX | MWr | 0008 | 60000002_000000FF_00000001_00010000 | | 77 RX | MRd (00) | 0008 | 20000002_000000FF_00000001_00010000 | | 77 TX | CplD (00) | 0008 | 4A000002_01080008_00000000 | The Header for 8 byte has the following TLP fields, which are to my knowledge correct: For MWr: DW0 60000002: length = 2, Fmt = 2’b11”, Type = 5’b00000 => strange: Fmt should be 2’b10 for MWr!!!??? (this is made by INTEL IP) DW1 000000FF: 1stBE = F, Last_BE = F, Tag = 0, ReqID = 0 (Made by Intel IP) DW2 00000001: why is DW1[0] = R=1, this should be 0 !!!!???? Address = 0 (Made by Intel IP) In reality it is 0000_0000 !!! (bug) DW3 00010000 = ???? (should be data???!!!). But the data arrives correctly in the memory (Made by Intel IP) In reality it is data[31:0] DW4: (not depicted in this table) data[63:32] works correctly For Mrd: DW0 20000002: Length = 2, Type = 5’b0000, Fmt = 2’b01 => strange: Fmt should be 2’b00 for Mrd (Made by Intel IP) DW1 000000FF: : 1stBE = F, Last_BE = F, Tag = 0, ReqID = 0 (Made by Intel IP) DW2: 00000000: why is DW1[0] = R=1, this should be 0 !!!!???? Address = 0 (Made by Intel IP) For CplD DW0 4A000002: Length = 2, Type = 5’b01010 (correct), Fmt = 2’b10 (correct) (Made by me) DW1 000000FF: : 1stBE = F, Last_BE = F, Tag = 0, ReqID = 0 (Made by me) DW2: 00000001: why is DW1[0] = R=1, this should be 0 !!!!???? Address = 0 (Made by Intel IP), in reality it is 0000_0000 DW3 data[31:0] DW4 data[63:32] For some reason this does not work: The DUT PCIE hard IP does not produce nice continuous tx0..tx3 signals: they are interrupted with ‘x’ s. In the testbench, the signal is routed via rx0..rx3 to altpcietb_bfm_rpvar_64b_x8_pipen1b. The output of rpvar: rx_data0 0000000000000000 rx_desc0 044a000002010800080000000078561011 rx_be f4 rx_dv 0 rx_dfr 0 rx_ack 0 rx_abort 0 rx_retry 0 rx_mask 0 rx_ws 0 Where the last 8 bytes are correct = data[31:0] sent! My questions: Why does this not work? Options: the header for MRd is not correct, but this is generated by INTEL testbench? The header for ClpD is not correct (this is generated by my modifications)? The HardIP in DUT has a configuration that only allows 4 bytes? The rpvar mimic can only handle 4 bytes? The rpvar needs a modification of parameters? And how do I make it working? Thanks in advance! Pieter Replies: Re: PCIE example issues Hi KhaiY, I solved the testbench and APPS issues. The testbench now runs in simulation any length >=4, as long as smaller then max.payload. the code is not confidential, but it costed me a lot of time and i am willing to share it with INTEL, if INTEL will give me a compensation for the effort. Regards, Pieter Replies: Re: PCIE example issues Hi Pieter, Thank you for the valuable feedback. I now transition this thread to community support. If you have a new question, feel free to open a new thread to get the support from Intel experts. Otherwise, the community users will continue to help you on this thread. Thank you. Best regards, KhaiY Replies: Re: PCIE example issues Hi KhaiY, thanks for this clarification. I do have a strong request: please adapt the manual accordingly, this is really very misleading! Consider the case closed, thanks for your support. Regards, Pieter Replies: Re: PCIE example issues Hi Pieter, And yes, indeed, if you run the software test as in figure 12, then the application accepts traffic loadup to MAX.PAYLOAD. BUT NOT THE TESTBENCH / APPS in simulation! Yes. As in my previous reply, the testbench provides simple method to do basic testing and this does not cover all the traffic profile stimuli. If user wants to simulate with what is not covered in the testbench, user has to modify the testbench based on their requirements. So, i think i am on the edge of improving the testbench and APPS to payload = max.payload. My question: if i would share this code with you, would INTEL be willing to compensate me for this effort ? (i think INTEL does not deliver what it says it does as described in the manual) I apologize for the miscommunication if there is any. The reason I asked if you are willing to share the modification earlier is because there are some hobbyists share information, what they have created or method to solve problems in this public forum. Please do not share if the content is confidential or non-public accessible. Thanks for your understanding. Do let me know if you have any questions or concerns. Best regards, KhaiY Replies: Re: PCIE example issues Hi KhaiT, I still feel we somewhere miss what we are saying to each other: You say: the TLP can handle any traffic size, as long as it less than the max payload (which can be 128, 256, 512, 1048, 2096 Bytes). For special cornercases, please apply commercial verication IP is recommended. I say: the testbench and APPS can ONLY handle traffic that is LESS or equal to 4 Bytes. Document UG-01145 says: $16, pag 157: "It can only handle received read requests that are less than or equal to the currently set Maximum payload size option specified under PCI Express/PCI Capabilities heading under the Device tab using the parameter editor. Many systems are capable of handling larger read requests that are then returned in multiple completions." THIS IS NOT TRUE!!!! See figure 10 on page 21, where you clearly see the 4 bytes reported. IT IS NOT POSSIBLE TO MAKE THIS AS BIG AS MAX.PAYLOAD. I would not call anything larger then 4 bytes a corner case. And yes, indeed, if you run the software test as in figure 12, then the application accepts traffic loadup to MAX.PAYLOAD. BUT NOT THE TESTBENCH / APPS in simulation! So, i think i am on the edge of improving the testbench and APPS to payload = max.payload. My question: if i would share this code with you, would INTEL be willing to compensate me for this effort ? (i think INTEL does not deliver what it says it does as described in the manual) regards, Pieter Replies: Re: PCIE example issues Hi Pieter, I apologize for the miscommunication if there is any. As I explained earlier, the IP can accept TLP up to max payload and the testbench provide a simple method to do basic testing of the Application Layer logic that interfaces to the variation. Corner cases and certain traffic profile stimuli are not covered. This is stated in the UG https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_a10_pcie_avst.pdf (Page 156). Please do not share any information that is confidential or non-public accessible. Otherwise, you are always welcome to share in this Public forum. Thanks Best regards, KhaiY Replies: Re: PCIE example issues Hi KhaiY, Please do not close this thread. I did not see a clear end / conclusion in your last reply, other then: please buy commercial available verification tools: it is not my intention to challenge INTEL's PICE block. i assume it works correctly. INTEL will not update the PCIE example to handle more then 4 Byte if you have a solution, please share it with INTEL there is no example PCIE / LL MAC / Ethernet PHY, if you have one, please share it with INTEL. Last friday i finally managed to send 8 bytes (that worked already for some months), receive 8 bytes (that worked for like 4 weeks), and last friday to SEND 8 bytes, and RECEIVE them back again over the full chain in INTEL's APPS application: TLP parser, write controller, memorycontroller, read controller, TLP parser. So that is a good start to extend from 8 bytes to many bytes. My question: as this has been hard work, and as INTEL is not willing to spend their resources on upgrading the testbench up to the standard as described in the manual (sending and receiving up to the maxpayload), how much would INTEL be willing to pay me for this effort? regards, Pieter Replies: Re: PCIE example issues Hi, We do not receive any response from you to the previous question/reply/answer that I have provided. This thread will be transitioned to community support. If you have a new question, feel free to open a new thread to get the support from Intel experts. Otherwise, the community users will continue to help you on this thread. Thank you Best regards, KhaiY Replies: Re: PCIE example issues Hi, May I know if you have any updates? Thanks Best regards, KhaiY Replies: Re: PCIE example issues Hi Pieter, Yes. It does accept TLP up to max payload. The current testbench and Root Port BFM provide a simple method to do basic testing of the Application Layer logic that interfaces to the variation. This BFM allows you to create and run simple task stimuli with configurable parameters to exercise basic functionality of the Intel example design. The testbench and Root Port BFM are not intended to be a substitute for a full verification environment. Corner cases and certain traffic profile stimuli are not covered. To ensure the best verification coverage possible, Intel suggests strongly that you obtain commercially available PCI Express verification IP and tools, or do your own extensive hardware testing or both. Could you share the modification that you have made? I believe it would be beneficial for other user or customer. Thanks Best regards, KhaiY Replies: Re: PCIE example issues Dear KhaiY, As i told you, we already worked with hte example in UG-20162. I understand you don't have an example design with the two connected. What i don't understand, is that you write in the manual for the PCIE example: The UG-01145 document says (page 157): "It can only handle received read requests that are less than or equal to the currently set Maximum payload size option specified under PCI Express/PCI Capabilities heading under the Device tab using the parameter editor. Many systems are capable of handling larger read requests that are then returned in multiple completions. meanwhile, i managed to send 8 bytes, but is a hard struggle. In fact, i am disappointed that this testbench / example cannot handle anything else but those 4 bytes. I reluctantly seem to have to accept that INTEL is not intented to repair this shortcoming, although the manual says it does accept TLP until max payload. Regards Pieter Replies: Re: PCIE example issues Hi Pieter, I checked with the team, we have 10G ethernet example design standalone that can generate from the 10G MAC IP GUI, but we don’t have PCIe + 10G ethernet design, where the customer need to integrate by themselves. You may find the steps to generate the example design here: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-20162.pdf Thanks Best regards, KhaiY Replies: Re: PCIE example issues Hi KhaiY, thanks for response, but i don't understand it. I thought i explained my goals (see earlier posts): What i really want, is connect my ethernet PHY, via the INTEL Low Latency MAC, to the PCIE Hard IP (gen2 x 4) and map it onto the cyclone10GX development board. (I have in the INTEL design example for Low Latency MAC, replaced the INTEL PHY iwth my own PHY, and it runs in simulation, and i still have some difficulties in getting the tranceiver properly mapped onto the cyclone10GX). If you do have an example with this, (but then of course with the INTEL PHY), that would be most welcome! For PCIE testbench I am using the example of UG-01145_avst. The difficulty is not in the testbench, but in the APPS component, in fact, i believe it is in the TLP parser that is not built to send anything back but ONE DWORD of 4 byte. I tried to modify the TLP parser to make it suitable for 2 DWORD of data, but somehow the Hard_IP PCIE block makes a mess out of it. The UG-01145 document says (page 157): "It can only handle received read requests that are less than or equal to the currently set Maximum payload size option specified under PCI Express/PCI Capabilities heading under the Device tab using the parameter editor. Many systems are capable of handling larger read requests that are then returned in multiple completions. That would be good enough for me, but i dont get it working! It only seems to work for 4 Bytes! And if you look at the code, (driver_downstream.v), line 270 - 289, the length is cut off at 4 Bytes, and there is a remark: line 271: //TODO extend to more than 1 DW. So, it seems as there is some work to be done ??????!!!!! So, you can help me in several ways: 1/ provide me with a TLP parser that is able to reply correctly with packets of many bytes. I think i can manage to integrate that into the APPS entity and the MEM entity, and modify the testbench such, that it maybe does not compare, but at least i can verify manually that received = send OR: 2/ provide me with an example where your LL_MAC_10GBASE_R example is already integrated into a PCIE gen2x4 example OR 3/ explain me how, without using the testbench, but by directly building the above (PCIE + LLMAC_10GBASER) and synthesise/place/route it onto the Cyclone10GX development board, and run a software test similar to the one described in AN 855: PCI Express* High Performance Reference Design for Intel® Cyclone® 10 GX. Thanks for your feedback! regards, Pieter Replies: Re: PCIE example issues Hi Pieter, I am sorry for the delay in response. Thank you for your patience. I discussed with the team, the testbench does not have the flexibility and does not cover certain traffic profile stimuli. It would be a time consuming process to edit and debug. Is there any ultimate goal that you would like to achieve from this? Thanks Best regards, KhaiY Replies: Re: PCIE example issues Hi KhaiY, thanks for updating me. It is really not an easy part of code. What i could not figure out, or not understand is: if you load the board with the example code, and connect it to the other PC, as described in document an-855 paragraph 1.7 Hardware Installation, does this also use the very same configuration? Because it says: ($1.8) Set the Transfer length to 100,000 bytes and the Sequence to Write only,....transfer data from the FPGA to the system memory in chunks of 100,000 bytes. So, you would expect the very same application (APPS and the others) really do accept and send back packets of length 512 / 1024 / 2048 packet to larger chunks... If so, then why not in the testbench? (we have this software test working) Looking forward to your progress / feedback Regards, Pieter Replies: Re: PCIE example issues Hi Pieter, I would like to inform you that I am still working on this. Please allow me some time on this. Thanks Best regards, KhaiY Replies: Re: PCIE example issues Hi KhaiY, Thanks for your response. Something in our communication seems not working. the design example you suggest, is the one i am working with: Cyclone 10 GX PCIe Gen2 x4 Avl-ST The design example i started with, and for which i ask your advice, is the one described in UG-01145_avst | 2020.06.02 (newer versions may exist), and which you can download from the IP Catalog in Quartus PCI Express / Intel Arria 10 / Cyclone 10 Hard IP for PCI Express => Platform Designer => Parameter setting: System Settings: Standalone, Avalon-ST, Gen2x4, 64 bit @ 250 MHz, Native Endpoint, Balanced, RX Buffer: Header: 112 Data: 440. All other settings: default. Example Design: PIO, Development Kit: Arria 10 GX Development Kit (I have the Cyclone 10GX development kit, and i understand this should not be a problem) I asked you: is it possible to change the testbench and application, such that transfers with Payloads of many bytes is possible, the design example only allows for 4 Bytes. I changed the testbench, the TLP Parser such, that in my opinion it should transfer 8 Bytes, but so far no success. My assumption is: if I manage 8 Bytes, i can also change to many bytes, as long as within the boundaries of the maximum payload. What i really want: connect INTEL's Low Latency MAC 10-GBASER design example to the hard IP PCIE of the PCIE example design as described above. Instead of finding out the wheel myself, an example design doing exactly this will be most appreciated. The list of design examples you send me, do not contain (unless i missed something, you can never exclude that) such an example. So, I really hope somebody did this is exercise (connect LL MAC to PCIE) and is willing to share this code with me. Otherwise, I have to do it myself, and a step to achieve this, is to make the PCIE design example suitable for transfers >4 Bytes, for which i need your help. I hope this clarifies my requests... Regards, Pieter Replies: Re: PCIE example issues Hi Pieter, I find this reference design relevant to your request. Cyclone 10 GX PCIe Gen2 x4 Avl-ST https://fpgacloud.intel.com/devstore/platform/18.0.0/Pro/cyclone-10-gx-pcie-gen2-x4-avl-st/ If it's not , you may find some other design examples in Intel FPGA Design Store. https://fpgacloud.intel.com/devstore/?page=4&search=pci Do let me know if the above design is helpful. Thanks Best regards, KhaiY Replies: Re: PCIE example issues Now i see: in your folder customer/ip/pcie_id/ there is only a small subset of files. Even more strange, is that in my design those files do not exist at all!!!??? There should be the set of files as shown in the pci_202_mod/pcie_a10_hip_blabla/ip/pcie_ed/directory. My assumption was you would load the design example from INTEL website, and then replace the changed files with my files. (quartus 20.2) Apparantly the design example is not properly loaded into your machine, hence it will definitly not run. The problem is: the fully compressed design is 0.7 GB (with 762 files), which does not fit in the drag & drop field below, so i cannot send this design to you, unless via a direct link to your email. So, my suggestion is: So, please install the full design from the INTEL website, and make sure it runs. I am afraid you loaded a different design, given those clock_in and reset_in files that do not exist in my design. Then replace the files from INTEL example, with the files in the attachements, that i sent you earlier. But maybe another appraoch example is also possible: What i really want, is connect my ethernet PHY, via the INTEL Low Latency MAC, to the PCIE Hard IP (gen2 x 4) and map it onto the cyclone10GX development board. (I have in the INTEL design example for Low Latency MAC, replaced the INTEL PHY iwth my own PHY, and it runs in simulation, and i still have difficulties in getting the tranceiver properly mapped onto hte cyclone10GX) If you do have an example with this, (but then of course with the INTEL PHY), that would be most welcome! Regards, Pieter Replies: Re: PCIE example issues Hi Pieter, Please find attached. Thanks Best regards, KhaiY Replies: Re: PCIE example issues Attach file Replies: Re: PCIE example issues Dear KhaiY. Yes, of course i tried without modification It runs fine. Take care: i modified the data being sent, and being send back, by 1/ making them different per loop, and 2/ adding 1 to each byte. Of course the comparator does not like that, so i changed it that it reports a difference, but does not stop as a result of that (other than the design example, where it would stop). For more input/comments, check the attached word file, and check the transcript line 3640 -- end for the output of the comparator when Byte = 4. I also attached a transscript_08.txt to show for Bytes = 8 regards, Pieter Replies: Re: PCIE example issues Hi Pieter, I plan to generate an example design using the same PCIe settings as pcie_ed.qsys that you have provided but I see warning messages below when I open the pcie_ed.qsys file. Component Instantiation Warning: pcie_ed.APPS: File not found: ip/pcie_ed/pcie_ed_APPS.ip Component Instantiation Warning: pcie_ed.DK: File not found: ip/pcie_ed/pcie_ed_DK.ip Component Instantiation Warning: pcie_ed.DUT: File not found: ip/pcie_ed/pcie_ed_DUT.ip Component Instantiation Warning: pcie_ed.MEM: File not found: ip/pcie_ed/pcie_ed_MEM.ip MWr: DW0 60000002: length = 2, Fmt = 2’b11”, Type = 5’b00000 => strange: Fmt should be 2’b10 for MWr!!!??? (this is made by INTEL IP) >> Both 2'b010 and 2'b011 are valid for MWr. 2'b010 is for 3DW with data and 2'b011 is for 4DW with data. You are using 4DW so 2'b011 is correct. DW3 00010000 = ???? (should be data???!!!). But the data arrives correctly in the memory (Made by Intel IP) In reality it is data[31:0] >> TLP header DW3 and DW4 are for address but not data. MRd DW0 20000002: Length = 2, Type = 5’b0000, Fmt = 2’b01 => strange: Fmt should be 2’b00 for Mrd (Made by Intel IP) >>Both 2'b000 and 2'b001 are valid for MRd. 2'b000 is for 3DW with no data and 2'b001 is for 4DW with no data. Have you tried using the example design without modification? Do you see any unexpected behavior in the example design without modification? Thanks Best regards, KhaiY Replies: Re: PCIE example issues Dear KhaiChein_Y Can you give me an update where you stand, or how i can help you, if at all, with getting the simulation running? regards, Pieter (....i am trying not to sound impatient.....) Replies: Re: PCIE example issues Hi KhaiY, Glad to hear you are still on it.... Of course, i understand, it is a F**ng difficult testbench / apps / bfm. But i was triggered by INTEL if the answer provided was sufficient to close the call. (i thought with all those smart AI of today, the bot would have seen there is just a question (yours) and an answer (mine).....) Regards, Pieter Replies: Re: PCIE example issues Hi pieter, I am still working on this. Please allow me some time. Best regards, KhaiY Replies: Re: PCIE example issues Dear KhaiChein_Y I did not receive a reaction from you yet. Please respond. Regards, pieter Replies: Re: PCIE example issues Dear KhaiChein_Y, thanks for reaching out. I assume the name of the zipfiles speak for its content. Main changes: splitted altpcietb_bv_rp_gen2_x8 into two files: altpcie_bfm_rpvar and altpcietb_bv_rp_gen2_x8_ex_rpvar, to make the editing of the last more easy added many $fdisplay statements to more easily track bitstreams (i always get lost in the wave screen) changed a lot in tlp_parser to generate headers for 8 byte CplD. The result of that can e.g. be seen in outparsout.txt added 8 byte wide write to and read from memory (interconnect_low and ~high, and mem_low and ~high) in pcie_ed changed a bit in downstream_driver to generate 8 bytes. in pcie_ed, line 546 and 566, 1 is added to each read byte In the downstream_drive file, line 465, Length = 4, as written by INTEL. If you run the testbench, it fails of course because of the adding of 1 to each byte. But downstream_drive is modifies such, that it does not stop, it just reports a difference. Change this line 465 to 8, and the tb sends 8 byte, the apps writes it to MEM in pcie_ed, read it back, adapts the TLP headerfiles, of which you can see the result in pcie_ed_tx.txt, and eventually sends seemingly correct data to the HardIP. At receipt in TB, in rpvar, rx_data remains zero, but rx_desc0 has some recognizable values. Also, there are 'x' in the serial wires tx0..3 from the DUT, to the rx0...3 of rpvar. This suggests that the HardIP of PCIE is not controlled correct by TLP headers? configuration setting? I hope this clarifies, please do not hesitate to ask clarification if needed. Regards ,Pieter Replies: Re: PCIE example issues Hi, Could you share both the settings/.qsys file you used to create the example design and the files after modification? Thanks. Best regards, KhaiY - 2021-02-23

external_document