Jump to content

Aros/Developer/Docs/HIDD/Nouveau

From Wikibooks, open books for an open world
Navbar for the Aros wikibook
Aros User
Aros User Docs
Aros User FAQs
Aros User Applications
Aros User DOS Shell
Aros/User/AmigaLegacy
Aros Dev Docs
Aros Developer Docs
Porting Software from AmigaOS/SDL
For Zune Beginners
Zune .MUI Classes
For SDL Beginners
Aros Developer BuildSystem
Specific platforms
Aros x86 Complete System HCL
Aros x86 Audio/Video Support
Aros x86 Network Support
Aros Intel AMD x86 Installing
Aros Storage Support IDE SATA etc
Aros Poseidon USB Support
x86-64 Support
Motorola 68k Amiga Support
Linux and FreeBSD Support
Windows Mingw and MacOSX Support
Android Support
Arm Raspberry Pi Support
PPC Power Architecture
misc
Aros Public License

Introduction

[edit | edit source]

The NouveauBitMap class contains and object of nouveau_bo. This object represents a buffer in memory accessible from GPU (either VRAM or GART). This nouveau_bo object is used in functions like copy and fill.

Nouveau requires minimum pixel block size of 64x64 to use accelerated transfer

UAE P96 up to now, the picasso96 emulation always used RGBFF_CHUNKY which maps to PIXFMT_BGR15 in cybegraphics and the copied the contents via WirtePixelArray to the display windows. When WritePixelArray is used, a conversion between modes is made. I made this conversion based on other drivers, so there should not be a difference here. No it tries to get the pixel format via GetCyberIDAttr(modeID, CYBRIDATTR_PIXFMT) and so no format conversions should be necessary. The conversion from PIXFMT_BGR15 to 24bit should not be that slow. Also WritePixelArray is now accelerated so it should be very fast.

OpenGL

[edit | edit source]

When 3D driver wants to blit the render buffer onto a BitMap it needs to get a hold of nouveau_bo that is behind the BitMap and do a GPU copy from its render buffer onto this nouveau_bo object. The render buffer is always in the same pixel format as the BitMap, because it is created based on a BitMap passed when creating GL context.

I think the Amiga 3D API is Warp3D, but I don't know anything more. OpenGL in this context is an alien API.

Yes, mesa.library is AROS-specific. I have never seen or used libraries you mentioned, but I think it is possible that AROSMesa API is somewhat similar to StormMESA API. I based AROSMesa API on Kalamatee's work which seemed to be based on StormMESA.

From client perspective using mesa.library is quite simple:

  • Create a window
  • Call AROSMesaCreateContext passing it the window and some other stuff -> you get rendering context in return
  • Call AROSMesaMakeCurrent passing it the context so that AROSMesa knows what to render on
  • Render some stuff using glXXX functions
  • Call AROSMesaSwapBuffers to have the content of render buffer painted onto your window
  • Loop to glXXX functions

If you are looking for an example, check /tests/mesa/mesasimplerendering.c and examined AROSMesaCreateContext() code.

This function is in fact responsible for setting up a "pipe screen" - some entity that is in fact used for rendering. A pipe screen is managed using gallium.library, which in turn works with OOP objects.

Wondering what is done in SDL_Init() and SDL_Quit(); isn't a peropenerbase needed anyway ? Hmm. SDL_VideoInit manipulates the global variable "current_video". But MorphOS's PowerSDL seems to do the same thing.

Does SDL have a context you pass to each function or does it rely on context being "global" (like GL does). If it is the later case, you need to check if context can be per opener (shared by more than one task) or the context needs to be really per task.

It's the latter. I wonder if those problems are the reason my MorphOS's PowerSDL spawns a ThreadServer task.

Alternatively you can use the approach I had for Mesa. Mesa always accesses the global context via a macro and in case of AROS this macro actually uses "library-side task local storage" of my own very-simple implementation. (see AROS/workbench/libs/mesa/src/aros/tls.[ch]). This way the there is no need to do any GL API changes.

References

[edit | edit source]

Backends

[edit | edit source]

Currently gallium.library knows only two backends:

hidd.gallium.nouveau    and    hidd.gallium.softpipe    (a    software implementation).

This list of backends is hardcoded.

Here i see two design flaws:

  • Hardcoding a list of backends. Adding third backend (for ATI, for example) will require gallium.library modification.
  • A backend object (representing a pipe screen) is created with no arguments. It is not associated with any display driver object.

Card Info

[edit | edit source]

Nouveau driver is designed in such a way that it stores card information in class static data (automatically disallowing using more than one card). This is a serious flaw.

In order to overcome these problems i would suggest to do a thing similar to bitmaps management. I would add a method to graphics driver class, something like moHidd_Gfx_NewPipeScreen. It would create an object of hidd.gallium subclass. The actual subclass would depend on the driver (on which the method is called).

If this method returns NULL, this means that this driver doesn't support hardware accelerated 3D. In this case gallium.library would use hidd.gallium.softpipe, as currently.

gallium.library/CreatePipeScreen() 

this way needs an argument, something like Screen *, or better may be ViewPort *. It will pick up display driver object from ViewPort's bitmap:

OOP_Object *drv;
OOP_Object *bm = vp->RasInfo->BitMap->Planes[0];

OOP_GetAttr(bm, aHidd_BitMap_GfxHidd, (IPTR *)&drv);

I don't think moHidd_Gfx_NewPipeScreen is the correct method though, it should be more like moHidd_Gfx_GalliumHiddObject. I'm also reluctant to do this change now as I don't believe I know all of the use cases of gallium.library. Remember that mesa.library is just one of possible clients and you don't really need Screen, nor ViewPort nor BitMap to use GPU. You only need these things if you actually want to blit your rendering onto AROS screen. That's why 3D driver don't have any association with 2D (like the case is now).

Here comes an answer about implementing MM_Query3DSupport in monitorclass. We need to add moHidd_Gfx_QueryHardware3D with pixelformat object as argument. The driver will be able to examine the given pixelformat and tell if it supports hardware 3D.

In case of nouveau, the implementation of moHidd_Gfx_QueryHardware3D should be:

if (card >= NV30 and bitsperpixelofmode >=16) return TRUE; else return FALSE;

I assume than you will make a following implementation of MM_Query3DSupport:

if (moHidd_Gfx_Query3DSupport)
return MSQUERY3D_HWDRIVER;
else if (bitsperpixelofmode >= 16)
return  MSQUERY3D_SWDRIVER;
else
return MSQUERY3D_NODRIVER;

An answer to implementing MM_Query3DSupport in monitorclass. We need to add moHidd_Gfx_QueryHardware3D with pixelformat object as argument. The driver will be able to examine the given pixelformat and tell if it supports hardware 3D. I see softpipe.hidd works with any pixelformat, so we can safely assume that software 3D works with anything.

I see softpipe.hidd works with any pixelformat, so we can safely assume that software 3D works with anything. With anything that is >= 16 bit. Mesa does not support 256 color rendering.

What will you say to this? If it's okay, I'll add these methods to the specification. You can add the moHidd_Gfx_QueryHardware3D but I need time to think over the first method you mentioned. Also the first method is not needed right now for your work on monitorclass.

Multiple cards

[edit | edit source]

As any HW GFX driver at this point, Nouveau can only work with one card. It's "old style" driver :) Actually CreatePipeScreen struct TagItem * tags as parameters. This is done on purpose so that in "future" what you describe is possible by creating additional tags and adding their handling. I wanted to do something very similar to what you described - the CreatePipeScreen would require passing a BitMap to create a proper pipe screen - I just did not get around to implement this. A note also: this would not magically make nouveau support more than one card, rather a change for the future support.

The semaphore protection is done at the caller level - for example CopyBox or FillRect. I also added explicit notice to the codes:

AROS/workbench/hidds/hidd.nouveau/nouveau.conf AROS/workbench/hidds/hidd.nouveau/nouveau_intern.h AROS/workbench/hidds/hidd.nouveau/nouveaubitmapclass.c AROS/workbench/hidds/hidd.nouveau/nouveauclass.c AROS/workbench/hidds/hidd.nouveau/nouveaugalliumclass.c AROS/workbench/hidds/hidd.nouveau/nv04_accel.c AROS/workbench/hidds/hidd.nouveau/nv50_accel.c

/* NOTE: Assumes lock on bitmap is already made */
/* NOTE: Assumes buffer is not mapped */
BOOL HIDDNouveauNV04FillSolidRect(struct CardData * carddata,
    struct HIDDNouveauBitMapData * bmdata, ULONG minX, ULONG minY, ULONG maxX, ULONG maxY, ULONG drawmode, ULONG color)

A bitmap lock is not enough if things like carddata->chan are global (shared among bitmaps). One gfx function to bitmap A may happen at the same time as a gfx function to bitmap B.

The drm layer itself take care of synchronizing access to the chan object. Also command buffers are allocated and executed per caller. The mapping/unmapping is about getting a VRAM address for a buffer. Buffer is allocated somewhere in VRAM but it can be moved by drm layer to other locations when needed. In order to get a "access pointer" to the buffer, it needs to be mapped (nouveau_bo_map). When a buffer is mapped, it cannot be moved but also accelerated functions cannot work on mapped buffer - thus for "pointer" access I need to map the buffer while for accelerate access I need to unmap it.

In other places like Linux/X11 things are different, because X11 rendering functions are executed only by single task (X server) but basically moving Wanderer windows while for example having movies played in mplayer or web pages reloading in OWB should be causing command buffers corruption and hang of GPU or corrupted graphics at best.

Note that most windows are simple refresh windows. Those don't have any offscreen bitmaps. Rendering into visible areas of this windows go directly into screen bitmap. Rendering into hidden areas of this windows are ignored.

So most of the rendering goes into a single bitmap: the screen bitmap.

OWB probably mostly uses pixel buffers in RAM:, but no bitmaps.

You would need to have some test program which does some RectFill()ing (or another function which the driver accelerates) in a smart refresh window. Then run it multiple times and make some of the windows fully or partly hidden (rendering into visible parts of smart refresh windows goes directly into screenbitmap, too. Only rendering into hidden parts goes into offscreen bitmaps). Shell (CON:) windows are smart refresh windows at the moment (in theory with charmap support it would be better if they were simple refresh). Some things in layers are pretty unoptimized (causing more back and forth blitting then really necessary), tough. Maybe you mean windows/layers with superbitmaps? Those are certainly broken, but also pretty useless.

Does it really matter if that BM_OBJ|COLMAP|COLMOD swapping is done always or not. Back in the era before that (unneeded) "NoFrameBuffer" stuff at least it did not, because if HIDD_Gfx_Show returned the same bm it got as input, the swapping would not really swap anything, as it put the same value in that was already there. It seems to matter somehow. Nouveau is a NoFrameBuffer driver and if I was returning the received bitmap, the third resolution change was somehow causing problems (crash or not refreshed display). I were not able to trace the cause, just the not doing the swap makes things work correctly.

Does anybody know why procedures from this tests are not integrate with our HIDD system? They give quite nice boost especially now with GART based transfers for nouveau? The reasoning was that if graphics.hidd is in "ROM" (whether this is the case of not seems to change every couple of years) it could increase size quite a bit, if one added really optimized versions of all conversions. Even if the ones in patchrgbconv give speedup, they are not really optimized. Optimized would mean things like loop unrolling, using SSE or similar where possible, using asm, ... Furthermore for some conversions where dest color format has more bits per RGB than source color format, one may want to have an option which chooses between either fast conversion and more accurate conversion (example "fast": R5G5B5 -> R8G8B8 == RRRRR000GGGGG000BBBBB000. == pixels get darker, because of the 0 padding) == even more conversion functions and taken up space. By allowing the conversion functions to be patched at runtime, it's also possible to write, compile and check new routines faster (patchrgbconv has option to benchmark and verify if routines work as expected) And it can also be done by outside coders.

Examples

[edit | edit source]