support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 5 months ago by PlutonianWayfarer790

How can I apply a grayscale filter to the AR camera feed while keeping 3D nodes in full color in Swift?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I’m developing an ARKit project in Swift where I display 3D nodes using either ARView or ARSCNView. My goal is to render the camera feed in black and white while keeping the 3D node objects in full color.

I attempted using Metal shaders by applying a custom fragment shader to the entire view, but this resulted in a white screen since the filter affected both the background and the nodes. I’m looking for a solution—possibly by applying the shader only to the camera feed or using multi-pass techniques—so that only the background is filtered while the nodes remain colored.

Below is a basic setup of my ARSCNView with node placement:

SWIFT
import UIKit
import ARKit

class ViewController: UIViewController, ARSCNViewDelegate {
    @IBOutlet var sceneView: ARSCNView!

    override func viewDidLoad() {
        super.viewDidLoad()
        let configuration = ARWorldTrackingConfiguration()
        sceneView.session.run(configuration)
        sceneView.delegate = self

        let tapGesture = UITapGestureRecognizer(target: self, action: #selector(handleTap(_:)))
        sceneView.addGestureRecognizer(tapGesture)
    }

    @objc func handleTap(_ gesture: UITapGestureRecognizer) {
        let location = gesture.location(in: sceneView)
        let hitResults = sceneView.hitTest(location, types: .featurePoint)
        if let hitResult = hitResults.first {
            let sphere = SCNSphere(radius: 0.05)
            sphere.firstMaterial?.diffuse.contents = UIColor.blue
            let node = SCNNode(geometry: sphere)
            node.position = SCNVector3(hitResult.worldTransform.columns.3.x,
                                       hitResult.worldTransform.columns.3.y,
                                       hitResult.worldTransform.columns.3.z)
            sceneView.scene.rootNode.addChildNode(node)
        }
    }
}

What I’ve Tried

I experimented with custom fragment shaders using Metal. Here’s the SceneFilterTechnique.metal file used to create a grayscale effect:

METAL
#include <metal_stdlib>
using namespace metal;

typedef struct {
    float4 renderedCoordinate [[position]];
    float2 textureCoordinate;
} TextureMappingVertex;

vertex TextureMappingVertex mapTexture(unsigned int vertex_id [[ vertex_id ]]) {
    
    float4x4 renderedCoordinates = float4x4(float4( -1.0, -1.0, 0.0, 1.0 ),
                                            float4(  1.0, -1.0, 0.0, 1.0 ),
                                            float4( -1.0,  1.0, 0.0, 1.0 ),
                                            float4(  1.0,  1.0, 0.0, 1.0 ));
    
    float4x2 textureCoordinates = float4x2(float2( 0.0, 1.0 ),
                                               float2( 1.0, 1.0 ),
                                               float2( 0.0, 0.0 ),
                                               float2( 1.0, 0.0 ));
                                           
                                           
    TextureMappingVertex outVertex;
    outVertex.renderedCoordinate = renderedCoordinates[vertex_id];
    outVertex.textureCoordinate = textureCoordinates[vertex_id];
    
    return outVertex;
}


fragment half4 displayTexture(TextureMappingVertex mappingVertex [[ stage_in ]],
                              texture2d<float, access::sample> texture [[ texture(0) ]]) {
    constexpr sampler s(address::clamp_to_edge, filter::linear);
    
    float4 color = texture.sample(s, mappingVertex.textureCoordinate);
    float grayscale = (color.r + color.g + color.b) / 3.0;
    return half4(grayscale, grayscale, grayscale, color.a);
}

I then set up the technique via a property list and the following function:

SWIFT
private func makeTechnique(fromPlistNamed plistName: String) -> SCNTechnique {
        guard let url = Bundle.main.url(forResource: plistName, withExtension: "plist") else {
            fatalError("
            \(plistName).plist does not exist in the main bundle")
        }
        
        guard let dictionary = NSDictionary(contentsOf: url) as? [String: Any] else {
            fatalError("Failed to parse \(plistName).plist as a dictionary")
        }
        
        guard let technique = SCNTechnique(dictionary: dictionary) else {
            fatalError("Failed to initialize a technique using \(plistName).plist")
        }
        
        return technique
    }

And I apply the technique with:

SWIFT
let filterTechnique = makeTechnique(fromPlistNamed: "SceneFilterTechnique")
sceneView.technique = filterTechnique

Currently, applying this approach results in a white screen, and I need help to make the background grayscale while leaving the 3D nodes unaffected. Any demo code or guidance on modifying the Metal shader/technique configuration would be greatly appreciated.

Comments 0

•

Answers 2

•

2 Answers

Answer by StackOverflow TychoAI Agent • 5 months ago

Below is some guidance on how to approach this problem. The key point is that ARSCNView (and ARView) composite two “layers”: the camera feed (as the scene background) and your 3D node rendering. If you apply a single SCNTechnique with your Metal shader, the filter is applied to everything—resulting in an all‐grayscale (or all–white, if something isn’t wired correctly) output.

To filter only the camera feed, you need to separate the two layers so you can process the background (camera feed) independently from the overlaid 3D nodes. Two common approaches are:

1. Use a CIFilter on the Camera Feed Background

ARSCNView’s scene has a background property that (by default) displays the camera feed. One simple solution is to capture the current ARFrame, convert its pixel buffer to a CIImage, apply a grayscale CIFilter, and then assign that to the scene background. For example:

SWIFT
import UIKit
import ARKit
import CoreImage

class ViewController: UIViewController, ARSCNViewDelegate {
    @IBOutlet var sceneView: ARSCNView!
    
    let context = CIContext()
    
    override func viewDidLoad() {
        super.viewDidLoad()
        let configuration = ARWorldTrackingConfiguration()
        sceneView.session.run(configuration)
        sceneView.delegate = self
        
        let tapGesture = UITapGestureRecognizer(target: self, action: #selector(handleTap(_:)))
        sceneView.addGestureRecognizer(tapGesture)
    }
    
    @objc func handleTap(_ gesture: UITapGestureRecognizer) {
        let location = gesture.location(in: sceneView)
        let hitResults = sceneView.hitTest(location, types: .featurePoint)
        if let hitResult = hitResults.first {
            let sphere = SCNSphere(radius: 0.05)
            sphere.firstMaterial?.diffuse.contents = UIColor.blue
            let node = SCNNode(geometry: sphere)
            let transform = hitResult.worldTransform
            node.position = SCNVector3(transform.columns.3.x,
                                       transform.columns.3.y,
                                       transform.columns.3.z)
            sceneView.scene.rootNode.addChildNode(node)
        }
    }
    
    func renderer(_ renderer: SCNSceneRenderer, updateAtTime time: TimeInterval) {
        guard let currentFrame = sceneView.session.currentFrame else { return }
        // Create a CIImage from the ARFrame camera image.
        let pixelBuffer = currentFrame.capturedImage
        let cameraImage = CIImage(cvPixelBuffer: pixelBuffer)
        
        // Apply a grayscale filter.
        if let filter = CIFilter(name: "CIColorControls") {
            filter.setValue(cameraImage, forKey: kCIInputImageKey)
            filter.setValue(0.0, forKey: kCIInputSaturationKey)
            if let outputImage = filter.outputImage {
                // Assign the filtered image as the scene background.
                DispatchQueue.main.async {
                    self.sceneView.scene.background.contents = outputImage
                }
            }
        }
    }
}

In this solution the nodes (which are added to the scene’s root node) remain unaffected while the background shows the filtered camera feed. Keep in mind that when using CIFilter you’re not using Metal shaders—however, it is often easier to separate the camera image from the node buffer this way.

2. Use a Multi-Pass SCNTechnique with Metal Shaders

If you prefer to use Metal shaders, you must build a SCNTechnique that separates your rendering into at least two passes:

Pass 1: Render the camera feed (the background) and apply your grayscale Metal fragment shader.
Pass 2: Render the 3D nodes normally with full-color and composite over the filtered background.

Your technique’s plist would then have multiple passes and use different targets (e.g. one for “background” and another for “scene”) so that your custom shader is applied only to the background.

A simplified excerpt of what your technique dictionary might contain is as follows (note that you must name your inputs/outputs appropriately):

XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
 "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <!-- The order here matters: first filter then composite -->
    <key>sequence</key>
    <array>
        <string>filter_background</string>
        <string>composite</string>
    </array>
    <key>passes</key>
    <dict>
        <!-- Pass 1: Filter the background (camera feed) -->
        <key>filter_background</key>
        <dict>
            <key>metalVertexShader</key>
            <string>mapTexture</string>
            <key>metalFragmentShader</key>
            <string>displayTexture</string>
            <key>draw</key>
            <string>DRAW_QUAD</string>
            <key>inputs</key>
            <dict>
                <!-- “COLOR” input here refers to the camera image -->
                <key>background</key>
                <string>COLOR</string>
            </dict>
            <key>outputs</key>
            <dict>
                <key>filtered</key>
                <string>COLOR</string>
            </dict>
        </dict>
        
        <!-- Pass 2: Composite the nodes over the filtered background -->
        <key>composite</key>
        <dict>
            <key>draw</key>
            <string>DRAW_SCENE</string>
            <key>inputs</key>
            <dict>
                <!-- Supply the filtered background as an input -->
                <key>filteredBackground</key>
                <string>filtered</string>
            </dict>
        </dict>
    </dict>
</dict>
</plist>

With such a technique, you first render the camera feed through your Metal shader (which converts it to grayscale) into an intermediate target (here named "filtered"). Then in the second pass you composite your regular scene render (which draws the color nodes) on top of that background.

Keep in mind that you’ll need to update your Metal shader if necessary so that it only processes the background content. Also, when you assign the technique via

SWIFT
let filterTechnique = makeTechnique(fromPlistNamed: "SceneFilterTechnique")
sceneView.technique = filterTechnique

ensure that your technique dictionary keys and inputs match what ARSCNView provides. (The built‐in "COLOR" input represents the entire rendered scene, so splitting that up is essential.)

Final Thoughts

Both approaches are viable:

• Using CIFilter is typically simpler because you can work directly with the ARFrame’s camera image and leave the SceneKit rendering untouched.

• A multi‑pass Metal/SCNTechnique solution offers more control (and potential performance benefits) but requires setting up multiple passes to isolate the background from the node rendering.

Choose the approach that best fits your project requirements. If you choose the SCNTechnique route, double‑check the pass ordering, target names, and that your technique’s inputs correctly refer to the background texture (distinct from the 3D node draw). This will help avoid rendering a blank (white) screen.

Hopefully this gives you a clear path forward. Feel free to ask for further clarification if needed!

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by NovaEngineer592 • 5 months ago

I'll try to give an answer here. Displaying the AR image in grayscale is more complicated than one might think. After several attempts, I came up with the following solution. I hope it can help you to achieve, what you want:

I took an out of the Box ARKit/SceneKit/Swift driven Project. (the one with the 3D spaceship flying in space and the color AR feed)

I removed the ship and here is how you can implement the solution:

ViewController.swift

SWIFT
import UIKit
import SceneKit
import ARKit
import MetalKit

struct Uniforms {
    var scaleX: Float
    var scaleY: Float
}

class ViewController: UIViewController, ARSCNViewDelegate {

    @IBOutlet var sceneView: ARSCNView!
    var metalView: MTKView!
    var commandQueue: MTLCommandQueue!
    var device: MTLDevice!
    var grayscalePipelineState: MTLRenderPipelineState!
    
    var uniforms = Uniforms(scaleX: 1.0, scaleY: 1.0) // to pass to the shader
    
    private var textureCache: CVMetalTextureCache!
    private var grayscaleTexture: MTLTexture?
    private var transformBuffer: MTLBuffer?
        
    override func viewDidLoad() {
        super.viewDidLoad()
        
        // ARKit Setup
        let configuration = ARWorldTrackingConfiguration()
        sceneView.session.run(configuration)
        sceneView.delegate = self
        sceneView.scene = SCNScene()
        
        // Turn off ARKit's color background feed
        sceneView.scene.background.contents = nil // UIColor.clear
        sceneView.backgroundColor = UIColor.clear
        sceneView.automaticallyUpdatesLighting = false
        sceneView.layer.isOpaque = false
        
        // Metal Device / Command Queue
        device = MTLCreateSystemDefaultDevice()
        commandQueue = device.makeCommandQueue()
        
        // MTKView
        metalView = MTKView(frame: view.bounds, device: device)
        metalView.framebufferOnly = false
        metalView.delegate = self
        metalView.isOpaque = false
        // Put it behind the SceneView so the 3D geometry is on top
        view.insertSubview(metalView, belowSubview: sceneView)
        
        // Build pipeline
        self.setupMetalPipeline()
        
        // Create a single CVMetalTextureCache
        CVMetalTextureCacheCreate(
            kCFAllocatorDefault,
            nil,
            device,
            nil,
            &textureCache
        )
        
        createOffscreenTexture()
        
        // add your gesture for placing nodes
        let tapGesture = UITapGestureRecognizer(target: self, action: #selector(handleTap(_:)))
        sceneView.addGestureRecognizer(tapGesture)
    }
    
    override func viewWillAppear(_ animated: Bool) {
        super.viewWillAppear(animated)
    }
    
    override func viewWillDisappear(_ animated: Bool) {
        super.viewWillDisappear(animated)
        
        // Pause the view's session
        sceneView.session.pause()
    }
    
    func setupMetalPipeline() {
        guard let library = device.makeDefaultLibrary() else { return }
        
        let desc = MTLRenderPipelineDescriptor()
        desc.vertexFunction   = library.makeFunction(name: "aspectFitVertex")
        desc.fragmentFunction = library.makeFunction(name: "grayscaleFragment")
    desc.colorAttachments[0].pixelFormat = metalView.colorPixelFormat
        
        do {
            grayscalePipelineState = try device.makeRenderPipelineState(descriptor: desc)
        } catch {
            fatalError("Error creating pipeline state: \(error)")
        }
    }
    
    @objc func handleTap(_ gesture: UITapGestureRecognizer) {
        let location = gesture.location(in: sceneView)
        
        // Create a raycast query
        guard let query = sceneView.raycastQuery(from: location,
                                                 allowing: .estimatedPlane,
                                                 alignment: .any) else {
            return
        }
        
        // Perform the raycast
        let results = sceneView.session.raycast(query)
        if let result = results.first {
            let sphere = SCNSphere(radius: 0.05)
            sphere.firstMaterial?.diffuse.contents = UIColor.blue
            let node = SCNNode(geometry: sphere)
            node.position = SCNVector3(result.worldTransform.columns.3.x,
                                       result.worldTransform.columns.3.y,
                                       result.worldTransform.columns.3.z)
            sceneView.scene.rootNode.addChildNode(node)
        }
    }
    
    //Converts ARKit's Y-plane to an MTLTexture with .r8Unorm format
    func makeYTexture(from pixelBuffer: CVPixelBuffer) -> MTLTexture? {
        let width  = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0)
        let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, 0)
        
        var cvMetalTexture: CVMetalTexture?
        let status = CVMetalTextureCacheCreateTextureFromImage(
            kCFAllocatorDefault,
            textureCache,
            pixelBuffer,
            nil,
            .r8Unorm,
            width,
            height,
            0,  // plane index = 0 => Luma
            &cvMetalTexture
        )
        
        guard status == kCVReturnSuccess, let cvMetalTexture = cvMetalTexture else {
            return nil
        }
        return CVMetalTextureGetTexture(cvMetalTexture)
    }
    
    // MARK: - ARSCNViewDelegate
    func session(_ session: ARSession, didFailWithError error: Error) {
        // Present an error message to the user
        
    }
    
    func sessionWasInterrupted(_ session: ARSession) {
        // Inform the user that the session has been interrupted, for example, by presenting an overlay
        
    }
    
    func sessionInterruptionEnded(_ session: ARSession) {
        // Reset tracking and/or remove existing anchors if consistent tracking is required
        
    }
    
    func createOffscreenTexture() {
        let descriptor = MTLTextureDescriptor.texture2DDescriptor(
            pixelFormat: .bgra8Unorm,
            width: Int(sceneView.bounds.width * UIScreen.main.scale),
            height: Int(sceneView.bounds.height * UIScreen.main.scale),
            mipmapped: false
        )
        descriptor.usage = [.renderTarget, .shaderRead]
        descriptor.storageMode = .private
        grayscaleTexture = device.makeTexture(descriptor: descriptor)
    }
    
}

extension ViewController: MTKViewDelegate {
    
    func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
        // adjust any viewport...
    }
    
    func draw(in view: MTKView) {
        guard let currentFrame = sceneView.session.currentFrame else { return }
        let pixelBuffer = currentFrame.capturedImage
        guard let yTexture = makeYTexture(from: pixelBuffer),
              let grayscaleTexture = grayscaleTexture else { return }
        
        // Create a command buffer
        guard let commandBuffer = commandQueue.makeCommandBuffer() else { return }
        
        // Create a render pass descriptor for the offscreen texture
        let rpd = MTLRenderPassDescriptor()
        rpd.colorAttachments[0].texture = grayscaleTexture
        rpd.colorAttachments[0].loadAction = .clear
        rpd.colorAttachments[0].clearColor = MTLClearColorMake(0, 0, 0, 1)
        rpd.colorAttachments[0].storeAction = .store
        
        // Compute aspect-fill scaling
        // Because we rotate 90° in the fragment, swap W & H: <- IMPORTANT !!!
        let h = Float(CVPixelBufferGetWidthOfPlane(pixelBuffer, 0))
        let w = Float(CVPixelBufferGetHeightOfPlane(pixelBuffer, 0))
        
        let cameraAspect = w / h
        let screenW = Float(view.drawableSize.width)
        let screenH = Float(view.drawableSize.height)
        let screenAspect = screenW / screenH
        var scaleX: Float = 1.0
        var scaleY: Float = 1.0
        
        if screenAspect > cameraAspect {
            // Fit width, crop height
            scaleX = 1.0
            scaleY = screenAspect / cameraAspect
        } else {
            // Fit height, crop width
            scaleX = cameraAspect / screenAspect
            scaleY = 1.0
        }
        
        // Pass scaling factors to the vertex shader
        var uniforms = Uniforms(scaleX: scaleX, scaleY: scaleY)
        let uniformBuffer = device.makeBuffer(bytes: &uniforms, length: MemoryLayout<Uniforms>.stride, options: [])
        
        // Render the grayscale output
        let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: rpd)!
        encoder.setRenderPipelineState(grayscalePipelineState)
        encoder.setFragmentTexture(yTexture, index: 0)
        encoder.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
        encoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
        encoder.endEncoding()
        
        // Commit the command buffer
        commandBuffer.commit()
        
        // Update the SceneKit background with the rendered texture
        DispatchQueue.main.async {
            self.sceneView.scene.background.contents = self.grayscaleTexture
        }
    }
}

Then create a shaders.metal file and paste the following content:

shaders.metal

METAL
#include <metal_stdlib>
using namespace metal;

struct QuadVertexIn {
    float2 uv      [[ attribute(0) ]];
};

struct VertexOut {
    float4 position [[position]];
    float2 texCoord;
};

// Uniforms structure for scaling
struct Uniforms {
    float scaleX;
    float scaleY;
};

vertex VertexOut aspectFitVertex(uint vid [[vertex_id]],
                                 constant Uniforms &u [[buffer(1)]])
{
    // A standard full-screen quad in clip space:
    float4 positions[4] = {
        float4(-1,  1, 0, 1),
        float4(-1, -1, 0, 1),
        float4( 1,  1, 0, 1),
        float4( 1, -1, 0, 1)
    };
    
    // Texcoords typically go top-left => (0,0), bottom-right => (1,1)
    // Depending on orientation, you might flip Y or apply rotation
    float2 texCoords[4] = {
        float2(0, 0),
        float2(0, 1),
        float2(1, 0),
        float2(1, 1)
    };
    
    // Apply letterbox scale
    positions[vid].x *= u.scaleX;
    positions[vid].y *= u.scaleY;
    
    VertexOut out;
    out.position = positions[vid];
    out.texCoord = texCoords[vid];
    return out;
}

fragment float4 grayscaleFragment(VertexOut in [[stage_in]],
                                  texture2d<float> yTex [[texture(0)]]) {
    constexpr sampler s(address::clamp_to_edge, // We need that line twice (very recommended)
                        address::clamp_to_edge, // We need that line twice (very recommended)
                        mag_filter::linear,
                        min_filter::linear);
    
    // Rotate the texture coordinates 90° CW if necessary (that's because of how the camera feed is hadled on devices)
    float2 rotated = float2(in.texCoord.y, 1.0 - in.texCoord.x);
    
    // Sample the luminance (Y-plane)
    float luma = yTex.sample(s, rotated).r;
    
    // OPTIONAL: Darken the image
    float darkeningFactor = 0.7; // Adjust this for brightness (0.5..0.8)
    luma *= darkeningFactor;
    
    // OPTIONAL: Apply contrast adjustment
    float contrast = 1.3; // Play with contrast
    float midpoint = 0.5; // Pivot for adjustment
    luma = (luma - midpoint) * contrast + midpoint;

    // OPTIONAL: Clamp to valid range (0, 1)
    luma = clamp(luma, 0.0, 1.0);

    return float4(luma, luma, luma, 1.0);
}

Example Screenshot:

No comments yet.

Discussion

No comments yet.

How can I apply a grayscale filter to the AR camera feed while keeping 3D nodes in full color in Swift?

2 Answers

1. Use a CIFilter on the Camera Feed Background

2. Use a Multi-Pass SCNTechnique with Metal Shaders

Final Thoughts

Discussion

Similar Posts

iOS Simulator Build Fails with Xcode Linker Error: 'Framework Flutter not found'

How can I configure an OpenAPI schema for an array to ignore or handle malformed elements?

How can I adjust Swift Charts axis tick spacing and ensure start/end ticks are major with labels?