Hermes Frangoudis

Posted on Apr 6, 2020

How To: Build an Augmented Reality Remote Assistance App

#swift #augmentedreality #arkit #agoraio

Have you ever been on the phone with customer support and struggled to describe the issue, or had the support person fail to clearly describe the solution or not understand what/where you should be looking?

Most remote assistance today is done through audio or text based chat. These solutions can be frustrating for users who may have a hard time describing their issues or understanding new concepts and terminology associated with troubleshooting whatever they need help with.

Thankfully technology has reached a point where this issue can be easily solved using Video Chat and Augmented Reality. In this guide, we’ll walk through all the steps you need to build an iOS app that leverages ARKit and video chat to create an interactive experience.

Prerequisites

A basic to intermediate understanding of Swift and iOS
Basic understanding of ARKit and Augmented Reality concepts
Agora.io Developer Account
Understand how to use Cocoa Pods
Hardware: a Mac with Xcode and 2 iOS devices
- iPhone: 6S or newer
- iPad: 5th Generation or newer

Please Note: While no Swift/iOS knowledge is needed to follow along, certain basic concepts in Swift/ARKit won’t be explained along the way.

Overview

The app we are going to build is meant to be used by two users who are in separate physical locations. One user will input a channel name and CREATE the channel. This will launch a back-facing AR-enabled camera. The second user will input the same channel name as the first user and JOIN the channel.

Once both users are in the channel, the user that created the channel will broadcast their rear camera into the channel. The second user has the ability to draw on their local screen, and have the touch input displayed in augmented reality in the first user’s world.

Let’s take a moment to review all the steps that we’ll be going through:

Download and build starter project
Project structure overview
Add video chat functionality
Capture and normalize touch data
Add data transmission
Display touch data in augmented reality
Add “Undo” functionality

Getting started with the starter project

I have created a starter project for this tutorial that includes the initial UI elements and buttons, including the bare-bones AR and remote user views.

Let’s start by downloading the starter project repo. Once all the files have finished downloading, open a Terminal window to the project’s directory and run pod install to install all dependencies. Once the dependencies have finished installing, open the AR Remote Support.xcworkspace in Xcode.

Once the project is open in Xcode, let’s build and run the project using the iOS simulator. The project should build and launch without issue.

build the starter project before starting development

Add a channel name, then click Join and Create buttons, to preview the UI’s that we will be working with.

Project Structure Overview

Before we start coding, let’s walk through the starter project files to understand how everything is setup. We’ll start with the dependencies, then go over the required files, and lastly we’ll take a look at the custom classes that we’ll be working with.

Within the Podfile, there are two 3rd-party dependencies: Agora.io’s Real-Time Communications SDK, facilitates in building video chat functionality; ARVideoKit’s open-source renderer, facilitates using the rendered AR view as a video source. The reason we need an off-screen renderer is because ARKit obfuscates the rendered view, so we need a framework to handle the task of exposing the rendered pixelbuffer.

As we move into project files, the AppDelegate.swift has the standard set up with one minor update. The ARVideoKit library is imported and there’s an added delegate function for UIInterfaceOrientationMask to return the ARVideoKit’s orientation. Within the info.plist the required permissions for Camera and Microphone access are included. These permissions are required by ARKit, the Agora.io Video SDK, and ARVideoKit.

Before we jump into the custom ViewController's, let’s take a look at some of the supporting files/classes that we’ll be using. The GetValueFromFile.swift allows us to store any sensitive API credentials in the keys.plist so we don’t have to hard-code them into the project. SCNVector3+Extensions.swift contains some extensions and functions for the SCNVector3 class that make mathematical calculations simpler. The last helper file is ARVideoSource.swift, which contains the implementation of the AgoraVideoSourceProtocol, which we’ll use to pass our rendered AR scene as the video source for one of the users in the video chat.

The ViewController.swift is a simple entry point for the app. It allows users to input a Channel Name and then choose whether they want to: CREATE the channel and receive remote assistance; JOIN the channel and provide remote assistance.

The ARSupportBroadcasterViewController.swift handles the functionality for the user who is receiving remote assistance. This ViewController will broadcast the rendered AR scene to the other user, so it implements the ARSCNViewDelegate, ARSessionDelegate, RenderARDelegate, and AgoraRtcEngineDelegate.

The ARSupportAudienceViewController.swift handles the functionality for the user who is providing remote assistance. This ViewController will broadcast the user’s front-facing camera and will allow the user to draw on their screen and have the touch information displayed in the remote user’s augmented reality scene, so it implements the UIGestureRecognizerDelegate, AgoraRtcEngineDelegate.

For simplicity, let’s refer to ARSupportBroadcasterViewController as BroadcasterVC and ARSupportAudienceViewController as AudienceVC.

Adding Video Chat Functionality

We’ll start by adding our AppID into the keys.plist file. Take a moment to log into your Agora Developer Account, select your project, copy your App ID and paste the hex into the value for AppID within keys.plist.

As an example, you keys.plist would look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>AppID</key>
    <string>69d5fd34f*******************5a1d</string>
</dict>
</plist>

Now that we have our AppID set, we will use it to initialize the Agora Engine within the loadView function for both BroadcasterVC and AudienceVC.

There are slight differences in how we set up the video configurations for each class. In the BroadcasterVC we are using an external video source so we can set up the video configuration and the source within the loadView.

override func loadView() {
    super.loadView()
    createUI()                                      // init and add the UI elements to the view
    self.view.backgroundColor = UIColor.black       // set the background color 

    // Agora setup
    guard let appID = getValue(withKey: "AppID", within: "keys") else { return } // get the AppID from keys.plist
    let agoraKit = AgoraRtcEngineKit.sharedEngine(withAppId: appID, delegate: self) // - init engine
    agoraKit.setChannelProfile(.communication) // - set channel profile
    let videoConfig = AgoraVideoEncoderConfiguration(size: AgoraVideoDimension1280x720, frameRate: .fps60, bitrate: AgoraVideoBitrateStandard, orientationMode: .fixedPortrait) 
    agoraKit.setVideoEncoderConfiguration(videoConfig) // - set video encoding configuration (dimensions, frame-rate, bitrate, orientation
    agoraKit.enableVideo() // - enable video
    agoraKit.setVideoSource(self.arVideoSource) // - set the video source to the custom AR source
    agoraKit.enableExternalAudioSource(withSampleRate: 44100, channelsPerFrame: 1) // - enable external audio souce (since video and audio are coming from seperate sources)
    self.agoraKit = agoraKit // set a reference to the Agora engine
}

Within the AudienceVC we will init the engine and set the Channel Profile in the loadView, but we will wait to configure video settings within the viewDidLoad.

override func loadView() {
    super.loadView()
    createUI() // init and add the UI elements to the view
    //  TODO: setup touch gestures

    // Add Agora setup
    guard let appID = getValue(withKey: "AppID", within: "keys") else { return }  // get the AppID from keys.plist
    self.agoraKit = AgoraRtcEngineKit.sharedEngine(withAppId: appID, delegate: self) // - init engine
    self.agoraKit.setChannelProfile(.communication) // - set channel profile
}

Note: We’ll add in the touch gestures functionality later on in this tutorial.

Let’s also set up the video configuration within the AudienceVC. Within the viewDidLoad call the setupLocalVideo function.

override func viewDidLoad() {
    super.viewDidLoad()
  ...
    // Agora implementation
    setupLocalVideo() //  - set video configuration
    //  - join the channel
  ...
}

Add the code below to the setupLocalVideo function.

func setupLocalVideo() {
    guard let localVideoView = self.localVideoView else { return } // get a reference to the localVideo UI element

    // enable the local video stream
    self.agoraKit.enableVideo()

    // Set video encoding configuration (dimensions, frame-rate, bitrate, orientation)
    let videoConfig = AgoraVideoEncoderConfiguration(size: AgoraVideoDimension360x360, frameRate: .fps15, bitrate: AgoraVideoBitrateStandard, orientationMode: .fixedPortrait)
    self.agoraKit.setVideoEncoderConfiguration(videoConfig)
    // Set up local video view
    let videoCanvas = AgoraRtcVideoCanvas()
    videoCanvas.uid = 0
    videoCanvas.view = localVideoView
    videoCanvas.renderMode = .hidden
    // Set the local video view.
    self.agoraKit.setupLocalVideo(videoCanvas)

    // stylin - round the corners for the view
    guard let videoView = localVideoView.subviews.first else { return }
    videoView.layer.cornerRadius = 25
}

Next we’ll join the channels from the viewDidLoad. Both ViewControllers use the same function to join the channel. In each BroadcasterVC and AudienceVC call the joinChannel function within the viewDidLoad.

override func viewDidLoad() {
    super.viewDidLoad()
  ...
    joinChannel() // Agora - join the channel
}

Add the code below to the joinChannel function.

func joinChannel() {
    // Set audio route to speaker
    self.agoraKit.setDefaultAudioRouteToSpeakerphone(true)
    // get the token - returns nil if no value is set
    let token = getValue(withKey: "token", within: "keys")
    // Join the channel
    self.agoraKit.joinChannel(byToken: token, channelId: self.channelName, info: nil, uid: 0) { (channel, uid, elapsed) in
      if self.debug {
          print("Successfully joined: \(channel), with \(uid): \(elapsed) secongs ago")
      }
    }
    UIApplication.shared.isIdleTimerDisabled = true     // Disable idle timmer
}

The joinChannel function will set the device to use the speakerphone for audio playback, and join the channel set by the ViewController.swift.

Note: This function will attempt to get the token value stored in keys.plist. This line is there in case you would like to use a temporary token from the Agora Console. For simplicity I have chosen to not use token security, so we have not set the value. In this case the function will return nil, and the Agora engine will not use token based security for this channel.

Now that users can join a channel, we should add functionality to leave the channel. Similar to joinChannel, both ViewControllers use the same function to leave the channel. In each BroadcasterVC and AudienceVC add the code below to the leaveChannel function.

func leaveChannel() {
    self.agoraKit.leaveChannel(nil)                     // leave channel and end chat
    self.sessionIsActive = false                        // session is no longer active
    UIApplication.shared.isIdleTimerDisabled = false    // Enable idle timer
}

The leaveChannel function will get called in popView and viewWillDisapear because we want to make sure we leave the channel whenever the user clicks to exit the view or if they dismiss the app (backgrounded/exit).

The last "Video Chat" feature we need to implement is the toggleMic function, which gets called anytime the user taps the microphone button. Both BroadcasterVC and AudienceVC use the same function, so add the code below to the toggleMic function.

@IBAction func toggleMic() {
    guard let activeMicImg = UIImage(named: "mic") else { return }
    guard let disabledMicImg = UIImage(named: "mute") else { return }
    if self.micBtn.imageView?.image == activeMicImg {
        self.agoraKit.muteLocalAudioStream(true) // Disable Mic using Agora Engine
        self.micBtn.setImage(disabledMicImg, for: .normal)
        if debug {
            print("disable active mic")
        }
    } else {
        self.agoraKit.muteLocalAudioStream(false) // Enable Mic using Agora Engine
        self.micBtn.setImage(activeMicImg, for: .normal)
        if debug {
            print("enable mic")
        }
    }
}

Handling touch gestures

In our app, the AudienceVC will provide remote assistance by using their finger to draw on their screen. Within the AudienceVC we’ll need to capture and handle the user’s touches.

First we’ll want to capture the location whenever the user initially touches the screen. Set that point as the starting point. As the user drag’s their finger across the screen, we’ll want to keep track of all those points, so we’ll use the touchPoints array to add each point, so we need to ensure an empty array with every new touch. I prefer to reset the array in the touchesBegan to mitigate against instances where the user adds a second finger to the screen.

override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
    // get the initial touch event
    if self.sessionIsActive, let touch = touches.first {
        let position = touch.location(in: self.view)
        self.touchStart = position
        self.touchPoints = []
        if debug {
            print(position)
        }
    }
    // check if the color selection menu is visible
    if let colorSelectionBtn = self.colorSelectionBtn, colorSelectionBtn.alpha < 1 {
        toggleColorSelection() // make sure to hide the color menu
    }
}

Note: This example will only support drawing with a single finger. It is possible to support multi-touch drawing, it would require some more effort to track the uniqueness of the touch event.

To handle the finger movement, let’s use a Pan Gesture. Within this gesture we’ll listen for the gesture start, change, and end states. Let’s start by registering the Pan Gesture.

func setupGestures() {
    // pan gesture
    let panGesture = UIPanGestureRecognizer(target: self, action: #selector(handlePan(_:)))
    panGesture.delegate = self
    self.view.addGestureRecognizer(panGesture)
}

Once the Pan Gesture is recognized, we’ll calculate the position of the touch within the view. The GestureRecognizer gives us the touch positions as values relative to the Gesture’s initial touch. This means that the translation from GestureRecognizer at GestureRecognizer.began is (0,0). The self.touchStart will help us to calculate the x,y values relative to the view’s coordinate system.

@IBAction func handlePan(_ gestureRecognizer: UIPanGestureRecognizer) {

    // TODO: send touch started event

    // keep track of points captured during pan gesture
    if self.sessionIsActive && (gestureRecognizer.state == .began || gestureRecognizer.state == .changed) {
        let translation = gestureRecognizer.translation(in: self.view)
        // calculate touch movement relative to the superview
        guard let touchStart = self.touchStart else { return } // ignore accidental finger drags
        let pixelTranslation = CGPoint(x: touchStart.x + translation.x, y: touchStart.y + translation.y)

        // normalize the touch point to use view center as the reference point
        let translationFromCenter = CGPoint(x: pixelTranslation.x - (0.5 * self.view.frame.width), y: pixelTranslation.y - (0.5 * self.view.frame.height))

        self.touchPoints.append(pixelTranslation)

        // TODO: Send captured points

        DispatchQueue.main.async {
            // draw user touches to the DrawView
            guard let drawView = self.drawingView else { return }
            guard let lineColor: UIColor = self.lineColor else { return }
            let layer = CAShapeLayer()
            layer.path = UIBezierPath(roundedRect: CGRect(x:  pixelTranslation.x, y: pixelTranslation.y, width: 25, height: 25), cornerRadius: 50).cgPath
            layer.fillColor = lineColor.cgColor
            drawView.layer.addSublayer(layer)
        }

        if debug {
            print(translationFromCenter)
            print(pixelTranslation)
        }
    }

    if gestureRecognizer.state == .ended {
        // TODO: send message to remote user that touches have ended

        // clear list of points
        if let touchPointsList = self.touchPoints {
            self.touchStart = nil // clear starting point
            if debug {
                print(touchPointsList)
            }
        }
    }
}

Once we’ve calculated the pixelTranslation (x,y values relative to the view’s coordinate system), we can use these values to draw the points to the screen and then to “normalize” the points relative to the screen’s center point.

I’ll discuss normalizing the touches in a moment, but first lets go through drawing the touches to the screen. Since we are drawing to the screen we’ll want to use the Main thread. So within a Dispatch block we’ll use the the pixelTranslation to draw the points into the DrawingView. For now don’t worry about removing the points because we’ll handle that when we transmit the points.

Before we can transmit the user’s touches we need to "normalize" the point relative to the screen’s center. UIKit considers the upper left hand corner of the view to be (0,0), but within AR we don't have any screen bounds so in ARKit we’ll need to add the points relative to the ARCamera’s center point. To achieve this we’ll calculate the translationFromCenter using the pixelTranslation and subtracting half of the view’s height and widths.

Transmitting touches and colors

To add an interactive layer, we’ll use the DataStream provided as part of the Agora engine. Agora’s video SDK allows for the ability to create a data stream capable of sending up to 30 (1kb) packets per second. Since we will be sending small data messages this will work well for us.

Let’s start by enabling the DataStream within the firstRemoteVideoDecoded. We’ll do this in both BroadcasterVC and AudienceVC.

func rtcEngine(_ engine: AgoraRtcEngineKit, firstRemoteVideoDecodedOfUid uid:UInt, size:CGSize, elapsed:Int) {
  // ... 
  if self.remoteUser == uid {
    // ...
    // create the data stream
    self.streamIsEnabled = self.agoraKit.createDataStream(&self.dataStreamId, reliable: true, ordered: true)
    if self.debug {
        print("Data Stream initiated - STATUS: \(self.streamIsEnabled)")
    }
  }
}

If the data stream is enabled successfully, self.streamIsEnabled will have a value of 0. We’ll check this value before attempting to send any messages.

Now that the DataStream is enabled, we’ll start with AudienceVC. Let’s review what data we need to send: touch-start, touch-end, the points, and color. Starting with the touch events, we’ll update the PanGesture to send the appropriate messages.

Note: Agora’s Video SDK DataStream uses raw data so we need to convert all messages to Strings and then use the .data attribute to pass the raw data bytes.

@IBAction func handlePan(_ gestureRecognizer: UIPanGestureRecognizer) {
        if self.sessionIsActive && gestureRecognizer.state == .began && self.streamIsEnabled == 0 {
            // send message to remote user that touches have started
            self.agoraKit.sendStreamMessage(self.dataStreamId, data: "touch-start".data(using: String.Encoding.ascii)!)
        }

        if self.sessionIsActive && (gestureRecognizer.state == .began || gestureRecognizer.state == .changed) {
            let translation = gestureRecognizer.translation(in: self.view)
            // calculate touch movement relative to the superview
            guard let touchStart = self.touchStart else { return } // ignore accidental finger drags
            let pixelTranslation = CGPoint(x: touchStart.x + translation.x, y: touchStart.y + translation.y)

            // normalize the touch point to use view center as the reference point
            let translationFromCenter = CGPoint(x: pixelTranslation.x - (0.5 * self.view.frame.width), y: pixelTranslation.y - (0.5 * self.view.frame.height))

            self.touchPoints.append(pixelTranslation)

            if self.streamIsEnabled == 0 {
                // send data to remote user
                let pointToSend = CGPoint(x: translationFromCenter.x, y: translationFromCenter.y)
                self.dataPointsArray.append(pointToSend)
                if self.dataPointsArray.count == 10 {
                    sendTouchPoints() // send touch data to remote user
                    clearSubLayers() // remove touches drawn to the screen
                }

                if debug {
                    print("streaming data: \(pointToSend)\n - STRING: \(self.dataPointsArray)\n - DATA: \(self.dataPointsArray.description.data(using: String.Encoding.ascii)!)")
                }
            }

            DispatchQueue.main.async {
                // draw user touches to the DrawView
                guard let drawView = self.drawingView else { return }
                guard let lineColor: UIColor = self.lineColor else { return }
                let layer = CAShapeLayer()
                layer.path = UIBezierPath(roundedRect: CGRect(x:  pixelTranslation.x, y: pixelTranslation.y, width: 25, height: 25), cornerRadius: 50).cgPath
                layer.fillColor = lineColor.cgColor
                drawView.layer.addSublayer(layer)
            }

            if debug {
                print(translationFromCenter)
                print(pixelTranslation)
            }
        }

        if gestureRecognizer.state == .ended {
            // send message to remote user that touches have ended
            if self.streamIsEnabled == 0 {
                // transmit any left over points
                if self.dataPointsArray.count > 0 {
                    sendTouchPoints() // send touch data to remote user
                    clearSubLayers() // remove touches drawn to the screen
                }
                self.agoraKit.sendStreamMessage(self.dataStreamId, data: "touch-end".data(using: String.Encoding.ascii)!)
            }
            // clear list of points
            if let touchPointsList = self.touchPoints {
                self.touchStart = nil // clear starting point
                if debug {
                    print(touchPointsList)
                }
            }
        }
    }

ARKit runs at 60 fps so sending the points individually would cause us to hit the 30 packet limit resulting in point data not getting sent. So we’ll add the points to the dataPointsArray and transmit them every 10 points. Each touch-point is about 30–50 bytes, so by transmitting every tenth point we will stay well within the limits of the DataStream.

func sendTouchPoints() {
    let pointsAsString: String = self.dataPointsArray.description
    self.agoraKit.sendStreamMessage(self.dataStreamId, data: pointsAsString.data(using: String.Encoding.ascii)!)
    self.dataPointsArray = []
}

When sending the touch data, we can also clear the DrawingView. To keep it simple we can get the DrawingView sublayers, loop through and remove them from the SuperLayer.

func clearSubLayers() {
    DispatchQueue.main.async {
        // loop through layers drawn from touches and remove them from the view
        guard let sublayers = self.drawingView.layer.sublayers else { return }
        for layer in sublayers {
            layer.isHidden = true
            layer.removeFromSuperlayer()
        }
    }
}

Lastly, we need to add support for changing the color of the lines. We’ll send the cgColor.components to get the color value as a comma delimited string. We’ll prefix the message with color: so that we don’t confuse it with touch data.

@IBAction func setColor(_ sender: UIButton) {
    guard let colorSelectionBtn = self.colorSelectionBtn else { return }
    colorSelectionBtn.tintColor = sender.backgroundColor
    self.lineColor = colorSelectionBtn.tintColor
    toggleColorSelection()
    // send data message with color components
    if self.streamIsEnabled == 0 {
        guard let colorComponents = sender.backgroundColor?.cgColor.components else { return }
        self.agoraKit.sendStreamMessage(self.dataStreamId, data: "color: \(colorComponents)".data(using: String.Encoding.ascii)!)
        if debug {
            print("color: \(colorComponents)")
        }
    }
}

Now that we’re able to send data from the AudienceVC, let’s add the ability for BroadcasterVC to receive and decode the data. We’ll use the rtcEngine delegate’s receiveStreamMessage function, to handle all data that is received from the DataStream.

func rtcEngine(_ engine: AgoraRtcEngineKit, receiveStreamMessageFromUid uid: UInt, streamId: Int, data: Data) {
    // successfully received message from user
    guard let dataAsString = String(bytes: data, encoding: String.Encoding.ascii) else { return }

    if debug {
        print("STREAMID: \(streamId)\n - DATA: \(data)\n - STRING: \(dataAsString)\n")
    }

    // check data message
    switch dataAsString {
        case var dataString where dataString.contains("color:"):
            if debug {
                print("color msg recieved\n - \(dataString)")
            }
            // remove the [ ] characters from the string
            if let closeBracketIndex = dataString.firstIndex(of: "]") {
                dataString.remove(at: closeBracketIndex)
                dataString = dataString.replacingOccurrences(of: "color: [", with: "")
            }
             // convert the string into an array -- using , as delimeter
            let colorComponentsStringArray = dataString.components(separatedBy: ", ")
            // safely convert the string values into numbers
            guard let redColor = NumberFormatter().number(from: colorComponentsStringArray[0]) else { return }
            guard let greenColor = NumberFormatter().number(from: colorComponentsStringArray[1]) else { return }
            guard let blueColor = NumberFormatter().number(from: colorComponentsStringArray[2]) else { return }
            guard let colorAlpha = NumberFormatter().number(from: colorComponentsStringArray[3]) else { return }
            // set line color to UIColor from remote user
            self.lineColor = UIColor.init(red: CGFloat(truncating: redColor), green: CGFloat(truncating: greenColor), blue: CGFloat(truncating:blueColor), alpha: CGFloat(truncating:colorAlpha))
        case "undo":
            // TODO: add undo
        case "touch-start":
            // touch-starts
            print("touch-start msg recieved")
            // TODO: handle event
        case "touch-end":
            if debug {
                print("touch-end msg recieved")
            }
        default:
            if debug {
                print("touch points msg recieved")
            }
            // TODO: add points in ARSCN

    }
}

There are a few different cases that we need to account for, so we’ll use a Switch to check the message and handle it appropriately.

When we receive the message to change the color, we need to isolate the component values, so we need to remove any excess characters from the string. Then we can use the components to initialize the UIColor.

In the next section we’ll go through handling the touch-start and adding the touch points into the ARSCN.

Display gestures in augmented reality

Upon receiving the message that a touch has started, we’ll want to add a new node to the scene and then parent all the touches to this node. We do this to group all the touch points and force them to always rotate to face the ARCamera.

case "touch-start":
    print("touch-start msg recieved")
    // add root node for points received
    guard let pointOfView = self.sceneView.pointOfView else { return }
    let transform = pointOfView.transform // transformation matrix
    let orientation = SCNVector3(-transform.m31, -transform.m32, -transform.m33) // camera rotation
    let location = SCNVector3(transform.m41, transform.m42, transform.m43) // location of camera frustum
    let currentPostionOfCamera = orientation + location // center of frustum in world space
    DispatchQueue.main.async {
        let touchRootNode : SCNNode = SCNNode() // create an empty node to serve as our root for the incoming points
        touchRootNode.position = currentPostionOfCamera // place the root node ad the center of the camera's frustum
        touchRootNode.scale = SCNVector3(1.25, 1.25, 1.25)// touches projected in Z will appear smaller than expected - increase scale of root node to compensate
        guard let sceneView = self.sceneView else { return }
        sceneView.scene.rootNode.addChildNode(touchRootNode) // add the root node to the scene
        let constraint = SCNLookAtConstraint(target: self.sceneView.pointOfView) // force root node to always face the camera
        constraint.isGimbalLockEnabled = true // enable gimbal locking to avoid issues with rotations from LookAtConstraint
        touchRootNode.constraints = [constraint] // apply LookAtConstraint

        self.touchRoots.append(touchRootNode)
    }

Note: We need to impose the LookAt constraint to ensure the drawn points always face the user. Points will need to be drawn always facing the camera.

When we receive touch-points we’ll need to decode the String into an Array of CGPoints that we can then append to the self.remotePoints array.

default:
    if debug {
        print("touch points msg recieved")
    }
    // convert data string into an array -- using given pattern as delimeter
    let arrayOfPoints = dataAsString.components(separatedBy: "), (")

    if debug {
        print("arrayOfPoints: \(arrayOfPoints)")
    }

    for pointString in arrayOfPoints {
        let pointArray: [String] = pointString.components(separatedBy: ", ")
        // make sure we have 2 points and convert them from String to number
        if pointArray.count == 2, let x = NumberFormatter().number(from: pointArray[0]), let y = NumberFormatter().number(from: pointArray[1]) {
            let remotePoint: CGPoint = CGPoint(x: CGFloat(truncating: x), y: CGFloat(truncating: y))
            self.remotePoints.append(remotePoint)
            if debug {
                print("POINT - \(pointString)")
                print("CGPOINT: \(remotePoint)")
            }
        }
    }

Within the session delegate’s didUpdate we’ll check the self.remotePoints array. We’ll pop the first point from the list and render a single point per frame to create the effect that the line is being drawn. We’ll parent the nodes to a single root node that gets created in the upon receipt of the "touch-start" message.

func session(_ session: ARSession, didUpdate frame: ARFrame) {
    // if we have points - draw one point per frame
    if self.remotePoints.count > 0 {
        let remotePoint: CGPoint = self.remotePoints.removeFirst() // pop the first node every frame
        DispatchQueue.main.async {
            guard let touchRootNode = self.touchRoots.last else { return }
            let sphereNode : SCNNode = SCNNode(geometry: SCNSphere(radius: 0.015))
            sphereNode.position = SCNVector3(-1*Float(remotePoint.x/1000), -1*Float(remotePoint.y/1000), 0)
            sphereNode.geometry?.firstMaterial?.diffuse.contents = self.lineColor
            touchRootNode.addChildNode(sphereNode)  // add point to the active root
        }
    }
}

Add "Undo"

Now that we have the data transmission layer setup, we can quickly keep track of each touch gesture and undo it. We’ll start by sending the undo message from the AudienceVC to the BroadcasterVC. We’ll add the code below to the sendUndoMsg function within our AudienceVC.

@IBAction func sendUndoMsg() {
    // if data stream is enabled, send undo message
    if self.streamIsEnabled == 0 {
        self.agoraKit.sendStreamMessage(self.dataStreamId, data: "undo".data(using: String.Encoding.ascii)!)
    }
}

Within the BroadcasterVC we’ll check for the undo message within the rtcEngine delegate’s receiveStreamMessage function. Since each set of touch points are parented to their own root nodes, with every undo message we’ll remove the last rootNode (in the array) from the scene.

case "undo":
    if !self.touchRoots.isEmpty {
        let latestTouchRoot: SCNNode = self.touchRoots.removeLast()
        latestTouchRoot.isHidden = true
        latestTouchRoot.removeFromParentNode()
    }

Build and Run

Now we are ready to build and run our app. Plug in your two test devices, build and run the app on each device. On one device enter the channel name and Create the channel, and then on the other device enter the channel name and Join the channel.

Thanks for following and coding along with me, below is a link to the completed project. Feel free to fork and make pull requests with any feature enhancements.

AR-Remote Support

For more information about the Agora.io Video SDK, please refer to the Agora.io API Reference.

DEV Community

How To: Build an Augmented Reality Remote Assistance App

Prerequisites

Overview

Getting started with the starter project

Project Structure Overview

Adding Video Chat Functionality

Handling touch gestures

Transmitting touches and colors

Display gestures in augmented reality

Add "Undo"

Build and Run

Top comments (0)

Read next

AI and the Black Box Problem: How Machine Learning Challenges Mathematical Proof Verification

Waterfall Model: Simple Breakdown

LLM Test Generators Miss Critical Bugs Due to Design Flaws, Study Shows

Hi, i'm new there