**My thoughts on a paper about a vision-based social distancing and critical density detection system using Deep CNNs and Linear Regression — from Ohio State University.**

If you’re living in the US, you’ve likely been under some form of ‘stay at home’ order for 5 months now (coming into 6 months!). This is longer than **any** other country in the world has been in isolation for.

Why did we go into lockdown?

To slow the spread right?

Well… yes… but the main reason for us all to go into isolation, was actually to **buy the government time** to put together a plan of attack to address the many, many flow-on effects (health, social, economic etc.) that a global pandemic initiates.

This time period wasn’t just about hiding away until the ‘spread slows and/or stops and to never reopen until it’s completely gone’, but to actually come up with, **and deploy,** measures that allow for safe, gradual reopening.

There are 2 (among many others) simple, yet effective measures that we can all take to help control the spread and allow for a somewhat safe* reopening:

- Wear a mask
- Social distance

Contrary to popular belief, those 2 things are **not** a political statement.

I have recently been reading up on research that’s being done to come up with innovative ideas addressing deployment of mass testing, privacy-concerned contact tracing methods and short-term social distancing protocols to implement in workplaces and public places.

These kinds of ideas should have been listened to in the initial months of lock-down, but let’s try not to get depressed and dwell on the past, and rather focus on the future…

Now, it’s not a question of

Should we reopen?

but a question of

Do we reopen with a plan, or without a plan?

Reopening without a plan will mean we’ll be in a perpetual yo-yo, in and out of isolation with no real end in sight.

I want to share the results of one such paper that I read that suggests a vision-based social distancing and critical density detection system.

I want to focus on the technical implementation of this, rather than the social and political argument of “should we actually do this”. But I do think it’s interesting (and refreshing) to see that there are people working hard to try and find solutions to get us through this mess!

**The Paper:**

This paper is titled, *‘A Vision-based Social Distancing and Critical Density Detection System for COVID-19’ *and authored by Dongfang Yang, Ekim Yurtsever, Vishnu Renganathan, Keith A. Redmill and Ümit Özgüner from the Ohio State University, Columbus.

I want to make clear that I am simply relaying the methods and results of this work and that it is most certainly not my work (citation below).

**Motivation**

The authors acknowledge that individuals are not used to tracking the required 6-feet distance between themselves and others and an audio-visual system could help to alert civilians of any social distancing violations. They also acknowledge the need to control social density and this system also makes efforts to control inflow to an area that breaches a certain social density threshold.

The paper suggests an “artificial intelligence based real-time social distancing detection and warning system”**. The priorities of the work were

- To ensure the privacy of civilians by not recording/caching any data
- Ensure the alert system does not target individuals (this isn’t about shaming people)
- No human supervisor being involved in the detection/warning loop
- The code be open-source and accessible to the public for transparency

**Method in a Nutshell:**

The method aims to address two areas of interest:

- Social distance detection
- Social density detection

The work utilised fixed monocular cameras for capturing images of areas of interest and then uses pre-trained, deep convolutional neural networks (CNN) for object detection. In this scenario, the only class label of interest was ‘person’. You can read more about the two methods of CNNs that were used in this work: Faster R-CNN and YOLOv4.

A simple linear regression model was used to determine a critical social density value.

This system was tested on three pedestrian crowded datasets: New York Central Station, an indoor mall and a busy town center in Oxford (urban street).

A ‘scene’ was defined to be a 6-tuple of objects:

**I:**an image matrix captured by the fixed monocular camera (containing height, width and RGB of the image)- A₀: the area of the region of interest
- dᶜ***: minimum physical distance for social distancing (typically 6-feet)
- c₁: binary control for sending non-intrusive cue if the distance between any two people is less than dᶜ
- c₂: binary control for social density detection. Set to 1 if social density is above a certain threshold 𝝆 and signal is cued to control inflow to the area.
- U₀: A probability threshold that ensures that the probability of a social distancing violation stays under, given the critical social density value (𝝆ᶜ), for example U₀ = 0.05 suggests the probability of a violation given 𝝆ᶜ is 0.05.

**Implementation:**

The implementation of this system can be narrowed down to three steps****:

**Step 1:**

Determine the perspective, transformation matrix M for each dataset. This is how the real-world dimensions of the area of interest are determined from the image. The result of this transformation matrix can be seen in the image below where the area of interest (as well as the ‘person’ labeled objects) are transformed from the image to real-world dimensions and cast onto the *x-y *plane.

Details on how the area was determined for each dataset can be found in section 5.1. of the paper.

**Step 2:**

After deploying the deep CNN model to detect the ‘person’ class objects in the area, these are also transformed from their image coordinated to the real-world coordinate system as shown on the right in the image above. The point on the ‘real-world’ coordinate plane was calculated as the mid-point of the lower edge of the box boundary determined by the CNN method used.

**Step 3:**

Following steps 1 and 2, values for 𝝆ᶜ, dᵤ,ᵥ and *v* can be calculated where

*𝝆ᶜ is the critical social density value,*

*dᵤ,ᵥ is the distance between pedestrians *u* and *v* (*i, j* used in paper) and*

v* is the number of violations in the area of interest at any particular time.*

Calculating 𝝆ᶜ, dᵤ,ᵥ and *v *is needed in order to determine the values of the 6-tuple ‘scene’ object, which ultimately determines the operation of the system.

dᵤ,ᵥ

Once real-world coordinates are calculated of each person object, we will have a set of 2 dimensional vectors, P, that represent the coordinates of each person. From here, we can calculate the euclidean distance between every possible combination of people (u, v). These distances will then be used to compare with the dᶜ value in the 6-tuple to determine a social distance violation.

We are also able to calculate minimum physical distance for each person with every other person as well as the average minimum physical distance by taking the average of all the minimum distances for each person.

v

Now that we have all the dᵤ,ᵥ values, the number of social distancing violations (*v*) can be calculated.

This is done by taking the sum of all dᵤ,ᵥ such that dᵤ,ᵥ < dᶜ.

The function here being piece-wise, that

if dᵤ,ᵥ < dc, then vᵤ,ᵥ = 1 and

if dᵤ,ᵥ≥ dc, then vᵤ,ᵥ = 0.

So we then have that

*v* = 𝚺(vᵤ,ᵥ)

— this is using my own notation which summarises the paper’s notation/explanation (see section 4.4. of the paper).

𝝆 and 𝝆ᶜ

𝝆 is the measure of social density given by,

𝝆 = the number pedestrians / m²

It was found that 𝝆 and *v* had a linear relationship (as can be seen in the image below) and so a simple linear regression model was employed to determine the value of 𝝆ᶜ.

𝝆ᶜ is the value that ensures the probability of social distancing violations stays lower than U₀ (as found in the 6-tuple ‘scene’ definition).

Once the linear regression model was fitted, 𝝆ᶜ was then identified as the lower bound of the 95% confidence interval where *v* = 0 (that is, no social distancing violations).

It was also found that 𝝆, and minimum, average physical distances of each person had a strong negative correlation. This makes sense since, if there is low social density, you would expect minimum distances between people to be greater (i.e. not violating social distancing) and visa versa. You can see this in the figure below:

**Conclusion**

As you can see this is a simple, yet somewhat effective implementation of a system that could really help people and businesses in their social distancing efforts. Some large companies are already employing systems like this in their workplaces to make their employees more safe if they want to/need to return to work.

Perhaps, as a first step, a system like this could be more easily implemented in workplace, office environments. It may be harder to implement in public places at first as retail, commercial and public spaces have their own nuances.

A limitation noted in the paper was that it did not consider groups. This could be a major flaw in the system that would need to be overcome for public use since people often shop and travel in couples and groups and so the system would constantly be alerting breaches in the area if it can’t detect groups.

A naive solution of mine to the group detection problem is a time-based solution (given the system is based on real-time data). Perhaps implementing some time-threshold such that if person-coordinates stay within a certain, pre-determined distance for more than some time-threshold, they are considered a related ‘group’. This would have it’s own nuanced down falls too, of course, but could be a place to start to address the group detection issue.

There are many ideas like the one detailed in this paper that address ways to reopen the economy safely. I hope you enjoyed reading this, as I did.

I hope we can continue to be curious about this problem and utilise some of the best brains we have in this world!

Stay safe out there!

**Footnotes:**

- *I say ‘safe’ here very loosely…
- **Cite: Abstract, page 1, paragraph 2
- ***Actually noted as ‘d subscript c’ in the paper but could not get a subscript c in medium…
- ****I have of course simplified these steps for the sake of this being a blog article!

**Paper Citation:**

yang2020visionbased, title: *A Vision-based Social Distancing and Critical Density Detection System for COVID-19*, authors: {Dongfang Yang and Ekim Yurtsever and Vishnu Renganathan and Keith A. Redmill and Ümit Özgüner}, year: 2020, eprint: {2007.03578}, archivePrefix: {arXiv}, primaryClass: {eess.IV}