Wheat Buyer From Mills

Wheat grain is the most consumed staple food all across the globe. To impart a rough idea, the Wheat trade is estimated to be more than the combined total of all other crops traded globally, as a…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Revealing police violence in America.

Leveraging Data Visualization to provide users with data visualization to explain incidents of police violence.

This project aims to provide users of our website with the ability to be well-informed consumers. We believe that it is best to provide information through data visualizations in bar graphs, pie charts, and actual data maps.

We want our solution to be a dashboard appeal with a central location that considers multiple sources and then provides the user with informative and interactive data. We sourced out datasets from the Washington Post Database and through tags and posts on Twitter. Then we build pipelines that would Extract the information, transform the data in a presentable manner, and load them to our website. The dataset from the Washington post was both extensive and limited. It was extensive regarding the vast data that it is providing and its accuracy but was limited in regards to it only providing about lethal police encounters. This dataset excludes non-lethal police violence, such as uses of tear gas, tasers, or batons. This is where our Reddits / Twitter post came in handy. We used an API from 2020PB that provides information specifically regarding police violence filtered from Twitter and Reddit.

Screenshot of our implementation.

A concern going into this was regarding the subject matter. Police shooting and police is a hot button subject in today’s culture, and as data scientists, it is our duty to present the data in an unbiased and non-partisan manner. I have to present a thorough representation of data and provide the user information that the users themselves can draw their own conclusion of the matter.

At the beginning of the project, we met with the stakeholder, Dr. Chang. Dr. change was from HRF and was quite knowledgeable on the subject matter. He set some expectations and challenges that groups have faced previously. We started our planning phase on Trello boards and created user stories, which consist of what we want the user to do on our websites. There is a checklist on these user stories that listed out how the stories can be achieved and by whom. This gave us a place to organize our efforts and check-in and always see what is done and what can be done as we proceed in the projects.

While Trello operates as a checklist for the project’s progress, we also needed to map out the technology of the project and how each technology from different parts of the team interact with each other. The following is a section of this framework that mapped out the Data Science part of the project.

This was the plan for the DS team at the beginning of the project. Although the final product, we excluded parts such as an NLP model due to us deciding that prioritizing our efforts on an NLP model require a lot of human resources, and there won’t be much real benefit because the NLP model would not provide a significant benefit compared to creating an extensive Data Visualization engine that enables the user to generate Data Viz that is interactive, informative, and vast.

My personal contribution can be categorized into three main parts Data Gathering and Munging, Data Visualization, and creation of the API as part of Data Engineering pipelines. My colleague source the dataset from two areas: one from the Washinton Post dataset and another from 2020PB that got their data from Reddit and Twitter posts. The one regarding police killings was vast and informative and lacked some features that would be vital for data visualization. It got addresses, but we have to put these incidents into a map requiring us to convert these addresses into latitude and longitude format. This part requires us to geocode location. I use a library call geopy and google API to turn location address into two columns separating latitude and longitude. I do this to both datasets; however, it should be noted that the police violence dataset from 2020PB doesn’t exactly have a direct address, so the location is base on the city and states values.

I then did some data analysis and mapped out all the possible data visualization.

A framework on what visualization could be created from the provided dataset.

At first, I thought this task was going to take a while because initially I was used to creating one graph for one type of data representation where now the brute force way is to think of a better way, or I would have to create over 50+ individual charts with an individual endpoint for all 50. My solution for this is my time learning computer science and creating an algorithm that takes in inputs and generates different graphs depending on those inputs. I figure that as long as they represent the same graph type, I could combine common datatypes with different data orders. This would greatly reduce to only 6 endpoints. The following is an example of one of the more complex algorithms.

I create this API in a docker container deployed on AWS Elastic beanstalk, ready to be query by the back-end team.

Our web team front-end was stricken early with a team member’s loss due to a family emergency. The web team was not used to Plotly or dealing with post requests with body instructions. The DS team and the Back-end worked extensively to ensure all the functionality flows through the back end seamlessly. Multiple filters could be used, and the front end could not be implemented despite the back-end and ds team. The DS team was able to finish their portion of the work relatively quickly and make multiple adjustments and endpoints that further will prove useful for the user. However, due to the delays and set back on the front end, what is shown is severely small compared to the capabilities available by the backend and the DS teams.

As part of the DS team, I feel that this project’s potential base on what we put into it and the backbone that was created could have been even more impressive. In terms of the Data Science aspects, this could be passed on to the next team that handles this and don’t have to focus as much on setting up the API and the data visualization and can focus on fancier and more experimental DS such as model generations and even creating a personalized database for both users entry and dataset entry. The DS team did build out a model to be used; however, the DS team decided until further progress with increased dataset entries. It won’t be of much use for the users.

Since the DS team has built a solid background in Data Visualization, the next team can finish the current front-end states and enable the available full functionality. The DS team can continue to pursue to fix some of the core problems in the lack of general datasets by other means or work on creating a centralized database for users to input and operator to be able to scan for anomalies.

This project proved quite challenging. There was a minimal amount of time, and despite that, the DS team really lead the project and create clear plans. There were up and downs to this project in particular. The project was semi-dynamic, and the team was able to receive feedback directly from the stakeholder regarding implementation and features. The DS team really provides a solid foundation for the next team to work from, and I believe this project can be polished given two more iterations of teams if they use the foundation created. The setback, although disappointing it provided real-world examples of events that could appear on the job.

Add a comment

Related posts:

9 Simple Rules of Business Process Mapping

This article was created to simplify the process of creating a Business Process Map for your organization(s). In my experience, people try to codify the business process map where they follow strict…

Applying Machine Learning Models in a Media Company

Welcome back to the world of AI, at AI Platform Services! In our previous article Svenja and Sebastian took you on a trip into our world by showing you one of our use cases, automated tagging. In…

Protecting Critical Infrastructure

One of the most regularly proven methods of protecting national infrastructure is to isolate critical data and operations of the network. In 2017, a global shipping company, responsible for 76 ports…