Patent application title: MEDIA AND MARKETING OPTIMIZATION WITH CROSS PLATFORM CONSUMER AND CONTENT INTELLIGENCE
Inventors:
Jyotika Singh (Winnetka, CA, US)
Michael Avon (Mclean, VA, US)
Serge Matta (Vienna, VA, US)
Assignees:
ICX Media, Inc.
IPC8 Class: AG06Q3002FI
USPC Class:
1 1
Class name:
Publication date: 2021-07-01
Patent application number: 20210201349
Abstract:
The invention is directed to a computer-implemented method of analyzing
video interactions on internet-supported computer platforms, such as
online social media platforms, to extract video and audience
intelligence, i.e. unique analytics, insights and recommendations for
audience engagement optimization, audience engagement, network growth,
advertising, and marketing purposes.Claims:
1. A computer-implemented method comprising the following steps:
generating analytical reporting comprising consumer interests comprising
topics, themes, celebrities, media, brands, influencers, actors, movies,
television shows, and combinations thereof; profiling the consumer
interests with social media averages for the consumer interests;
calculating a platform specific index report of over- or
under-representation of the consumer interests; and using the consumer
interests profiling report to develop or improve a marketing campaign
and/or advertising campaign, to develop or improve content creation
strategies, or to provide intelligence related to a consumer section in
combination with demographic sectors for measuring trends.
2. The computer-implemented method of claim 1, further comprising computing platform averages based on social media subscription and interaction behavior, wherein general audience content creation, consumption and engagement patterns are identified on social media platforms and assigned quantitative metrics to reflect a proportion of audiences engaging in or with a topic, keyword, term, object, brand, media, influencer, movie, television show, or combination thereof.
3. The computer-implemented method of claim 1, further comprising defining and assigning baseline scores for terms, key-phrases, entities, content, videos, content creators, media, brands, audience sectors, or topics in a given time period by using algorithmic steps of collection, aggregation, averaging and normalization across an online media platform.
4. The computer-implemented method of claim 1, further comprising categorizing social media platform profiles and channels into categories and subcategories of topics using predictive modeling techniques.
5. The computer-implemented method of claim 1, further comprising categorizing social media platform profiles and channels into categories of media, brand, influencer, entertainer, television show, movie, celebrity, actor, director, chef, politician, retailer, journalist, artist, comedian, news channels, or combination thereof, and sub-categories of the categories including brand name, movie name, movie genre, actor names, famous persons, influencer or entertainer names, or combinations thereof, using natural language processing and a combination of supervised and unsupervised machine learning.
6. The computer-implemented method of claim 1, further comprising combining categories of content and content providers on social media with demographics of audiences interacting with the content and content providers, wherein the combinations calculate an analysis of audience interests, and wherein the demographics comprise age, gender and ethnicity, location, direct marketing area, brand or individual creator status, low-level categorical and topical attributes, high-level categorical and topical attributes, political views, cross-platform account linkages, education level and income range, or combinations thereof.
7. The computer-implemented method of claim 1, further comprising calculating scores predicted for an audience sector and a general audience for social media platforms presented as platform averages, wherein the scores comprise an over-index or under-index factor conveying if the chosen audience segment over-indexes in terms of interest in a media, brand, influencer, actor, television show, movie, or combinations thereof.
8. The computer-implemented method according to claim 1, further comprising algorithmically defining a Return on Investment (ROI) analysis, wherein the algorithmically defining Return on Investment (ROI) analysis comprises automatically scoring for content, audience, content provider, audience engagement types, audience demographics, audience interests, or advertising and marketing platforms, by analyzing trends and engagement in a given time frame, and using the scoring of the ROI analysis to automatically inform or suggest improved content creation, advertising, and marketing strategies.
9. The computer-implemented method according to claim 1, further comprising analyzing post interactions on internet-supported computer platforms to extract intelligence and quantify a performance of social media posts and factors contributing to performance, and outputting predicted performance of audience engagement optimization, network growth, advertising, marketing, or combinations thereof.
10. The computer-implemented method according to claim 1, further comprising calculating an engagement score measuring a likelihood of a trend, video, topic or creator to engage viewership, by measuring content against other posts and target audience from comments, likes, shares, or time engaged with content, and using weighted averaging and normalization for score quantification.
11. The computer-implemented method according to claim 1, further comprising providing a mechanism for quantifying an estimated success of a piece of content, content provider, content publisher, or content marketer, the mechanism comprising extracting numerical representations based on text, images, video intelligence, target audience demographics, and interests, which is then processed to provide tailored content and audience analysis, and wherein the numerical representations are extracted algorithmically using key-phrase/entity extraction and vector formation from text data, applying a variety of filters on image and video data, and dividing audiences into clusters of interest and demographic groups.
12. The computer-implemented method according to claim 1, further comprising assigning numerical scores to an engagement of social media audiences with video content and video creators, including trends, affinities, and performance above or below a baseline.
13. The computer-implemented method according to claim 1, further comprising processing audience consumption behaviors from interactions on social media platforms and calculating recommendations or predictions for media analysts, content creators, direct marketers, or combinations thereof.
14. The computer-implemented method according to claim 1, further comprising identifying and disaggregating audience segments by viewing, commenting, sharing, and other engagement behavior, wherein the audience segments are extracted from groups of undifferentiated social media messages.
15. The computer-implemented method according to claim 1, further comprising measuring popularity, trendiness, or virality of video content themes, including messages, topics, perspectives, sentiments, brands, media, or persona according to computed baselines, topical baselines, sub-topical baselines, or combinations thereof.
16. The computer-implemented method according to claim 1, further comprising scoring attributes of videos and video consumers using data analysis techniques and machine learning on social media content including text, image, audio, video, or combinations thereof.
17. The computer-implemented method according to claim 1, further comprising assigning scores to terms, key-phrases, and entities algorithmically extracted from social media text, image, and video content that are trending in terms of attracting engaged viewership in a given time period compared to a different or previous time period.
18. The computer-implemented method according to claim 1, further comprising defining and assigning a baseline score for terms, key-phrases, entities, content, videos, content creators, audience sectors and topics in a given time period by using algorithmic steps of collection, aggregation, averaging, and normalization.
19. The computer-implemented method according to claim 1, further comprising calculating an affinity score measuring a likelihood that content, topics, and content makers will attract different audience sectors by combining data of audience attributes including age, gender, location, topics of interest, interest groups, or combinations thereof, wherein the measuring is performed algorithmically using text matching and key-phrase extraction techniques and by using a combination of distance metrics, geographical distance, and/or similarity to trending items on social media platforms in a given time period.
20. A computer-implemented method comprising automatically detecting and recognizing sensitive Personally Identifiable Information (PII) within publicly available text written by a consumer on a social media platform using natural language processing techniques and machine learning algorithms that detect PII presence and recognize a type of PII.
21. The computer-implemented method of claim 20, further comprising automatically removing the Personally Identifiable Information (PII) from storage databases if the PII is detected.
22. The computer-implemented method of claim 20, further comprising using the natural language processing and a combination of supervised and unsupervised machine learning algorithms to detect the Personally Identifiable Information (PII) in the publicly available text written by a consumer on a social media platform.
23. The computer-implemented method of claim 20, further comprising algorithmically detecting a location of the Personally Identifiable Information (PII) in the publicly available text written by a consumer on a social media platform and categorizing the detected PII into categories of PII.
24. The computer-implemented method according to claim 20, further comprising using language and phrases, identified using the natural language processing techniques, that content consumers and content producers use to describe social media profiles, interests, and/or content preferences and comments, and using predictive modeling to detect and recognize email addresses, location information such as geographical coordinates, phone numbers, contact information, street address(es), IP address(es), social security number, bank account details, or combinations thereof.
25. A computer-implemented method comprising matching social media consumers to location-based profiles based on content consumption and engagement patterns on social media, using predicted demographics and linked audience to automatically connect offline data to online data for marketing or advertising to the social media consumers.
26. The computer-implemented method according to claim 25, further comprising matching profiles from a first social media platform to a second social media platform using predictive modeling using names, comments, typographic patterns, emojis, hyperlink usage, demographics, posting behavior, language, phrases, profile descriptions, content interests, or combinations thereof, identified using natural language processing techniques.
27. The computer-implemented method according to claim 25, further comprising using data from two or more of a consumer, content, engagement, or demographics, automatically gathered from two or more social media platforms, to match a social media profile to a non-social media profile.
28. The computer-implemented method according to claim 25, further comprising matching social media data with additional location-based profile data comprising demographics, location, user first name, user last name, user middle name, income band, education level, household income, number of children, or combinations thereof.
29. The computer-implemented method according to claim 25, further comprising deduplicating social media user matches with location-based profile matches based on calculated accuracy of data and enriched data predictions made on social media data.
30. The computer-implemented method according to claim 25, further comprising automated processing for finding audiences based on a content interest and/or demographic criteria across different social media platforms, matching consumers from social media to a location-based profile dataset based on interests and demographic predictions across social media platforms, performing deduplication algorithmically, converting files to a format compliant or compatible with an identity link store, and automatically uploading the files to the identity link store.
Description:
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a continuation of and relies on the disclosures of and claims priority to and the benefit of the filing dates of U.S. application Ser. No. 16/838,234, filed Apr. 2, 2020 and of U.S. Provisional Application No. 62/828,814, filed Apr. 3, 2019. The disclosures of those applications are hereby incorporated by reference herein in their entireties.
BACKGROUND OF THE INVENTION
Field of the Invention
[0002] Embodiments of the present invention are directed to a computer-implemented method of analyzing video interactions on internet-supported computer platforms, such as online social media platforms, to extract video intelligence, i.e. unique insights and recommendations for audience engagement optimization, network growth, advertising, and marketing purposes. In aspects, the more general field of invention is commonly referred to by people of ordinary skill in the art as "big data."
[0003] Embodiments include a software implementation that enables the study of audience behavior from their interactions on social media platforms, which yields insights for video content creators on content and messaging that will be more or less successful with their target audiences.
[0004] One focus area is on the language used by consumers of video content, and the design and implementation of language-based classifiers that can be used to understand the characteristics of audiences of specific videos, of individual creators, or groups of related creators.
[0005] Another focus area is on automated video content analysis, where audio-visual components are extracted from raw video content and used as features to construct models to predict the interest and engagement of different audience demographics.
[0006] Embodiments also include an unsupervised machine learning model that analyzes both language and video patterns to identify peer sets of creators, which enables cross-creator analytics and comparison across different social media platforms.
Description of Related Art
[0007] The process of designing, sourcing, producing, publishing, marketing, and optimizing video content on social media platforms has historically relied upon trial-and-error. Video content creators create content they believe may resonate with audiences, and select a social media platform or platforms on which to publish the new content. Next, they use a combination of free-text and platform-specific drop-down fields to add descriptive titles, narratives, tags, and other metadata to the content, which will be used by the social media platforms to make the content discoverable and appealing to the video consumers searching through content. Unfortunately, such ad hoc processes often result in many wasted hours of creator time, failures to successfully market content to target audiences, missed opportunities to unpack and explore contingencies within a creator's viewership, and poor search engine optimization for effective audience capture. As such, computer-implemented and automated processes for capturing potential audiences of social media video content and increasing their engagement are needed.
SUMMARY OF THE INVENTION
[0008] Embodiments of the present invention infer audience attributes using computer-implemented natural language processing and supervised machine learning on the textual artifacts of their interactions with video content.
[0009] For example, in one embodiment, a computer-implemented method is provided that allows for reviewing and analyzing viewing and engagement habits of video consumers to algorithmically determine their gender, age, location, ethnicity, education, income, topical interests, personality type, political stance, sentiment (positive, negative and neutral) and emotion categories (e.g., anger, joy, humor, sadness) and sentiment themes, based on consumer video interactions across online social media platforms such as YouTube, Facebook, Instagram, Twitter, Vimeo and Twitch, by way of example.
[0010] In another embodiment, a computer-implemented method is provided that allows for reviewing and analyzing video viewing and engagement habits of consumers and determining topical interest areas of social video consumers based on the video content they watch. These topical interests can be used to find similarities and differences between two or more target audiences, such as the affinity between the audience of an independent video creator and that of a large consumer brand, or the differences between the audience of a large consumer brand and that of their competitors.
[0011] Embodiments also analyze the audio-visual components (objects, people, ambient noise, music, etc.) of videos with respect to their audience engagement patterns including views, likes, dislikes, upvotes, downvotes, shares, links, quotes, and comments.
[0012] For instance, in another embodiment, a computer-implemented method is provided that allows for recognizing audio components of videos and modeling the relationships between those audio components and the engagement metrics for video audiences across online social media platforms such as YouTube, Facebook, Instagram, Twitter, Vimeo and Twitch with respect to their demographics, interests, sentiments, and emotions.
[0013] In another embodiment, a computer-implemented method is provided that allows for recognizing visual components of videos (by analyzing video frames and identifying elements of videos, including the setting, number and ethnicity of cast, objects in the video, and music) and modeling the relationships between those components (objects, background, color, shapes, human faces and figures) and engagement metrics for video audiences across online social media platforms such as YouTube, Facebook, Instagram, Twitter, Vimeo and Twitch with respect to their demographics, interests, sentiments, and emotions.
[0014] Embodiments also evaluate performance and practices of a content creator with respect to their extracted peer set. In another embodiment, a computer-implemented method is provided that allows for extrapolating peer sets of content creators based on video content similarity, content statistics, audience demographics and audience engagement metrics across online social media platforms such as YouTube, Facebook, Instagram, Twitter, Vimeo and Twitch.
[0015] In another embodiment, a computer-implemented method is provided that extrapolates trends in high-engagement video topics, themes, and elements from peer set video content, thereby rendering automated recommendations for video creators for in-demand content.
[0016] In another embodiment, a computer-implemented method is provided that identifies real-time or near real-time trends from video topics, themes, and elements from peer set video content, thereby rendering recommendations for video creators on optimal post time, frequency, platform, length, messaging and marketing (title, description, keywords, hashtags).
[0017] In another embodiment, a computer-implemented method is provided that renders automated recommendations for optimized video messaging and marketing (title, description, keywords, hashtags) to content creators that can be accepted and automatically applied.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] The accompanying drawings illustrate certain aspects of embodiments of the present invention, and should not be used to limit the invention. Together with the written description the drawings serve to explain certain principles of the invention.
[0019] FIG. 1 is a schematic diagram illustrating the extraction of features (names, comments, typographic patterns, emoji, hyperlink usage, posting behavior) from raw text using Natural Language Processing for use in supervised machine learning model training according to an embodiment.
[0020] FIG. 2 is a schematic diagram demonstrating the transformation of extracted text features into composite numeric feature vectors and training of an ensemble of proprietary feature-specific models according to an embodiment.
[0021] FIG. 3 is a schematic diagram showing how a pre-trained ensemble model is used to classify new, unseen data using stored model artifacts that include pipelines for feature extraction, vectorization, and classification according to an embodiment.
[0022] FIG. 4 is a schematic diagram illustrating the extraction of audiovisual content elements from raw video and their transformation into numeric feature vectors for downstream modeling according to an embodiment.
[0023] FIG. 5 is a schematic diagram showing how supervised machine learning is used to build a regression model to predict video performance score (computed from engagement metrics including views, likes, dislikes, upvotes, downvotes, shares, links, quotes, and comments) from the vectorized, extracted video content features according to an embodiment.
[0024] FIG. 6 is a schematic diagram showing how supervised machine learning is used to build a classification model to predict the proportion of audience engagement by demographics (computed from the natural language classifier pipeline shown in FIGS. 1-3) from the vectorized, extracted video content features according to an embodiment.
[0025] FIG. 7 is a schematic diagram demonstrating how unsupervised machine learning is used to build a series of cohort (creator peer set) models, informed by high-level creator statistics (e.g. number of videos made, number of followers or subscribers, and/or daily views) as well as vectorized, extracted video content and language features according to an embodiment. FIG. 7 further illustrates how the resultant models are stored and reconstituted on demand in deployment to map new creators to existing peer sets according to an embodiment.
[0026] FIG. 8 is a schematic diagram illustrating how unsupervised machine learning is used to find clusters of creators who share similar features in terms of high-level statistics (number of videos made, number of followers or subscribers, daily views), as well as video content and audience demographics according to an embodiment.
[0027] FIG. 9 is a schematic diagram showing how new creators are quickly mapped to their peer set and served content development, messaging, and marketing recommendations with respect to the most and least successful topics, themes, and strategies within that specific peer set according to an embodiment.
[0028] FIG. 10 is a schematic diagram illustrating how, once mapped to their cohort of most relevant peers, creators explore within their cohort cluster and identify their most similar peers within the cohort according to an embodiment.
[0029] FIG. 11 is a schematic diagram showing extraction of audio features from a video according to an embodiment.
[0030] FIG. 12 is a schematic diagram showing some of the categories and subcategories that an audio signal can be classified into according to an embodiment.
DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS OF THE INVENTION
[0031] Reference will now be made in detail to various exemplary embodiments of the invention. It is to be understood that the following discussion of exemplary embodiments is not intended as a limitation on the invention. Rather, the following discussion is provided to give the reader a more detailed understanding of certain aspects and features of the invention.
[0032] Embodiments of the present invention provide a computer-implemented mechanism for automatically extracting video intelligence from existing video data to provide tailored insights for video content creators to empower informed, tactical and strategic decision-making around both the creative process and the ensuing publishing and marketing processes.
[0033] In one embodiment, a computer processing device(s) analyzes existing video data in an asynchronous or synchronous manner to produce automated recommendations for how a video content creator can grow their network of viewers (i.e. get more people to watch their videos), optimize their audience engagement (i.e. get more people to provide feedback on their videos such as commenting, liking, upvoting, linking to, or sharing the video content), and more effectively market their content to specific demographic and interest groups (e.g. improve discoverability with Millennial Asian women, or with electronics enthusiasts from Atlanta, Ga.).
[0034] Demographic and topical recommendations are generated in part using an ensemble of computer-applied language-based classifiers to understand, and then predict, audience characteristics. This process includes extracting data (as shown in FIG. 1) from the text of comments and interactions to videos, including commenters' proper names, emojis, urls, keywords and keyphrases, grammatical patterns, and posting behaviors such as times and frequencies. FIG. 1 illustrates the extraction of features (names, comments, typographic patterns, emoji, hyperlink usage, posting behavior) from raw text using Natural Language Processing for use in supervised machine learning model training. From such extracted features, features can be engineered to capture how the audience perceives the content, such as video sentiment and emotions (i.e., how a viewer or set of viewers feel(s) about the video).
[0035] Embodiments then determine features out of this data that capture interactions and consumer details. FIG. 2 demonstrates the transformation of extracted text features into composite numeric feature vectors which can be projected into a multidimensional space and used in the training of an ensemble of feature-based text classification models. These feature-based models include models that, for example, convert a username into an age or gender probability distribution, that transform the text of comments into frequency distributions of keywords, and that extract a time series from users' posted comments. As an ensemble, these gender probability distributions, keyword frequency distributions, and time series analyses can be used together to make predictions about changes in a certain demographic group's beliefs and feelings about popular issues over time. The text classifiers are built (trained) using a variety of datasets, some publicly available and some that have been gathered, and using a combination of manual and algorithmic curation. This algorithmic curation includes a process through which similar data samples can be identified using feature similarity, for example, finding similar comments using similarity in context and writing style. A classification pipeline is constructed using a combination of these datasets and natural language processing-based feature extraction, feature engineering, and feature vectorization methods. This classification pipeline is built to incorporate the full series of model training steps; including data cleaning, streaming, filtering, feature extraction, vectorization, normalization, and dimensionality reduction, followed by supervised learning, evaluation, comparison and testing processes. This pipeline is stored in addition to the serialized versions of each of the classifiers, such that the classifiers can be deployed to serve predictions on demand, and such that the pipeline can be continually retrained and improved as new data become available.
[0036] In aspects and as used herein generally, vectorization may mean but is not limited to vectorizing comments or profile descriptions from social media. In general vectorization may refer to creating a set of words and textual features (e.g., emoji, keyphrases, or timestamps) that appear either in those comments and descriptions or in their metadata; creating a fixed-length array of numbers (like a cipher) that is the same length as the total count of words and textual features across all of the comments and descriptions; and creating a representation of every comment or some comments and descriptions with respect to the array. In some cases as used herein, the terms "multidimensional" and/or "high dimensional" would be understood by one of ordinary skill in the art, and the terms refer to machine learning as using "multidimensional" and/or "high dimensional" vectors, and in cases the terms are used interchangeably.
[0037] Once trained, the classifiers, using computing processing, classify new incoming data into meaningful categories. For example, after a new video is published, comments posted in response to the new video can be aggregated, and predictions can be made about the gender, age, location, ethnicity/race, religious views, political stance, education, income, personality type, and language of the commenters. FIG. 3 shows how a pre-trained ensemble model is used to classify new, unseen data using stored model artifacts that include pipelines for feature extraction, vectorization, and classification. In one embodiment, the results of the classification are used in better informing content creators about their audience, and to automate and supply potential recommendations that can help a content creator or advertiser grow the audience, their reach, and their influence. For example, the classifiers can be used to provide a given Creator X with an algorithmically filtered list of creators who are successfully making similar content, to automatically evaluate Creator X's content relative to the competition, and to automatically adjust the tunable parameters of Creator X's content on the social platform (e.g. hashtags, titles, descriptions) in order to optimize for more views, more positive comments, more shares, etc. The classifiers can also be used to propose new topics and content recommendations to Creator X for content likely to perform well with his or her audience.
[0038] However, embodiments of the present invention go beyond just the textual contexts of videos (e.g. video titles, comments, tags, etc.) in order to derive video intelligence analytics and recommendations, such as combining textual context with audio-visual content (objects, people, ambient noise, music, etc.) to generate unique predictions about audience affinity and topical relevance. To achieve this, computer-implemented extraction of audiovisual content elements from raw video is performed, as illustrated in FIG. 4. Extracting audio-visual features from raw video content requires using computer processing mechanisms to extract both audio components and visual components from such content. The audio component of this operates by first extracting the audio components from the video components and breaking down the extracted audio into multiple audio features and computer files, as illustrated in FIG. 11. First, the audio signal is converted to a numerical representation which is based on the length of the signal and sampling frequency. The numerical audio signal is then passed through computerized filters and transformations to represent mathematically meaningful elements of the audio signal. This begins by reading the audio file and converting stereo signal to mono signal. The signal is then further broken down using computerized sampling frequency, midterm signal attributes, and shortterm signal attributes. Window ratio and step ratio are then extracted from the frequency and signal attributes, and signal normalization is performed. Other information associated with metadata attached to the video and other derived components also play into this classification process. Some of the features used in the classification process include explorations with Gammatone Frequency Cepstral Coefficients, which have been previously found to be useful for the speech category of audio and now a version of which is applied by the present invention to other novel categories as well, Mel Frequency Cepstral Coefficients, centroid and spread, spectral entropy, spectral flux, short-term energy and entropy of energy, zero crossing rate and more. The Gammatone Frequency Cepstral Coefficients feature is constructed by processing the signal with a gammatone filter bank. The resultant signal is passed through a matrix multiplication with DCT-II matrix of the order of the signal array. These feature extractions are performed for short term windows in the entire signal. For every window, normalized Fast Fourier Transformations (FFTs) are computed, and combined with the other features to produce a single vector for each audio instance. The feature vector then goes through additional mathematical manipulations such as feature normalization. The resultant feature is passed through digitally pre-built classifiers with the generated proprietary input features as described from audio files and carefully selected hyperparameters which results in the piece of audio getting classified into categories including, for example, music, speech, birds, pets, animals, crowd, beach, water, weather, rain, vehicles, wind, footsteps, gunshots, silence, running, motor, other noise and more, as illustrated in FIG. 12. This audio will then be further classified into more detailed categories from those listed above, such as music genre, type of musical instrument, gender of speech, speaker identification, speech style, speech transcription, type of vehicle, animal species, and so on. The data used to train the classifier is manually, digitally, automatically, and/or algorithmically cleaned and enhanced to result in a training set that represents each category. The algorithmic cleaning and enhancing includes removing silent and unwanted portions of an audio instance, using any hand labeled samples to find audio instances that share similar attributes, and using the classifier previously trained on the known audio to classify unseen instances into classes, which aids further segregation of the samples and helps add to the training data set.
[0039] The visual component operates using a computer processing mechanism to analyze each frame image of the video. These image frames are represented in pixel values and ultimately processed into numeric-based features after being passed through a series of processing and filtering steps. For example, images are passed through the process of detecting blank sections and cropping to exclude blank borders and divide the image into regions. Parts of an image can then be used to perform face detection, where min, max, mean and median color calculations are used to narrow down to specific portions of human faces (eyebrows, nostrils, lips, etc.). The processing steps return features that mathematically represent the useful image details. These features are then passed through pre-built neural network classifiers to recognize the objects observed in the images, such as a chair, a table, a beach, a human, or a cheetah. The visual component also undergoes more processing to detect humans in the images and analyze their faces. This additional processing helps in analyzing such features as the number of humans, genders, diversity, emotion, expression and more. In addition, more processing takes place leading to an image background analysis that differentiates between indoor versus outdoor settings and performs further classification of the background scene--such as home, theatre, forest and so on. Other image analysis of pixels, shapes and colors also plays into analysis at the video frame level. These per image frame elements are then combined per video to provide an aggregate video-based analysis. The audio-visual analysis components are also combined with other statistical, demographic, sentiment, emotion, and audience interest data, resulting in a more powerful analysis on consumer watching habits.
[0040] Once extracted, these elements are computationally transformed into numeric feature vectors that can be projected into a multidimensional space and used for downstream modeling as shown in FIGS. 5 and 6. FIG. 5 shows how supervised machine learning is used to build a regression model to predict a video performance score (computed from engagement metrics including views, likes, dislikes, upvotes, downvotes, shares, links, quotes, and comments) from the vectorized, extracted video content features. FIG. 6 shows how supervised machine learning is used to build a classification model to predict the proportion of audience engagement by demographics (computed from the natural language classifier pipeline shown in FIGS. 1-3) from the vectorized, extracted video content features.
[0041] By offering an automated computer-implemented method with which to extract and transform raw text, audio, and video into features, embodiments of the present invention also provide a mechanism for comparing any two creators together, generating cross-creator analytics, and offering comparisons within specific peer sets. Peer sets or "cohorts" of creators are formed using unsupervised machine learning, as shown in FIG. 7. Resultant models are stored and reconstituted on demand in deployment to map new creators to existing peer sets.
[0042] The creation of these cohorts is informed by high-level creator statistics (e.g. number of videos made, number of followers or subscribers, and/or daily views) as well as vectorized, extracted video content and language features. FIG. 8 illustrates how unsupervised machine learning is used to find clusters of creators who share similar features in terms of high-level statistics (e.g. number of videos made, number of followers or subscribers, and/or daily views), as well as video content and audience demographics.
[0043] Cohort clusters are computationally created and based on the nature of content produced, statistics, location, audience demographics and audience interests. In aspects, each content creator will have a peer-group based on these factors. Therefore, using the cohort models, a new creator can be served recommendations relative to the best practices of relevant successful content creators within their cohort, which can be much more nuanced and specific and reflect unique cohort behavioral patterns. FIG. 9 shows how new creators are mapped to their peer set and served content development, messaging, and marketing recommendations with respect to the most and least successful topics, themes, and strategies within that specific peer set. The practices observed may include content labeling practices such as titles, descriptions, vocabulary words, phrases, tags, and categories, as well as content posting practices such as time of day, day of week, and frequency and other practices related to content creation and marketing.
[0044] A nearest neighbor approach is used to find peer sets and the peers are further compared by performance within a cluster. FIG. 10 illustrates how, once mapped to their cohort of most relevant peers, creators explore within their cohort cluster and identify their most similar peers within the cohort.
[0045] The present disclosure provides for a computer program comprising computer-executable instructions, which when the program is executed by a computer, cause the computer to carry out any of the processes, methods, and/or algorithms according to the above. The computer-executable instructions can be programmed in any suitable programming language, including JavaScript, C, C#, C++, Java, Python, Perl, Ruby, Swift, Visual Basic, and Objective C.
[0046] Also provided herein is a non-transitory computer-readable medium (or media) comprising computer-executable instructions, which when executed by a computer, cause the computer to carry out any of the processes, methods, and/or algorithms according to the above. As used in the context of this specification, a "non-transitory computer-readable medium (or media)" may include any kind of computer memory, including magnetic storage media, optical storage media, nonvolatile memory storage media, and volatile memory. Non-limiting examples of non-transitory computer-readable storage media include floppy disks, magnetic tape, conventional hard disks, CD-ROM, DVD-ROM, BLU-RAY, Flash ROM, memory cards, optical drives, solid state drives, flash drives, erasable programmable read only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), non-volatile ROM, and RAM. The non-transitory computer readable media can include one or more sets of computer-executable instructions for providing an operating system as well as for implementing the processes, methods, and/or algorithms of the invention.
[0047] Embodiments of this disclosure include one or more computers or devices loaded with a set of the computer-executable instructions described herein. The computers or devices may be a general purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a particular machine, such that the one or more computers or devices are instructed and configured to carry out the calculations, processes, steps, operations, algorithms, statistical methods, formulas, or computational routines of this disclosure. The computer or device performing the specified calculations, processes, steps, operations, algorithms, statistical methods, formulas, or computational routines of this disclosure may comprise at least one processing element such as a central processing unit (i.e., processor) and a form of computer-readable memory which may include random-access memory (RAM) or read-only memory (ROM). The computer-executable instructions can be embedded in computer hardware or stored in the computer-readable memory such that the computer or device may be directed to perform one or more of the calculations, steps, processes and operations depicted and/or described herein.
[0048] Additional embodiments of this disclosure comprise a computer system for carrying out the computer-implemented method of this disclosure. The computer system may comprise a processor for executing the computer-executable instructions, one or more electronic databases containing the data or information described herein, an input/output interface or user interface, and a set of instructions (e.g., software) for carrying out the method. The computer system can include a stand-alone computer, such as a desktop computer, a portable computer, such as a tablet, laptop, PDA, or smartphone, or a set of computers connected through a network including a client-server configuration and one or more database servers. The network may use any suitable network protocol, including IP, UDP, or ICMP, and may be any suitable wired or wireless network including any local area network, wide area network, Internet network, telecommunications network, Wi-Fi enabled network, or Bluetooth enabled network. In one embodiment, the computer system comprises a central computer connected to the internet that has the computer-executable instructions stored in memory that is operably connected to an internal electronic database. The central computer may perform the computer-implemented method based on input and commands received from remote computers through the internet. The central computer may effectively serve as a server and the remote computers may serve as client computers such that the server-client relationship is established, and the client computers issue queries or receive output from the server over a network.
[0049] The present invention has been described with reference to particular embodiments having various features. In light of the disclosure provided above, it will be apparent to those skilled in the art that various modifications and variations can be made in the practice of the present invention without departing from the scope or spirit of the invention. One skilled in the art will recognize that the disclosed features may be used singularly, in any combination, or omitted based on the requirements and specifications of a given application or design. When an embodiment refers to "comprising" certain features, it is to be understood that the embodiments can alternatively "consist of" or "consist essentially of" any one or more of the features. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention.
[0050] It is noted in particular that where a range of values is provided in this specification, each value between the upper and lower limits of that range is also specifically disclosed. The upper and lower limits of these smaller ranges may independently be included or excluded in the range as well. The singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. It is intended that the specification and examples be considered as exemplary in nature and that variations that do not depart from the essence of the invention fall within the scope of the invention. Further, all of the references cited in this disclosure are each individually incorporated by reference herein in their entireties and as such are intended to provide an efficient way of supplementing the enabling disclosure of this invention as well as provide background detailing the level of ordinary skill in the art.
User Contributions:
Comment about this patent or add new information about this topic: