After graudated from UIUC, I joined iFlytek Cloud Platform department's Bigdata Team.
The major functions of the department is to develop and maintain platforms for iFlytek cloud services. Such services is provided to: online billion-level voice-recoginition services, computation platform for bigdata analysis, and computation Platform of iFlytek indigenous online Advertising services. The voice-recoginition and voice-translation are extremely heavy-burdened. Almost every voice related services in China use iFlytek API. Also, iFlytek's voice input method occupied 50% of the Chinese cell phone market. The challenge of the department was tremendous.
Working at iFlytek was a great opportunity. With the help of friendly colleagues and my team leader Mr. Liu. I was able to learn as fast as possible. I mastered scala, yarn-hdfs, and spark within a short amount of time. Not only did they teach me use the tool, but also equipped me with a thorough understanding of the underlying mechanism of RDD and yarn-hdfs
Although the computation resources was never enough, my team leader encourages me to do experimental computation. This is extremely precious for any data science related work. Sometimes your algorithm will fail after a whole day, either because of OOM or node failure, and it is absolutely frustrating. However, testing the limit of the system and data, adjusting algorithm, making reasonable compromises has always being the essence of Bigdata. After one month of handeling bussiness requests such as user profile analysis, ad auction log analysis, I managed to figure out the way of optimizing spark code to maximum efficiency. Eventually, I could finish spark jobs on very limited time, generate visulization and give analysis report.
Some tricks I learnt are especially useful in industrial-driven enviroment. For example, how to write memory-safe classes that are serializable, how to reduce communication, caching reuseable RDD before action or wide dependency, partitioning data according to computation resources, and so on. These hands-on experience are extremely critical, and guarantee the success of bussiness requests.
I was extremely grateful to the support resources my team has offered me. With that, I can learn and experiment under pressure and proper guidance. On the second month I compiled a thorough research result about resources management. Depending on 3 levels of computation complexity and 9 different sources of TB-level data, the research gave a way of finding the minimum resources needed to run specific spark-jobs. With the help of this research, my colleagues were able to predict the exact amount of resources they need before running a job, without requesting excessive from the cluster. This report not only finished my first major project, but also saved the platform a great amount of precious computation resources
Not long before I started to work at iFlytek, the company initiated the online advertising bussiness. Three new platforms was established, Data Managed Platform, Ad Exchange Platform, and Demand Supply Platform. Our team was heavily burdened with bussiness requests of all kinds. For example, mining targeted audiance, evaluating ad champaign result, adjusting bidding/matching algorithm, updating DMP labels. These tasks usually have very short life span and needs action immediately. Custmized Ad champaigns involving short-window discount offer are the most stressful. They usually came in short notice, and demanded processing time no longer than three hours before bringing online. The computation job must not fail otherwise order will expire. The strategies must be thoughtful otherwise champaign result will not please the client. Under the heavy pressure, multitasking and focusing was essential. Working overtime was also common in the office, especially when running heavy jobs. I spent countless of my weekends and vacations with debugging complex data-mining algorithm and running heavy jobs in the office because weekends are the time cluster are mostly free, and vacation are rush time when sales champaign took place. The hands-on online advertising bussiness experience was certainly life-changing to me. During my time at the position, I was proud that I did not fail or turn down any champaign, and most of them came through with fruitful bussiness conversion.
As any Demand Supply Platform, iFlytek DSP faces a mind-blowing problem, click-fraud. At all time, Non-Human-Traffic and Deceitful-Human-Traffic came from media sabotaged the bussiness, and just when iFlytek ad bussiness was earning reputation and interests for the company, me and three other best in the team was assigned to the Anti-Fraud project "Vine". Vine was initiated by an absurd anomaly. In the middle of May 2016, after one week of connecting a new group of media, one of the media reported an astonishing 80% CTR. Although the media was immediate banned, this is an urgent signal that an anti-fraud system must be implemented at once. After two weeks of devotion. The frist version of Vine was brought online. The first version was a single-layer volume firewall. Then the second version provided IP and hardware-info firewall features. Then there were third and forth generations. During the test phase of Vine, a critical challenge was to develop a testing strategy that allow designers to monitor the performance without making actual impact until the next phase. With the statistic background, I designed an AB testing that solved the problem. The Testing splits a champaign into two seperate orders. Whenever a traffic comes in, Vine will decide if the traffic goes into A or B. If it is blocked by Vine, it goes to A, otherwise B. This testing allows us to monitor and compare critical index of A and B without the risk of sacrificing any audiance until we are certain Vine is ready to kick-in. Also it amplified the conversion rate differences and made adjusting strategies easier. At last, Vine proved to be a great success, and soon it attracted more clients to the platform because the Anti-Fraud feature was a safe-guard to their money.
Vine projects impacted me so much. On Augest 24, I resigned from my position and decided to continue with graduate education at U of R. During my master study a good portion of my research projects is based on fraud detecion and continuation of Vine. Especially VisualDX project.
|