Archive

Archive for July, 2009

Searching For a Number in Shifted Sorted Array within O(log(n)) time

July 29th, 2009 Bali No comments

Run into the algorithm problem long time ago. Now post my answer here. A sorted array, say: {1,2,3,4,5,6,7,8,9,10,11,12}, do right rotate through carry unknown times, and then it might become: {6,7,8,9,10,11,12,1,2,3,4,5}. Now we need get the index of a given number, say 4, from the array within O(log(n)) time. Apparently a 8-year-old can get it done with O(n) time.

We can think of it this way: take the middle element of array, if target is found, fine; if not, and then array become two parts, one is sorted array, the other is shifted sorted array. As illustrated as below diagram:

If the target falls into the sorted array half, we can simple do a binary search; otherwise, repeat this operation in the other half in recursive way. You can see this is divide-and-conquer algorithm. Obviously this is O(log(n)).

Code

//

// A typical binary search implementation

//

int _BinarySearch(unsigned int ShiftedArray[], unsigned int start, unsigned int end, unsigned int target)

{

// Not found

if( start == end && ShiftedArray[start] != target) {

return -1;

}

unsigned int middle = start + (end – start)/2;

if(target == ShiftedArray[middle])

{

return middle;

} else if (target > ShiftedArray[middle]) {

return _BinarySearch(ShiftedArray, middle + 1, end, target);

} else {

return _BinarySearch(ShiftedArray, start, middle – 1, target);

}

}

//

// Select a given number from shifted array.

// ShiftedArray is something like = {6,7,8,9,10,11,12,1,2,3,4,5}

// If found, return index of the number; if not, reutrn -1

// Require log(N)

//

int SearchShiftedArray(unsigned int ShiftedArray[], unsigned int start, unsigned int end, unsigned int target)

{

// Start meets end

if( start == end && ShiftedArray[start] != target) {

return -1;

}

unsigned int middle = start + (end – start)/2;

if(target == ShiftedArray[middle])

{

return middle;

} else if(ShiftedArray[middle] < ShiftedArray[start]) { // Right half is sorted linearly

if((target > ShiftedArray[middle]) && (target <= ShiftedArray[end])) {

return _BinarySearch(ShiftedArray, middle + 1, end, target);

} else {

return SearchShiftedArray(ShiftedArray, start, middle-1, target);

}

} else { // Left half is sorted linearly

if((target >= ShiftedArray[start]) && (target < ShiftedArray[middle])) {

return _BinarySearch(ShiftedArray, start, middle – 1, target);

} else {

return SearchShiftedArray(ShiftedArray, middle + 1, end, target);

}

}

}

Test cases

Positive: {6,7,8,9,10,11,12,1,2,3,4,5}, target = 3, target = 8

Negative: {6,7,8,9,10,11,12,1,2,3,4,5}, target = 0, target = 13

Boundary: {6,7,8,9,10,11,12,1,2,3,4,5}, target = 6, target = 5

Exceptional: {…max}, target = max

One more interesting thing is the statement that “only about 10 percent of the professional programmers implemented binary search correctly.” Do you know why? Check this.

Categories: 一些老文章

To Next Cuil

July 29th, 2009 Bali No comments

Cuil, another so-called Google killer, is at its last gasp. I just knew it. I am not predicting present. Cuil is not the first one, and apparently not the last. For upcoming cuils, here are my words.

Brand. Brand. Brand.

For many people, word of Google has close sentimental connection with bunch of splendid words such as cool, innovation, unselfish, impartial, revolution, and powerful, etc… With brand, Google claims that “People don’t work at Google for the money. They work at Google because they want to change the world!”. With brand, debut of Google’s every new service always arouses buzzes, but seldom notices that Live also has compelling equivalence. With brand, people think only Google can provide best results, but often they can’t tell who is search provider when presented anonymous results set. It is very interesting to take a look at curve of Cuil’s daily unique visitors:

Curl's Daily Unique Visitors

At launch momentum, people rushed to see what this Google killer looks like because of Google’s brand. Ridiculous? Not actually. It is everyone’s inherent attributes as people love to check out events of small probability such as Shoes thrown at Bush, one crazy million-dollar idea. As part of branding strategy, naming is essential. Cuil might not a good name actually. Let me share a story of mine. Back to several years ago, a group of my friends decided to build a website aimed to provide 3rd service for franchising, called JiaMeng in Chinese. The guys with solid academic management background came with the domain name of 51franchise.com. It turned out a real trouble – hard to explain to customers, not localized. Even ordinary college students don’t know the word franchise, not to mention clients with much less schooling. So, ditu.live.com for Chinese is much better than chinamap.live.com if you take a look at average education level of internet users. All in all, BRAND works like religion, and it takes lifetime to build.

Brand -> prouducts

“A Google approach to email” – see how brand helps product marketing.

Infrastructure

GFS. BigTable. MapReduce. They can be competitive advantages. With these put in place, Google can roll out new internet services faster, cheaper, and at scale at few others can compete with. They are designed solely for Internet services. Users quit quickly after dissatisfied performance experience in Cuil. Microsoft software is mainly for an enterprise, supporting 100K concurrent users is “good enough”, but it is far more perfect in internet scenario.


Understand/Repsect Customers


There is no one-size-fits-all solution given the growingly diversified market. Of course you can educate customers, but never expect to change their inherent attributes coming from culture/history/economic development level. If you doubt this claim, check out this article: Search site moves at the speed of China, which reports, “But appreciating such cultural differences is what Baidu.com Inc.’s chief financial officer, Shawn Wang, says gives the Chinese search giant unique insight into the country’s 1.3 billion people as it competes with American rivals such as Google Inc. and Yahoo Inc.” As a result:

Baidu beats Google in China market

Culture

Per Wikipedia, culture means the set of shared attitudes, values, goals, and practices that characterizes an institution, organization or group. Google’s business is built on top of internet, so its organization/knowledge base is built for the internet, just like Microsoft is built for software, mainly enterprise software. I met strong feature PM with deep knowledge needed for enterprise software, say reporting, admin UI, DB admin UI, and information work flow. They understand their customers so much after years of interactions with them. It takes time to accumulate. Top-down hierarchy, heavyweight development process, years of in-house development can hardly catch up with the pace of internet evolution. The same thing is applied to Google – I am equally not optimistic if Google step into enterprise software because of the same reason – culture, enterprise’s DNA.

Web Competition Strategy

What is Cuil’s selling point? (1) Fancy UI. UI is critical for adoption and usage, but it hardly provides a moat. This is provided by two case studies of Apple computer of the nineties and the “X window” system on *nix OS. Both these systems with more attractive UI couldn’t beat windows OS with lower cost and rich applications available. (2) More relevant result. This is an ambiguous area which lacks of widely accepted measure criteria. (3) Cheaper solution. There is a question of sunk cost, of course you can claim you are 1/10 cheaper once reaching Google’s current scale. None of these is compelling from users’ point of view. Why do users bother to go to your site instead? One of the significant differences between web service (say, search) and traditional software business(say, DB) is purchasing decision making process. DB vendors can send to salesmen to target customers’ office and argue the deal. Only quite a few key persons have the final call. They are more analytical, love data. As comparison, everyone can be customers of search, we are more emotional. If I don’t miss anything, looks like the best strategy to monetize Cuil is to be acquired by Google.

No chance to win in search?

Definitely No. But you are doomed to fail if following essential parts are missed:

  1. Remember brand. Remember “winners take all”.
  2. Build your DNA towards internet. DNA = SUM(people, team arch, process, knowledge, …)
  3. Put infrastructure in place. This is the way to help turn your idea into profitable traffic. Not scale-up, scale-out instead.
  4. One thumb rule to compete with dominant market leader
    • Avoid playing games whose rules are set by opponents. You can hardly win. In this case, better search engine defined by Google are faster, relevant results, simple UI, magic algorithm, PB of data, … Let us think of solving same problems with different approaches. Why search? Help explore and share information. If someone tries to solve this problem by following Yahoo’s tail light to build yet another portal, he has little change to take off. Another example is download – P2P technology solved the download problem without adding more expensive servers/bandwidth.
    • Attack opponents’ weak points. Google is designed to search everything, but it may not be good at all vertical industries, say shopping. Nibble at its market share if we can’t win in head-to-head way.
  5. Before rolling up sleeves, why we have to win? Why not step away and go find next big thing? Let Google be Google.
Categories: 一些老文章

Principles for Building Secure Database Applications in Action

July 29th, 2009 Bali No comments

What I am talking about in this post might be well known to many people(too simple, sometimes naive?), but often most basic things make a difference. OK, get down to business. Thumbs rules for DB security might be:

  • Define your security boundary(or attack surface)
  • All input is evil! Evaluate them with whitelist
  • Don’t store blank password, even hard-coded in the source
  • Put DB in a dedicated server and access it with accounts with least privilege
  • Put connection string in registry and read it out from code
  • Use stored procedure
  • The attacker is told nothing
  • Save your resources
  • Specify least assembly permission requirements with attributes

FxCop is obviously a “must-have” for .NET developer, but we have to eliminate complaints one by one. Instead of remembering all “bad behavior” in various tutorials, why not make them our built-in features towards great developers? (if you are still developers, why not much better? ) Let us put most significant principles into simple sample lines of code. Pay special attention to highlighted words.

// <THIS IS UPDATED ON 2/5/2009 PER FEEDBACKS>

using System;

using System.Data;

using System.Data.SqlTypes;

using System.Data.SqlClient;

using System.Security.Principal;

using System.Security.Permissions;

using System.Text.RegularExpressions;

using System.Threading;

using System.Web;

using Microsoft.Win32;

namespace Sample

{

public class SecureDBAppSample

{

[SqlClientPermissionAttribute(SecurityAction.PermitOnly,

AllowBlankPassword = false)] // (1) Blank password is never allowed

[RegistryPermissionAttribute(SecurityAction.PermitOnly,

Read = @"HKEY_LOCAL_MACHINE\SOFTWARE\MyApp")] // (2) Can read only one specific registry key

static string GetName(string Id)

{

string Status = “Name Unknown”;

try

{

// (3) Check for valid shipping ID with white list

// 4-10 digist only, anything else is bad. In most production environment,

// inputs check should be done in attack boundary instead. Of course we can check

// it here for defensive programming efforts

Regex r = new Regex(@”^\d{4,10}$”);

if (!r.Match(Id).Success)

{

throw new Exception(“Invalid ID”);

}

// (8) Shut down connection–even on failure.

using (SqlConnection sqlConn = new SqlConnection(ConnectionString))

{

//Add shipping ID parameter.

// (4) Use a store procedure to hide the application business logic

// in case the code is compromised

string str = “sp_GetName”;

// (8) Release resources–even on failure.

using (SqlCommand cmd = new SqlCommand(str, sqlConn))

{

cmd.CommandType = CommandType.StoredProcedure;

// (5) Use parameters, instead of string concatentation to build the query

// (6) Force the input to be 64 bits integer

cmd.Parameters.Add(“@ID”, Convert.ToInt64(Id));

cmd.Connection.Open();

Status = cmd.ExecuteScalar().ToString();

}

}

}

catch (Exception e)

{

// TODO: For better debugging purpose, we need log the exception with

// something like Logger.Log(e);

// (7) On error, the attacker is told nothing

if (HttpContext.Current.Request.UserHostAddress == “127.0.0.1″)

{

Status = e.ToString();

}

else

{

Status = “Error Processing Request”;

}

}

return Status;

}

//Get connection string.

internal static string ConnectionString

{

get

{

// (9) Store connection string in registry key intead of xml files

return (string)Registry

.LocalMachine

.OpenSubKey(@”SOFTWARE\MyApp\”)

.GetValue(“ConnectionString”);

}

}

}

}

The data in registry key is the connection string.

Data Source=MyDb008; // (10) DB is on remote server.

// Compromised web service does not lead to SQL data access automatically

Integrated Security=SSPI;// (11) Use Windows authentication

Initial Catalog=client

In stead of storing plain text, we can encrypt above connetion string. Keep in mind that I don’t say that they are necessarily the best choice at all times, but many times they are.

Reference: Write Secure Code

Categories: 一些老文章

An AD System to Pay Content Generators

July 29th, 2009 Bali No comments

Back to not too long ago, I had a half-completed advertising idea related to social shopping. Now I post it here to collect more feedbacks. I call it HappyDog. (Just a name, not related to that DogFood widely used within Microsoft J)

Problems

As everybody knows, ‘YOU’ is named Time’s person of 2006 for the growth and influence of user-generated content on the internet.  Why? In Wikipedia’s words:

“… chose the millions of anonymous contributors of user-generated content to Wikipedia, YouTube, MySpace, Facebook, Digg, Second Life, the Linux Operating System, and other providers, as Person of the Year, personified simply as You.”

But on the other hand, if we carefully think of people’s motives of generating contents we can easily find that people do this mostly out of curiosity, self-achievement or volunteerism. Problem #1: Per basic economy principles, these efforts can hardly stand long. Content quality is another downside in such circumstances.

Another awkward problem around current most successful business model, online ad, is the fact that people get tired of spammed AD when they use certain online service for free. Problem #2: But interesting enough, people often have trouble in making the right purchasing decision given even exposed to so much AD, probably because: (1) AD timing is not good, (2) they don’t trust the publisher, (3) AD is not carefully targeted. This looks pretty twisted, doesn’t?

At a glance

Taking above problems into consideration, HappyDog is an innovative advertising system which provides following unique values:

  • For general content generator – get cash paid by sharing your experience with the world, even if you does not own a website.
  • For shopper – make your shopping decision more smartly
  • For business, either online or offline – market your product more directly to the most potential customers

Let us take a look at below diagram first:

By content generators, it could be anyone(professors, housewives, kids,…) who contribute anything(answers, e-books, songs, tutorials, …) in any format(wiki, blog post, video,…), including but not limited to wiki & QA.

When people produce contents, HappyDog helps insert contextual AD into the content. The big difference from existing advertizing system is that content generators gain real revenue in HappyDog system. The revenue coming from advertisers would be divided into several parts: 60% to content generator, 30% to site owner, 10% to HD. HD takes the smallest piece of pie, but apparently we will gain most amount of them, because there is only single HD, 1*M site owners, 1*M*N content generators. In addtion to real $s, the incentives here can also be points/happiness, etc. The point here is to encourage ones by certain ways. But the reason why we highlight monetary incentives is the belief that this makes something long-awaited possible. Let us think about this: Sites like Wikipedia, IMDB, or Amazon, have tons of high quality content that have been contributed freely, but why only them? Because They are cheap, handy offerings to the community. How about an book written online, sold online? Few will do this free mainly because of its costly nature. HD makes it possible though. Another interesting thing is how wikipedia-clones are going in China. They never take off. It might be cultural reasons related to general finance status.

Monetary incentives can help imporve quality of content because one gets financial penalty due to “thumb down” when we put a rating system in place.

HD helps people do better purchasing decision in that HD AD is more personal, and people are encouraged to advertise the products they sincerely love and have first hand experience on. We will address this in detail next. One will be concerned to play dirty because one also has to care about its repuation/credit disclosed by rating system.

Demo

Take Yahoo answers as example, an working page from publisher side would be:

P0 features in 1st Milestone

In order to pay content generator effectively, following features are essential in M0:

(1)    Users have the rights to select their preferred AD – When you answer questions in forums; you actually care your reputation in same way you deliver a public speech. So you also care the AD itself along with your content. Another reason supporting users selectable AD is that people might like to show certain products more persuasively via real personal experience. In short, we prefer “recommend your favorites to friends” scenario.

(2)    PPA(Pay Per Action) as major pricing model along with PPC – Advantages of PPA: no click fraud, advertisers’ preferences.

(3)    AD type – signature text, picture, inner-text popup, mini-cast

(4)    Support offline mode – Although internet users reach 0.8 billion statistically, but not every business has a website. This is especially true when people surf the net mainly for entertainment instead of business. HappyDog aims to benefit this kind of business via so called offline mode. We will take about this later.

(5)    An open platform – Open API to foster a strong ecosystem around HappyDog.

How It Works

The conceptual architect for HappyDog might be looking like follows:

In real implementation, HappyDog doesn’t depend on any specific technology platform. Take general open source platform as example, related technology could be:

•       Client

–      HTML

–      Use Javascript/Flash extensively

•       Servlet container

–      Tomcat/Jboss

–      Spring, OR structs

•       JAS(Java Application Server)

–      Jboss

–      Hibernate

•       File Server

–      Store static data, say photos, html, pic, js

•       DB

–      MySQL

Its key use cases would be:

Buyers can be publishers or unregistered users. By online, it means the transactions whose completion can be confirmed online, say online order, user registration, complete one survey, software download, etc. By contrast, offline means the remaining trade types, such as haircare, restaurant, face-to-face trading.

PPA Implementation

PPA, as one of the core features of HappyDog System, comes with two flavors: online and offline, as illustrated in following sequence diagrams.

Online-PPA

The steps here are:

(1)    User browses and click the publisher site link somehow

(2)    Browser sends request to publisher

(3)    Publisher response the corresponding content along with Javascript inside

(4)    Browser starts rendering the page, and then the Javascript in Browser call the HD to get AD

(5)    HD returns content with targeted AD

(6)    Brower completes rendering

(7)    User continues browsing and click one of our AD

(8)    Brower follow the link and send the request to HD along with the needed parameters

(9)    HD do several things here:

a)        Write “who is publisher, when, advertiser, campaign, etc” into the DB and return with a TRANSACTION_ID

b)        Write TRANSACTION_ID into user’s browser cookie

c)         Redirect user to advertiser website

(10)Advertiser returns the commercial pages to user browser

(11)Just show it

(12)User is interested in something in advertiser’s web site and fills in form(sales order, lead, signup, etc) and submit

(13)Submit user inputs to advertiser

(14)Advertiser will do below things:

a)        Do anything necessary to close the deal

b)        Read user’s browser to get TRANSACTION_ID

c)         Call beacon code the confirm the transaction with HD with TRANSACTION_ID as parameter

(15)Charge advertiser and share revenue with publisher

To take part in AD promotion of this advertiser, user must give his comments about this shopping experience.

Offline-PPA

Offline mode is bit more complex.

(1)    User browses and clicks the publisher site link somehow

(2)    Browser sends request to publisher

(3)    Publisher response the corresponding content along with Javascript inside

(4)    Browser starts rendering the page, and then the Javascript in Browser call the HD to get AD

(5)    HD returns content with targeted AD

(6)    Brower completes rendering

(7)    User continues browsing and click one of our AD

(8)    Brower follow the link and send the request to HD along with the needed parameters

(9)    HD do several things here:

a)        Write “who is publisher, when, advertiser, campaign, etc” into the DB and return with a TRANSACTION_ID

b)        Give user the advertiser’s profile and coupon which contains TRANSACTION_ID, advertiser name, promotion campaign.

(10)Just show it

(11)User is interested in advertiser’s  promotion program and decide to print the coupon out

(12)Go to HD

(13)Record this as a successful transaction in DB and wait for confirmation(TTL is 2weeks)

(14)User go to the advertiser with coupon to enjoy the service or product(say haircare, dinner, etc), and go back with CONFIRMATION_CODE

(15)Provide service and a CONFIRMATION_CODE

(16)User logon to HD, submit TRANSACTION_ID and CONFIRMATION_CODE

(17)After validation, give user further kickback

A series of CONFIRMATION_CODE are issued to advertiser by HD while he/she enrolls the program. Additionally, in step(15), alternatively advertiser could logon to HD and confirm the transaction.

Open Questions

There are also several problems deserving consideration at current stage:

1.       How to get advertisers? Alimama? Google AdSense is such a close system, hard to extend.

2.       Looks like current publishing place are not ready for HappyDog. They often filter AD content out. How to solve this?

3.       How to share revenue in Wiki scenario? Say 50 people contribute to a wiki page, how to distribute revenue among them?

4.       Is the money too little to stimulate enthusiasm among high quality content contributors who are usually payed very well in regular job?

5.       How to protect content, for example 1st tutorial about jailbreaking iPhone? Share revenue?

This post is mainly for feedback collection purpose. Feel free to Pai Zhuan. :-)

Categories: 一些老文章

Words from potential ex-customers

July 29th, 2009 Bali No comments

It is only customers who pay dollars as your revenue and enable your read this post comfortably at your office. Almost every company considers “Customer-oriented” as one of their key values although they can be expressed in various ways. It is “Passion for customers, partners, and technology” in Microsoft. This post is about two real personal experiences around customers, or ex-customers.

#1: My mother’s mobile phone

My over-55-year-old mother is of the old school, short-sighted. She has nearly zero knowledge about computer, menu, input method or any that sort of high-tech things. One day she realizes needs of contacting us from time to time in her way and decides to grab a mobile phone. Typical user scenarios of her interest can be summarized in descending order as follows:

P0:

  • Power On/Off – for power saving
  • Receive a phone call
  • Call her contacts(<10)
  • Set/Close alarm – She need prepare breakfast for family every morning
  • Handle recharging easily – when to charge, when to start/complete

P1(Features):

  • Larger UI elements – It is not convenient for her to bring glasses always
  • Louder sound – Her hearing is aging

And then I go ahead to bring her one of my used feature-rich phones with enough confidence. After helping input her contacts with concerns whether she can ever figure out where to start, I start demo-ing how to use the phone. Problems arise from the very beginning. To power-on, the phone provides two approaches:

(1)    Hit a green button, wait for a second, and then hit another button to confirm

(2)    Alternatively, hold the green button for 3 seconds

My mother finds she is in trouble telling “hit” vs. “hold”, and it is especially hard for her to judge when the startup itself is completed, or whether she hit the button heavily enough. She also complains that the button is too many for her and keyboard is totally useless for her. UI controls are over-crowded to hardly figure out what is going on in the screen… Similar things happen in other scenarios. This makes me think several things seriously.

(1)    Is feature-rich always good thing for customers? People(especially software makers) tend to add more and more features to their products version by version, and then charge customers more because of this. But in reality, this may result in confusion or even troubles sometimes.

(2)    Are customers kidnapped? Feature designers always think their product should be used in this way, or in that way. But they can never imagine that for everyone. Providing enough customization-ability is probably a good practice. Customers should have the call on their click length.

(3)    The elderly mobile market is ignored? I search for a phone dedicated to the elderly, but to my surprise there is no such type.  Because kids and women have much bigger spending power? Can Windows Mobile easily hit this market niche?

#2: Medical checking report

A friend of mine was sent to the hospital after two days of Diarrhea. The doctor made a chemical examination of his excrements with the suspicious of certain virus. When we got the report, one highlighted message shows:

Rotavirus –> Negative

I am totally confused. Negative? What should I do about it? And then I had to turn to a doctor in the front desk and had the conversation as follows.

Me: “Execute me, could you let me know what it means by ‘Negative’?”

Doctor: “It means no reaction for this virus.”

Me: “But what does ‘no reaction’ mean here?”

Doctor: “There is no such virus found in the examination. In other words, your friend looks OK from the report.”

Software industry is not alone around such kind of customer interactions. This reminds me of two thumb rules:

(1)  Don’t assume customers are as knowledgeable you/your study/your expectation. They can be lack of most basic things in your own area.

(2) User customers words. Customers care about their own problems more, and don’t care how you help solve them that much.

Categories: 一些老文章